30 Oct 2012

Packetpig on Amazon Elastic Map Reduce

2 comments Permalink Tuesday, October 30, 2012

Packetpig can now be used with Amazon's Elastic Map/Reduce (EMR) for Big Data Security Analytics.

We've added some sugar around the EMR API to help you start running packet captures through our Packetpig User Defined Functions (UDFs) as easily as possible.

Let's start with a very basic example, pumping a set of captures through the supplied pig/examples/binning.pig.

'binning.pig' uses the PacketLoader UDF to extract IP and TCP/UDP attributes from each packet in each capture. If you look in the script, you'll see the format returned in the LOAD statement.
We want to extract all of these and store them in a CSV file for later analysis.

First let's setup our credentials. Set these env variables in your terminal.

export AWS_ACCESS_KEY_ID=[your key]
export AWS_SECRET_ACCESS_KEY=[your key]
export EMR_KEYPAIR=[name of key you create in ec2 console]
export EMR_KEYPAIR_PATH=[path to saved key you just created]
export EC2_REGION=us-west-1 (optional, defaults to us-east-1)

Now, run the job:

$ lib/run_emr -o s3://your-bucket/output/  \
-l s3://your-bucket/logs/ \
-f s3://packetpig/pig/examples/binning.pig \
-r s3://your-bucket/captures/ \
-w
...
Created job flow j-33QXAKHCEOXUO

Type lib/run_emr --help for more information but for now, we specify the output dir with -o, the log dir with -l, the pig file with -f and the read dir with -r*.
-w specifies we like to watch.

After a while, you'll see the bootstrap process begin, some packages will be installed, and then Hadoop will start.

At this stage, an EC2 node has been spawned to run the Hadoop master and it's also where the mappers and reducers will run in this example.

It's boring to watch logs, it'd be nicer if we could see more.
$ lib/run_emr -e
j-33QXAKHCEOXUO RUNNING david's pig jobflow
        Setup Pig      COMPLETED                   22s
        binning.pig    RUNNING                   3485s

$ lib/run_emr -x j-33QXAKHCEOXUO
Connect to http://localhost:9100/jobtracker.jsp - hit ctrl-c to stop the ssh forwarder

Do as it says and hit localhost:9100 in your browser and you can look at the Hadoop job tracker which is useful to get a measure of how well you've tweaked your node type and node count.

In my case, I'm looking at 22.64% mappers completed after 1h 14m. That's a bit slow!
The default is to run 1 m1.large instance == 4 cores.

$ lib/run_emr -o s3://your-packetpig-output/ \
-l s3://your-packetpig-logs/ \
-f s3://packetpig/pig/examples/binning.pig \
-r s3://yourbucket/captures/ \
-w -k 20 -t m1.xlarge
Created job flow j-38QAABHC3RXO7

Now we're looking at 20 m1.xlarge nodes == 80 cores.

If you change your mind about the job you can easily terminate it like so:

$ lib/run_emr -d j-38QAABHC3RXO7

All the included Packetpig scripts in pig/examples are mirrored in s3://packetpig/pig/examples.*
If you want to run your own, just change the -f argument to point to whereever your script is.

Here's a video showing how you can use Packetpig and EMR to find Zero Days in past traffic.

28 Oct 2012

News and Updates

3 comments Permalink Sunday, October 28, 2012

Its been very busy couple of weeks at Packetloop leading up to our presentation last weekend at Ruxcon. A lot of work has happened to finalise the commercial release of Packetloop, as we continue to work with our Early Access users, understanding and incorporating their feedback into the platform. The feedback has been great, and its exciting to watch new users explore and understand the user interface for the first time, and even more gratifying to see them understand and exploit the power of Packetloop.

We were lucky enough to be able to take the entire team to Ruxcon to support Michael Baker (our CTO) with his presentation Finding Needles in Haystacks the Size of Countries, and I would estimate there was some 300 people in the room to watch his presentation. I was interested to see the expressions on the faces of some of the countries best security professionals when Michael showed how we can easily process vast amounts of network packet captures (Big Data!), and use our tools to identify previously undetected Zero Day attacks.


Best of all was the stunned silence that came over the room when he showed a couple of visualisations that captured the power of the Packetloop/PacketPig tools, showing security data in a way that people had previously not considered. The entire presentation can be found here on Slideshare, but if you just want to see the visualisations in action, check them out here. Many thanks to those who took the time to come to the presentation, and to also come and seek us out later for further conversations.

Michael is also currently collaborating on a series of blog posts with the team at HortonWorks (whose founders authored Apache Pig). The topic of this series is the use of Pig to perform Big Data Security Analytics. Much of our work in this space has been using our open source platform PacketPig. The first of this series, co written with Russell Jurney (@rjurney), titled Big Data Security Part One: Introducing PacketPig is a great read.

Given our participation in Ruxcon, we took the opportunity to sponsor the Risky.Biz Podcast, who covered both the BreakPoint and Ruxcon conferences last week. Risky.Biz host Patrick Gray did an interview with Michael about Packetloop for the show. The entire interview can be heard here. During the interview and some subsequent conversations, Patrick posed a number of questions around the prospect of businesses uploading their internal data into the cloud for security processing. This is an interesting question, and is worthy of further discussion.

In essence we are advocating a new data source, one that may be seen as higher risk due to its external, 3rd party nature. Obviously we have to offer a value proposition that outweighs this perceived risk. We believe that value proposition is the power of Big Data Security Analytics and the knowledge and intelligence it provides. We should however put some perspective around this risk by keeping in mind that companies already have data in the cloud, such as email, CRM, device logs, or they use online applications for everything from project/document management to financial applications. These all store the resultant data in the cloud. 

There are several ways you can mitigate these risks when using a cloud based solution such as Packetloop, including:
  • Storing your full packet captures in your own S3 bucket and providing us the keys to process
  • Sending us an encrypted drive with the source data
  • Implement the Packetloop onsite appliance (see below).
Of course you can also delete the data you have uploaded once we have processed it, but then you miss out on the full benefits of the Packetloop platform, the ability to search older stored full packet captures for previously undetected Zero Day attacks. An analogy for this sort of retrospective review of older data for previously undetected attacks can be found in sport. Athlete's drug test 'B' samples are kept for up to seven years, and are retrospectively tested to see if they athlete was in fact using a drug that was went previously undetected. This has had a lot of press lately with the downfall of a certain multiple Tour De France winner. Bruce Schneier wrote a great article for Wired this week about this very topic, and the power of being able to look into the past for answers. The parallels to Information Security are very real. It's our belief that over the next 5 years full packet captures will become the standard for logging and analysis of all data, including sensitive data, as this is the only way you can currently use existing IPS to produce indicators and warnings about potential threats, and its the only way you can Play, Pause and Rewind your network data.

We do however acknowledge that for some organisations, their data classification or regulatory position will simply prevent them from using our cloud based security analytics service. We understand this, and that is why we intend to create an on premise appliance version of Packetloop straight after we release commercially in the cloud. Ultimately, we are trying to provide the best Big Data Security Analytics tools in the market, and we will let you choose your level of involvement.
21 Oct 2012

Finding Needles in Haystacks @ Ruxcon

0 comments Permalink Sunday, October 21, 2012
Yesterday I was in Melbourne presenting "Finding Needles in Haystacks (the size of countries)" at Ruxcon. If you are looking for the latest version of the slides they are here - [PDF] [Slideshare]. It was an awesome conference with high quality presentations. Special thanks to Chris Spencer and the Ruxcon panel for selecting our CFP.
I was a little concerned about how it would be received as 'Big Data' hasn't really penetrated the security world yet. However that fear was soon dispelled and I think our visualisations really helped to reinforce the concepts.


The Worldwide Attack Globe received a great response. It showed almost 1 Million attacks over a 12 day period. This was a real world dataset from an early customer of Packetloop's.


The Worldwide Attack Globe can also be used to show/filter different data types. In this example I demoed how TOR endpoints can be plotted on the globe and then I zoom in on a very persistent attacker from the Republic of Ireland.



One of the concepts I wanted to focus on was that of data fidelity. Big Data tooling enables the ability to maintain full fidelity from years to minutes. Further to this sometimes it's seeing data in a different way or seeing it animate that brings on the discovery and knowledge. This was shown in the 'Full HD - Play, Pause and Rewind' demonstration.



Thanks again to everyone who attended and filled Room 1. Also thanks to all those who took time out to chat with us and share ideas.
18 Oct 2012

Teaming up with Hortonworks for Packetpig Blog Series

0 comments Permalink Thursday, October 18, 2012

Recently I connected with Russell Jurney @rjurney on Twitter after he posted a couple of tweets related to Packetpig. Russell works for Hortonworks a Big Data Platform company founded by Alan Gates one of the developers of Pig and author of Programming Pig.

I had been following Russell after reading his datasyndrome blog which is an awesome reference for people keen to learn about Big Data and how Pig, Hadoop and NoSQL databases like Cassandra and Mongo can be linked together in pipelines.

Soon after this Russell asked if I would like to collaborate with him on a series of blog posts on the Hortonworks blog. The first of which came out recently.

So check out "Big Data Security Part One: Introducing Packetpig" - I hope you enjoy the post and the series!