6 Dec 2012

What is Big Data Security Analytics? Part One: Visualization

0 comments Permalink Thursday, December 06, 2012

Recently we were showcased in GigaOM's article "6 ways big data is helping reinvent enterprise security". The area of Packetloop they focused on was visualization and the nexus of Big Data, NoSQL and it's ability to power visualisations. This nexus is incredible and I believe will drive a lot of really awesome work. The idea of navigating large data sets effortlessly allows an analyst to explore and explain the data. However I see this as only the first in a set of powerful features that Big Data Security Analytics will deliver.

So what is Big Data Security Analytics?

It's delivering knowledge and intelligence in relation to security events with the highest fidelity and context possible. Knowledge and intelligence drives informed decisions based on real evidence that increases the security effectiveness of an organisation. I also think it's important to view the subject after you remove the benefits Big Data Security Analytics gives you in terms of size and speed. I was challenged to do this recently by Scott Crawford a leading industry analyst and although at first disarming it forced me to explain our work in terms of the subjects in the figure below.


So I was keen to discuss these subjects in a series of blog posts that showcased the real effectiveness of Big Data Security Analytics and what problems Packetloop can solve forgetting about size and speed.

Visualization

For us Visualization is all about encoding complex and densely featured information so that the best pattern matching system in the world can interpret it. The best pattern matching system in this case is you! We all have difficulty interpreting novelty, outliers, anomalies and trends when faced with a spreadsheet but as soon as we see it visually it's easy. A classic example of this is Anscombe's Quartet. Four data sets that have almost identical statistical features (sum, mean, variance etc.) and also exhibit very strong correlation. However as soon as you visualize the same information you can immediately see the differences in the datasets and the trends within each dataset.



http://visual.ly/anscombes-quartet
So how does this concept translate to security? In Packetloop's Threats module we have a stacked bar visualization. A great dataset to demonstrate and analyse is DARPA98. When you take away the size of the data set (3 months of time, 171K attacks and 64M packets) it's a basic Low, Medium and High severity visualization. Low attacks are shown in light blue, High/Critical attacks are shown in dark blue and Medium attacks in between those two shades.


Even in this simple visualization I can enable a single feature to help me find outliers. In the first example I am going to track "New Attacks" and see how that changes my analysis. New attacks are attacks that have never been seen before. 


After hiding the Legend drawer you can see that Frequency of Attacks is plotted against the left hand Y-Axis (y1) and the number of  New Attacks is plotted against the right hand Y-Axis (y2). Sharp spikes show periods of time where novel attacks were used against the organisation. They are areas of time I would want to investigate in greater detail. In the figure below you can see a sharp spike in the "New Attacks" line on June 15th. What immediately piques my interest is the fact that there's a number of new attacks in that period but the stacked bar is all Low and Medium attacks. This is a classic example of Severity not always being a good indicator of attacker behaviour and tactics but the "New Attacks" line definitely shows me that either an individual or group of attackers is trying a number of vectors that I haven't seen before in a relatively small time period.


So instead of viewing the information based on Severity (Low, Medium and High) let's pivot the data and view the same information through the Attack lens. In the visualization below each attack is allocated a colour and the height of the bar shows the frequency or number of times the attack was used.  If we focus on the peak of New Attacks on Tuesday the 16th the stacked bar visualization hides the nature and complexity of that period however the New Attacks line clearly shows there is interesting information hidden in there. Why is it hidden? In the first example overlaying novel attacks exposed the fact that severity is not always a good indicator. When we switched to viewing by attacks we still see a large jump in new attacks but they are dominated by the high number of other attacks that aren't new. Viewing the data as a 3 month period where each bar is 1 day is hiding the detail.


Packetloop has the ability to zoom in from years of data to one minute of data instantly. Zooming in from the 3 month view to the 24 hour view the detail is uncovered. In a single hour there a flurry of activity in relation to attacks that the organisation had never seen before.


Zooming in further to look at just 30 minutes from the original 3 months and each minute is laid out clearly. A mouseover allows you to see the Severity, Attack Type and Frequency.


Below our main visualization we have a series of data panels that allow you to see more detail than an annotation can provide. In this case I see the new attack "Finger/execution attempt" has all originated from a single Source IP address (152.169.215.104). This is a relatively innocuous attack and is obviously being used for information gathering prior to an exploit being used.


So what other attacks has this single Source IP Address been using, what is this attacker's timeline - throughout the current 30 minutes and then the entire 3 months. The Advanced Filter allows the IP to be selected that filters all attack data based on the Source being 152.169.215.104.


Once filtered you can see that there's a dozen attacks with numerous different attack vectors against a single destination (172.16.112.50) within that 30 minutes. All could be classed as methods of gaining specific information on the target. So obviously a lot of work is being done to enumerate the host but as yet there is no smoking gun.


Zooming out you can see the entire attack timeline for this attacker - 21 attacks using 10 different attack vectors over a period of 3 months.


Looking back at the filter I can see that the attacker has hit two destinations and by selecting one then the other I get his attack timeline for each. What's interesting about looking at the entire timeline is I now start to see indicators and warnings related to FTP and warez as well as actual exploit delivery in the form of x86 Shellcode.


If we filter by the destination 172.16.112.50 we can see the attack timeline between the attacker and the specific victim. There is a lot of information gathering on the June 15th and then a break of a week and then a number of warnings related to FTP and warez activity on the 22nd June and 23rd of June.


If we filter by the destination 172.16.114.50 the timeline is totally empty except for delivery of Shellcode over DNS 3 times within 7 minute period on July 23rd.


Zooming back into the exact time period you can see when the attacks were delivered. Two attacks at 12:33am and another occurrence of the same attack at 12:39am.


I will post a video of this soon but to give you a rough idea this analysis took less than a minute or two to perform against a dataset of 64 million packets that spanned 3 months of time.

Summary

Big Data and NoSQL enable the ability to process, store and query security event information at incredible size and scale. However Big Data Security Analytics is the intelligence that can be gleaned from this data. This blog demonstrated the strengths of visualization and specific overlays (New Attacks) that allow you to explore and explain the data. Changing the lens to view the same attack data from different perspectives and the ability to zoom from years to minutes and back again makes Big Data Security Analytics an extremely powerful analysis and intelligence tool.

No comments:

Post a Comment