Weekly Intelligence Summary Lead Paragraph: 2015-06-26

The developments in InfoSec risk this week that almost certainly had the greatest impact on Verizon Enterprise clients were zero-day attacks on a new vulnerability in Adobe Flash Player and release of a security bulletin by Adobe.  A Cisco security advisory for Virtual WSA, ESA, and SMA products because they ship with a common default SSH key for the remote support functionality in the products. Neither Cisco nor the VCIC are aware of threat activity, but security infrastructure vulnerabilities are in a class by themselves for update management. Incomplete or absent strong authentication was a theme carried over from last week’s bogus report that OPM passwords were circulating on the deep web. This week, the OPM director told the US Congress the attackers used (static) login credentials stolen from a contractor to access federal employee’s data. Expedia sent warnings to their registered users to be on guard for phishing attempts. This was because a partner hotel was successfully phished (losing static ID/PW) and customers who had used Expedia to make reservations at that one hotel had their data compromised. re/code reported, from leaks, investigators from the SEC/Secret Service pursuing the FIN4 threat actor have asked at least 8 companies for details of their data breaches. Fin4 was profiled by FireEye in December and used stolen credentials to access email accounts mining insider information and abusing the victims email systems to dupe colleagues, peers and customers. The email logins used static user name and password for I&A. Fortune is 2/3 through an investigative report on last year’s Sony Pictures Entertainment (SPE) breach. Part 1: Who was manning the ramparts at Sony Pictures? and Part 2: The storm builds are candid accountings of the business and security cultures in the company. Part 3 is due Sunday. The VCIC collected reports especially useful for two segments of Verizon Enterprise clients: Microsoft published, The Latest Picture of the Threat Landscape in the European Union – part 1. And Websense published 2015 Industry Drill-Down Report: Financial Services (PDF) (blog summary).

Weekly Intelligence Summary Lead Paragraph: 2015-06-19

Palo Alto Networks’ Unit 42 may have been one of the most active security teams in the InfoSec space this week after it published reports on the Lotus Blossom campaign targeting Southeast Asian governments, Evilgrab malware being distributed via a strategic web compromise in Myanmar, and the KeyBase keylogger family of malware. Kaspersky also published some of its intel on Lotus Blossom, though it calls the actors responsible Spring Dragon. Media reports continued to surface regarding the Office of Personnel Management breach and both Brian Krebs and Nextgov published timelines of key events leading up to and continuing after the compromise was announced. In other collections related to the OPM breach, advantageous criminals attempted to latch onto the hype by selling a data dump of 23,000 .gov/.mil email addresses…stolen in 2013…from the Federal Prison Industries agency. In more current breach news, LastPass announced it suffered a breach of customer email addresses and master passwords at the hands of unknown threat actors. Robert Graham provides a good overview of why you shouldn’t be worried if you’re a LastPass user, unless you use a simplistic master password. Trend Micro reported an Adobe Flash Player vulnerability (CVE-2015-3105) patched just last week was already being exploited by the Magnitude exploit kit. And forget steroid use, there’s a new crime in Major League Baseball and it’s called corporate espionage.

Mitigations Aren’t Effective After the First Six (A DBIR Attack Graph Analysis)


So that title, right?  A bit inflammatory.  Before I explain myself, let me step back and review what we’re doing.

If you haven’t, watch the first 2 minutes of this video or read about the DBIR attack graph here.

Now, our job as defenders is to make the paths attackers have available to them as long as possible.  The longer the path, the more expensive/harder/time consuming the attack is.  Some attackers won’t be able to do the attack any more. Others will look for easier targets.  (DBIR data shows most attacks are opportunistic.)  Or, if the do attack, they will be able to afford less attacks.

With the DBIR attack graph, we can test how much longer the shortest path gets with each mitigation.

What to Mitigate

In the figure below, we see just that.  And what we see is that after the first six mitigations, mitigating actions or attributes simply don’t increase the shortest path available to hackers.

So what are the first six?

  1. Prevent Software Installation (e.g. application whitelisting)
  2. Deny the attacker the ability to execute a Denial of Service attack (e.g. DDoS protection)
  3. Deny the attacker command and control of their malware (e.g. outbound filtering)
  4. Prevent phishing (see the DBIR for suggestions)
  5. Mitigate payment information. (e.g. Don’t store CC’s if you don’t have to)
  6. Prevent the use of stolen credentials (e.g. two factor authentication)

Now don’t get me wrong, you can mitigate more if you want.  And it’s likely some attackers who lack the knowledge or means to take the shortest path but are able to take others will be affected by other mitigations.  However observability, signalling, and threat actor uniqueness will need to wait for another blog.

And if that’s all you want to know, no need to read the rest of the blog!

The Detail

You’re still with us?  Good.

The way we are doing this analysis is somewhat interesting.  We generate all paths between a pair of nodes, (in this case ‘start’ and ‘end’), up to a certain length.  We then order the paths by length.  We then look for the node that mitigates the most shortest paths.  (i.e. we look for a node that is in the shortest path, the next shortest, the next shortest, etc.  We remove that node, and retest the new shortest path length.  We repeat this until there are no more paths from start to end.

The figure above includes the 9 patterns as well.  (Note: There’s a bug in nvd3 that causes errors if points are exactly on top of each other.  The fix is in nvd3 1.8 which we will upgrade to when it comes out.  Until then, if you aren’t able to select points, just reload the page or the iframe page specifically.)  Some of the patterns are defined by specific actions or attributes in which case mitigating them solves the pattern leading to trivial solutions.

Notice this allowed mitigation of both actions and attributes.  It’s easy to understand how to mitigate an attacker action such as blocking command and control.  Sometimes it’s harder to understand mitigating attributes.  However, doing things such as not storing payment data or strongly encrypting all credential data show that attributes can be mitigated within VERIS.

You can also analyze edges.  This would represent if, rather than mitigate  completely mitigate an action, you simply mitigated an attackers ability to progress from an an action/attribute to an attribute/action.  Mitigating nodes removes all edges associated with a node so mitigating edges is a much slower process.  It also has different steps of mitigation similar to those seen in the bar chart in the DBIR Attack Graph Web App.  You can see the same chart as above but with edges mitigated in the figure below:

One of the reasons that the improvements flatten out is that we are considering attacks on any attribute.  That means we are preparing for nation-state espionage attacks, malicious theft of documents, simple disposal errors, and many other breaches.  The reality is, most organizations are not equally under threat from all types of breaches.  In future analysis I hope to show the benefit of focusing on specific attributes to mitigate.  By focusing on them, hopefully the possibility of truly mitigating the risk will be obtainable.

It is worth noting that this is based off of the DBIR Attack Graph and so inherits all underlying assumptions about the creation of the graph and the role paths play.  Those can be reviewed in the DBIR Attack Graph: Redux! blog.

If you want to replicate the analysis, simply download the DBIR attack graphs (DBIR 2015, Crimeware Pattern, Cyber Espionage Pattern, Denial of Service Pattern, Lost and Stolen Assets Pattern, Miscellaneous Errors Pattern, Payment Card Skimmers Pattern, Point of Sale Pattern, Privilege Misuse Pattern, and the Web Applications Pattern.  Then use the code below to either generate the set of node or edge mitigations.   Regardless what you do, you will first need to set up the environment.

 Node Mitigations:


Edge Mitigations:


Weekly Intelligence Summary Lead Paragraph: 2015-06-12

InfoSec intelligence collections this week were driven predominantly by targeted attacks, and fortunately some included actionable observables.  But the targeted attack on the Office of Personnel Management (OPM) reported last week is not among those with new actionable intelligence that the VCIC has high confidence is related to that breach. We did collect quality intelligence about OPM, especially from ThreatConnect’s follow-up to last week’s OPM-focused assessment which was drawn from intelligence they published in February as related to the Anthem breach. A new variant of an old foe, Duqu 2, drove both quality and quantity of targeted attack collections.  Kaspersky was among the targeted companies and has the best, albeit technical, assessment with indicators.  CrySyS, the company that discovered the original Duqu, also published an assessmentWired has a concise summary.  Der Spiegel, in German, reported the targeted attack on the Bundestag is continuing and speculates it may be very expensive to resolve; Reuters report in English. Le Express, in French, (BBC report in English) reported the April attack on the TV5Monde network was the work of APT28 and not a Middle Eastern actor with affinity to ISIS, as was previously reportedTrend Micro’s Rik Ferguson posted a concise blog entry skeptical of the theory that TV5Monde attack=APT28.  To-do lists should be refreshed to add deploying new Adobe Flash Player versions and then the new patches announced by Microsoft, including one zero-day exploited by Duqu 2.

The DBIR Attack Graph: Redux!

In this blog I previously introduced the idea of building an attack graph from the Verizon Data Breach Investigation Report data.  The wheels of industry did not stop with the blog post and we have updates!

You may want to review the previous blog to get an idea of what’s going on.  The basic idea is to use VERIS data from the DBIR to build an attack graph showing us what attacker actions lead to compromise of which confidentiality, integrity, and availability attributes.  Nodes in the graph represent either the various action or attribute enumerations within VERIS.


First, we have an updated methodology!  I wasn’t particularly excited about the need for static mapping so, now, we have a methodology that no longer requires it!  We accomplish it in two passes:

In the first pass we look for breaches with a single action/attribute pairing.  From these breaches we can definitively conclude that the action leads to the attribute.  This will form the basic mapping that our static file provided last time.

In the second pass we use some sequential logic to actually build the graph.  We collect all actions and attributes in each record and form all potential action->attribute pairings.  We then process them in the following way:

  1. If an action->attribute relationship exists in the basic mappings, add it to the attack graph and remove it from the relationships list. (For example if action of social.Phishing and attribute of integrity.credentials alone in
  2. If an action->attribute relationship is not in the basic mapping, delete it
  3. If actions are left with no mapping to an attribute, map them directly to all actions which did have a mapping to attributes
  4. If multiple action->attribute pairs exist in the record, create backwards mappings from the attributes to the actions they were not mapped to.

In step number four, we do not know which action->attribute pair happened first so, to be safe, we create both.  This is, however, an area for improvement.

Once the logic is applied, we do a few other things.  We count the number of times each relationship and node are found as well as capturing the type (action or attribute) and sub-type for the node.  Finally, we adjust the ‘weight’ attribute of the nodes and edges so that smaller is more common.  (This allows us to use shortest-path algorithms.)


Let’s take an example.  We start with a mapping file of social.phishing -> integrity.alter behavior and malware.web drive-by -> confidentiality.data.personal   A VERIS breach has'actions social.Phishing, malware.web drive-by, and hacking.Mail command injection. It also has attributes of integrity.alter behavior and confidentiality.data.personal.  The action -> attribute pairs would be:

  • actions social.Phishing -> integrity.alter behavior
  • actions social.Phishing ->confidentiality.data.personal
  • malware.web drive-by -> integrity.alter behavior
  • malware.web drive-by ->confidentiality.data.personal
  • hacking.Mail command injection -> integrity.alter behavior
  • hacking.Mail command injection -> confidentiality.data.personal
  1. In step 1, social.phishing -> integrity.alter behavior and malware.web drive-by -> confidentiality.data.personal exist in the mapping and the record so nodes for the actions/attributes are added to the graph (or their count incremented) and edges for the relationships are added (or incremented)
  2. In step 2, all other pairs are removed because they were not in the mapping.
  3. In step 3, hacking.Mail command injection is left without any pairings so it is created in the graph, (or it’s count incremented), and edges from it to integrity.alter behavior and confidentiality.data.personal
  4. In step 4, because we don’t know which of the action->attribute pairs in step 1 occurred first, we create both potential edges, (with the hope that as we process multiple breaches the common backwards paths will appear more regularly than the ones which did not occur).  We create or increment edges for integrity.alter behavior -> malware.web drive-by and confidentiality.data.personal -> social.phishing

We repeat this process for every breach in a large corpus, effectively treading and retreading the actions, attributes, and relationships which occur most often.

The Results

Full DBIR Corpus Attack Graph:

2015 DBIR Corpus Attack Graph:

Frankly, just looking at it, it looks a lot like the graphs from the first blog, just a bit more blurry.  It is nice to click around it a bit.  The real power comes when you start to analyze the graph. To do our analysis, we primarily focus on two tasks: We will find the shortest path between two nodes and we will measure the length of paths.  Path lengths will use the Dijkstra’s algorithm implementation from networkx.  Path lengths will simply be the sum of the edge lengths in the paths.

The Analysis  – Single Graph, Multiple Paths

A few weeks back at bsides Nashville, Trey Ford challenged me to find a way to use the DBIR to recommend specifically what organizations should fix based on the DBIR data.  He was looking for something that could take the DBIR data and recommend controls.  While we aren’t quite there, we’re very close.  With the DBIR attack graph we are able to recommend the VERIS enumeration to mitigate and able to quantify the effect of doing so.  Add a VERIS<->control mapping and we’ll be there!

What we will mitigate are either attacker actions, preventing them from taking the action, or the compromise of an attribute.  For example, implementation of two-factor authentication might mitigate either the compromise of confidentiality of credentials or the action of a hacker using the stolen credentials.

Picking an action or attribute to mitigate is harder than it sounds.  In reality, we can make educated guesses, but cannot say definitively which would be best.  This is due to removing a node potentially mitigating multiple of the shortest paths between a pair of nodes.  Unfortunately, being sure you’ve found the best node to remove is very hard. Our first method for guessing is to calculate the shortest path between each action:attribute pair.  We can then guess that the node that exists on the highest number of those paths is a good candidate to mitigate as, by taking it out, we have removed the highest number of shortest paths.  This is also known as betweenness centrality.  Another method we can try is a variant of eigenvector centrality based on Google’s pagerank modified to always jump to the action nodes when no other path exists.  Most means for identifying candidates to mitigate will be different methods for estimating centrality, modified to deal with action->attribute paths.

To measure the effectiveness, we will capture the shortest paths and their lengths before and after removing the node.  We can then compare which paths were removed by removing the node and measure how the distances increased in the remaining paths.  Below are some of the most effective examples from the DBIR.  The following analysis included all actions and all attributes.  It could just as easily have been run for a single action and all attributes if attempting to mitigate a specific action.  Alternately it could have been run for a single attribute and all actions if trying to prevent compromise of a specific attribute.

This should make the benefit of preventing unauthorized software installation clear.  It removes nearly 10% of potential attack paths and increases the length of the others by over 5%.

(This is a good time for a sidebar on what the length means.  In this case length is roughly equivalent to the inverse of the number of times the relationship or enumeration was seen in our corpus.  It suffers from the same limitations as the DBIR; i.e. it is skewed toward the types of incidents our partners can provide.  While it doesn’t represent any specific absolute value, if we assume that how often an attack happens is correlated with how costly/hard it is to accomplish, we can read the length as how likely/expensive for the attacker/hard for the attacker an attack path is.  So in the case of removing the Phishing action above, attacks became roughly 2% less likely, or 2% more expensive/harder for the attacker to execute.)

The Analysis  – Single Graph, Single Path

We may only be concerned about a single path, say from compromised credentials to use of stolen credentials.  In this case we don’t just want to mitigate the shortest path, but the next shortest path, and the next, and so-on, increasing cost to the attacker as much as possible.

When we run the analysis, we find that mitigating the direct path from phishing to compromise of an organization’s secrets causes an almost 9% increase in the attack difficulty.  Once the direct path is mitigated, we can mitigate either the use of stolen credentials or software installation to get another 8.5% increase.

Let’s say your primary concern is cyber-espionage.  We can rerun the analysis with a graph made solely of cyber-espionage breaches:

For cyber-espionage we see a significant difference.  Removing the direct link represents a 36% improvement; significantly more than in the overall DBIR.  We also notice that the nodes to mitigate have changed.  No longer are stolen creds and software installation the target, but instead preventing phishing from altering behavior and prevent hackers from using backdoors or C2.

The Analysis  – Multiple Graphs

The previous analysis methods dealt with comparing a graph to itself.  However, one key point of the 2015 DBIR is that different organizations experience different types of attacks.  The patterns clearly define 9 unique types of attacks and Figure 19 shows how different sub-sectors of industry bear closer resemblance to each other than to the rest of their industry.  As such, we may want to compare an attack graph built off of breaches that match the cyber-espionage pattern to the overall DBIR corpus.

Using the same two graphs as above, we will instead compare them to each other.  For each node and edge, we divide the graph to analyse (cyber-espionage) by the baseline graph (the overall 2015 DBIR) then invert the value since smaller is better for the attacker.  We also have to list out nodes and edges that are in one graph and not the other as creating the cyber-espionage subset of the data may have eliminated nodes or edges completely from the graph.

 It’s clear from this data that Phishing is significantly more important in the cyber-espionage pattern than in the greater corpus.  This agrees with the cyber-espionage section of the DBIR but goes beyond it to show us which edges are more common as well.  We can expand on this by looking at the differences in path lengths between the two graphs in the next section.

The Analysis  – Multiple Graphs, Multiple Paths

This is actually very similar to a single graph with multiple paths, except that instead of comparing a ‘before’ and ‘after’ graph, we compare the graph to be analyzed and the baseline graph.  As we do it, we find which paths exist in one and not the other as well as which paths are significantly more likely in one than in the other of the paths that are shared.  This helps us understand the unique paths attacks will take in the analyzed graph compared to the baseline.  We will continue the cyber-espionage analysis to see how it’s paths differ from the overall DBIR corpus:

By comparing the cyber-espionage attack graph and the 2015 DBIR, we clearly see what makes espionage unique.  Phishing leading to compromise of an organization’s secrets is 44% more likely in cyber-espionage than in the greater corpus.  We also see web drive-by leading to secrets taking the 4th spot with a 35% increase.  This helps pull in to stark resolution what makes a cyber-espionage attack unique.

We can also analyze things that are unlikely in cyber-espionage compared to incidents overall.   We see how hacking and malware leading to repurposing of systems or personal information are simply not as important in cyber-espionage.  Remote File Inclusion (RFI) leading to the compromise of personal information is almost 5x less likely in cyber-espionage breaches than in general.  Knowing what is both more likely, as well as what is less likely, provides us robust information to plan our cyber defenses based on our threats.


Hopefully this shows the potential of using breach data to make practical decisions to prevent specific threat actor actions or mitigate the effect of breach of specific attributes.  And to help you get started, you can pick up the scripts for building the attack graph from either VERIS JSON or verisr dataframes in the VERISAG repository!  Give it a try on the VCDB data set or start collecting VERIS breach data within your own organization and analyze it yourself!  And consider giving back to the community by become a DBIR partner so that next year, what you’ve learned helps shape our understanding for the industry!

A DBIR Attack Graph Web App!

A few weeks ago, I published a blog on using the DBIR to generate attack graphs.  It was meant to show how, in aggregate, VERIS data, could be used, not just to do summary statistics as in the DBIR, but also to create an attack graph identifying what paths attackers take.

While the blog laid out some potential avenues for analysis of the DBIR attack graph, doing that analysis isn’t for the faint of heart.  It takes an understanding of graphs and the types of algorithms that can be used with them.  I’m back in this blog telling you I’m hooking you up!

Presenting the: DBIR Attack Graph Web App!

To learn about the DBIR Attack Graph Web App, watch the above tutorial video, skip to the usage section of the video, or read on below!

What You’re Doing

The DBIR Attack Graph Web App is meant to make analyzing DBIR attack graphs simple enough anyone can do it!  You only have two things you have to input:

  1. What worries you: This lets you subset the data down to just the industry or pattern you are interested in.  Behind the scenes, it is loading a graph derived from just that subset of the DBIR data.
  2. What you are trying to protect: This lets you choose what you want to stop.  If you are a technology company, you might choose protecting the confidentiality of your corporate secrets, copyrighted information, Internal information, and source code.  If you are a medical company, you may choose the confidentiality of your medical, personal, and payment data.  And if you are a web services company, you might choose the degradation or loss of availability and defacement (i.e. integrity of your web site).

Thats it!  Hit the ‘Analyze’ button and in a few seconds, the back end will calculate the VERIS action or attribute is best to mitigate and, more importantly, will quantify the improvement for you!

Let me repeat that.  It will quantify the improvement for you.  This is an area we’ve been severely lacking in as we engineer our defenses against threats.  We can guess at how well mitigating a vulnerability or implementing a control will help us, but until this point, it was basically impossible to quantify.  That’s because implementing security controls is like locking the door to your house.  You can do it, but if you didn’t also close the window, you didn’t make breaking in any harder for the burglar.  The attack graph web app takes this into account as it quantifies the improvement.

Now, that’s not to say this analysis is perfect.  Many assumptions are made in making the graph as well as a few in the analysis.  The assumptions are documented here.  That said, this is significantly ahead of where we are today.  I look forward to getting to improve it even more to help organizations spend the least amount of resources to make attacks as hard as possible for the bad guys!

What You’re Seeing

To help make it easier to understand, we’ve also added visualizations.  When you select what worries you, a graph representing that subset of the DBIR data will be show on the right.  The attack paths associated with it will be shown below.  Similarly, when you choose what you want to protect, the paths from all other attributes to the end node in the upper left will be grayed out and only paths ending in the attributes you want to protect will be shown in the list of paths below.  Finally, when you analyze the data, the paths after analysis will appear in the list of attack paths, allowing you to compare the before and after states.

For Example

Let’s say you are an IT company operating a web service with a lot of customers and a lot of intellectual property, but no bank information.  You want to know how to plan your defenses.  In the “What worries you?” dialog, you could choose “Sector 51: Information”.  When you do this, you will see the graph on the right change to represent just breaches for the Information sector.  The bar chart below will also change to only represent paths within that graph.

Because you want to protect your customers, from the “What are you trying to protect?” dialog, you could choose “Credentials”, “Copyrighted”, “Digital certificate”, “Personal”, “Internal”, “Source code”, “System”, and “Secrets”.  When you choose those attributes, you will notice paths gray out in the graph.  This is because you have said not to consider paths from some attributes.  You will also see a subset of paths in the bar chart below.  These paths will be restricted to ending in the attributes you chose.

When you click “Analyze”, the server will figure out what to mitigate and will calculate the effect of that mitigation.  It will report back: “Mitigate action.hacking.variety.Use of stolen creds to eliminate 4.8% of attack paths and improve defenses on remaining paths by 11%.”  This means that by preventing the use of stolen credentials by the attackers through a control such as 2-factor authentication, you could remove 4.8% of attacker action->compromised attribute pairs you wanted to protect.  Of the 95.2% of action->attribute paths which still exist, in aggregate they are now 11% harder for attackers to accomplish.

In the bar chart at the bottom of the web app, you will now see the paths after mitigation.  Some paths will no longer be there, as they were eliminated.  Others will be longer because the path now takes more or different steps.  Mousing over the paths will provide the attacker action and associated compromised attribute.  We can see that the shortest path is no longer “Action.hacking.variety.Use of stolen creds -> Attribute.confidentiality.data.variety.Credentials” but is now “Action.hacking.variety.Brute force -> Attribute.confidentiality.data.variety.Credentials”.


The old adage that the defenders have to be right every time and the attackers only have to be right once still holds.  Any path will do for the attacker.  But not all paths are created equal.  Nor are all attackers.  This tool makes it possible to understand the holistic effect of mitigating different attacker actions or attribute compromises and hopefully helps organizations plan a better defense.


There is A LOT going on on the back end of the web app.  If you’d like to peel back the curtain and see truly how the sausage is made, head on over to this blog where I go into detail about how the attack graph is generated, how the analysis is conducted, and even where you can get the code to do the analysis yourself.  It includes the ability to load your own VERIS data as well as multiple analysis functions not implemented in the web app so I highly encourage those interested to give it a read!


Weekly Intelligence Summary Lead Paragraph: 2015-06-05

The VCIC collected several reports this week from promising new intelligence sources. CyberX Labs released an analysis on BlackEnergy 3, which it believes is a new variant in the BlackEnergy malware family capable of exfiltrating data from compromised systems. Blueliv published a report on both the Dyre and Dridex families of banking malware. The analysis includes metrics compiled during the company’s time studying the malware. While both sources are seemingly new and have an unestablished history, the VCIC looks forward to future reports from both vendors. Thursday, the Office of Personnel Management announced a data breach that exposed the personal information of over 4 million current and federal employees. The VCIC is working diligently to collect actionable intelligence regarding this incident.  We continued to collect reports related to the data breach suffered by Germany’s parliament two weeks ago. Few details of the incident have been officially disclosed to the public, though anonymous sources close to the investigation report that the malware used in the attack was similar to code used by Russian threat actors in a previous attack against the German government. Speaking of attribution, ThreatConnect makes the case as to why adversary intelligence and attribution should be an important part of an organization’s security program. This week’s quarterly metrics report comes via Verisign, which published its DDoS Trends Report for Q1 2015. The VCIC highly recommends adding the ThreatConnect and Verisign publications to your reading lists.

The Other DBIR: Database Breach Investigations Report

As we all know, the DBIR is the Verizon Data Breach Investigation Report. But what about Databases? This post will look at breaches of databases in the DBIR data set. We will do this by filtering the veris DBIR data using verisr. We’ll look at the Actors, Actions, Assets, and Attributes common for database breaches. To find the database-related breaches, we will filter for breaches where the asset breached was a database server. We can also look for where the attacker’s action was SQL injection:

Initial Inspection of our dataset shows 1224 incidents across 11 years from almost 40 partners:
db breaches by year

Of those breaches, most have data disclosure, (which makes complete sense for databases). The ones where data disclosure is not ‘yes’, are basically all ‘unknown’. While it’s an assumption, it’s a fairly good one that these breaches also disclosed data and that the auditing capabilities were simply not in place to tell.
db disclosure


The majority of actors attacking databases are external. And the majority of the external actors are organized crime with a financial motive. Many times the visible attacks (for Ideology, Fun, or Grudges), get the press, however they are in the minority of external actors and even less in the internal actors.


Internal actors tend to be end users though they also exhibit the same financial motive.


The real fun comes when we analyze the actions of the hackers. We can’t discount, though, that Misuse, Social Engineering, and Physical actions make up sizable portions of the actions. I’ll point out why I left Malware out of that list below.



First, hacking. Nothing surprising here. Lots of actors hacking databases using SQL injection, 90+% of the time using a web application. That said, you can’t discount the 50% of the time stolen credentials are used to compromise the database. You also can’t discount the backdoors and command shells which leads us to Malware.



At first blush, it looks like malware is a significant hacking action, but not for the reason you think. When you filter for just where action.malware includes “SQL injection”, you only find seven incidents. Yes, seven where the malware hacked the database. Instead, malware is a facilitator for hacking actions.


Social, Misuse, and Physical

While Hacking may be the main path in, about one sixth of database incidents involved a social action. Primarily this was pretexting (pretending to be someone else), either in-person or by phone. There is also a good amount of bribery and the ever-present phishing.


Misuse on par with social actions. If you’ve ever heard of a nurse looking up a celebrity’s medical records or a law enforcement officer caught looking up their ex’s spouse’s police record, you’ve heard of misuse. As can be expected, it’s happening on the LAN and at the terminal.

And why should misuse just be fore insiders? Malicious actors can many time visit the victim’s public area, and connect to the local network. In this case, it’s not misuse, but it sure is similar.

Assets and Attributes

So we know it’s hacking or, to a lesser extent, a slew of other actions. But what is compromised and taken? To answer succinctly, web apps are compromised and confidentiality is breached. We see a large breach of integrity as well, however, like malware actions, most can be attributed to compromises of integrity used to breach confidentiality. (For example, breaching web app integrity to use SQLi to dump the database.) The breaches of availability can almost all be attributed to loss, where the database, or a portion thereof, was deleted.


We can, however, find some interesting insights by looking deeper at the size of the breaches. We see some interesting patterns:

  • Personal Data:
    • Breaches of individual people’s data are more likely to be small with a [long tail](https://securityblog.verizonenterprise.com/?p=6740). This includes breaches of bank records, credentials, and medical data.
    • This doesn’t apply to personal data (social security numbers) where there are even number of small, medium, and large breaches.
    • It also doesn’t apply to payment data (credit card data) where almost all breaches are in the 10’s to 100’s of 1000’s of records.
  • Corporate Data:
    • Much more distributed around the center of the range in the 1000’s and 10,000’s of records, regardless of whether the data is Internal, Secrets, or copyright data.
    • Note: The corporate data is not as often stored in a database and so therefore the sample sizes are small.


Timeline to Discovery

I’ll be honest, the timeline doesn’t look good. Compromise and exfiltration happen in hours and minutes respectively. Detection happens in months and is primarily external. (Caveat: This is over the entire timeframe. Over the years, we have seen improvements in discovery and so, hopefully, more recent database breaches are the ones caught faster.)


What Can You Do?

  1. Find all of your customer data. It’s easy to overlook basic credentials to unimportant websites. Don’t just focus on financial data, focus on _all_ customer data.
  2. Protect web apps from SQL injection. SQL injection is a very small subset of the DBIR data set. That said, most SQL breaches are hacking attacks on the web app. Look to trusted resources such as ([NIST](http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-53r4.pdf), the [Council on Cyber Security](http://www.counciloncybersecurity.org/critical-controls/), or [SANS](https://www.sans.org/critical-security-controls/) for example), for controls which can help mitigate SQL injection risks in web apps.
  3. Audit your database. The only reason it should take months to catch a database breach is that either the correct logs weren’t collected or they were collected and not monitored. Better monitoring and response should fix that.
  4. And if you monitor your database, pay special attention to misuse. Databases present a significant temptation to even the most honest people. Letting them know you’re watching will help keep those temptations in check, and help you rectify them when they do happen before they become a breach that ends up in the DBIR.
  5. Consider loss prevention solutions which may detect/respond to the loss of information from your network.

Weekly Intelligence Summary Lead Paragraph: 2015-05-29

The charming InfoSec nickname-of-the-week, courtesy of FireEye, was The Teenage Mutant Malvertiser Network. FireEye also reported a new point of sale malware variant, NitlovePOS.  The Verizon Cyber Intelligence Center continues to collect intelligence on criminal attacks on SOHO and consumer-grade network hardware.  Kafeine reported a new exploit kit dedicated to hardware attacks and CSRF Pharming.  The VCIC is on the lookout for links between hardware attacks and DoS attacks perpetrated by threat actor “DD4BC.” This week, Incapsula reported that actor has begun to target the payment industry.  Among the most detailed collections was ESET’s report on the Moose worm spreading on Linux consumer-grade hardware platforms with default passwords.  The data breach impacting the largest number of victims was of 10 million users of the Gaana music streaming site.  Yet another US-based healthcare organization reported a breach as Beacon Health Systems notified about 220K customers of a 15-month long breach that ended in January.  “Only” about 100K  taxpayers are among the victims of a data breach the US Internal Revenue Service reported this week shortly before we learned they’ve reduced cyber-security staff by 11% over the last four years.

Tweetalytics of DBIR-2015

Following the public release of DBIR 2015 on April 15, there was a flurry of activity on Twitter. Last year we looked at how the InfoSec community reacted to the shellshock vulnerability on Twitter.  Once again we turn to our favorite social media platform to assess the impact of the DBIR on the InfoSec world.

Data Size and Potential Reach of DBIR

We sampled a little over three thousand original tweets in April 12 to May 15, 2015 timeframe. Add to that about 1,800 retweets, and we got a grand total of roughly forty-eight hundred tweets to work with. Not bad considering that InfoSec is not exactly what most Twitter users have on their mind when they turn to Twitter. There were roughly two thousand users involved in all this activity. Almost all, (98%), tweets were in english, which could mean that either all DBIR related awareness is restricted to English speaking countries, (which is disappointing).  Alternately, InfoSec professionals in non-English speaking countries may tend to be well versed in English and prefer it for tweeting about InfoSec. We suspect it is little bit of both.

Not surprisingly the tweet that got the most attention in terms of both number of retweets and number of favorites was the one announcing the release of the DBIR by our official corporate handle, VZEnterprise. That single tweet was favorited 72 times and retweeted 212 times giving it a potential viewership of around 19 thousand viewers, (the number of followers of that account). If we simply add up the number of followers of all the users in our dataset we get a grand total of roughly eleven and a half million potential viewers. This number may seem large but is not surprising considering the presence of some very high profile corporate/media twitter accounts like Forbes Technology and Engadget in our data. The single most influential personal twitter account in terms of number of followers was that of Craig Brown Ph.D with around four hundred thousand followers.  Who says nerds have no social lives?

Spread and Top-N Stats

Figure 1: Spread of Twitter activity related to DBIR 2015

Figure 1: Spread of Twitter activity related to DBIR 2015

Before looking at some of the Top-N stats, let us start by looking at how the news about DBIR spread. Figure 1 shows the date wise histogram of all the activity so far. Not surprisingly, the majority of the activity was immediately after the release.  After all, the DBIR is probably the most anticipated InfoSec report of the year.  An amusing pattern is the drop in activity over weekends; Yes we InfoSec folks need to break from work too :). The bump in activity around 5/12 is from the ‘DBIR Puzzle solved’ announcement.

Figure 2: Top 10 Hashtags

Figure 2: Top 10 Hashtags

No surprises to be found in the top 10 hashtags shown in Figure 2. Also, Figure 3, the top 10 user accounts in terms of number of tweets posted related to the DBIR, shows that we, (The Verizon accounts), don’t spam the InfoSec world, but rather rely on word of mouth for DBIR awareness.  We do get a lot of mentions in DBIR tweets (are you surprised by this ?) as can be seen in Figure 4

Figure 3: Top 10 Active Users

Figure 3: Top 10 Active Users

Figure 4: Top 10 Users Mentioned

Figure 4: Top 10 Users Mentioned

Text Analytics

We have kept it simple so far with some basic statistical measures, but things get complicated from here on. Put your nerd cap on and be prepared to be drowned in the sea of Text Mining/Analytics.

Figure 5: Word Cloud of Terms

Figure 5: Word Cloud of Terms

We start off light by building a word cloud of terms found in the tweet texts. The first word cloud we built was not very informative because it was biased towards terms like ‘Verizon’, ‘Data’, and ‘Breach’.  While we expected those to dominate, they didn’t really convey a whole lot of information.  Figure 5 gives a much better picture after we filtered out these terms.  Clearly we see two very popular themes from the DBIR: the cost-per-record model and mobile malware analysis. But a word cloud is not the best of ways to visualize information in text, read on to find out more.

Figure 6: Frequently Occuring Terms

Figure 6: Frequently Occurring Terms

Figure 6 shows the most frequently occurring terms in our dataset. The line width conveys the strength of association between two terms, (i.e. how often they were found together in a tweet). We can also see the mobile malware analysis and the cost model clusters forming in this figure. We also found a third cluster: that of tweets to links with takeaways from the DBIR by the InfoSec industry. Isn’t this much better than the word cloud? Hold on don’t get excited just yet we have more…

OK enough lightweight stuff, let’s dive into two heavyweight concepts of text mining: Latent Dirichlet Allocation & Latent Semantic Indexing. Latent Dirichlet Allocation or LDA for short helps us discover underlying topics in text and Latent Semantic Indexing or LSA for will help us identify patterns in our dataset.

Figure 7: Topics in DBIR tweets

Figure 7: Topics in DBIR tweets

For our topic modeling we sliced and diced out tweet terms, passed them through some clean up procedures and built a LDA model and plotted it in Figure 7.  The five topics which emerged are plotted across time.  This is perhaps a much better way to visualize topics/clusters found in DBIR tweets and how they trended across time.  We did tell you all this “advanced analytics” stuff was quite useful, didn’t we?  Wait we are not done just yet.

Figure 8: Multidimensional Scaling of Tweets

Figure 8: Multidimensional Scaling of Tweets

Next up we introduce the concept of Multidimensional Scaling (MDS) a technique used to visualize high dimensional data on a two dimensional space.  The dimensions in this case being the unique terms found in our tweets.  Once again, we sliced and diced our data, passed it through a blender, (well a multidimensional scaler), and finally plotted tweets related to the same four topics as our LDA in Figure 8.  What’s going on here?  There seems to be a few clusters but the tweets are spread all over the space.  Are the tweets that semantically different from each other?  Find out below…

Figure 9: Multidimensional Scaling of Tweets after Semantic Analysis

Figure 9: Multidimensional Scaling of Tweets after Semantic Analysis

To answer this question we turn to LSA.  So, after performing LSA, ( which gives us a measure of how close the tweets are to each other semantically), and once again passing them through our multidimensional scaler, we get Figure 9. Now this is much better.  As it turns out, most tweets are indeed semantically similar to each other, which is not surprising considering we’re dealing with one large topic, the DBIR, a few sub topics, and of course the 140 character limit. After all, how creative can one get in 140 characters?  Wait, don’t answer that.

What is interesting is that, although we get distinct clusters using LSA + MDS, our topics didn’t seem to segregate into these distinct clusters like we had anticipated. In other words, with LDA we saw a very distinctive separation of Topics, but with LSA+MDS, although we saw most activity falling in to semantically separate clusters, those clusters didn’t map to the topics. Maybe because LSA and MDS are not really topic modeling techniques.

Graphing the DBIR Twitterverse

You didn’t think we were going to let you go that easily, did you?  Finally, we present some graph analytics for your reading pleasure; (Well more for viewing pleasure). Graph analysis is very useful technique for looking at how various entities interact with each other.

Figure 10: Graph of Tweet Terms

Figure 10: Graph of Tweet Terms

Figure 10 shows the association between frequent terms that occurred in our tweet dataset. This is remarkably similar to Figure 6 with a addition of few more terms. The color of the edges connecting the terms denote the cluster where that association was most observed. But this graph is just a repeat of what we have already seen, and as such not really interesting. Ready to be blown away ?

Figure 11:Graph of Twitter Conversations related to DBIR

Figure 11:Graph of Twitter Conversations related to DBIR

Figure 11 is the crowning glory of all of our Tweetalytics.  What you see is a graph of all the conversation in our dataset.  The blue circles are Twitter accounts, (Users), and the red circles are original tweets.  There are 3 types of activities, (i.e. edges), going on:

  1. A user tweeting a tweet.
  2. A user retweeting a tweet.
  3. A user retweeting another user.

For us, this was a fascinating view of the conversations that allowed us to quickly make some visual observations:

  • The huge cluster in the center is where the core of interactions, (in terms of retweeting), is happening.
  • In that center cluster, you see our most popular tweet and the corresponding corporate account that posted it.
  • In addition to it there is a lot of interaction going on and the accounts are mostly InfoSec related accounts.
  • The ‘BYOD News’ bot account you see on bottom right posted a total of 68 tweets, but it is far apart from main activity as it has no interactions with traditional InfoSec accounts/tweets found in the center.
  • The second cluster bottom center is most probably a bot which seems to tweet out stuff about Books and Reports and such. Again it is far away from the center cluster and has no interactions with InfoSec tweets/users.
  • The outer ring is comprised of one-off tweets/retweets from user accounts, who did not interact with the center cluster and had limited reach.  In other words, over 75% of the users tweeting about DBIR were doing so individually; sharing their interest in the DBIR even if they had no interaction/sharing with the more popular accounts in the center.

Muchas Gracias to Gabe Bassett, fellow co-author DBIR 2015 and our resident Graph Analytics expert, for his tips and insights for this section.

Final Word

So what does all this mean in layman terms? (OK you can take off your nerd cap now.) Apart from our love of Twitter as a platform to rant and rave about every silly personal detail of our lives, there are actually some very useful ways in which Twitter can be used for things like information distribution, customer engagement, event management, and brand awareness. Although the InfoSec community on Twitter is very small compared to the global Twitter user base, it is very vocal and very much active.  The DBIR aims to raise awareness about data breaches, what causes them, and how to prevent or mitigate them.  Twitter can be used as one of the platforms to raise the awareness about the DBIR itself in addition to other traditional means like news outlets, conferences, and presentations. And we did see data to confirm this. What is encouraging is that, more and more, industry professionals, not just in traditional InfoSec domains, (based on the User accounts we saw in the dataset), are becoming aware of the issues related to data breaches and spreading the knowledge.  In the process, they are broadening their own understanding of the problem. That is, after all, why we do the DBIR year after year.