The 2014 Data [In]Security Hall of Fame

2014 Hall of Fame Logo
image-6653
Ahh the holidays. A time when we think about goodwill towards our fellow man, exchanging gifts, and of course making lists! All the good boys and girls know that one of our projects here at Verizon Security Labs is the VERIS Community Database (VCDB), a free repository of breach incident data available to the public. As we go through the year adding cases to the dataset, we mark some of them as being “Hall of Fame” (HOF) candidates. So this year, instead of making yet another set of predictions of what to expect in 2015, we decided to review our nominees for the 2014 Data Security Hall of Fame.

Now, this is not intended to be a list of the biggest breaches, and not all of them are supposed to be funny. Think of this as our curated list of the most interesting data security events of 2014 in the VCDB.

The law fought the law…and the law won

The first story of 2014 destined for the HOF goes to an event that actually happened in 2013, but was reported nationwide in January of 2014. A county sheriff in West Virginia was going through a divorce and wanted to get information about his wife’s suspected new love interest. So naturally he put a keylogger on her computer … at work … on a computer belonging to the West Virginia Supreme Court. This incident made the HOF because we honestly don’t see a lot of incidents involving physical keyloggers and we don’t see many incidents where a law man is the threat actor. That makes this a very rare and unusual incident indeed.

Honorable mention goes to an incident that was reported in December of 2013. We had to disqualify it from the 2014 Hall of Fame because it wasn’t even reported in 2014, but it’s still an interesting read. The American Civil Liberties Union had been trying to get a copy of an FBI interrogation manual but could not due to the manual being classified. However, in an ironic turnabout, the document had been checked into the Library of Congress (thus making it a public document) by an FBI agent that was attempting to register a copyright of the work.

Bringing A New Meaning to “Brute Force”

Lost and stolen devices have proven to be a major concern for the Healthcare industry. In fact, 52% of the IT security incidents affecting Healthcare in the VCDB Explore Interface are lost or stolen devices. Full disk encryption would have prevented the disclosure of data in almost all of these incidents.

We say “almost all” because of a story that made everyone immediately think of this comic. A doctor from Brigham and Women’s Hospital in Boston was robbed at gunpoint by two individuals who stole his mobile phone and laptop computer. The assailants tied the doctor to a tree and made him enter his password into the phone and laptop to get around the devices’ encryption.   So much for all those breach notification letters that say the criminals are after the value of the asset, not the data inside it.

Public displays of hacking

Website defacement is often used as a means of spreading political messages. Groups like the Syrian Electronic Army and various factions of Anonymous have been prolific hackers that spread messages in support of (or opposition to) governments around the world. Let’s be honest, though, website defacement is getting a little boring. One group decided to step up their game in August. A group of hackers calling themselves the Anti-Communist Party Hackers managed to take over a Chinese television station and began to place pro-democracy overlays on top of the live news. It took several hours to eject the hackers and the Chinese government spent days purging the Internet of images and discussion about the event.

Heads up to all those corporate big-wig phishers–Anon Ghost is watching you.  In March, the hacktivist group boasted about defacing a Yorkshire Banking site and striking back against fat cat bankers. The only problem was that the website they hacked turned out to be a phishing site that had been made to look like Yorkshire Bank.  Still, phishing must be lucrative or it wouldn’t be so popular–clearly the CEOs of phishing sites should beware.  Now that they’ve become part of the corporate establishment (dare we say the 1%’ers), they’re fair targets.

It’s easy to think of defacement as an important tool for the politically oppressed, but not all vandalism is the work of activists trying to spread a political message. In July the world’s  worst superhero, Florida Man, hacked a road construction sign and changed it to an obscene message. This epic act of hacking really came down to an unlocked panel that provided physical access to a keyboard and a weak or missing password on the configuration console.

Best blunders of 2014

Miscellaneous errors are the root cause of more security incidents in the VERIS Community Database than any other pattern.  They account for nearly a quarter of the dataset. Publishing errors are the second most common variety of error accounting for security incidents. Most of the time these publishing errors are just cases of documents being posted on a website accidentally, but in 2014 we saw a different twist on publishing errors; a blunder so nice we saw it thrice!

This year during the media build up for the Super Bowl, CBS News aired a segment on the physical security in place for the event. At one point footage was shown from inside the command center, and clearly displayed was the wifi SSID and password that they were using. A few months later the exact same set of circumstances played out with the World Cup in Brazil. And then the next month, it happened to the Los Angeles Police Department. This was hardly a new phenomenon, though. Back on 2012 reporters covering Prince William’s service in the Royal Air Force published photos of their wifi passwords and even some sensitive documents.

Another incident in the Oops! category is from when the White House accidentally emailed reporters talking points about the (at the time) classified CIA Torture report.  Now that the report has been released, we wonder if the talking points reflect any subsequent edits.

However, the award for biggest error of the year has to go to Emory University in Atlanta. Emory uses Microsoft System Center Configuration Manager (SCCM) to manage endpoint configuration and automate operating system deployments. Earlier this year, Emory’s SCCM server decided to reformat all university-owned machines and install a fresh copy of Windows 7 right before final exams. By the time anyone figured out that the server had initiated this action, it had already begun formatting itself. Full incident history over at The Wayback Machine.

Really, servers run so much faster without all that pesky data on them!

Format all the things
image-6654

Meanest insider of 2014

Insiders account for about 42% of the incidents in VCDB. Most of these incidents are errors, but when the action is on purpose it’s usually motivated by personal gain. To be sure, stealing from people is bad, but the meanest insider of 2014 goes to the woman who admitted she forged 1300 mammogram reports because she had “personal issues that caused her to stop caring about her job.”  When she fell behind in processing the stacks of mammogram films, her solution was to go into the hospital’s computer system, impersonate the doctors, and give each patient’s scan a clear reading.   Sadly the result is that patients whose positive cancer diagnoses were delayed bore consequences in terms of pain, suffering and shortened life spans.

I stopped caring about my job. Let me just ruin some people' class=
image-6655
 

Most epic hack of 2014

Every year the Academy Awards saves the award for best picture for last, and even though this isn’t an awards show, we decided to do the same. Every year the hackers of the world produce so many truly epic hacks that it’s hard to pick a winner. And so, without further ado, here are the nominees for 2014’s most epic hack.

With all the attention being paid to Home Depot and the Sony hack in the latter part of 2014, it’s almost easy to forget that back in May eBay confirmed that it had been hacked and that 145,000,000 user credentials were compromised making it the fourth biggest hack by record count of all time. Adding insult to injury, their website was defaced as well. The Syrian Electronic Army (SEA) claimed credit for the hack, forever cementing their position as one of the most successful hacking groups ever. SEA gained access to the eBay network by sending phishing messages to a system administrator. eBay stock took a big hit after the announcement, but has largely recovered and is back on the same trend line it has been following since the beginning of 2013.

eBay stock price
image-6656
The Home Depot hack may not have been the biggest by record count, but it is one of the largest breaches of payment card data ever recorded at 56,000,000 cards exposed. That’s bigger than the 40 million cards stolen from Target in 2013. Speaking of the Target breach, the attackers that breach Home Depot gained initial entry with stolen credentials from a third-party vendor that serviced Home Depot; a tactic seen in the Target breach as well.

In December we learned that Sony Pictures Entertainment had been hacked. United States officials have blamed North Korea for launching the attack in retaliation for Sony’s release of the movie The Interview. The enormity of this hack is certain to be something we’ll be talking about for a long time. The attackers have released movies onto the Internet as well as internal email, salary data, employee health information, and also wiped data from Sony computers. Sony has gone on to cancel the release of the movie after the attackers made threats referencing September 11, 2001.  This incident may become the poster child for worst-case (save for human injury or loss of life) impact of a data breach.

The final nominee goes to Clinkle, a startup in the mobile payments market. Although the size of the hack is nothing compared to the other nominees, it does hold the distinction of being hacked before it even launched. Hey, if you’re a 22 year old and someone hands you $25 million, the first thing you must do is take a selfie, right? That is what Clinkle CEO Lucas Duplan did, as evidenced by the picture that was leaked after the company was hacked.  Clinkle was supposed to be the next hot thing in mobile payments, but instead, names, phone numbers and profile pictures of users were released on the internet.  That doesn’t exactly inspire a lot of faith in a mobile payment provider.

Which one of these hacks should win the award for most epic hack of 2014? We can’t decide. Why don’t you tell us your choice by reaching out to us on twitter: @vzdbir

Weekly Intelligence Summary Lead Paragraph: 2014-12-12

Let’s get Sony out of the way first.  There has been no significant new actionable intelligence gathered regarding this breach.  The folks at Risk Based Security have an excellent timeline of the Sony Pictures breach that’s full of details, some analysis and no hyperbole. Collections from Symantec and Bluecoat provided significant new intel about Destover malware.  We collected thoughtful analyses of the Sony Breach from Scott Terban’s Krypt3ia blog and from the opinion piece by Ira Winkler and Araceli Gomes opinion IDG publications.  In the rest of the world, InfoSec risk continued to evolve.  Microsoft released seven bulletins, re-released two and assessed MS14-080, MS14-081 and MS14-082 as more likely to be exploited.  Adobe reported attacks on a new vulnerability in Flash Player with a security bulletin and patch. Adobe also patched 21 vulnerabilities in Acrobat/Adobe Reader and ColdFusion. F-Secure expanded their analysis of Regin with two white papers.  Another espionageware campaign targeting Russia with, some similarities to “Red October,” was the topic of one report on “The Inception Framework” by Blue Coat and another report by Kaspersky: Cloud Atlas. Perhaps they couldn’t agree on “Inception Atlas.”

I Made a Million Models

1 million, 185 thousand, 960 to be specific.  But let’s back up.

The Setup

The common thought is that to be able to wield machine learning models, you need three things:

  1. deep domain expertise
  2. rigorous scientific and statistical acumen
  3. technical computer skills

The idea is that someone will use their deep domain expertise to hypothesis combinations features which can predict the desired variables.  They will then identify appropriate models to do so based on the features and the underlying data.  Finally, they will use their technical skills to train the model in some language such as R or Python.

But there’s another way.  The algorithms to build models in R and Python are refined to the point where appropriate data can simply be put in one end and the model comes out the other.  ‘Appropriate data’ is a bit of a qualification.  A given data set can be formatted to two or three versions which cover all potential models.  This can all be scripted so that the feature observations do not need to be messaged every time.

Once  the technical aspects are covered, correct machine learning algorithms must be picked to generate the model.  If your data supports basically any algorithm, there is less incentive to worry about picking the correct algorithm as they can all be run.  While this implies additional compute resources, the cost of compute is low and getting lower by the day.

Finally, domain knowledge supports picking the features.  Again, the high availability of compute resources means there is less need to pick features as all combinations can be tried.  It does help to try them in a responsible order.  Adding features to try in order of least correlation is a simple way of intelligently picking features.  There is still the need for domain knowledge to identify data sets and identify features to use from the data sets.  Future research will hopefully provide potential options for automatically generating features from data.

The Payoff

To generate the million models, I used  Wisconsin Diagnostic Breast Cancer Data Set as it’s use served a secondary purpose outside the scope of this blog post.  I wrote a script that has three phases:

  1. Set the random seed for repeatability and generate the training and test data sets such that they can support all models.
  2. Order features in order of increasing correlation with the already added features.
  3. Iterate through all combinations of the features for all models.  The model used, it’s sensitivity, specificity, features, and parameters are then saved to a results file.

All models were generated over roughly a week in R on a 2012 Macbook Pro with a 2.6Ghz Core i7 and 8GB of ram.  The model loop was run 93530 times generating 1,185,960 models iterating through combinations of 13 of the 30 available features.  The following progression provides an idea of the models building over time:

11700models
image-6366

11,700 Models

1,440 Models
image-6367

1,440 Models

400,000 Models
image-6368

400,000 Models

1185960_models
image-6369

1,185,960 Models

As we analyze these plots, initially we find models with 100% sensitivity and 100% specificity relatively easily, but not both.  We notice a gap in the upper left corner, particularly with 100% specificity.  However, as we test more and more models, that gap shrinks.  still, even in our final model set, we notice very few models with 100% specificity.  Interestingly, there are four support vector machines (SVM) with 100% sensitivity and specificity including one using just two features.

We can also analyze how the models perform individually:

11,700 Models, 50% Alpha
image-6370

11,700 Models, 50% Alpha

400000_models_separated
image-6371

400,000 Models, 50% Alpha

1185960_models_separated
image-6372

1,185,960 Models, 50% Alpha.

To make the final plot more nuanced, we can turn the alpha down to two percent:

1185960_models_separate_2percent_alpha
image-6373

1,185,960 Models, 2% Alpha

This provides some interesting insights into the model generation algorithms.  First, the circle 3rd from the left on the first row, is a random coin flip (CF) model which is, understandably, centered around 50%/50% and never does particularly well, even across 90,000 iterations.  The Perceptron (P) model is very narrowly focused and, in fact, performs best of all of the models other than the four 100% accurate SVMs.  Interestingly enough, K Nearest Neighbors (KNN)  and Relevance Vector Machine (RVM) also perform similarly with a stronger bias to high sensitivity but general improvement in both sensitivity and specificity at the same time.  The RVM never reaches the accuracy of the KNN and Perceptron models.  Overall, bagging (with a partial least squares regression) seems to perform the best though it never reaches the overall accuracy some of the other models obtain. It favors improving either sensitivity or specificity, but has many models which perform well in both.  The clustering of the Decision Tree (DT) and Boosting (GBM) models are quite interesting.  Rather than the general coverage of the other models, they clearly tends towards hot spots.  The Logistic Regression (LR) and Linear Model (LM) actually do quite well, including in comparison to the Artificial Neural Net (ANN).  This may be due to the fact that the ANN does not always converge.  All three favor sensitivity or specificity in a model, but have few models which perform well in both.  The Naive Bayes (NB) appears to be bias towards high specificity while the Random Forest (RF) tends to be biased towards high sensitivity.  The SVM is surprisingly thin in the middle.  The Robust Linear Model (RLM) is very thin as it does not perform with multiple features and is therefore left out of most models.

In general, perceptron models perform the best with multiple models with sensitivity of 100% and specificity of 98%.  The best model with 100% specificity is an ANN with 96% sensitivity followed by a perceptron with 94% sensitivity.  Most high-performing models have roughly 6-10 features. (Models up to 17 features were analyzed out of 30 possible features.)  Ultimately, 55,501 models were identified with 90% sensitivity and specificity or above.

Even though these were generated procedurally with separate training and testing data sets, the sheer volume may provide for completely random overfitting of both the training and test data.  We have a few options to address this.  We can use n-fold cross-validation to retrain the algorithm-feature set combinations which performed the best and then validate the continued performance of those pairings. We could have separated our data into three sets and run the best performing models only once on the final data set to validate the models.  We could use a method such as Elder Research’s ‘target shuffling’ simulation to establish likely distributions for random correlation.  This allows comparison of the actual model performance with a known random performance distribution.  We can even use the models which perform best under validation to create an ensemble model using multiple algorithms and feature sets to provide resilience against randomly occurring correlations.

In conclusion, we have identified a method for transferring the cost and schedule of creating models away from our expensive resources (humans), to less expensive resources (compute clusters).  There is no reason that models could not be hunt and only the best ones kept, ultimately leading to better classification, hopefully of malice in the information systems we defend.

Making Informed Decisions by Using Meaningful Security Metrics

While security metrics are used in a number of ways, the ultimate purpose of security metrics is to support the decision-making process. Making informed decisions is key to effectively manage information security risk. Every year Verizon publishes the Data Breach Investigations Report (DBIR) to help business do exactly that: Make informed decisions based upon real data analysis.

The DBIR is a great tool to understand the current state of information security on a strategic level. However, every organization must have a mechanism to measure its own “state of security” on an ongoing basis using internal security metrics.

The decision making process may happen at different levels, for example, at an operational level and at an executive level. Tactical security metrics help decision-making at operational level whereas strategic security metrics support decision making at executive level in addition to operational security.

While thinking about security metrics, one should keep the following in mind:

  • Security metrics must be meaningful (easily understood), actionable, accurate, timely, and provide leading indicators.
  • Metrics should show progress towards managing risk posture over time.
  • Security metrics may be qualitative as well as quantitative.

Many organizations struggle with creating good security metrics, especially for executive reporting. Following are some ideas to start designing metrics. Based upon current maturity level of an organization and availability of data, one can choose a subset of the following and slowly add additional metrics over time.

  • Patch management that includes percentage of coverage and mean time to patch.
  • Vulnerability management that includes percentage of coverage, mean time to fix vulnerabilities, and percentage of servers with no high risk vulnerabilities.
  • Incident management including mean time to discover and mean time between incidents.
  • Cost of information security as percentage of total IT budget, expense buckets (hardware/software, security payroll/training, consulting and professional services).
  • Effectiveness of awareness program, number/percentage of associates trained, awareness testing, retraining after discovering gaps.
  • Mock incident exercises, identified issues, reduction in discovered issues over time.
  • Asset management including known assets vs. discovered assets.
  • Identity management including number of service accounts, percentage of systems with local accounts, percentage with multi-factor authentication.

Selection of relevant metrics is a challenge. Dividing this into smaller tasks can make this challenge easier and help improve security posture over time.

 

Resources

Weekly Intelligence Summary Lead Paragraph: 2014-12-05

It’s been a week since news of an incident at Sony Pictures began to surface and new reports collected in that timespan show the company suffered a significant breach. According to multiple accounts the individuals behind the attack stole a trove of data, including internal documents, employees’ personal data and yet-to-be released films. The FBI issued an advisory regarding wiper malware that may be connected to the incident. Kaspersky, Symantec and Trend Micro each published additional intelligence on “Destover.” The link to Sony remains unconfirmed. There’s also speculation that North Korea, motivated by an upcoming Sony Pictures film mocking supreme leader Kim Jong-un, is responsible for the attack. However, there’s no official confirmation from Sony, law enforcement or FireEye (whose services the company retained) regarding that speculation. The VCIC’s more actionable intelligence collections this week include Cylance’s report on Operation Cleaver, a suspected Iranian group responsible for attacking critical infrastructure around the globe, as well as FireEye’s report on a group using phishing to steal insider financial information. And Brian Krebs was at it again this week after he announced Bebe Stores, Inc. suffered a suspected payment card breach. Unfortunately, the year of the point of sale breach continues.

When is an Intelligence Feed Record New?

A common question we grapple with when evaluating intelligence feeds is “If I see the same observable twice, what does it mean?”  This is probably, actually, two questions in one: “Is my feed sending me the same observation multiple times?” and “Is the second observation an observation of a single incident or a new incident?”

These are both tough questions to answer.  In the first case, the intelligence feed may not provide any indicator of uniqueness per record making it impossible to immediately tell if it is a duplicate or not.  The second question is even more complex.  Without significant context for the observation, there is no way to tell what caused it which would imply whether it was a second observation of a single incident or a new incident all together.

Ultimately, whichever question is being asked, the action question would be “Do I initiate new incident handling processes for the second record?”  This may be adding it again to detection systems, resetting detection timers, scanning the network for the observable, etc.

Let’s rephrase the question as a statistical question: “At what point is it statistically unlikely that the the second observation is related to the first?”  To answer this, we need to define what we mean by “what point”.  Effectively the feature of our data is the time between occurrences of an observable in our intelligence feed.  As such, “what point” refers to the time between the observation of an observable and it’s next observation.  We’ll use “days” to measure this, though if your feeds are updated frequently enough you may prefer ‘hours’.

Calculation

In reality, this is a fairly simple question to answer if you have a historical data store of the intelligence stream. To build our data set we use the following steps:

  1. Randomly sample the historical data store for a set number of observables, say 1000.
  2. Collect every observation of those observables for the intelligence feed.
  3. Sort the time series
  4. Calculate and store the number of days between each observation in a list.  This list will form our distribution of days between occurrences.

Once we have this list of days, the answer would normally be to find the value 3 standard deviations from the mean of the distribution.  However, we have an issue.  Because our values are temporal, they are not independent.  (I.e. when the next observation occurs probably depends on the previous observation.)  We can see this in the data as a clear power law probability distribution:

Distribution of Days Between Observation Occurrences
image-6385

This means the data is both long tailed and skewed.  As such the mean and standard deviation will not accurately represent the data.  (See Michael Roytman’s talk at bSidesLV for more information on long tailed distributions.)  Instead we use a robust estimate of scale.  We will use the τ estimate proposed by Maronna and Zamar in 2002 (Robust estimates of location and dispersion of high-dimensional datasets; Technometrics 44(4), 307–317).  In R, this code is available in the robustbase library.  If our distribution is stored in “D”, we can find our estimate of scale by running:

  1. install.packages(“robustbase”)
  2. library(robustbase)
  3. scaleTau2(D)

(If you would prefer python, I have transcoded the function here.)

The other issue we need to address is the use of the mean.  The outliers would significantly influence the mean.  As such, we use the geometric median.  Since our data is one dimensional, the geometric median is the same as the standard median.

So to find our cutoff, we take:

  • threshold <- median(D) + 3 * scaleTau2(D)

Or, if you prefer python:

  • from scipy import stats as scistats
  • import numpy as np
  • threshold = np.median(D) + 3 * scaleTau2(D)

The below list provides the descriptive statistics for the distribution in the histogram above:

  • Samples : 444
  • Mean : 12.4414414414
  • Mode : [2, 93]
  • First Quartile : 3.0
  • Second Quartile/Median : 6.0
  • Third Quartile : 10.0
  • Minimum : 2
  • Maximum : 159
  • Variance : 397.633958283
  • Std. deviation : 19.9407612263
  • Skew : 3.64549527587
  • Kurtosis : 16.0680960044
  • Outlier Threshold : 21.0636588967

We see that both the mean and standard deviation are influenced by the outliers.  Using them to calculate the cutoff would be roughly 72, 7 times the third quartile.  Instead, the Outlier Threshold of 21 provides a much more reasonable value.

Usage

Our usage for the threshold is when to consider an observation a new incident and when to treat it as a continuation of an existing incident.  With the threshold, it is easy.  If the number of days between observations of the observable is greater than the threshold, it is a new incident.  If not, it is a continuation of the old.  In addition, the threshold maybe provide clues about how long to keep looking for an observation after it has been reported on an intelligence feed.  Both usages provide a significant step forward in practical usage of intelligence feeds.

Special thanks to Rob Bird and Allison Miller who helped with some of the sticky statistics.

Weekly Intelligence Summary Lead Paragraph: 2014-11-28

The “Regin” espionageware platform dominated risk intelligence collections. Mashable published a good general summary of Regin. But the risk is almost certainly greater from the latest Adobe Flash vulnerability for Verizon Enterprise clients. Adobe released an out-of-cycle security bulletin and patch for Flash Player after F-Secure discovered the new vulnerability attacking via the Angler exploit kit (EK). Angler was also in last week’s INTSUM for exploiting a vulnerability the previous Flash security bulletin. Sony Pictures was the victim of the most significant data breach this week resulting in the company deciding to take their network down after an extortion attempt.

A Thought Experiment about Shared Credentials

Earlier this year the following question was posed to us:

“What is more likely to get compromised by an external attacker? One account with a strong password shared by 5 people or 5 accounts with strong passwords known only individually?”

The instinctive reaction is to shout the evils of shared passwords, but the specific question raised the degree of difficulty providing an answer. Internal misuse and accountability provided by unique user logins was not to be factored in. Continue reading

Weekly Intelligence Summary Lead Paragraph: 2014-11-21

Tuesday, Microsoft released MS14-068 out-of-cycle to mitigate a vulnerability in Kerberos that could be exploited to take over Windows domains.  The severity of the impact of a successful attack drove our recommendation for a 30-day deployment and pre-planning for a much shorter fuse if risk changes.  We’ve been collecting all the reliable intelligence we can regarding last week’s MS14-066 (SChannel). We have no reports of threats in the wild for it.  We can’t say the same for Adobe’s Flash Player bulletin from last week because Kafeine from DontNeedCoffee.com discovered the Angler EK is exploiting one of the 15 vulnerabilities from the bulletin. And ESET reported one of the two vulnerabilities patched last week by MS14-064 (OLE) was being exploited through IE for a drive-by-download on an Alexa 11,000 news site. So both vulnerabilities are being exploited in the wild. It doesn’t appear that attack used malvertisements, but the risk that enterprise users will encounter a malvertisement continues to grow.  Lastline Labs reported that 1% of ads served online are malicious and Trend Micro reported the Flashpack EK in malvertisements dropping Zeus, Dofoil and CryptoWall Trojans. To our colleagues in the U.S., the VCIC extends our wishes for a happy Thanksgiving holiday and hope the only thing all our clients will see from us for the rest of November is next week’s INTSUM.

Twitter and Information Security awareness

Twitter is giving traditional media a run for its money in many aspects, especially when it comes to getting the news out. Over the last few years a common pattern has emerged where news breaks first over Twitter or a comparable social media platform only to be picked up later by traditional media such as TV/Radio/Newspapers. In fact, most of the traditional media powerhouses have started incorporating social media in their portfolio both as means of reaching a younger tech savvy audience as well as receiving information about events as soon as they appear on social media. Twitter is by far the most popular choice of social network for breaking news as well as subsequent user/community interactions and for official corporate accounts to interact with the user base at large. With this in mind we analyze and assess the impact of Twitter when it comes to raising awareness of critical Information Security-related events.

The questions we’re tried to answer were…

  • How effective is the Twitter platform when it comes to raising awareness about InfoSec in general, and high profile events in specific?
  • Can InfoSec professionals and Organizations who face a constant uphill battle to keep up with what now seems like an endless barrage of computer/network attacks use Twitter to their advantage ?

On the whole, 2014 has already seen some very high profile vulnerability disclosures as well as data breaches. The intent of this exercise was to do a cross-sectional study of one such high profile vulnerability disclosure, the ’‘Shellshock vulnerability’, which made its appearance on Twitter and mainstream media in late Sept 2014. We use this particular vulnerability disclosure for our study because it was a very high profile disclosure with the potential to impact a lot of easy targets on the web and intranets (Don’t discount the internal threats !). Also this disclosure came on the heels of another high profile vulnerability disclosure, the ‘Heartbleed’.

Data at a glance

For this study we downloaded some 330,000 odd tweets using Twitter’s search API spanning the several weeks during which this vulnerability disclosure generated the most amount of activity both on social and mainstream media.
Due to the nature of Twitter conversations, where we have tweets, retweets, favorites, replies and combinations there of, it was interesting to look at the overall conversation map. This allowed us to gain a high level understanding of the scope of the conversations and the interactions that happened in those conversations.

Figure 1: Conversation Map

Figure 1: Conversation Map

Figure 1 shows a succinct view of the data we sampled. The numbers in the boxes show the number of tweets for that particular category (e.g. original tweets are tweets which are neither retweets nor replies). For comparison sake we also show what percentage each category takes up with respect to parent and grand-parent categories. What is clear is that there was not a whole lot of interaction here apart from retweeting. Not many back-and-forth conversations (replies and replies to replies < 3%), nor too many tweets got favorited (<10%). Of retweets, it was mostly the original tweets that tended to be retweeted rather than a subsequent reply to a tweet. All this pointed to the fact that the InfoSec twitter crowd is very close knit small crowd but not very interactive at least on the Twitter platform as far as discussing InfoSec events is concerned.

To elaborate on how minuscule this activity is, as far as high profile Twitter Activities go, we compared it with the July 9th 2014, Soccer World cup semi-final match between Germany and Brazil (which Germany won 7–1). That single match alone (lasting a little over 90 minutes) generated more that 35 Million tweets with a peak rate of approximately 580,000 tweets/minute (more than our entire sample size spanning several weeks). Although we don’t have a similar Conversation Map of that activity, the sheer number of tweets compared to our data gave us a good idea of how insignificant our ‘Shellshock’ activity was compared to a high profile sports related activity.

Note : The numbers were calculated from the data we downloaded using the Twitter search API. The search API does not provide all tweets but a sample of them (anywhere from 1% to 40% without indicating the true volume). The assumption here was that the sampling distribution is true enough to the distribution found in the total amount of tweets related to ‘shellshock’.

Timeline and Activity Trend

The one argument that Twitter has going for it over traditional media is that the speed with which news is received and propagated is very high compared to traditional media. There are various reasons behind such a claim, but let’s first see if there is any truth to this claim in the InfoSec world. Below we see two graphs. Figure 2 shows the per-day tweets and retweets related to ‘shellshock’ around the time when shellshock dominated the InfoSec news world, and Figure 3 is the google trend related to searches for the phrase ‘shellshock’ during that same time period.

Figure 2: Timeline of Tweets Related to shellshock

Figure 2: Timeline of Tweets Related to ‘shellshock’

Figure 3: Google Trend for shellsock

Figure 3: Google Trend for ‘shellsock’

Immediately what we saw was a remarkable similarity in the two trends. The peak of the activity was on Sept 25th & 26th and gradually declining over the next two weeks. We also saw a repetitive pattern of drop in activity over the weekends (27th Sat, 28th Sun and repeated on Oct 4th & 5th and later again on the 11th & 12th). This was a clear indicator that the InfoSec community and in general people who are interested in InfoSec was monitoring/searching for more information about ‘Shellshock’ and at the same time also interacting and exchanging information about it on Twitter, but this activity was mostly happening during work-days.

If we look at the timeline of creation dates of the accounts who participated in this conversation in Figure 4, we don’t see a huge spike in new user creation near the timeline of the event, which tells us that quite a lot of seasoned Twitter pros were engaged in the conversation.

Figure 4: User Account Creation Timeline

Figure 4: User Account Creation Timeline

Diversity

To analyze the diversity in the twitter communication we looked at the Top 10 Languages that were used to tweet in Figure 5 and Top 10 user timezones in Figure 6.

Figure 5: Top 10 Languages Tweeted in

Figure 5: Top 10 Languages Tweeted in

Figure 6: Top 10 Timezones in User Profiles

Figure 6: Top 10 Timezones in User Profiles

Unsurprisingly, English was the dominating language with Japanese and Spanish coming in at distant 2nd and 3rd. The United States of America (USA) and Europe dominated the top user timezones. This was expected given their high concentration of IT industries and as a consequence more awareness about InfoSec in these locations. What was surprising was how little Asia, APAC, South-America, & Africa contributed in the conversation. Does this point to relatively lower InfoSec awareness in these regions ? Or does it simply point to lack of popularity of Twitter in these regions ? Given that Twitter has a huge world wide user base, we suspect the former more than the later.

Twitter does provide a way for its users to individually geo-tag each tweet with a location, but out of the 330,000 or so tweets we sampled, less than 1,000 had this information available, so we didn’t deep dive in to per tweet location analysis.

References

Individual Tweets can have hashtags, refer to external URLs, and mentions of individual users. Analyzing this information gives us valuable insights about the internal and external resources used in these tweets.

Figure 7: Top 10 Hashtags in All Tweets

Figure 7: Top 10 Hashtags in All Tweets

Figure 7 shows the Top 10 Hashtags found in all tweets. The ‘shellshock’ hashtag took a very comfortable 1st spot. What’s more, the previous high profile vulnerability ‘heartbleed’ also got a place in the Top 10. Because the Top 10 hashtags in just the unique tweets (sans retweets) show similar distribution, we didn’t replicate it plot here. Figure 8 shows the top 10 URLs in all the tweets. It was a bit of a surprise to see a CNET URL getting the top spot. This was due to a single tweet that got retweeted 11K times to push that CNET URL to the top. That same tweet is mentioned at the start of this article.When we removed the retweets from the equation then the Top URL spot went to a very detailed explanation about the vulnerability by Troy Hunt.

Figure 8: Top 10 URLs in all Tweets

Figure 8: Top 10 URLs in all Tweets

Figure 9 shows the top 10 Users who were mentioned in the all the tweets. The user account ‘whsaito’ took the top spot on account of a single tweet that got retweeted almost 11K times. (The same one which has the CNET URL and which appears at the top of this article.)

Figure 9: Top 10 Users mentioned in all Tweets

Figure 9: Top 10 Users mentioned in all Tweets

Figure_10 shows the Top 10 active user accounts in terms of number of tweets twitted from those accounts. These accounts can be thought of as being most active in raising awareness about this vulnerability by tweeting about it multiple times.

Figure 10: Top 10 Active Twitter Accounts

Figure 10: Top 10 Active Twitter Accounts

To round out this discussion we also present the Top 10 Retweeted, Replied, and Favorited user accounts in terms of number of tweets in Figure 11

Figure 11: Top 10 retweeted/replied/favorited user accounts

Figure 11: Top 10 retweeted/replied/favorited user accounts

Interactions

We already looked at a high level interaction map in Figure 1. Now let’s deep dive in to it a bit. For starters let’s see how many followers our InfoSec twitter users tend to have. In Figure 12 below we show a kernel density plot of follower counts of each unique user involved in the data sample. What was very apparent is that InfoSec crowd is not very popular amongst Twitter user base. This is evident in that most of the accounts having less than 1,000 followers. But there are a few high profile accounts which have followers in the millions (the InfoSec rock-stars).

Figure 12: Follower Counts of Users

Figure 12: Follower Counts of Users

Figure 13: Retweeted and Favorited Counts of Tweets

Figure 13: Retweeted and Favorited Counts of Tweets

In Figure 13 above we show how many times unique tweets tend to be retweeted and favorited. This is a way to measure how popular InfoSec tweets tend to be. Most tweets don’t tend to be retweeted or favorited more than 100 times, a very small number especially when compared to some of the high profile trending activity on Twitter. This is a pity as it again points to lesser reach of InfoSec related Tweets.

Figure 14: Follower Counts v/s Retweet Counts and Favorite Counts

Figure 14: Follower Counts v/s Retweet Counts and Favorite Counts

For an interesting comparison we looked that the relationship between a user’s number of followers and whether it had any impact on a tweet being retweeted or favorited. Conventional wisdom would seem to suggest that there indeed should be some correlation there, i.e. the more the number of followers, the more the retweets or favorites, but we found evidence to the contrary. As Figure 14 seems to suggest that the less followers you have the higher the number of times your tweets get retweeted / favorited. One possible explanation for this contradiction is that we didn’t taken the PageRank effect into account, i.e. a tweet being retweeted by someone who has lots more followers than the original user account. This was left as an exercise for the future.

Conclusion

So what did we learn ? Twitter has the potential to reach vast amounts of users/organizations even outside the traditional InfoSec community to raise awareness about high profile security incidents or vulnerability disclosure. However as things stand now the InfoSec world is very close-knit and largely not popular outside its own sphere. InfoSec tweets about high profile events such as the ‘Shellshock’ vulnerability tend to reach a restricted and niche user base as opposed to say a high profile sports activity such as a Soccer world cup match or a super-bowl match. This was evident from the fact that most tweets related to shellshock were seen by, retweeted, replied, favorited by a very small fraction of the twitter user base.

If InfoSec organizations and professionals want to use twitter to their advantage, then they have their work cut out. They need to engage more and more people/Orgs from outside the core InfoSec community in the conversation. The more the conversation the more the awareness. InfoSec will never be as popular as Soccer or Super-Bowl. But if it manages to attract enough attention from non-InfoSec crowd it would be a step in the right direction.

Technical Notes

  • For the interested the data was downloaded using a python script and Twitter’s search API.
  • It was analyzed and plotted using R & ggplot2.
  • An interesting continuation of this analysis would be to put the data in a Graph Structure to explore the conversations in more detail, and see if we can discover any interesting clusters in the conversations.
  • Another possible research path is performing text analytics on the data for finding clusters based on words occurring in the tweets.