Let’s get Sony out of the way first…again. The Guardians of Peace leaked more documents this week stolen during November’s breach. It doesn’t end there. The FBI officially pinned the attack on the government of North Korea. After weeks of listening to anonymous sources peg the attack on the North Korean regime as the US government played a game of will we or won’t we, the VCIC finally has an official statement it can cite regarding attribution. Other noteworthy collections this week includes a report of a successful spear phishing attack against ICANN that resulted in a breach of its centralized zone data system, as well as a report of a payment card breach at Park-n-Fly. Sucuri published two blog posts on the SoakSoak malware campaign which is responsible for compromising thousands of WordPress sites. Seeing as it’s the Christmas season, AlertLogic published an advisory on a “vulnerability” in Linux dubbed Grinch. SANS, Trend Micro and others recommend not losing sleep over it. ESET and Palo Alto Networks released their security predictions for 2015 and Dell SecureWorks published a report on the criminal underground. Be sure to add it to your reading list.
Now, this is not intended to be a list of the biggest breaches, and not all of them are supposed to be funny. Think of this as our curated list of the most interesting data security events of 2014 in the VCDB.
The law fought the law…and the law won
The first story of 2014 destined for the HOF goes to an event that actually happened in 2013, but was reported nationwide in January of 2014. A county sheriff in West Virginia was going through a divorce and wanted to get information about his wife’s suspected new love interest. So naturally he put a keylogger on her computer … at work … on a computer belonging to the West Virginia Supreme Court. This incident made the HOF because we honestly don’t see a lot of incidents involving physical keyloggers and we don’t see many incidents where a law man is the threat actor. That makes this a very rare and unusual incident indeed.
Honorable mention goes to an incident that was reported in December of 2013. We had to disqualify it from the 2014 Hall of Fame because it wasn’t even reported in 2014, but it’s still an interesting read. The American Civil Liberties Union had been trying to get a copy of an FBI interrogation manual but could not due to the manual being classified. However, in an ironic turnabout, the document had been checked into the Library of Congress (thus making it a public document) by an FBI agent that was attempting to register a copyright of the work.
Bringing A New Meaning to “Brute Force”
Lost and stolen devices have proven to be a major concern for the Healthcare industry. In fact, 52% of the IT security incidents affecting Healthcare in the VCDB Explore Interface are lost or stolen devices. Full disk encryption would have prevented the disclosure of data in almost all of these incidents.
We say “almost all” because of a story that made everyone immediately think of this comic. A doctor from Brigham and Women’s Hospital in Boston was robbed at gunpoint by two individuals who stole his mobile phone and laptop computer. The assailants tied the doctor to a tree and made him enter his password into the phone and laptop to get around the devices’ encryption. So much for all those breach notification letters that say the criminals are after the value of the asset, not the data inside it.
Public displays of hacking
Website defacement is often used as a means of spreading political messages. Groups like the Syrian Electronic Army and various factions of Anonymous have been prolific hackers that spread messages in support of (or opposition to) governments around the world. Let’s be honest, though, website defacement is getting a little boring. One group decided to step up their game in August. A group of hackers calling themselves the Anti-Communist Party Hackers managed to take over a Chinese television station and began to place pro-democracy overlays on top of the live news. It took several hours to eject the hackers and the Chinese government spent days purging the Internet of images and discussion about the event.
Heads up to all those corporate big-wig phishers–Anon Ghost is watching you. In March, the hacktivist group boasted about defacing a Yorkshire Banking site and striking back against fat cat bankers. The only problem was that the website they hacked turned out to be a phishing site that had been made to look like Yorkshire Bank. Still, phishing must be lucrative or it wouldn’t be so popular–clearly the CEOs of phishing sites should beware. Now that they’ve become part of the corporate establishment (dare we say the 1%’ers), they’re fair targets.
It’s easy to think of defacement as an important tool for the politically oppressed, but not all vandalism is the work of activists trying to spread a political message. In July the world’s worst superhero, Florida Man, hacked a road construction sign and changed it to an obscene message. This epic act of hacking really came down to an unlocked panel that provided physical access to a keyboard and a weak or missing password on the configuration console.
Best blunders of 2014
Miscellaneous errors are the root cause of more security incidents in the VERIS Community Database than any other pattern. They account for nearly a quarter of the dataset. Publishing errors are the second most common variety of error accounting for security incidents. Most of the time these publishing errors are just cases of documents being posted on a website accidentally, but in 2014 we saw a different twist on publishing errors; a blunder so nice we saw it thrice!
This year during the media build up for the Super Bowl, CBS News aired a segment on the physical security in place for the event. At one point footage was shown from inside the command center, and clearly displayed was the wifi SSID and password that they were using. A few months later the exact same set of circumstances played out with the World Cup in Brazil. And then the next month, it happened to the Los Angeles Police Department. This was hardly a new phenomenon, though. Back on 2012 reporters covering Prince William’s service in the Royal Air Force published photos of their wifi passwords and even some sensitive documents.
Another incident in the Oops! category is from when the White House accidentally emailed reporters talking points about the (at the time) classified CIA Torture report. Now that the report has been released, we wonder if the talking points reflect any subsequent edits.
However, the award for biggest error of the year has to go to Emory University in Atlanta. Emory uses Microsoft System Center Configuration Manager (SCCM) to manage endpoint configuration and automate operating system deployments. Earlier this year, Emory’s SCCM server decided to reformat all university-owned machines and install a fresh copy of Windows 7 right before final exams. By the time anyone figured out that the server had initiated this action, it had already begun formatting itself. Full incident history over at The Wayback Machine.
Really, servers run so much faster without all that pesky data on them!
Meanest insider of 2014
Insiders account for about 42% of the incidents in VCDB. Most of these incidents are errors, but when the action is on purpose it’s usually motivated by personal gain. To be sure, stealing from people is bad, but the meanest insider of 2014 goes to the woman who admitted she forged 1300 mammogram reports because she had “personal issues that caused her to stop caring about her job.” When she fell behind in processing the stacks of mammogram films, her solution was to go into the hospital’s computer system, impersonate the doctors, and give each patient’s scan a clear reading. Sadly the result is that patients whose positive cancer diagnoses were delayed bore consequences in terms of pain, suffering and shortened life spans.
Most epic hack of 2014
Every year the Academy Awards saves the award for best picture for last, and even though this isn’t an awards show, we decided to do the same. Every year the hackers of the world produce so many truly epic hacks that it’s hard to pick a winner. And so, without further ado, here are the nominees for 2014’s most epic hack.
In December we learned that Sony Pictures Entertainment had been hacked. United States officials have blamed North Korea for launching the attack in retaliation for Sony’s release of the movie The Interview. The enormity of this hack is certain to be something we’ll be talking about for a long time. The attackers have released movies onto the Internet as well as internal email, salary data, employee health information, and also wiped data from Sony computers. Sony has gone on to cancel the release of the movie after the attackers made threats referencing September 11, 2001. This incident may become the poster child for worst-case (save for human injury or loss of life) impact of a data breach.
The final nominee goes to Clinkle, a startup in the mobile payments market. Although the size of the hack is nothing compared to the other nominees, it does hold the distinction of being hacked before it even launched. Hey, if you’re a 22 year old and someone hands you $25 million, the first thing you must do is take a selfie, right? That is what Clinkle CEO Lucas Duplan did, as evidenced by the picture that was leaked after the company was hacked. Clinkle was supposed to be the next hot thing in mobile payments, but instead, names, phone numbers and profile pictures of users were released on the internet. That doesn’t exactly inspire a lot of faith in a mobile payment provider.
Which one of these hacks should win the award for most epic hack of 2014? We can’t decide. Why don’t you tell us your choice by reaching out to us on twitter: @vzdbir
Let’s get Sony out of the way first. There has been no significant new actionable intelligence gathered regarding this breach. The folks at Risk Based Security have an excellent timeline of the Sony Pictures breach that’s full of details, some analysis and no hyperbole. Collections from Symantec and Bluecoat provided significant new intel about Destover malware. We collected thoughtful analyses of the Sony Breach from Scott Terban’s Krypt3ia blog and from the opinion piece by Ira Winkler and Araceli Gomes opinion IDG publications. In the rest of the world, InfoSec risk continued to evolve. Microsoft released seven bulletins, re-released two and assessed MS14-080, MS14-081 and MS14-082 as more likely to be exploited. Adobe reported attacks on a new vulnerability in Flash Player with a security bulletin and patch. Adobe also patched 21 vulnerabilities in Acrobat/Adobe Reader and ColdFusion. F-Secure expanded their analysis of Regin with two white papers. Another espionageware campaign targeting Russia with, some similarities to “Red October,” was the topic of one report on “The Inception Framework” by Blue Coat and another report by Kaspersky: Cloud Atlas. Perhaps they couldn’t agree on “Inception Atlas.”
1 million, 185 thousand, 960 to be specific. But let’s back up.
The common thought is that to be able to wield machine learning models, you need three things:
- deep domain expertise
- rigorous scientific and statistical acumen
- technical computer skills
The idea is that someone will use their deep domain expertise to hypothesis combinations features which can predict the desired variables. They will then identify appropriate models to do so based on the features and the underlying data. Finally, they will use their technical skills to train the model in some language such as R or Python.
But there’s another way. The algorithms to build models in R and Python are refined to the point where appropriate data can simply be put in one end and the model comes out the other. ‘Appropriate data’ is a bit of a qualification. A given data set can be formatted to two or three versions which cover all potential models. This can all be scripted so that the feature observations do not need to be messaged every time.
Once the technical aspects are covered, correct machine learning algorithms must be picked to generate the model. If your data supports basically any algorithm, there is less incentive to worry about picking the correct algorithm as they can all be run. While this implies additional compute resources, the cost of compute is low and getting lower by the day.
Finally, domain knowledge supports picking the features. Again, the high availability of compute resources means there is less need to pick features as all combinations can be tried. It does help to try them in a responsible order. Adding features to try in order of least correlation is a simple way of intelligently picking features. There is still the need for domain knowledge to identify data sets and identify features to use from the data sets. Future research will hopefully provide potential options for automatically generating features from data.
To generate the million models, I used Wisconsin Diagnostic Breast Cancer Data Set as it’s use served a secondary purpose outside the scope of this blog post. I wrote a script that has three phases:
- Set the random seed for repeatability and generate the training and test data sets such that they can support all models.
- Order features in order of increasing correlation with the already added features.
- Iterate through all combinations of the features for all models. The model used, it’s sensitivity, specificity, features, and parameters are then saved to a results file.
All models were generated over roughly a week in R on a 2012 Macbook Pro with a 2.6Ghz Core i7 and 8GB of ram. The model loop was run 93530 times generating 1,185,960 models iterating through combinations of 13 of the 30 available features. The following progression provides an idea of the models building over time:
As we analyze these plots, initially we find models with 100% sensitivity and 100% specificity relatively easily, but not both. We notice a gap in the upper left corner, particularly with 100% specificity. However, as we test more and more models, that gap shrinks. still, even in our final model set, we notice very few models with 100% specificity. Interestingly, there are four support vector machines (SVM) with 100% sensitivity and specificity including one using just two features.
We can also analyze how the models perform individually:
To make the final plot more nuanced, we can turn the alpha down to two percent:
This provides some interesting insights into the model generation algorithms. First, the circle 3rd from the left on the first row, is a random coin flip (CF) model which is, understandably, centered around 50%/50% and never does particularly well, even across 90,000 iterations. The Perceptron (P) model is very narrowly focused and, in fact, performs best of all of the models other than the four 100% accurate SVMs. Interestingly enough, K Nearest Neighbors (KNN) and Relevance Vector Machine (RVM) also perform similarly with a stronger bias to high sensitivity but general improvement in both sensitivity and specificity at the same time. The RVM never reaches the accuracy of the KNN and Perceptron models. Overall, bagging (with a partial least squares regression) seems to perform the best though it never reaches the overall accuracy some of the other models obtain. It favors improving either sensitivity or specificity, but has many models which perform well in both. The clustering of the Decision Tree (DT) and Boosting (GBM) models are quite interesting. Rather than the general coverage of the other models, they clearly tends towards hot spots. The Logistic Regression (LR) and Linear Model (LM) actually do quite well, including in comparison to the Artificial Neural Net (ANN). This may be due to the fact that the ANN does not always converge. All three favor sensitivity or specificity in a model, but have few models which perform well in both. The Naive Bayes (NB) appears to be bias towards high specificity while the Random Forest (RF) tends to be biased towards high sensitivity. The SVM is surprisingly thin in the middle. The Robust Linear Model (RLM) is very thin as it does not perform with multiple features and is therefore left out of most models.
In general, perceptron models perform the best with multiple models with sensitivity of 100% and specificity of 98%. The best model with 100% specificity is an ANN with 96% sensitivity followed by a perceptron with 94% sensitivity. Most high-performing models have roughly 6-10 features. (Models up to 17 features were analyzed out of 30 possible features.) Ultimately, 55,501 models were identified with 90% sensitivity and specificity or above.
Even though these were generated procedurally with separate training and testing data sets, the sheer volume may provide for completely random overfitting of both the training and test data. We have a few options to address this. We can use n-fold cross-validation to retrain the algorithm-feature set combinations which performed the best and then validate the continued performance of those pairings. We could have separated our data into three sets and run the best performing models only once on the final data set to validate the models. We could use a method such as Elder Research’s ‘target shuffling’ simulation to establish likely distributions for random correlation. This allows comparison of the actual model performance with a known random performance distribution. We can even use the models which perform best under validation to create an ensemble model using multiple algorithms and feature sets to provide resilience against randomly occurring correlations.
In conclusion, we have identified a method for transferring the cost and schedule of creating models away from our expensive resources (humans), to less expensive resources (compute clusters). There is no reason that models could not be hunt and only the best ones kept, ultimately leading to better classification, hopefully of malice in the information systems we defend.
While security metrics are used in a number of ways, the ultimate purpose of security metrics is to support the decision-making process. Making informed decisions is key to effectively manage information security risk. Every year Verizon publishes the Data Breach Investigations Report (DBIR) to help business do exactly that: Make informed decisions based upon real data analysis.
The DBIR is a great tool to understand the current state of information security on a strategic level. However, every organization must have a mechanism to measure its own “state of security” on an ongoing basis using internal security metrics.
The decision making process may happen at different levels, for example, at an operational level and at an executive level. Tactical security metrics help decision-making at operational level whereas strategic security metrics support decision making at executive level in addition to operational security.
While thinking about security metrics, one should keep the following in mind:
- Security metrics must be meaningful (easily understood), actionable, accurate, timely, and provide leading indicators.
- Metrics should show progress towards managing risk posture over time.
- Security metrics may be qualitative as well as quantitative.
Many organizations struggle with creating good security metrics, especially for executive reporting. Following are some ideas to start designing metrics. Based upon current maturity level of an organization and availability of data, one can choose a subset of the following and slowly add additional metrics over time.
- Patch management that includes percentage of coverage and mean time to patch.
- Vulnerability management that includes percentage of coverage, mean time to fix vulnerabilities, and percentage of servers with no high risk vulnerabilities.
- Incident management including mean time to discover and mean time between incidents.
- Cost of information security as percentage of total IT budget, expense buckets (hardware/software, security payroll/training, consulting and professional services).
- Effectiveness of awareness program, number/percentage of associates trained, awareness testing, retraining after discovering gaps.
- Mock incident exercises, identified issues, reduction in discovered issues over time.
- Asset management including known assets vs. discovered assets.
- Identity management including number of service accounts, percentage of systems with local accounts, percentage with multi-factor authentication.
Selection of relevant metrics is a challenge. Dividing this into smaller tasks can make this challenge easier and help improve security posture over time.
- Developing Metrics for Effective Information Security Governance. ISACA Information Systems Controls Journal, Volume 2.
- CISM Job Practice – ISACA
- Security metrics toolkit spreadsheet by SecurityMetametrics.com.
- Security Metrics, Replacing Fear, Uncertainty and Doubt by Andrew Jaquith
- Cloud Security Alliances, Domain 2, Governance and Enterprise Risk Management.
It’s been a week since news of an incident at Sony Pictures began to surface and new reports collected in that timespan show the company suffered a significant breach. According to multiple accounts the individuals behind the attack stole a trove of data, including internal documents, employees’ personal data and yet-to-be released films. The FBI issued an advisory regarding wiper malware that may be connected to the incident. Kaspersky, Symantec and Trend Micro each published additional intelligence on “Destover.” The link to Sony remains unconfirmed. There’s also speculation that North Korea, motivated by an upcoming Sony Pictures film mocking supreme leader Kim Jong-un, is responsible for the attack. However, there’s no official confirmation from Sony, law enforcement or FireEye (whose services the company retained) regarding that speculation. The VCIC’s more actionable intelligence collections this week include Cylance’s report on Operation Cleaver, a suspected Iranian group responsible for attacking critical infrastructure around the globe, as well as FireEye’s report on a group using phishing to steal insider financial information. And Brian Krebs was at it again this week after he announced Bebe Stores, Inc. suffered a suspected payment card breach. Unfortunately, the year of the point of sale breach continues.
A common question we grapple with when evaluating intelligence feeds is “If I see the same observable twice, what does it mean?” This is probably, actually, two questions in one: “Is my feed sending me the same observation multiple times?” and “Is the second observation an observation of a single incident or a new incident?”
These are both tough questions to answer. In the first case, the intelligence feed may not provide any indicator of uniqueness per record making it impossible to immediately tell if it is a duplicate or not. The second question is even more complex. Without significant context for the observation, there is no way to tell what caused it which would imply whether it was a second observation of a single incident or a new incident all together.
Ultimately, whichever question is being asked, the action question would be “Do I initiate new incident handling processes for the second record?” This may be adding it again to detection systems, resetting detection timers, scanning the network for the observable, etc.
Let’s rephrase the question as a statistical question: “At what point is it statistically unlikely that the the second observation is related to the first?” To answer this, we need to define what we mean by “what point”. Effectively the feature of our data is the time between occurrences of an observable in our intelligence feed. As such, “what point” refers to the time between the observation of an observable and it’s next observation. We’ll use “days” to measure this, though if your feeds are updated frequently enough you may prefer ‘hours’.
In reality, this is a fairly simple question to answer if you have a historical data store of the intelligence stream. To build our data set we use the following steps:
- Randomly sample the historical data store for a set number of observables, say 1000.
- Collect every observation of those observables for the intelligence feed.
- Sort the time series
- Calculate and store the number of days between each observation in a list. This list will form our distribution of days between occurrences.
Once we have this list of days, the answer would normally be to find the value 3 standard deviations from the mean of the distribution. However, we have an issue. Because our values are temporal, they are not independent. (I.e. when the next observation occurs probably depends on the previous observation.) We can see this in the data as a clear power law probability distribution:
This means the data is both long tailed and skewed. As such the mean and standard deviation will not accurately represent the data. (See Michael Roytman’s talk at bSidesLV for more information on long tailed distributions.) Instead we use a robust estimate of scale. We will use the τ estimate proposed by Maronna and Zamar in 2002 (Robust estimates of location and dispersion of high-dimensional datasets; Technometrics 44(4), 307–317). In R, this code is available in the robustbase library. If our distribution is stored in “D”, we can find our estimate of scale by running:
(If you would prefer python, I have transcoded the function here.)
The other issue we need to address is the use of the mean. The outliers would significantly influence the mean. As such, we use the geometric median. Since our data is one dimensional, the geometric median is the same as the standard median.
So to find our cutoff, we take:
- threshold <- median(D) + 3 * scaleTau2(D)
Or, if you prefer python:
- from scipy import stats as scistats
- import numpy as np
- threshold = np.median(D) + 3 * scaleTau2(D)
The below list provides the descriptive statistics for the distribution in the histogram above:
- Samples : 444
- Mean : 12.4414414414
- Mode : [2, 93]
- First Quartile : 3.0
- Second Quartile/Median : 6.0
- Third Quartile : 10.0
- Minimum : 2
- Maximum : 159
- Variance : 397.633958283
- Std. deviation : 19.9407612263
- Skew : 3.64549527587
- Kurtosis : 16.0680960044
- Outlier Threshold : 21.0636588967
We see that both the mean and standard deviation are influenced by the outliers. Using them to calculate the cutoff would be roughly 72, 7 times the third quartile. Instead, the Outlier Threshold of 21 provides a much more reasonable value.
Our usage for the threshold is when to consider an observation a new incident and when to treat it as a continuation of an existing incident. With the threshold, it is easy. If the number of days between observations of the observable is greater than the threshold, it is a new incident. If not, it is a continuation of the old. In addition, the threshold maybe provide clues about how long to keep looking for an observation after it has been reported on an intelligence feed. Both usages provide a significant step forward in practical usage of intelligence feeds.
The “Regin” espionageware platform dominated risk intelligence collections. Mashable published a good general summary of Regin. But the risk is almost certainly greater from the latest Adobe Flash vulnerability for Verizon Enterprise clients. Adobe released an out-of-cycle security bulletin and patch for Flash Player after F-Secure discovered the new vulnerability attacking via the Angler exploit kit (EK). Angler was also in last week’s INTSUM for exploiting a vulnerability the previous Flash security bulletin. Sony Pictures was the victim of the most significant data breach this week resulting in the company deciding to take their network down after an extortion attempt.
Earlier this year the following question was posed to us:
“What is more likely to get compromised by an external attacker? One account with a strong password shared by 5 people or 5 accounts with strong passwords known only individually?”
The instinctive reaction is to shout the evils of shared passwords, but the specific question raised the degree of difficulty providing an answer. Internal misuse and accountability provided by unique user logins was not to be factored in. Continue reading
Tuesday, Microsoft released MS14-068 out-of-cycle to mitigate a vulnerability in Kerberos that could be exploited to take over Windows domains. The severity of the impact of a successful attack drove our recommendation for a 30-day deployment and pre-planning for a much shorter fuse if risk changes. We’ve been collecting all the reliable intelligence we can regarding last week’s MS14-066 (SChannel). We have no reports of threats in the wild for it. We can’t say the same for Adobe’s Flash Player bulletin from last week because Kafeine from DontNeedCoffee.com discovered the Angler EK is exploiting one of the 15 vulnerabilities from the bulletin. And ESET reported one of the two vulnerabilities patched last week by MS14-064 (OLE) was being exploited through IE for a drive-by-download on an Alexa 11,000 news site. So both vulnerabilities are being exploited in the wild. It doesn’t appear that attack used malvertisements, but the risk that enterprise users will encounter a malvertisement continues to grow. Lastline Labs reported that 1% of ads served online are malicious and Trend Micro reported the Flashpack EK in malvertisements dropping Zeus, Dofoil and CryptoWall Trojans. To our colleagues in the U.S., the VCIC extends our wishes for a happy Thanksgiving holiday and hope the only thing all our clients will see from us for the rest of November is next week’s INTSUM.