Markerbench

Metricon X — Opening Remarks

2019-03-21T00:00:00-04:00

This is the nominal text of Andy Jaquith’s opening remarks for Metricon X, delivered on March 21, 2019. It has been lightly edited for clarity and a few identities have been slightly disguised.

Welcome

I appreciate everybody coming today. It’s a great turnout for a conference that we rather deliberately did not advertise. If you’re here, it’s because you wanted to be here. You’ve self-selected.

The theme of the conference is “plus ça change…,” the second half of which is “plus c’est la même chose.” Colloquially: “the more things change, the more they stay the same.” So what we’re really here to talk about are the constants and the change. But because I suspect that we will have ample time to reheat some of the old chestnuts (the constants), I’d like to offer a few remarks on the changes — that is, notable happenings in the world of security metrics over the last 12 years.

Data-driven security took root

One of the most gratifying things to emerge in security over the last 10-plus years is the increased fluency and comfort people have with real security data. This is not completely new. Bill Cheswick’s work at Bell Labs in the late 1990s on network mapping, for example, helped create a company (Lumeta) that specialized in analyzing networks, and developed a specialty in analytics for use in M&A situations. Jim Cowie, formerly CTO of Renesys, as another example, was doing large-scale analytics on BGP routes at the turn of the millennium. The last dozen years has brought many more examples, notably:

The Verizon Data Breach Investigations Report (DBIR), which fused together law-enforcement data and private sources to paint a data-rich picture of what data breaches look like, are caused by, and cost. The DBIR, and publications such as Larry Ponemon’s eponymous studies on breach costs, helped popularize a metric known as “cost per record.” As a result, we now have relatively well-accepted currency for calculating potential and actual consumer information exposures.
Observables and ratings. Spurred on, in part, by the challenges of the the questionnaire-based approach to evaluating vendor security, vendors such as BitSight and Security Scorecard have focused on inferring the security of companies based on what they can empirically observe. If your MX and DNS records are messed up, or if spam is coming from IP address space you control, or if externally-facing systems appear to be compromised, then the rest of your security program probably isn’t any good either. Ratings are derived from how spotless one’s external presence is. Data about your supply base, for example, can help you make a decision about when need to dispatch the goon squad to interrogate a high-risk vendor.
The increased use of statistical and data science tools to analyze large security data sets. These include Python (eg PANDAS and NumPy), and the R ecosystem, the HadleyVerse and so on. There are a healthy number of “R-heads” in the security metrics community, such as Jay Jacobs, Bob Rudis and many others. I count myself among them. Although many of the studies are custom-made, the prevailing attitude is to practice reproducible science using a tool-driven analysis and workflow. Find interesting problems and data sets. Explore them. Publish findings. Repeat!

And also, somewhere along the line, data science became a Thing. Some of us used to call it “statistics.” Speaking of which…

“AI” has come to security, with uneven results

“AI” has come to security, with uneven results. I say “AI” in quotes because what we call AI in the popular press is not about endowing computing machines with cognition. I must tell you, every time I see that Microsoft commercial with the rapper Common extolling the virtues of “AI,” I feel like Marvin Minsky spins another turn in his grave, and that Douglas Hofstadter rips up and crumples one of his piano compositions and weeps.

Once you get beyond the commercials, “AI” is primarily about creating models to make better predictions, using a bag of tricks that includes supervised and unsupervised learning, neural networks, bayesian strategies, Markov networks, bootstrapping, anomaly detection, and a whole set of other buzzwords that many of our attendees have better first-hand experience with than I.

In security, many of these “AI” techniques are being put to use to help solve some very real operational security problems, for example, making a security operations team more efficient. Consider an enterprise-class SOC with dozens of analysts. The sensor grid will ingest daily log volumes in the tens of millions, extract tens of thousands of potentially suspicious activities, and then reduce these down to dozens of cases to put in front of human analysts. As a rule of thumb, it’s about roughly one million pieces of straw in every haystack, for each needle found in it.

Financial services and national agencies are two types of organizations that have the threat volume, funding and organizational capability to fund vendor and internal efforts in this space. They have big haystacks and lots of needles to find. A large focus of research and vendor efforts is in increasing the signal-to-noise ratio. From a measurement perspective, this means using “AI” to correctly classify genuine intrusions (true positives) and non-intrusions (true negatives), and reduce the false-positive and false-negative rate.

But results have been “uneven” because it’s a tough problem space. Many vendors will tell you that they’ve got bulletproof, universal techniques that solve all sorts of superficially related problems. For example, network intrusion detection and insurance fraud are both anomaly detection problems, right? I’ve heard a vendor say, “well, our AI/neural net/ML engine solves both of these problems.” Actually, they are in different domains and have very different characteristics in terms of variety of data sources, completeness, and outlier detection strategies. There is no one size fits all. I’m inherently suspicious of generalizable AI in security. But every time I see a well-bounded, domain specific strategy, I’m happy.

In addition, there is lots of low-hanging fruit that can be harvested by simply fusing data together at the presentation level to make investigations more efficient. SOC labor optimization is more like an operations research problem than an “AI” problem. With respect to making SOCs more efficient, there’s plenty of room for experimentation at both ends of the funnel, by attacking the top and middle of the funnel to present the truest and most accurate incidents; and then, improving the efficiency of the investigations of the cases that fall through to the bottom of the funnel.

Success disasters are great teachers

Dr Dan Geer first introduced me to the concept of a “success disaster”; something that goes so well that it creates painful side-effects. Here in New York, you could argue that the cronut craze that began in 2013 was a success disaster for the Dominique Ansell Bakery. Sure, there were lines around of the block, but it led to a black market in resellable cronuts, counterfeit cronuts, quotas for cronuts, and I am sure, staff burnout and ingredient shortages. It was also a disaster for ordinary customers. If, for you, the Ansell Bakery had been a lovely place to have your morning French roast while leisurely enjoying a croissant, reading Le Monde and chain-smoking Galois cigarettes, it is no longer. That dream was trampled by all of the marauding tourists.

In security metrics, it’s been gratifying to see a lot more focus on data, analytics and metrics. And many of the metrics I’ve been seeing are much better than the stuff that drove me batty when I wrote my book twelve years ago. You know, stuff like turning highs, mediums, lows into cardinal numbers like 5, 3, and 1, or (worse) 9, 3 and 1, and then doing math on them and claiming the results are “quanty.” Or creating an “index” that uses mystery math to jam a bunch of semi-related indicators into a score that can’t be easily explained, on the theory that because the Dow Jones Industrial average is an index, and we all know that a higher means we’re richer, then our security metric needs to be an index too. These are mistakes anybody can make, and usually do when they start off.

Many organizations have matured their thinking and have gotten religion about measuring things. At a bank I’m familiar with, for example, the GRC team produces a 100-page monthly pack of metrics that cover all areas of technology risk. Many of the metrics count things things that risk or control owners consider important, typically trailing indicators, often with breakdowns by organizational units, and almost always with commentary and correct attribution about sources. The 1,000 or so metrics in this pack are assiduously collected every month and assembled into a polished report. This is wonderful. It is a success. It is also a disaster, because the quantity of data is challenging to assimilate. It is challenging to see the forest for the trees.

Here’s another success disaster: vulnerability management. Everybody in the audience knows what a vulnerability scan is, and what it does. It finds weaknesses and exposures in technology assets, typically on endpoints such as servers and desktops. The tools have gotten very good and produce few false positives. What’s more, there’s a general consensus on an industry-wide rating scheme for measuring severity: the Common Vulnerability Scoring System (CVSS). The market is mature, with well-established vendors such as Qualys, Rapid7 and Tenable.

What’s not to like about vulnerability scanners? They have a consistent measurement system, are accurate and pervasive. If the scanner says something is bad, it must be right? We should fix all “critical” vulnerabilities right away, shouldn’t we? Sounds great. But the problem is that there are too many darned vulnerabilities: millions in the typical large enterprise. What do you fix first? This is very much a success disaster.

These kinds of problems are excellent teachers, because they force you to think differently about the problems. In the vulnerability management space, for example, one must begin with the concession that not all vulnerabilities are cost-effective to fix. Some matter more than others. How important is the asset they are on? And is the vulnerability weaponized? Are attackers actively exploiting the vulnerability in the wild? Both of these are tedious and error-prone processes to do as one-offs, but can be attacked with a bit of engineering. So now you have vendors such as Kenna (founded by one of securitymetrics.org’s early members, Ed Bellis), applying logic over-the-top of the scanners you’re already using. Maybe you don’t need to fix 1 million vulnerabilities. Maybe this week, the only thing you worry about is the one-half of one percent of the vulns, or 5,000 patches relating to a single CVE that other companies are seeing abused by scripted attacks. That is a nice win, even better than the proverbial 80/20.

For coping with success disasters in areas such as risk and control issues, I tend to worry less about the overall numbers of issues, and focus more on the pockets of risk “debt” that aren’t being paid down. Suppose you’ve got 10,000 risk issues and control breaks on the books, across the whole company. That sounds like a lot, but only 250 of them are in your highest-severity bracket. What’s the best way to figure out which ones to attack?

There are many ways to look at the data — for example, finding who has the largest number of high-severity issues, or those with the largest number of longest-aged ones. Mean-time-to-close is another. Personally, I like “velocity” as the right way to look at the problem. Who’s paying down debt fastest, and who’s letting it sit?

I stole a metric from the warehousing industry called “turnover,” which is defined as the number of SKUs flowing through a warehouse, divided into the average inventory. For example, Apple’s inventory turnover in 2017 was 60, meaning it sold through everything in its warehouses every 6 days.

When adapted for issue turnover, we define it as the number of closed issues divided into the average inventory. You don’t get credit for issues you postpone or renew. So for example, if you start with 100 issues on Feb 1, and end with 120 on Feb 28th, that’s bad, right? But what if you closed 65, and added 85? That’s pretty good, because you closed half of your issues during the month. Your issue turnover is 0.5, or when expressed as an annualized figure means your inventory would turn over 6 times per year. That’s actually quite outstanding. Now imagine computing issue turnover by organizational unit and severity of issue. You’d see the high and low performers right away.

This issue turnover metric works well because it is easy to understand and rewards the behaviors we want to see: paydown of issue debt. This is another example of how a success disaster causes us to evolve our thinking, and allows us to prioritize better.

Controls instrumentation offers terrific bang for the buck

When I joined a large investment bank as the MD for technology risk measurement and analytics, I was excited that I’d be able to put some of my ideas about security metrics into practice. I’d done a fair bit of metrics work on a smaller scale in prior roles, but the bank had both the commitment and the resources to do it properly. But what I found out quite quickly after coming in was that the primary use of “metrics” was in demonstrating controls conformance, chiefly for Sarbanes-Oxley and assurance régimes such as SSAE-18. Our biggest customer wasn’t the security organization — it was our external auditors. They needed our data to be able to show quantitatively that the key controls were working. Our second biggest customer was the finance organization, because they ran SOX, although they were less interested in the data than the results.

The “sweet spot” for the continuous controls monitoring program was identity and authorization, which lies at the heart of technology risk management. “No privilege without identity. Approve all privileges. Remove them in a timely manner when roles change or someone leaves the firm.” These were well-instrumented operational processes with well-defined systems to tap for the data. Because we calculated control effectiveness at a very granular level, we could state with confidence whether a particular control was effective or not. We had the data to prove it. No arguments.

A key insight the team had was to being applying a similar approach to a large annual process that many of you are intimately familiar with, the Risk and Control Self-Assessment (RCSA, or as my contact from the Fed calls it, the “ricksa”). If you’ve had the pleasure of doing one, it’s usually an annual exercise that touches the entire enterprise. Both business-managed and control-function-managed controls are included. Everybody does it a little differently, but the basic steps are similar: (1) define “assessment units” that will perform the risk and control assessments; (2) set up the ratings scales for assessing inherent and residual risk; (3) have each assessment unit assess their inherent risk; (4) have each control owner assess the controls that help reduce these risks; (5) synthesize the results, calibrate them, determine residual risk and roll everything up.

All of this sounds nice in theory, but the defects in practice are known.

Because so many people are involved, RCSAs can’t be done regularly; at most, most organizations will do them once a year.
Because the ratings are subjective, a lot of time is spent “calibrating” and “challenging” to try to ensure that nobody lied particularly egregiously. And,
Because of time constraints and the lack of detailed empirical facts about the control environment, assessors must evaluate in a very coarse-grained way, perhaps, at a sub-line of business level at best. What this means is that a significant risk or control weakness affecting a particular asset is steamrollered over by the tyranny of averages.

In short, these RCSA exercises aren’t timely, objective or precise. So what good are they? Based on comments from practitioners, not much good at all. And the regulators know it, which is why they are quite openly fishing for alternative approaches.

What we found was that applying the continuous controls monitoring strategy to RCSA offered a terrific bang for the buck. The key was to do it in a commercial way. For example, consider Dorian’s wonderful Unified Compliance Framework, which offers a consistent and universal taxonomy of controls that can be mapped to every technology or cyber framework, regulation or statute. If you pick just three of these mandates, for example ISO 27000, the EU’s General Data Protection Regulation (GDPR) and the NIST Cyber-Security Framework, UCF will tell you that you need something like 600 controls, with another 300–400 implied. You would never want to automate the measurement of that number of controls. That would not be commercial, and you’d never be done.

Instead, why not pick the 50 technology controls that we know from experience offer the biggest risk reduction potential, and instrument just those? We developed a playbook, which went more-or-less like this: “hey subject matter experts, we think change management, software lifecycle, data quality, tech ops, asset management, intrusion detection etc etc are the most important risk areas. How would you define ‘success’ in these areas? What metrics can we agree on that describe success? Who owns the data?” And then: defining a project plan for sourcing, loading, transforming and refining the data, in waves, so that we can compute the metrics we agreed constitute success. As a sweetener, we bribed the data owners with free labor to get their data into the computing plant.

There are some caveats:

The data is never complete, but that’s ok, because it’s good enough to be indicative… and certainly better than “1-5 scales” that are based mostly on opinions leavened with a few facts.
The early results are always ugly, but that’s ok, because un-instrumented controls are always ugly the first time one sees the data. But nobody ought to get fired if the data’s all new and the control implementers haven’t been given time to fully adopt or get their performance in shape.
And it takes time, but that’s ok, especially if one sequences the plan to deliver quick wins first

In short, having a rigorous plan to delivery incremental value of a small number of representative metrics makes assessment processes more timely, precise and objective. It’s important to keep the exercise limited to key controls that you can tangibly measure. And it is critical to keep reminding everybody about all of the cost and complexity that’s being removed — typically, millions of dollars of labor that is largely guesswork.

Audience is everything

People want data for different reasons. And people consume data differently. What might seem good to you might be Greek to someone else. As a rule, I believe that when we build exhibits and reports, we tend to condescend to the reader. We assume that if we don’t lard exhibits with lots of reds, yellows, and greens, the person who is reading it won’t get it. Or we use simple pie and bar charts that waste space and are not data-dense. I ranted about this in my book a long time ago, but it’s still true. I rarely see information graphics related to security metrics that are more complicated than one-dimensional, for example, categorical data displayed as a bar chart. This is understandable in many ways, because most information graphs used in high-volume reports don’t need to do too much. They’re not there because they provide a lot of diagnostic power. They are meant to just get a simple message out. But is the message even right? If you don’t know who your audience is and what they want, it can’t possibly be — and so you are forced to keep it simple. If you knew your audience better, you could take them along much further, with more relevant and powerful metrics.

When I look at published metrics and exhibits, I ask five questions that have a simple mnemonic: A-B-C-D-E.

A is for Audience. Do we know who we’re putting our metrics in front of? Do we know what they want?
B is for Behaviors. If you’re looking at a chart of exhibit, what behaviors do I want the audience to change based on the inferences or conclusions in the data?
C: can I Concisely and clearly communicate, in the simplest way possible, the data I that the audience will need to make…
D: …the Decisions based on the data I put in front of them?
E: Lastly, does my data include commentary with an Editorial voice that showcases my expertise and provides context to guide the audience to the decision?

Because Audience is everything, you have to start there. That’s a key lesson I’ve learned personally over the last dozen years.

Outside of the security field, I two relatively new disciplines have emerged as Things that people specialize in that relate to the question of Audience. The first is data visualization as a discrete field of study, and a sub-field related to information dashboard design. For data visualization (or “data vis”), toolsets such as Tableau, D3 and GGplot have turned visualization into a rich grammar that can be programmed, layered and reused. And websites like Information Is Beautiful and Flowing Data celebrate novel ways of mashing up and showcasing data. Stephen Few has been doing pathbreaking work on dashboard design — I can’t recommend his work highly enough, because of the rigor with which he approaches make-overs of the sorts of dashboards that we are all showing our bosses. As security and risk professionals, we all benefit from the increasing formalism of the field of data visualization, and from efforts to promote more “visualization literacy.”

Data journalism is the second Thing I’ve been following that benefits our field, and it too relates to Audience. Made mainstream by Nate Silver’s FiveThirtyEight election prediction work, nearly every premier news publication has invested in what is now called data journalism. Data journalists are either quants like Nate who happen to write persuasively, or data-curious journalists that got their Nerd on and developed a niche. The essence of data journalism is telling stories with data. Notable publications that are doing this really well include the New York Times, which has been doing some extraordinary data journalism over the last ten years; the Economist, which has always had excellent, honest, sound data graphics but has recently gone much deeper into analytics; and of course, the now-ESPN-owned fivethirtyeight.com. And academics such as Alberto Cairo are also doing incredible work in this space.

A few years ago I made a highly speculative hire — I hired the head of the data journalism team from a major business publication. The theory was, we’ve got lots of data, but we’re doing a crap job telling the story. Let’s see if we can bring in someone with a hybrid skillset. She writes well, and fast — is used to writing on deadline. As a reporter, she’s got a nose for the headline. And she’s got data chops. Maybe not like a full-on data scientist would, but hey, give it time. It turned out she was exactly what we needed. It was a true win-win… the bank got a massive upgrade in clarity and impact. And my new team member was happy as a clam because by making the jump into financial services, we were also able to raise her compensation by a very healthy amount.

The point I’m trying to make here is that the skills that made our data journalist such a valuable member of the team was, more-or-less, ABCDE. In short: knowing your audience, what they want, and what you want out of them. And then, constructing the simplest and most efficient narrative that encourages inquiry, while also making setting the stage for decisions that shape behavior.

This talk was meant as a retrospective, so I could have talked about any number of things. I mentioned these five trends…

data-driven security
“AI” in security
success disasters as teachers
controls instrumentation
audience focus

…because they represented topics that I’ve learned a lot about, and that have benefited the industry. Thanks for listening to this rather old-school speech — no slides — and I look forward to seeing what Metricon XX will bring.

The Twenty-Year War on Cybercrime

2015-06-06T00:00:00-04:00

This is the text of a speech I delivered at the Gartner Group Security and Risk Management Summit in June 2015. I originally wrote the speech for Sir Roger Carr, the Chairman of BAE Systems, to use at one of his public appearances. But it was too good not to re-use for myself as the BAE Applied Intelligence’ strategy lead. I felt no shame in doing so, seeing that I’d written it…

Introduction

Good afternoon. Thank you for coming. It is a privilege to speak with you today. I’ve been asked to speak to you about digital crime: its rise, its significance, and what can be done about it.

But I also know that I am the last thing between you and beer, so I will keep this talk as short and sweet as I can.

Certainly, “cyber security” (I hate that phrase, but there we are) is a topic that can be treated lightly, and it is ambitious to try and cover the whole subject in 20 minutes. Nonetheless. I will discuss the rise of digital crime: how criminal enterprises, state-sponsored actors, and other parties are robbing the industrialized world of its secrets and personal information. I’ll discuss the impact that these activities have on businesses, citizens and governments. And I’ll discuss what can be done from our perspective as BAE Systems, one of the world’s largest defense contractors and providers of digital crime solutions.

Introduce Self

But first, allow me to introduce myself and BAE Systems.

I am the strategy officer for BAE Systems Applied Intelligence. I’m a recovering analyst; you might know me from, as they say on late-night TV, “another network,” in this case Forrester, where I covered data security and mobile security, and advised hundreds of enterprise clients on these topics, and on security strategy. I also wrote a fairly well-regarded book on security metrics called, funnily enough, “Security Metrics.”

Introduce BAE

Most of you probably know BAE Systems because of the work we do with the UK government and the Ministry of Defense. BAE’s role is to safeguard and enhance our customers’ vital interests. We have a robust defense business: we build aircraft such as the the Typhoon; we build, service and repair naval ships; we make land-based armaments, such as the Bradley Fighting Vehicle; and we are a key supplier to aerospace and defense companies worldwide.

Most of you probably do not know that we have a billion-dollar risk and security business, which we call Applied Intelligence. We are probably the largest cyber-security company that you have never heard of. We have over 5,000 customers in many industries in three continents, with a concentration in financial services. We secure our customers’ intellectual property and their email; we detect fraud and reduce the cost of compliance; we help them identify and reduce their financial and reputational risks; we host their key collaboration services; and we monitor and defend their networks from intrusions.

All of these activities give us a unique vantage point on the challenges of cyber-security, and on the problem of digital crime.

The Rise of Digital Crime

First, let’s talk about the rise of digital crime: what it is, and what it means. When we speak about “digital crime” we mean the use of computers either as the main component of, or as an accessory to, criminal activities that result in financial gain or in competitive advantage. Broadly speaking, “digital crime” includes all dastardly deeds that span cyber-crime, financial crime, fraud, and insider activity. The common element is that unlike purely physical crimes — for example, pickpockets on a crowded subway car, these crimes rely on technology in some way.

Increasingly, we see significant interplay between the different types of digital crime. Cyber is a key enabler of financial fraud, of healthcare fraud, and of the theft of industrial secrets. As reported by Scotland Yard in April 2014, seven out of ten financial fraud offenses involve cyber in some way. And because every part of society is becoming increasingly automated, instrumented, and network-connected, we expect that cyber will be involved in an increasingly large proportion of crimes over the next few years.

Two types of threat actors: nation-states and criminal enterprises

Today, digital crime is perpetrated by two main types of actors: nation-states and criminal enterprises. Many of the most important cyber incidents that you have no doubt read about over the last five years have involved nation-states. These nation-states are engaged in state-sponsored hacking and industrial espionage on a grand scale. Two years ago, for example, US forensics firm Mandiant revealed that an elite hacking unit of the People’s Liberation Army was responsible for stealing industrial secrets from the U.S. defense industrial base, leading security software firms, and other businesses. More recently, North Korea stands accused of penetrating the networks of Sony Pictures to embarrass executives and steal intellectual property.

The goal of these types of state-sponsored cyber activities is to obtain industrial secrets for sovereign advantage. The adversaries are advanced, persistent, and most certainly a threat.

Criminal enterprises present a danger of a different sort. Their goals are to obtain what one might call “toxic data”: payment card details; personal health information; and personally identifying information, such as pension and other government identifiers. This information is fungible, and can be sold on black markets for profit, or to commit identity theft — at which point it is used for fraudulent financial purposes.

Some examples. Last year, the U.S. retailer Target suffered from a data breach that caused the payment card details of over 40 million customers to be stolen, plus the personal details of over 70 million additional customers. And last month, the healthcare company Anthem was breached, exposing millions of healthcare records. A Bloomberg report suggested that the real target of the Anthem breach were the employees of its customers, which included Northrop Grumman and Boeing. Attackers were in effect using weaknesses found in Anthem’s defenses to get to these other companies.

The advantages attackers have over defenders

Although both classes of attacker — state-sponsored actors and organized criminal enterprises — have different objectives, they have several things in common, which give them advantages over their targets, who must defend themselves:

First, both classes of adversary are supported by an integrated criminal supply chain. The supply chain is fully stratified, with loose networks of cyber weapons suppliers, middle-men, intermediaries, distributors, and 24 x 7 support providers. The wheels of this supply chain are helpfully greased by digital currencies such as BitCoin, which enable the anonymous exchange of funds between buyers and sellers.
Second: both classes of adversary are highly creative, willing to use all means at their disposal. These means include hacking, lying, fraud, identity theft, infiltration, and compromising trusted suppliers. They also include the use of any and all channels: phone, cyber, wi-fi, in-person and physical.
Third: both depend on the fact that their victims’ networks are increasingly far-flung, cloud based, and porous. With the advent of mobile, cloud, social networking, consumerization, and extended digital supply chains, companies must deal with exponentially more complexity in their networks than they did just ten years ago.

But it gets worse. You may not know that that the lingua franca of the Internet, the TCP and IP protocols, were never designed to be secure. They were designed to make the Internet resilient, to allow packets to flow to their destinations even when parts of the infrastructure were damaged. Every security protocol we have, was written — after the fact — to flow on top of those resilient, but insecure, protocols. Because security was never woven into the basic building blocks of the Internet, attackers inevitably find flaws in the ones we’ve fitted on top of them.

Against such a backdrop, the adversary is always assured of asymmetric advantage. Defenders have to get it right all the time. Attackers, just once. To use a colloquial phrase, one might expect that for attackers, this should be rather like shooting fish in a barrel. And indeed it has been.

The Impact of Digital Crime

The impact of digital crime is significant no matter how one chooses to measure it.

The cost of digital crime begins with the direct costs; the cost of cleanup, notifications to customers, and fines. Target stores has spent almost $150 million cleaning up after their data breach. Heartland Payments Systems, a payment processor, was breached in 2008 and had over a hundred million payment card details stolen, with direct costs from the breach totaling nearly $150 million, only 30 million of which was covered by insurance. In general, industry analysts estimate that breaches of customer information can cost victims — companies and customers — millions of dollars. But the criminals nearly always make a mint: the gangs that broke into Target, for example, may have made over 675 million dollars of profit.
The cost of digital crime includes the damage to the victim’s reputation. A significant breach can cause significant personal embarrassment to executives and to customers. The co-chairman of Sony Pictures, for example, was forced resign last month because her company’s security was so poor. The CEO of Target stores resigned because of its hack. Security is indeed becoming a board level issue in the sense that people are getting fired because they don’t have enough of it.
The cost of digital crime includes changes in stock price and profits in the wake of a security breach, although these are usually temporary. Often overlooked are the inevitable class-action lawsuits that arise against public companies after data breaches. The management of Heartland Payment Systems has spent over five years defending itself against 27 separate consumer and institutional class-action lawsuits.
Finally, the cost of digital crime includes the loss of trust of one’s customers. Once lost, it is often difficult to regain. This is particularly challenging with firms that sell to other businesses. In the Heartland case, after years of growing its merchant base at double-digit rates between 10 and 20 percent, in the 2 years following the breach, merchant growth went into reverse, dropping 2%.

(more examples here…)

These costs — direct costs, damage to reputation, stock price and profit drops, lawsuits, and loss of trust — are significant costs for any individual organization to bear. Taken in aggregate, the near-continuous stream of bad news leads to a gradual erosion of trust in digital business in general.

What can be done

The problems associated with digital crime are complex. So are the solutions, but that is in part because of the way we as customers, suppliers and national governments have been thinking about the problem of digital crime. We need to think differently. We need to think deeply. And we need to think quickly.

Systems thinking, not silo thinking

First, we need to think about systems as a whole, and not about silos.

To use an analogy, consider the West’s responses to various failed and successful hijacks of aircraft by terrorists. The first plots were revealed in 2006. A plot was foiled to detonate liquid explosives on 7 airplanes over the Atlantic. These explosives were peroxide-based and easily disguised in drinks bottles. After foiling the plot, the US and UK airline authorities duly banned bringing liquids through airport security. In September 2001, the 9/11 hijackers took control of airline cockpits using knives and box-cutters. Authorities duly prohibited knives and box-cutters on flights. Then, in December, show-bomber Richard Reid tried to set off a PETN-based bomb embedded in his shoe; the plot was foiled. Authorities duly forced passengers to remove their shoes.

Security expert Bruce Schneier argues that none of these things have made any difference in minimizing the risk of hijackings. Only two things have: reinforcement of cockpit doors, and the fact that passengers are willing to fight back against attackers.

Whether you agree with Bruce or not on this point, you can surely agree that the pattern used for preventing hijackings is “silo thinking”: looking for the artifacts used in the last attack and hoping that strategy will be effective in preventing the next one. Enumerating the things that are bad, rather than spotting the patterns that are bad.

In cyber, we have been following a similar script. Consider the case of Target stores. Target suffered a horrendous breach; most people can appreciate the seriousness of that. What is less appreciated is that Target was compliant with the industry standard for security at the time of the hack: the Payment Card Industry’s Data Security Standard (PCI-DSS). By definition, Target owned and operated:

anti-virus software
firewalls
intrusion detection systems, and:
log management software to filter through security device logs.

All of these items are mandated by PCI-DSS and are required to be installed on systems that process cardholder data.

In addition, the retailer also operated a security operations center in Minnesota. It had installed a $1.5 million advanced malware detection system, FireEye, which did detect the malware that ultimately compromised its network.

In short, Target could not possibly have been accused of skimping on security.

What happened? Target’s failure came down to something fairly simple: the various silos of security did not talk to each other. Target’s advanced malware detection system saw the malware and created an alert. But the information was not acted on by Target’s staff. It was lost amidst the noise, or not presented in a relevant or timely way. Target did not arrive at the conclusions they needed to fast enough, which was not “you’ve got malware” but: “your point of sale systems are being taken over by a criminal enterprise.” In short, Target’s tragedy was the failure to think of its data sources, individual security systems, directories, suppliers and point-of-sale terminals as a single, interconnected system, and to attach relevance and meaning to the patterns of behavior seen within it.

A system, in the broad definition, is a set of connected technologies or processes that form a greater, more complex whole. Target thought it had a system in place, but it’s clear it only had silos: FireEye, the Bangalore team, the Security Operation Center in Minnesota, and many individual security technologies. When needed the most, they acted (or didn’t act) separately.

When we rethink security, we must re-imagine security processes as an integrated whole. Systems thinking. To prevent and detect attacks, one must integrate all the elements — email, networks, physical, web, monitoring systems and many others. The components don’t all have to be from the same company, but they need to be integrated in such a way that the data flows seamlessly. Crucially, the information needs to be filtered and packaged so that it can be rapidly assessed, evaluated and acted on by human analysts.

Getting the full picture of risk

Second, we need to think about the full picture of risk.

Digital crime, particularly cyber crime, does not happen in a vacuum. Regardless of whether an attacker is trying to steal secrets, purloin personal information or launder lucre, nearly every type of digital crime can be reduced to a few common steps.

The attacker must plan his “campaign”: perform reconnaissance, communicate with confederates, collect insider information, create exploits, or infiltrate a network of people.
The attacker must commit his crime: break into a system, steal an identity, launch a denial of service attack, abuse administrator privileges, or use non-public information.
The attacker must harvest his gains: purchase or sell goods, make fraudulent claims, sell secrets, or launder money.

Every method used in these steps generate some sort of tell-tale signal or artifact: a phone call, an entry in a log, a transaction, an intrusion alert, a payment or a sale.

Appreciating the full picture of risk means having full knowledge, within the span of your control, of all of these artifacts. It means having the ability to sift through noise to find signal. It means acquiring, analyzing and acting on information at high speeds and at large scales. And it means having effective processes, technology and skills to spot anomalies, communicate them coherently, and act quickly.

Scaling up

Third, we need to scale up.

The problems of digital crime are complex, critical and costly. I will explain this by way of example. Much of the work that we are inspired to do by our customers are multi-billion-pound problems, for example:

First-party financial fraud costs institutions $18 billion a year globally
Intellectual property stolen from U.S. firms costs $300 billion every year
U.S. health care fraud costs insurers and the government nearly $75 billion annually, of which over $6 billion is cyber-related
Tax fraud globally is estimated at 5% of the total global economy: over $300 billion in the US and over $100 billion here in the UK

What unites these problems is that they are sufficiently large to escape the grasp of any one company, institution, or government. Effective approaches must necessarily be multi-company, industry-wide, and transnational in scope. For complex, critical and costly problems, only large-scale solutions will suffice.

For example, here in the UK, we work with the Insurance Fraud Bureau. Software supplied by our Applied Intelligence unit analyzes every auto and property insurance claim submitted by every claimant in the country. This industry-wide capability has resulted in over 600 arrests and a large reduction in the amount of insurance fraud committed. This is not something that could work for a single insurer. This truly is a Big Data problem.

Here in the United States, we are working with several state insurance agencies to reduce medical insurance fraud, again, as an industry-wide solution within each state. We provide essential network security services for nearly 15% of all American banking and credit union institutions. We monitor a quarter-million daily transactions processed by a New York-based clearing house, about $1.2 billion worth of instruments every day.

These are all examples of how having a multi-company, transnational vantage helps solve industry-wide problems.

Conclusions

The three strategies I’ve described — employing systems thinking, not silo thinking; getting the full picture of risk; and “scaling up” to span industries and international boundaries — are key to solving the complex, costly and critical problem of digital crime. But these items will not be sufficient in and of themselves. Because what we also need as businesses, as consumers and as society as a whole is a new mindset.

The risk intelligence mindset

The mindset we need to adopt is a more informed, intelligent approach to thinking about and managing risk: “risk intelligence” if you like. Not every plan to protect the business will be perfect. It is impossible to imagine a world in which there is no fraud, no theft, and no successful cyber-attacks. BAE might well wish it could sell silver bullets in addition to the conventional kind, but silver bullets do not exist.

What I mean by “risk intelligence” is that customers have enough information to act, even in conditions of uncertainty. I mean that when customers’ most well-considered security and risk plans fail, they can still act decisively, and can make decisions appropriate for their businesses. Customers need to be able to:

quickly acquire data about risks and threats at the highest level that could affect them and their customers;
effectively analyze the data on hand to create information that can be put to use; and then:
decisively act on that information to achieve better business outcomes: for example, reducing fraud, repelling cyber attacks, or rapidly responding to a break-in.

Learning from John Boyd

There is a precedent for this type of thinking, and it comes courtesy of BAE’s main business, the military business. In the 1970s American military strategist Colonel John Boyd wrote about something called the “OODA loop,” which stands for Observe, Orient, Decide and Act. Boyd theorized that in combat conditions, one must:

Observe the enemy’s movements;
Orient oneself by creating a mental picture of the situation;
Decide on the courses of action available, and then:
Act decisively

Boyd believed that the combatant who can observe, orient, decide and act fastest would win the battle. This means achieving the clearest and most accurate conception of battlefield position, and then taking action, as fast as possible. Boyd also believed that a combatant who can observe, orient, decide and act faster can overwhelm his adversary’s decision-making capability, achieving victory in a fraction of the time required by conventional warfare.

That was why Hitler’s blitzkrieg attacks were so effective. It is why the US-led Operation Desert Storm, for which Boyd was a key architect, was able to conquer Iraq — a country whose territory is nearly twice the size of the UK — in less than four days.

It is also why digital crimes take days, months and years to detect. Adversaries are able to observe, orient, decide and act much more quickly than their victims.

So, when we say that to properly combat digital crime, we need “risk intelligence,” we mean quickly acquiring data, effectively analyzing it, and decisively acting. In essence: speeding up customers’ own analytics and decision-making processes to match or exceed the speed of the adversary.

Result: make customers’ jobs easier

Imagine a world where risk intelligence becomes the norm. Done right, our customers’ jobs become simpler. Today, the Chief Information Security Officer’s role in most organizations is to catalog all of the vulnerabilities in the environment; prioritize them; and then serially eliminate them one after the other. He or she buys many best-of-breed products to solve many narrow problems. Along the way, he or she writes policies that few people read, and some business unit owners actually regard as harmful. He or she spends valuable staff time answering hundreds of pesky audit questionnaires. That is the day job.

The after-hours job is what happens when the company is actually compromised or subjected to fraud or attack. In these circumstances, the Security Officer scrambles, dodges, and weaves before making the best of a bad situation. Because policy is prioritized over speed of decision-making, the Security Officer is always caught by surprise.

In future, the Chief Information Security Officer’s job will be measured not by the pound — that is, by the weight of policies produced and purchase orders placed. It will be measured instead by the tick — that is, by the number of ticks of the clock between when the adversary initially acts, and when he or she is able to acquire, analyze and act in response, or in advance of the adversary’s next move.

Parting thought

I will close with a quote from Sir Winston Churchill:

“Want of foresight, unwillingness to act when action would be simple and effective, lack of clear thinking, confusion of counsel until the emergency comes, until self-preservation strikes its jarring gong – these are the features which constitute the endless repetition of history.”

Let us learn from history.

Thank you for your time and attention today.

The DevOps Security Handbook: Building Security In With Chef, Part III

2013-10-06T20:15:00-04:00

Introduction

This is the third in a series of occasional posts about security and DevOps. The ultimate goal of this series is to show how to build a reasonably secure Apache web server using the popular DevOps automation tool Chef. The server will be suitable for serving static content such as that generated by OctoPress. Each post explores a new aspect of Chef.

If you read the first and second posts in this series, you learned how to set up the Chef workstation and server; created webserver and base roles; created a test environment and a virtual machine; and built a partially hardened server called tester.local. This server has a minimized Apache configuration, and a restricted OpenSSH configuration.

In this post, I will demonstrate one of the most challenging aspects of any server automation project: copying sensitive keying materials, such as SSL private keys, to server nodes. Although SSL certificates themselves are not sensitive, certificate private keys are. In order to use Chef to truly “build security in,” these materials must be securely conveyed from the Chef server to the target server nodes. To do this, you will use Chef’s encrypted data bag feature and an add-on feature called chef-vault. You will create a custom cookbook recipe that performs all of the necessary decryption and file-creation actions on the target node. At the end of this post, you will possess a repeatable, reliable and secure method for conveying SSL keying materials or other secrets to target nodes.

Generate self-signed SSL certificate

To use SSL with your webserver, you must have an SSL certificate. In production environments, you will likely use a certificate signed by a public certificate authority, such as VeriSign, Thawte, or GoDaddy. But you can also use a self-signed certificate, which you will do here. At the command line, change to your chef-repo/.chef directory. Type:

openssl genrsa -aes128 -out tester.local.key-with-password 2048

This creates a 2048-bit RSA key and wraps it with 128-bit key secured by a password. You should see output similar to the following (and will be prompted to enter a password):

Generating RSA private key, 2048 bit long modulus
........+++
..+++
e is 65537 (0x10001)
Enter pass phrase for tester.local.key-with-password:
Verifying - Enter pass phrase for tester.local.key-with-password:

In most situations it isn’t desirable to have a password protecting the actual key file, because when you start Apache, it will block until the password is entered. If you have a large-scale infrastructure, you don’t want to type in passwords every time some random server starts up. (As a compensating control — later on in this post — you will make the key file accessible only to root.) To ensure that Apache starts cleanly, let’s remove the password. Type:

openssl rsa -in tester.local.key-with-password -out tester.local.key

Enter the password when prompted. A new key file tester.local.key will be created. Remove the old password-protected key file:

rm tester.local.key-with-password

Next, generate a certificate signing request (CSR). Type:

openssl req -new -key tester.local.key -out tester.local.csr

Enter data used in the certificate: country name, state, locality, organization name, OU name, common name, and email address, as shown in the sample output below:

You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:Massachusetts
Locality Name (eg, city) []:Boston
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Markerbench
Organizational Unit Name (eg, section) []:SSL Test Certificate Directorate
Common Name (e.g. server FQDN or YOUR name) []:tester.local
Email Address []:nobody@markerbench.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:

Most of the fields are mandatory, but only one really matters: the common name. This must match the name of the webserver host; in this case, tester.local.

The CSR will be written to tester.local.csr. After it is created, create a self-signed certificate by typing:

openssl x509 -req -days 365 -in tester.local.csr -signkey tester.local.key -out tester.local.pem

You will see this output:

Signature ok
subject=/C=US/ST=Massachusetts/L=Boston/O=Markerbench/OU=SSL Test Certificate Directorate/CN=tester.local/emailAddress=nobody@markerbench.com
Getting Private key

The certificate will be written to tester.local.pem. You can verify the certificate contents by using OpenSSL’s x509 command:

openssl x509 -in tester.local.crt -noout -text

You should see output similar to this:

Certificate:
Data:
    Version: 1 (0x0)
    Serial Number:
        90:aa:b2:e4:06:ca:50:32
    Signature Algorithm: sha1WithRSAEncryption
    Issuer: C=US, ST=Massachusetts, L=Boston, O=Markerbench, OU=SSL Test Certificate Directorate, CN=tester.local/emailAddress=nobody@markerbench.com
    Validity
        Not Before: Oct  5 21:05:32 2013 GMT
        Not After : Oct  5 21:05:32 2014 GMT
    Subject: C=US, ST=Massachusetts, L=Boston, O=Markerbench, OU=SSL Test Certificate Directorate, CN=tester.local/emailAddress=nobody@markerbench.com
    Subject Public Key Info:
        Public Key Algorithm: rsaEncryption
        RSA Public Key: (2048 bit)
            Modulus (2048 bit):
                00:c4:3b:79:55:78:15:c2:82:a6:e3:e9:f0:64:c7:
                …(content omitted)
                56:e1:57:0d:b0:e0:37:31:19:ee:31:95:8f:2f:a6:
                c3:3b
            Exponent: 65537 (0x10001)
Signature Algorithm: sha1WithRSAEncryption
    1e:a6:1a:27:d3:5d:08:bc:ad:00:df:4e:6a:5b:4c:a4:be:80:
    …(content omitted)
    23:0e:02:be:3e:e8:89:75:58:03:7d:70:ac:13:a3:f4:d5:02:
    2e:d8:58:7f

Congratulations. You have created a self-signed certificate called tester.local.pem in the local directory, and a corresponding private key file tester.local.key.

Installing Chef-vault for distributing secrets

With the certificate and private key created, the next challenge is to use Chef to copy these two files to the correct locations on the web server. SSL certificates are stored at /etc/ssl/certs; private keys are stored at /etc/ssl/private.

There are many ways to convey the SSL certificate and key to the webserver. The easiest way would be to include these files in a cookbook as file resources, then use a recipe to copy them to the correct locations on the server. That is easy, but not very secure because the keying materials would be stored as part of the cookbook, and therefore in the clear. (The Git source tree contains your cookbooks and all of their supporting files. If the SSL private key were checked in as an unencrypted regular file, other people would be able to see it.)

It would be much nicer to store the SSL private key in an encrypted form — somehow — so that it can be copied to the server without worrying about who sees it (an attacker would just see ciphertext). After it is copied the server note, it can be decrypted in-place and moved to the correct destination in /etc/ssl/private.

Chef has some tools that make conveying secret materials easier. It provides a construct called a data bag for storing custom configuration items and other materials that target nodes need. Data bags are essentially hash maps (otherwise known as associative arrays, or in Rubyspeak, “hashes”) that are stored on the Chef server and retrieved by target nodes when chef-client runs. Data bags items can be encrypted. To encrypt a data bag item, you pass an symmetric encryption key (or password) to the knife data bag create command. For example:

knife data bag create certs tester_local_key --secret-file /tmp/my_secret_key

…where my_secret_key is a secret key generated, for example, by OpenSSL. To decrypt the item on target nodes, the target nodes perform an equivalent decryption operation, passing in the same secret key used to originally encrypt the item.

Go back and re-read that last paragraph. See anything problematic? Nodes that need to decrypt the data bag item need the the secret key used to encrypt it. That might sound obvious, but the it raises a question: how does the secret key actually get copied to the nodes? The Chef documentation is silent about how this is done; it leaves the problem of key management as an exercise for the reader. We are left to assume that the secret key is “somehow” copied to the target nodes.

How should this be done? You could copy the secret key using a Chef recipe, but you’d have the same problem all over again — sensitive keying material would be exposed. You could, instead, copy the secret key manually to the node over SSH, but that defeats the whole point of Chef — automation of configuration tasks.

It would be much nicer if there was a way to encrypt sensitive materials without requiring lots of complicated key management. Sadly, Chef does not provide such a method. However, a clever programmer named Kevin Moser, who works for Nordstrom, has created Chef plugin called chef-vault that solves the key management problem rather elegantly.

Kevin’s chef-vault tool takes a clever approach. Chef-vault uses a type of key encapsulation to protect secret materials using the public keys of target nodes that need them. These public keys are the same ones the nodes use to authenticate to Chef. Because these keys must, by definition, already exist, using them for encryption creates no extra work for you. Essentially, Chef-vault’s encrypt operation does the following:

Creates a symmetric encryption key (“secret key”). This secret key encrypts the plaintext (the thing you want to encrypt), and creates a ciphertext.
Adds the ciphertext to the data bag.
For each target node, encrypts the secret key with the node’s public key, creating an encapsulated key blob. The same operation is repeated for authorized users using their public keys as well
Adds each encapsulated key blob to the data bag

The result of Chef-vault’s encrypt operation is a data bag that contains an encrypted item for the secret being protected, and, for each authorized user and target node, an encrypted data blob that allows each user or node (and only that user or node) to recover the encryption key and, thus, decrypt the encrypted item. Chef-vault (essentially) extends Chef’s data bag structures to use its own public-key encryption system so that secret keys can be conveyed securely to target nodes. This rather neatly solves the secret-key distribution problem.

Enough talk. Let’s install chef-vault. Normally, you would use the Ruby command gem install chef-vault to install it. However, as of this writing, only chef-vault version 2.1 has the ability to encrypt entire files. That ability is missing in the version of chef-vault in the Gem repositories. So you must build it yourself using the latest version from Github.

At the command line, change to a directory outside of your chef-repo directory. Type:

git clone https://github.com/Nordstrom/chef-vault.git

You will see output similar to this:

Cloning into 'chef-vault'...
remote: Counting objects: 667, done.
remote: Compressing objects: 100% (343/343), done.
remote: Total 667 (delta 324), reused 619 (delta 279)
Receiving objects: 100% (667/667), 107.30 KiB | 0 bytes/s, done.
Resolving deltas: 100% (324/324), done.

Change to the new chef-vault directory and build the Gem:

cd chef-vault
gem build chef-vault.gemspec

You will see output similar to this:

WARNING:  no homepage specified
WARNING:  description and summary are identical
  Successfully built RubyGem
  Name: chef-vault
  Version: 2.1.0
  File: chef-vault-2.1.0.gem

Install chef-vault by typing:

gem install chef-vault-2.1.0.gem

Ruby will report successful installation as follows:

Successfully installed chef-vault-2.1.0
1 gem installed
Installing ri documentation for chef-vault-2.1.0...
Installing RDoc documentation for chef-vault-2.1.0...

Creating an encrypted vault for the SSL certificate and key

With chef-vault installed, you can now use it to encrypt sensitive materials and convey them securely to nodes. As an example, let’s encrypt the SSL certificate file. Change back to the chef-repo directory. Type the following, where username is your Chef username:

knife encrypt create certs tester_local_pem --mode client --file .chef/tester.local.pem -A "username"

This Knife encrypt command tells chef-vault to encrypt the contents of file .chef/tester.local.pem (the SSL certificate) and to authorize the user username to decrypt or update its contents. You can use any valid Chef username, or multiple usernames separated by commas. (If you need to find out your Chef username is, type knife user list.)

The contents of the encrypt operation are added to a vault named certs. The vault is backed by a data bag with the same name. You can verify that the data bag certs exists by typing:

knife data bag list

You will see the data bag certs in the list. You can see the items added to the data bag via the knife data bag show command. Type:

knife data bag show certs

You will see the following items:

tester_local_pem
tester_local_pem_keys

The first item, tester_local_pem is a hash that contains the encrypted contents of the file. The second item, tester_local_pem_keys, is a hash containing the list of authorized nodes, users and their associated public-key-encrypted blobs.

Take a look at the encrypted file contents. The command for viewing data bag items is knife data bag show name-of-data-bag name-of-item. Type:

knife data bag show certs tester_local_pem

You should see output similar to the following:

file-content:
  cipher:         aes-256-cbc
  encrypted_data: QpB63Qv2650jwmWfj3IX4iXAIoGz8WZkggoV+wbLyI0T4nUivD5QBovdjtJU
  YkhI9QOrbW55HVwew7tLW+ee0cetjZm+Amaa0Gyo8ehBsTbRAeY3jkdWv8Ia
  …
  (content omitted)
  …
  jGAUa+xcdDedmBSiRxoUwrjSq85hnAGwmKovXqKZeK4=

  iv:             kOrZ5kIrTCmwRloUodCtgA==

  version:        1
file-name:
  cipher:         aes-256-cbc
  encrypted_data: kEL5rHzmx85diXKC1AL7EXdEID+SC1E58GuNFBeu9lK1k+Bv5GbcQXK/iDtS
  L8tQ

  iv:             xEhV676bjE4SwVYZZwkFtw==

  version:        1
id:           tester_local_pem

As you can see, the file_content hash key contains the a child hash containing the cipher (AES-256-CBC), initialization vector, and the encrypted data blob. The file-name hash key contains similar data that corresponds to the original file name (which is also encrypted).

Let’s look at the encryption keys. Type:

knife data bag show certs tester_local_pem_keys

You should see output similar to the following:

admins:  arj
arj:     SDqZuaFrpy28YOSDDhkyDmDBLPZHuRSDXjOgHklnaetDjl8QI7zuTvznmg1Q
…
(content omitted)
…
f+s7gdSVBZ0el7Uc9gDhOFZA0hz0ADqcIPd2hA90PQ==

clients:
id:      tester_local_pem_keys

Here, the admins entry’s value is arj, indicating that the user arj is authorized to decrypt or update the contents. The arj entry contains the secret key, encrypted with arj’s public key. Of course, instead of seeing arj you will see your own username. Note that the clients entry is empty because no nodes are authorized to decrypt yet.

You can decrypt the secret by using the knife decrypt command. Type:

knife decrypt certs tester_local_pem file-content --mode client

This command decrypts the payload stored in the key path tester_local_pem > file-content in data bag certs. Because your Chef user is an authorized user, you should be able to see the decrypted content. It starts with the string -----BEGIN CERTIFICATE-----. Compare the output to the contents of the file .chef/tester.local.pem; the contents should be identical.

At this point, only one party (your Chef user) is authorized to decrypt the certificate. Of course, the web server needs to be authorized also! To authorize more users or nodes after the original knife encrypt create operation, use the knife encrypt update command. In this case, you want to authorize the tester.local node that will actually use the SSL certificate. Type:

knife encrypt update certs tester_local_pem --mode client --file .chef/tester.local.pem -S "name:tester.local"

You can examine the contents of the data bag by typing knife data bag show certs tester_local_pem again. If you do that, you will see that the contents of the data bag item are much the same as before, although the initialization vector file-content > iv and encrypted data blocks encrypted_data are different. That is because the vault has re-encrypted the contents with different keys.

Examine the tester_local_pem_keys entry. Type:

knife data bag show certs tester_local_pem_keys

You will see that the contents of this entry are now a little different:

admins:       arj
arj:          SDqZuaFrpy28YOSDDhkyDmDBLPZHuRSDXjOgHklnaetDjl8QI7zuTvznmg1Q
…(content omitted)
f+s7gdSVBZ0el7Uc9gDhOFZA0hz0ADqcIPd2hA90PQ==

clients:      tester.local
id:           tester_local_pem_keys
tester.local: y7DM7oQZj9+Yd5oRLFA4eSVOZ/+g/NYUNjfMJvsxxd1Nv85yLigzjb1JlaYm
…(content omitted)
yYWtFXX47765NivPGNTszfJQ8igNNBy1+YvfQn/wNw==

You can see that the clients entry contains the value tester.local, and that a corresponding encrypted data blob named tester.local has been added. Splendid!

With the certificate correctly added to the vault, let’s add the private key. Instead of doing a two-step process of creating the encrypted data items and then authorizing the node, let’s do it in one step by supplying both the user and node to the create operation. Type:

knife encrypt create certs tester_local_key --mode client --file .chef/tester.local.key -A "arj" -S "name:tester.local"

Substitute your own username instead of arj, of course.

If you want to, you can verify that the SSL private key was added successfully by typing the now-familiar knife data bag show certs tester_local_key command.

Creating a cookbook for configuring SSL

At this point you have added the SSL certificate and its corresponding private key to the encrypted data vault certs. Now you need to get the vault’s contents over to the target nodes so you can create the certificate and private key files.

First, create a new cookbook called ssl-config:

knife cookbook create ssl-config

Add the new cookbook to the webserver role so that it is executed whenever chef-client runs:

knife role edit webserver

Add the recipe for ssl-config to the role by editing the run_list as follows:

"run_list": [
  "recipe[apt]",
  "recipe[apache2]",
  "recipe[ssl-config]"
],

Next, edit the default recipe file ssl-config/recipes/default.rb as follows:

chef_gem 'chef-vault'
require 'chef-vault'

directory '/etc/ssl/certs' do
  recursive true
  owner 'root'
  group 'root'
  mode '0755'
end

directory '/etc/ssl/private' do
  owner 'root'
  group 'root'
  mode '0700'
end

# Certificate entries equal to hostname but with _ replaced by .
vault         = 'certs'
hostname      = node['fqdn']
cert_prefix   = hostname.sub('.','_')
cert_cert     = "#{cert_prefix}_pem"
cert_key      = "#{cert_prefix}_key"
cert_chain    = "#{cert_prefix}_chain"
puts "Creating certificates for #{hostname} using vault #{vault}."

# Decrypt certificate
puts "Decrypting certificate from hash item #{cert_cert}."
begin
  item = ChefVault::Item.load(vault,cert_cert)
  file "/etc/ssl/certs/ssl-cert-snakeoil.pem" do
    owner 'root'
    group 'root'
    mode '0444'
    content item['file-content']
  end
rescue ChefVault::Exceptions::KeysNotFound
  raise ChefVault::Exceptions::ItemNotFound,
    "Certificate not found at #{vault}/#{cert_cert}!"
end

# Decrypt certificate chain
puts "Decrypting certificate chain."
begin
  item = ChefVault::Item.load(vault,cert_chain)
  file "/etc/ssl/certs/#{hostname}.chain" do
    owner 'root'
    group 'root'
    mode '0444'
    content item['file-content']
  end
rescue ChefVault::Exceptions::KeysNotFound
  Chef::Log.warn("No certificate chain in #{vault}/#{cert_chain}.")
end

# Decrypt private key
puts "Decrypting key from hash item #{cert_key}."
begin
  item = ChefVault::Item.load(vault,cert_key)
  file "/etc/ssl/private/ssl-cert-snakeoil.key" do
    owner 'root'
    group 'root'
    mode '0400'
    content item['file-content']
  end
rescue ChefVault::Exceptions::KeysNotFound
  raise ChefVault::Exceptions::ItemNotFound,
    "Private key not found at #{vault}/#{cert_key}!"
end

# Configure the SSL default site if enabled
apache_site "default-ssl" do
  enable node['apache']['default_site_enabled']
end

There might appear to be a lot going on in this recipe, but it is actually quite simple. First, the chef_gem and require lines tell the target node’s Chef client to download the chef-vault Gem.

The directory '/etc/ssl/certs' do block creates the directory that should contain the SSL certificate /etc/ssl/certs if it does not already exist. Directory ownership is changed to root and it is made world-readable.

The directory '/etc/ssl/private' do block creates the directory that should contain the SSL private key /etc/ssl/private if it does not already exist. Directory ownership is changed to root and it is made readable only by root.

The next part of the recipe assigns variables used for looking up and decrypting the node’s certificate, private key and certificate chain. The key names for these items are equal to the fully-qualified domain name of the node with periods escaped as underscores, plus the _pem, _key, and _chain suffixes, respectively. For example, for your test VM tester.local these values are tester_local_pem, tester_local_key, and tester_local_chain. (In case you were wondering: that is why the knife encrypt create commands you typed earlier created items named tester_local_pem and tester_local_key.)

The next three code blocks (each beginning with the comment # Decrypt) actually decrypt the file contents and save them to files. Let’s look the first of these.

In the first decryption block, the line ChefVault::Item.load(vault,cert_cert) decrypts the certificate object and assigns the result to the variable item. The value of item will be a hash. The next 6 lines that begin with file "/etc/ssl/certs/ssl-cert-snakeoil.pem" do create the certificate file, assign ownership to root, make it world-readable, and set the contents to item’s hash entry named file-content. Note that all of this code is enclosed in a begin/rescue/end block, so that the ChefVault::Exceptions::KeysNotFound exception can be trapped. ChefVault::Item.load throws this exception if the vault does not contain the expected entry, in this case one whose key is tester_local_pem. If the entry is not found (for example, because you forgot to add the certificate to the vault), the recipe will throw and exception and fail — as it should.

The second decryption block decrypts and saves the certificate chain, if one was added to the vault. Because tester.local’s SSL certificate was self-signed, it does not need a certificate chain. However, in production situations you might have one, and if you do, you can ensure that it is copied to the server by adding it to vault using the usual knife encrypt create command and specifying an item named nodename_chain, where nodename is the escaped form of the fully-qualified domain name (periods replaced by underscores). Unlike the first decryption block, however, the recipe does not crash and burn if the certificate chain item is not found. Instead, the recipe simply warns that no chain was found.

The third decryption block decrypts and saves the private key. As with the first decryption block, the recipe fails if the key’s expected entry is not found in the vault.

The last block turns on the default-ssl site in Apache, which is preconfigured to use the various ssl_snakeoil certificate files and private keys.

The ssl-config recipe is fairly bare-bones, but sufficiently flexible that it will work with any SSL-enabled web server node. As discussed above, all you must do is (1) ensure that the node’s SSL certificate and private key are added to the vault correctly, and (2) configure the node’s run-list so that it executes the ssl-config recipe.

Copying SSL certificates to the server

With all of the prep work out of the way, it is time to finally configure the server. Upload the ssl-config cookbook to the Chef server:

knife cookbook upload ssl-config

SSH into tester.local:

vagrant ssh

Once in, run chef-client:

sudo su
chef-client

Many console messages will scroll past you at a dizzying pace. Look for these lines:

Recipe: ssl-config::default
…
Creating certificates for tester.local using vault certs.
Decrypting certificate from hash item tester_local_pem.
Decrypting certificate chain.
[2013-10-06T23:29:50+00:00] WARN: No certificate chain in certs/tester_local_chain.
Decrypting key from hash item tester_local_key.

These indicate that the recipe worked as expected. If there is a problem finding or decrypting the certificate or private key, the output will show an exception. Assuming the recipe ran successfully, the output will also contain lines showing that the certificate and private key files were created also. Look for lines similar to these, which shows the certificate file was created:

- create new file /etc/ssl/certs/ssl-cert-snakeoil.pem
- update content in file /etc/ssl/certs/ssl-cert-snakeoil.pem from none to 53f4ae
    --- /etc/ssl/certs/ssl-cert-snakeoil.pem    2013-10-06 23:29:51.663590528 +0000
    +++ /tmp/.ssl-cert-snakeoil.pem20131006-9127-hmw8o1 2013-10-06 23:29:51.667592528 +0000
    @@ -0,0 +1,23 @@
    +-----BEGIN CERTIFICATE-----

…and these, which shows the private key file was created:

- create new file /etc/ssl/private/ssl-cert-snakeoil.key
- update content in file /etc/ssl/private/ssl-cert-snakeoil.key from none to 60fcbe
    --- /etc/ssl/private/ssl-cert-snakeoil.key  2013-10-06 23:29:51.727622527 +0000
    +++ /tmp/.ssl-cert-snakeoil.key20131006-9127-m9p4zk 2013-10-06 23:29:51.731624527 +0000
    @@ -0,0 +1,27 @@
    +-----BEGIN RSA PRIVATE KEY-----

After the recipe runs, you can verify the files were correctly created by cat-ing the files /etc/ssl/certs/ssl-cert-snakeoil.pem and /etc/ssl/private/ssl-cert-snakeoil.key. The files should be owned by root/root; permissions should be restricted to 444 and 400, respectively.

Testing the webserver

To test that the webserver is working as it should, we need to do two more things: edit the webserver role to enable SSL and the default site. Then, we re-push the cookbook and restart the server.

First, edit the role as follows using the usual command knife role edit webserver. As shown below, add SSL as an enabled module by adding "ssl", to the default_modules array, and turn set the default_site_enabled value to true:

"override_attributes": {
    "apache": {
      "allow_override": "None",
      "contact": "nobody@example.com",
      "default_modules": [
        "alias",
        "cgi",
        "deflate",
        "dir",
        "log_config",
        "logio",
        "mime",
        "rewrite",
        "ssl",
        "setenvif"
      ],
      "default_site_enabled": true,
....

Also, enable port 443 in the listen_ports section:

      "listen_ports": [
        "80",
        "443"
      ],

On tester.local, run chef-client as root again and watch the node converge using these new settings.

Then, open your browser to tester.local using regular HTTP. You should see a page that screams It works!. Try using HTTPS; you should see the same message (and likely after getting an SSL warning about an untrusted certificate).

Save your work

You are done. Back up your nodes, roles, data bags and environments from the Chef server to your local workstation. Type:

knife backup export

You will see the following output:

Backing up nodes
Backing up nodes tester.local
Backing up roles
Backing up roles base
Backing up roles webserver
Backing up data bags
Backing up data bag certs item tester_local_key
Backing up data bag certs item tester_local_key_keys
Backing up data bag certs item tester_local_pem
Backing up data bag certs item tester_local_pem_keys
Backing up environments
Backing up environments testing

Next, edit .gitignore in your chef-repo directory so that your SSL certificate and private key are not stored in Git. Add this line somewhere near the top (for example, underneath the line .chef/*.pem):

.chef/*.key

Finally, commit your work. You can see what files were modified with the usual command git status; if you do, you will see that some new files have been added:

.chef/chef_server_backup/data_bags/certs/
cookbooks/ssl-config/

You will see that a few have also been modified. Commit everything:

git commit -am "DevOps Secuity Handbook Part 3"

Remember, the keying materials (the .key and .pem files in the .chef directory) are not versioned in your Git repository. This is both a feature and a bug. You can safely move the tester.local.pem and tester.local.key files to offline media now, if you wish; they are safely encrypted in the data bag certs and no longer need to be in the local filesystem.

Next: Adding custom content

If you have completed the instructions in this post, you learned how to do some very useful things. You created a self-signed SSL certificate and private key for tester.local. You installed the chef-vault plugin for storing the SSL certificate and private key as encrypted data bag items. You authorized the user arj and node tester.local to decrypt these items. And you created a cookbook that decrypts the certificate, private key and certificate chain and creates files in the correct locations on the server.

In the next post, you will use Chef to configure Apache for serving custom content. You will create a non-privileged user whose home directory stores static HTML. This directory will be served up by Apache as the default website. In keeping with the SSH configuration introduced in this post, the user account will be configured to use SSH public keys for authentication rather than passwords.

This post was updated July 22, 2015 to change the naming convention for SSL certificate files on the target box. It also added a short section that enables the default normal and SSL sites, as well as a short section for testing the actual SSL configuration.

The DevOps Security Handbook: Building Security In With Chef, Part II

2013-10-03T13:30:00-04:00

Introduction

This is the second in a series of occasional posts about security and DevOps. The ultimate goal of this series is to show how to build a reasonably secure Apache web server using the popular DevOps automation tool Chef. The server I am describing how to build will be suitable for serving static content. Readers of this blog know that I am a fan of static blogging tools like Octopress, which I use to generate this website.

If you read the first post in this series, you learned how to set up the Chef workstation and server account. You created an Apache server role and a test environment; set up a virtual machine; and built your first node. In this post, I will show you how to create a new role called base that includes security enhancements to OpenSSH. You will also fine-tune Apache to remove non-essential modules.

Tightening the Apache configuration

To recap, in the last post I described how to create a sample virtual machine called tester.local, onto which Chef installed the Apache 2 web server. If you were (as they say in the game-show world) “playing along at home,” you created a sample role called webserver that caused the apache2 and apt packages to be installed on the node tester.local. You also bootstrapped the node so that it converged into the desired state.

As a refresher, let’s review a few details from last time. In your chef-repo directory, at the command line type:

knife role edit webserver

You should see something that looks like this:

{
  "name": "webserver",
  "description": "Web server for my.org",
  "json_class": "Chef::Role",
  "default_attributes": {
  },
  "override_attributes": {
    "apache": {
      "listen_ports": [ "80" ]
    }
  },
  "chef_type": "role",
  "run_list": [
    "recipe[apt]",
    "recipe[apache2]"
  ],
  "env_run_lists": {
  }
}

This configuration works just fine, of course. It sets up Apache with the usual defaults. Lots of modules are enabled, and a default website is configured automatically. For demonstrations, that might be dandy. But in production situations, you should tighten up the configuration so that it is more secure. Security professionals know, as a general rule, that when something has fewer configured options, it is usually more secure. In that spirit, let’s:

Minimize the attack surface by removing Apache modules we don’t need
Decrease the amount of information “leaked” by the server by turning off server tokens and signatures
Increase server performance by eliminating HTTP keep-alives
Remove the default server website

If you have tried to do these things in the past, you probably wrote shell-code or some other kind of custom script. Or perhaps, like me, painstakingly hand-tuned the server and wrote down all of your specific hardening steps in a notebook in case you needed to do it again. The genius of Chef’s apache2 cookbook is that you no longer have to do those things. The apache2 cookbook recipes are cleverly written; they allow Apache to be heavily customized without requiring you to write code. Nearly everything that Apache does (or should not do) can be controlled through attributes.

Attributes and their values can be defined in cookbooks via attribute files and within recipes. They can also be defined for individual roles or environments. When attributes are defined in more than one place, those defined for specific environments beat those defined for roles, which in turn beat those defined in cookbooks.

Attribute values can also have multiple priorities. In reverse order of precedence, these are default, force default, normal, override, force override and automatic priority types. That is, the default attributes are used unless there are force-default, override, force-override or automatic values supplied somewhere; force-default attributes apply unless normal, override, force-override or automatic values are found, and so on. The precedence rules are fairly complex; OpsCode’s documentation discusses them at length.

In this case, you will define a several override attributes that will take precedence over the default values defined in the apache2 recipes. When chef-client runs on the target node tester.local, these overridden values will be used in the various recipes to produce a more secure web server.

At the console, type:

knife role edit webserver

In the editor screen, modify the webserver role so that it looks like this:

{
  "name": "webserver",
  "description": "Web server for my.org",
  "json_class": "Chef::Role",
  "default_attributes": {
  },
  "override_attributes": {
    "apache": {
      "allow_override": "None",
      "contact": "nobody@example.com",
      "default_modules": [
        "alias",
        "cgi",
        "deflate",
        "dir",
        "log_config",
        "logio",
        "mime",
        "rewrite",
        "setenvif"
      ],
      "default_site_enabled": false,
      "directory_index": "disabled",
      "directory_options": "None",
      "ext_status": false,
      "keepalive": "Off",
      "keepaliverequests": "100",
      "keepalivetimeout": "15",
      "listen_ports": [
        "80"
      ],
      "serversignature": "Off",
      "servertokens": "Prod",
      "timeout": "120",
      "traceenable": "Off"
    }
  },
  "chef_type": "role",
  "run_list": [
    "recipe[apt]",
    "recipe[apache2]"
  ],
  "env_run_lists": {
  }
}

The hash named apache (inside the override_attributes hash), contains the attributes that modify how the Apache is configured. If you are familiar with Apache configuration files, you can probably guess what many of the attributes do. In order, the override values tell Apache to:

allow_override: Prevents .htaccess files placed in content directories from overriding any directives already in place for the directory
contact: Sets the contact email address printed on Apache error pages to a bogus address
default_modules: Restricts Apache loadable modules to just the few needed to server static content; in this case, mod_alias, mod_cgi, mod_deflate, mod_dir, two logging modules, mod_mime (MIME support), mod_rewrite (for URL re-writing) and mod_setenvif (useful for sending different responses based on browser types)
default_site_enabled: Disables the default website
directory_index: Disables directory indexing
directory_options: Disable all “extra features” in directories, such as fancy indexing, symlink-following, multi-views, server-side includes and so forth
ext_status: Disables extended status messages
keepalive, keepaliverequests and keepalivetimeout: Disables HTTP Keep-Alive messages, which can cause performance to suffer in many cases
serversignature: Removes server signatures from error messages
servertokens: Minimizes the response header field to include just the webserver software (“Apache”) but not the version, OS or compiled-in options
timeout: Increases the time the server is allowed to respond to a request to 120 seconds
traceenable: Removes support for the HTTP TRACE method

Of these attributes, the default_modules attribute is the most interesting because its value causes various Apache modules to be enabled or disabled. By default, the apache2 recipe loads a huge number of modules. By overriding the defaults you can restrict what is loaded to a small subset.

Note that Apache always loads a few other modules regardless of the value of the default_modules attribute. These include authorization, content negotiation, timeout and status modules. But by keeping the list of modules small, you keep the server’s memory footprint smaller. You also get rid of features that aren’t needed in most websites and can be sources of risk, such as WebDAV support, LDAP authentication, proxying and so forth.

I do not claim to be an Apache expert by any means, but default settings in the list above are reasonably tight. Certainly, they are good enough to demonstrate how you can use attributes to customize how the Apache cookbook runs.

Now that you have created override attributes for the web server role, it is time to put them to use. Save and close the role editor; the contents will be saved to the Chef server.

SSH into the test VM and execute the node’s run-list again so that the new attribute values are applied. From the post from last time, recall that the Chef role webserver had been assigned to tester.local. All that you need to do, therefore, is run the client again. SSH into the box and elevate to root:

vagrant ssh
sudo su

and then:

chef-client

You should see a dizzying rush of console messages, including many indicating that various Apache-related files are being modified. The run process should only take a few seconds. Assuming all recipes succeed, you will see a message at the bottom similar to the following:

Recipe: apache2::default
  * service[apache2] action restart
    - restart service service[apache2]

Chef Client finished, 31 resources updated

Congratulations; your Apache server is now just a little bit faster, and a little bit tighter. You did it solely by twiddling a few attributes, without having to write any code. Nice, huh?

Creating a new role for server hardening

Let’s do some more attribute-twiddling. This time, your objective is to tighten the configuration of several common server components that reside on most servers: the SSH configuration, and the Chef client itself.

Download the cookbooks for SSH and the Chef client:

knife cookbook site install openssh
knife cookbook site install chef-client

Upload the cookbooks to the Chef server:

knife cookbook upload --all

Create a second role. This role, called base, will be used by all servers and will include recipes that every server should use. Type:

knife role create base

…and supply the following contents into the editor:

{
  "name": "base",
  "description": "Essential recipes for securing every server",
  "json_class": "Chef::Role",
  "default_attributes": {
  },
  "override_attributes": {
    "openssh": {
      "server": {
        "allow_agent_forwarding": "no",
        "allow_tcp_forwarding": "no",
        "client_alive_count_max": "0",
        "client_alive_interval": "600",
        "ignore_user_known_hosts": "yes",
        "login_grace_time": "30s",
        "password_authentication": "no",
        "permit_root_login": "no",
        "rsa_authentication": "no"
      }
    }
  },
  "chef_type": "role",
  "run_list": [
    "recipe[openssh]",
    "recipe[chef-client::delete_validation]"
  ],
  "env_run_lists": {
  }
}

The openssh recipe configures SSH on the machine. The override attributes above it configure the OpenSSH server daemon so that it uses sensible settings. Root logins are disabled, password authentication is disallowed; only public-key authentication is allowed. Session-forwarding is disabled, making the server unsuitable for use a “jump box.” (For more information on hardening SSHD, see the many fine articles on the subject.)

In addition to the SSH settings, notice the addition of the chef-client::delete_validation recipe. This recipe does something rather important from a security prospective. As discussed previously, Chef server communicates with its nodes and clients using public/private key pairs. When a new node is added, a shared “validation key” is copied to the new node. This is a standard 2048-bit RSA private key with a name similar to organization-validator.pem; it is stored in your Chef repository’s .chef directory. It is not versioned by Git because .chef/*.pem is added to .gitignore, and it is obviously very sensitive. Anyone who obtained the validation key could conceivably join your Chef node set and gain access to the configuration data, recipes and more. Despite the sensitivity of this key, however, after the bootstrap operation completes, Chef inexplicably leaves it on the new node! It would be much nicer to remove it after the bootstrap.

For security reasons, you should remove the validation key after the initial bootstrap because it is not needed any more. The chef-client::delete_validation recipe does that. That is why it is in the run-list for the base role.

Adding the `base` role to the server

After you define the base role, you need to apply it to the test VM tester.local by adding it to the node’s run list. At present, tester.local is only running recipes that are part of the webserver role. As you might expect, you can add to a node’s run-list by using knife. Type:

knife node run_list add tester.local "role[base]"

You will see output similar to the following that confirms that the base role has been added to tester.local’s run list.

tester.local:
  run_list:
    role[webserver]
    role[base]

SSH back into the test box (type vagrant ssh followed by sudo su). Run chef-client again.

You will see many messages scroll by indicating that the /etc/ssh/sshd_config and /etc/ssh/ssh_config files have been updated. By default, the Chef openssh cookbook configures these files with the default settings that ship with OpenSSH. Console output should look similar to the following:

Recipe: openssh::default
  * package[openssh-client] action install (up to date)
  * package[openssh-server] action install (up to date)
  * service[ssh] action enable
    - enable service service[ssh]

  * service[ssh] action start (up to date)
  * template[/etc/ssh/ssh_config] action create
    - update content in file /etc/ssh/ssh_config from 265a26 to 74365c
        --- /etc/ssh/ssh_config 2012-04-02 11:49:30.000000000 +0000
        +++ /tmp/chef-rendered-template20131003-3037-n6ytk  2013-10-03 02:14:48.674543237 +0000
        @@ -1,53 +1,3 @@
...
  * template[/etc/ssh/sshd_config] action create
    - update content in file /etc/ssh/sshd_config from 33469d to 1ba1c4
        --- /etc/ssh/sshd_config    2013-05-11 06:10:17.805866080 +0000
        +++ /tmp/chef-rendered-template20131003-3037-6xl885 2013-10-03 02:14:49.114323240 +0000
        @@ -1,88 +1,14 @@
        -# Package generated configuration file
        -# See the sshd_config(5) manpage for details
        +# Generated by Chef for tester.local

Recipe: openssh::default
  * service[ssh] action restart
    - restart service service[ssh]

You can verify that SSH has been reconfigured correctly by trying to SSH into tester.local using the default Vagrant account credentials (vagrant/vagrant). They should no longer work. However, typing the vagrant ssh command should still get you in. That is because the vagrant ssh authenticates using an embedded private key that is hardcoded into Vagrant. The public half of this key is an authorized key in the vagrant account’s list of public keys. (You can verify this yourself by examining the file /home/vagrant/.ssh/authorized_keys on tester.local. It shows one entry whose description reads “vagrant insecure public key.” How did it get there? Well, that is part of the ”contract” of building a Vagrant-compatible base box.)

Note: running the openssh recipe with the attributes as shown above can have adverse consequences on production nodes if you aren’t prepared. The recipe with the attributes as shown removes SSH root access. Unless you have another way of becoming root on the box, you might find yourself locked out! If your machine is a Vagrant machine, you can use the vagrant ssh command to become root. For non-Vagrant machines, you will need a non-root account that allows public-key logins and can su to root. You have been warned.

Next: Managing SSL certificates and keys

This post introduced the concept of using Chef to partially harden a web server. You reduced the number of loadable Apache modules to a minimum set, disabled unnecessary services and reduced the amount of useful information an attacker could obtain. You created a second role called base and assigned two recipes, openssh and chef-client::delete_validation. These recipes configure OpenSSH in a more restrictive manner by disabling password authentication, disabling root logins and preventing session forwarding. The delete_validation recipe removes the Chef validation key from the node after it is created, which removes a potential security risk.

In the next post, you will switch back to Apache. You will use Chef to perform one of the most challenging aspects of any server configuration: copying SSL keying materials to server nodes.

The DevOps Security Handbook: Building Security In With Chef, Part I

2013-10-01T16:18:00-04:00

Introduction

This is the first in a series of posts about Chef, an infrastructure automation platform. The goal of this series is to describe how to build a reasonably secure Apache web server. By using Chef, we can quickly and efficiently build identical web servers with assurance that they will work the same way, every time, and have the security properties we want.

You will build this server in stages. The server will ultimately contain the following elements:

Apache 2 HTTP web server, with minimal modules and a virtual host defined for serving website content
A limited user account whose home directory contains the website content. The account only accepts SSH remote logins that use public-key authentication. The Apache virtual host’s document root will point to a subdirectory of the account’s home
A user group whose name matches the user account name, and which contains the user as its only member
Hardened configuration with minimized services, synchronized time, intrusion prevention, and other security characteristics

For purposes of testing, the server will be spun up as a virtual machine on your local workstation. You will use VirtualBox VMs for this purpose.

This first post will describe how to set up a basic test infrastructure that uses Chef. You will set up the Chef workstation and server account, create an Apache server role and a test environment, set up a virtual machine, and build your first node. The web server will not do much, and it will not be especially secure — at least not initially. Subsequent posts will gradually add more security components. By adding security features gradually, you will learn how to use Chef. As a side effect, you will learn how Chef’s philosophy of “convergence” makes it easy to gradually massage your nodes into the states you want. This is important when adding Chef to servers that already exist.

Getting started

In order to demonstrate how Chef works, you will need a virtual machine to play with. To create one, you will use Vagrant to instantiate a new VirtualBox VM. Our goal is to create a VM that you can boot and access on your laptop for testing purposes. After you do that, you will bootstrap it with Chef so that you can configure and manage it.

Some prerequisites. You will need to download and install:

VirtualBox from Oracle, which creates and manages guest virtual machines.
Vagrant, which creates, manages, and destroys VirtualBox VM images from the command line.
Git, the ubiquitous version-control system that will allow you to “check in” your Chef repository and manage its versions as you create the server.
Chef 11.x workstation software, which is where all of the magic happens.
Ruby 1.9.3 or higher

Chef works best on Unix- and Linux-based systems. I used a Mac to prepare this guide. But my instructions are largely platform independent; as long as you have a Linux- or BSD-based workstation, or a Mac, you should be in good shape.

OpsCode’s QuickStart guide does a fine job explaining how to do the initial preparatory steps in their Workstation setup page. OpsCode recommends that you install a Ruby version manager. I use RVM myself, although the documentation (in the Advanced tab) recommends RBENV. Open up OpsCode’s QuickStart guide and do everything on Page 1. It should take you about 5 minutes.

Next, you need to create an Enterprise Chef account, and download the starter package using the Enterprise Chef web interface. Page 2 of the documentation page explains how to do this. The free version of Enterprise Chef supports up to five nodes, which is perfect for our purposes. After you sign up and create an account, create a new Organization and download the “Starter Kit” as described on QuickStart Page 2. Follow the instructions on this page all the way up to the “Create a Simple Cookbook” section. Once you have done that, you have configured your Chef workstation properly.

A word about the “Starter Kit.” The Starter Kit is a zipped bundle that contains a sample Chef repository directory structure, and crucially, a private key for the your workstation, which Chef calls a “client.” When you expand the Starter Kit, it will unpack into a directory called chef-repo. This is your Chef repository, and you should move it somewhere useful. I put mine in ~/workspace, which is where I keep all of my dev stuff, but you can put it anywhere you like.

Using the Chef workstation tools, you create and edit Chef roles, environments, cookbooks and other locally on your workstation. When you want to push new versions out to your nodes, you use Knife to upload them to the Enterprise Chef server. When you upload, Knife uses the client’s private key to authenticate with the Enterprise Chef server.

With the initial setup stuff out of the way, let’s start getting into the fun stuff.

Creating sample server run-lists, roles and environments

I have found the OpsCode QuickStart documentation to be quite well-written. But it only gets you so far, and it leaves out some important steps for using Chef in a more serious way. Let’s take this opportunity to stray from the OpsCode documentation a bit and lay down some additional foundation-work for building the web server. In particular, let’s set up some initial run-lists, roles and environments for your test VM.

Some background. Chef “converges” nodes into their desired states by applying a ”run-list” of recipes to each node. The run-list of recipes (Apache2, NTPD, user creation, etc) that apply can be specified in several ways. The quickest and most direct way is to specify the node’s run-list of recipes when the node is initially bootstrapped with Chef; that is, when the Chef agent (chef-client) is initially installed on the node. Bootstrapping the node configuration is done using Knife, and the syntax looks like this:

knife bootstrap tester.local --run-list "recipe[apt],recipe[apache2]" -E testing

I have omitted some of the syntax the sake of simplicity; don’t try running this. There are important concepts to understand here. The bootstrap command causes the chef-client application to be installed on the node. The chef-client is essentially an agent. It configures and installs software based on instructions (”recipes”) it receives from the Chef server. Notice the run-list parameter: it indicates that the APT and Apache2 recipes will be applied to node tester.local. What this means is that when chef-client is bootstrapped onto the node, the APT and Apache packages will be downloaded, installed and configured as well.

Notice also the -E parameter. This means that tester.local should be assigned to an environment called testing, which you will define in a minute. By ”environment,” Chef means a group of nodes that typically correspond to a stage of development, for example “testing,” “staging,” or “production.” Let’s create the testing environment now. Type:

knife environment create testing

…and type or paste the following JSON contents into the file:

{
  "name": "testing",
  "description": "Test environment",
  "cookbook_versions": {
  },
  "json_class": "Chef::Environment",
  "chef_type": "environment",
  "default_attributes": {
  },
  "override_attributes": {
  }
}

Nothing tricky here — just a simple JSON file with a few attributes in it. The default_attributes and override_attributes items can be used to supply variables to the recipes that are unique to the testing environment, for example, debug settings or dummy passwords. You will leave these blank for now because they don’t apply in this case.

As I mentioned, there are several ways to assign run-list items to nodes. Direct assignment of recipes during bootstrapping, shown in the edited knife bootstrap command above, is the easiest way. But that won’t scale if you have multiple nodes that must be configured identically. It makes more sense, instead, to create a role, which allows common run-lists to be defined for groups of machines that do the same thing. Instead of bootstrapping with a specific run-list of recipes, you can bootstrap with roles. When you use a role, Chef looks up (dereferences, if you will) the run-list for the role and applies all of the recipes it contains, along with any custom attributes. You can think of roles as a type of pointer.

Let’s create a new role called webserver. In it you will add the components needed to run your website. Type:

knife role create webserver

…and supply these contents:

{
  "name": "webserver",
  "description": "Web server for my.org",
  "json_class": "Chef::Role",
  "default_attributes": {
  },
  "override_attributes": {
    "apache": {
      "listen_ports": [ "80" ]
    }
  },
  "chef_type": "role",
  "run_list": [
    "recipe[apt]",
    "recipe[apache2]",
  ],
  "env_run_lists": {
  }
}

Notice that the run-list attribute contains the apache2 cookbook, similar to what you used in the initial bootstrap command. The listen-ports override attribute tells the apache2 cookbook to configure Apache to listen just on port 80. You will learn more about override attributes in a future post. But if you are curious about how the cookbook works, and about the various attributes you can use to customize Apache’s configuration, see OpsCode’s online decimation. Notice also the apt recipe; this is required because Debian’s APT package updater is how Apache is actually installed onto the node.

To bootstrap using roles instead of directly specifying recipes, you would use the following syntax (some details omitted):

knife bootstrap tester.local --run-list "role[webserver]" -E testing

Again, don’t type this in, because it won’t work without some additional syntax; you will get to it soon enough.

Let’s complete the initial Chef setup. So far, you have created a sample test environment called testing, and a sample server role called webserver. To complete the initial setup, you need to do two more things: download the actual cookbooks that Chef will apply to the node; and upload the cookbooks to the Chef server so that any nodes that are assigned it can get it. The cookbooks we need are apt (required to install Apache), and apache2 (Apache itself).

To install the apache2 cookbook, type:

knife cookbook site install apache2

This command looks up the apache2 cookbook on the Opscode community cookbook site and causes it to be downloaded to your workstation. You will see a series of output messages showing the progress of the download, followed by a completion message when it succeeds. While you are at it, go ahead and install the apt cookbook too.

After downloading both, commit your current Chef repo to Git:

git add .
git commit -m "Added Apache and APT cookbooks."

Then upload your cookbooks to the Chef server:

knife cookbook upload --all

It might seem a little strange to have to upload the cookbooks to the Chef server. After all, they are managed centrally from the community cookbook site. Why can’t roles simply reference the cookbooks stored there, instead of needing to make copies? Frankly, I am not too sure why this is the case. I suspect Chef works this way so that cookbooks and recipes can be hacked up when needed. Regardless, you must upload cookbooks to Chef server after you update them. If you don’t, the Chef client on any nodes you create will continue to use outdated recipes.

Backing up Chef server data

Because you are using Enterprise Chef, your nodes, roles, environments and data bags are stored on the server — not locally. While I trust OpsCode to keep their servers up and available, I like to keep copies of important data on my client so that I have a record of them, and can version them with Git. You should, too.

To do that, you will need to install the backup-export Knife plugin, part of the Knife Hacks package. Then, you should copy a specific plugin file from GitHub into our local Chef knife plugin cache in ~/.chef/plugins/knife, creating the directory if necessary. A few quick commands should do the trick:

mkdir -p ~/.chef/plugins/knife
curl https://raw.github.com/stevendanna/knife-hacks/master/plugins/backup_export.rb > ~/.chef/plugins/knife/backup_export.rb

Change back to your chef-repo directory and issue the following command:

knife backup export

You’ll see output similar to this:

Backing up nodes
Backing up nodes tester.local
Backing up roles
Backing up roles webserver
Backing up data bags
Backing up environments
Backing up environments testing

By default, backups are stored in .chef/chef_server_backup. You can change this by modifying the chef_server_backup_dir entry in .chef/knife.rb, but there’s no obvious benefit to doing that here. It is sufficient simply to have them present in the Chef repo directory, because they can be checked into Git using the usual familiar git add . and git commit steps. Go ahead and do that now.

If you have gotten this far, your initial Chef setup is complete. Now, let’s create a test machine.

Creating a virtual machine for testing

Change to your Chef repo directory. Create a new file Vagrantfile with these contents, or edit the existing one so that it matches this:

# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
  config.vm.box = "opscode-ubuntu-12.04-i386"
  config.vm.box_url = "https://opscode-vm.s3.amazonaws.com/vagrant/opscode_ubuntu-12.04-i386_provisionerless.box"
  config.vm.hostname = "tester.local"
  config.vm.define :tester do |t|
  end
  config.vm.network "private_network", ip: "192.168.56.2"
  config.vm.provider :virtualbox do |vb|
    vb.gui = false
    vb.name = "tester.local"
  end
end

Vagrantfile’s job is to tell Vagrant how to set up the test VM. If you have used Vagrant before, you will notice that this Vagrantfile is shorter than the default file Vagrant supplies. Here’s what it does:

Downloads an Ubuntu 12.04 base box (essentially, a virtual machine image) from OpsCode’s repository on Amazon
Creates a VirtualBox VM based on the machine image
Gives the VM the network name tester.local. This is the name that the Unix command hostname will return when you log into it
Names the VirtualBox machine tester. This is the name used to start, stop and delete the VM when using the VirtualBox command-line tools or the VirtualBox GUI. Names the VirtualBox image directory tester.local. By default, VirtualBox names the image based on the directory that contains Vagrantfile, plus a timestamp suffix. The vb.name property inside the config.vm.provider block overrides the default so that it matches the host name.
Configures the VM’s networking interface to use a private network address 192.168.56.2. This will allow us to start the VM and see it on our workstation, but the VM won’t be accessible from the outside.
Specifies that when you boot the VM, it will be booted in headless mode; the VirtualBox GUI won’t be displayed.

That is all you need to instantiate a new VM on our workstation. Next, edit your workstation’s /etc/hosts file and add a line that points to the VM using the private IP address and name tester.local:

192.168.56.2    tester.local

Great. Now, let’s go ahead and actually create the VM. From the command line in the same directory as Vagrantfile, type:

vagrant up

Vagrant will look by default in the same directory for Vagrantfile, and having found it, will create the VM according to the contents of the file. You will see output similar to the following:

Bringing machine 'tester' up with 'virtualbox' provider...
[tester] Importing base box 'opscode-ubuntu-12.04-i386'...
[tester] Matching MAC address for NAT networking...
[tester] Setting the name of the VM...
[tester] Clearing any previously set forwarded ports...
[tester] Creating shared folders metadata...
[tester] Clearing any previously set network interfaces...
[tester] Preparing network interfaces based on configuration...
[tester] Forwarding ports...
[tester] -- 22 => 2222 (adapter 1)
[tester] Booting VM...
[tester] Waiting for VM to boot. This can take a few minutes.
[tester] VM booted and ready for use!
[tester] Setting hostname...
[tester] Configuring and enabling network interfaces...
[tester] Mounting shared folders...
[tester] -- /vagrant

The entire process should take between 30 seconds to a minute if the base box is already cached on your workstation. If not, the first time you do vagrant up Vagrant will need to download the machine image from Amazon.

You can verify that the new test VM is up by pinging tester and verifying that it responds:

Tweety:chef-repo arj$ ping tester.local
PING tester (192.168.56.2): 56 data bytes
64 bytes from 192.168.56.2: icmp_seq=0 ttl=64 time=0.582 ms
64 bytes from 192.168.56.2: icmp_seq=1 ttl=64 time=0.638 ms
…

Typing vagrant status will also indicate that the VM is up and running:

Tweety:chef-repo arj$ vagrant status
Current machine states:

tester                    running (virtualbox)

The VM is running. To stop this VM, you can run `vagrant halt` to
shut it down forcefully, or you can run `vagrant suspend` to simply
suspend the virtual machine. In either case, to restart it again,
simply run `vagrant up`.

You can repeat this process as often as you like by destroying and recreating the VM:

vagrant halt tester
vagrant destroy tester

If you would like to verify that the VM is really up, you can SSH into the box using the username vagrant and password vagrant. You can also use the command vagrant ssh which does the same thing.

Note: by default, base boxes used with Vagrant ship with a pre-installed SSH public/private key pair that is used for SSHing into VMs it creates. These base boxes also ship with default vagrant/vagrant credentials. This configuration is not secure. For testing purposes on your local workstation this should not be a problem, because we have configured the VM to use host-based networking. It cannot be accessed outside of the workstation. But production servers should not use Vagrant with its default configuration.

Bootstrapping the virtual machine with Chef

So far, so good. You have successfully created a test virtual machine, but it isn’t much good to us yet because it doesn’t have Chef on it. Until it does, you cannot manage it.

It is (finally!) time to “bootstrap” the VM using Knife. This installs the chef-client agent on the node, and registers the new node with the Chef server. Type in the following:

knife bootstrap tester.local --ssh-user vagrant  --ssh-password vagrant --run-list "role[webserver]" -E testing --sudo

Viola! Assuming you did everything as described, Chef will SSH into the box, download and install Chef client onto it, and begin converging the node into its desired state; in this case, installing and configuring Apache.

Immediately after hitting Enter, a long list of output lines should appear. These should resemble the following:

Bootstrapping Chef on tester.local
tester.local --2013-09-29 03:20:41--  https://www.opscode.com/chef/install.sh
tester.local 
tester.local Resolving www.opscode.com (www.opscode.com)... 
tester.local 184.106.28.82
tester.local 
tester.local Connecting to www.opscode.com (www.opscode.com)|184.106.28.82|:443... 
tester.local connected.
tester.local 
tester.local HTTP request sent, awaiting response... 
tester.local 200 OK

followed by

tester.local Starting Chef Client, version 11.6.0
tester.local 
tester.local resolving cookbooks for run list: ["apt", "apache2"]
tester.local 
tester.local Synchronizing Cookbooks:
tester.local 
tester.local   - apt
tester.local 
tester.local   - apache2
tester.local 
tester.local Compiling Cookbooks...

and then a series of lines that indicate that APT and Apache have been installed. The last lines indicate that Apache has been installed and restarted, and that the resources on the box have been updated:

tester.local Recipe: apache2::default
tester.local 
tester.local   * service[apache2] action restart
tester.local 
tester.local 
tester.local     - restart service service[apache2]
tester.local 
tester.local 
tester.local 
tester.local Chef Client finished, 28 resources updated
tester.local

If you see output similar to this, and no errors, it means that you have successfully converged your first node. Congratulations! Excellent work.

You verify that the web server is up by firing up your browser to the address http://tester.local. It should return a “Forbidden” message because we have not actually provided any HTML pages for Apache to serve up. But that is evidence enough that Apache is actually working.

Next: Adding security to the box

This post covered the basics of how to get going with Chef. You have installed the Chef workstation software and supporting components Git, Ruby and VirtualBox and Vagrant. You have created a sample role called webserver and assigned two sample recipes, apache2 and apt, to it. You created a virtual machine called tester with the domain name tester.local and bootstrapped Chef onto it, placing it under Chef control.

In the next post, you will begin doing more useful work. I’ll describe how to fine-tune the Apache installation. We will also begin increasing the security of the machine.

This post was updated October 1, 2013 to change the hostname used in the examples from tester to tester.local. It was updated on October 2, 2013 to remove references to the half-configured SSL support for Apache; I’ll cover this more fully in a future post.

Building Security In Using Chef

2013-09-23T00:02:00-04:00

Lately I have been spending a lot of time with a new best friend. This new friend is reliable; he does everything according to plan and always exactly the same way. The results are exactly the same every time, too. And he speaks to me in a language that I understand — the language of food.

I am not talking about a new buddy gourmand, about a pal I go out to restaurants with, or about a super-reliable project manager. My new best friend is a technology called Chef, made by OpsCode.

Chef, along with Puppet and CFEngine, is a flexible toolset for building infrastructure. The Chef mantra is “infrastructure as code,” which means simply that you can build infrastructure — servers and workstations the same way every time. Chef has important implications for security because, by using it, you can ensure that your nodes have exactly the security properties you want by “baking it in” to what Chef calls “cookbooks,” the core component. I’ll come back to the security implications in a few minutes, but in the meantime I should explain what a cookbook is.

Cookbooks are packages that define how packages, applications or system functions should be built and configured. Cookbooks exist for Apache, NTP, user and group account creation, and just about every common application you can imagine. At a file level, cookbooks are basically composed of property files, templates and clever glue-code. The cookbook’s job is to declare required packages and dependencies; provide templates for configuration files that need to be modified, and provide Ruby code that sets up the packages, configures things or does whatever is needed to achieve the desired result. The process for building nodes is similar to how developers build code.

Typical Chef workflow

Here’s what a typical project workflow looks like. With Chef, you:

Create a new developer project using Git
Download one or more cookbooks for the applications or services you want to manage; or, you create new ones from scratch
Modify the properties associated with each cookbook as needed
Upload the modified cookbooks and/or properties to the master Chef server
“Bootstrap” new nodes from standard machine images, for example a generic CentOS VM. The bootstrap process injects Chef agents onto the new nodes and then…
“Converge” each new nodes into the desired state by downloading the required cookbooks and properties (the “run list”) that apply, and then running all of the cookbook recipes in the run-lists

I’ve glossed over quite a few things here, but the overall strategy is that the Chef agent transforms the node into the state you want. Sometimes this takes multiple passes through the run-list, although the Chef agent is generally smart enough to figure out how to manage dependencies without intervention. That is why Chef uses the term “converge” to describe how the node morphs into the desired state. Nodes need not be clones of each other, and indeed Chef can be injected into existing systems long after they are created. One might say that the Chef philosophy is exactly the opposite of the traditional “golden image” concept where every system is an exact copy of every other. It is more correct to say that with Chef, every package and application within scope — those you have created cookbooks for — is configured in exactly the way you expect. Chef stresses idempotency — a fancy way of saying that when you execute the run-list on multiple nodes, you get the same result every time. For the curious, Sean O’Meara provides an excellent overview of Chef on his blog.

Chef tools for cooking in the kitchen

Chef includes several components that work together to produce consistent results every time:

Knife, a command-line workhorse that you use to create, download, edit and upload cookbooks, clients, nodes, roles and environments. Clients are the workstations that edit Chef configurations. Nodes are the machines that Chef produces. Roles are run-lists of cookbooks and configs for a common purpose, for example, a role called “webserver” with cookbooks and properties for Apache, PHP, CGI, and your company’s standard HTML chrome. Environments are variations on either global configurations or roles for specific situations, for example “development,” “production,” etc.
Chef-server, which serves as a master repository for your cookbooks, property files, and lists of clients, nodes and environments. You can set up your own server by downloading and running the community open source version. OpsCode also provides a hosted option called Enterprise Chef, which is free when used with five nodes or less.
Public/private keys, which allow clients, nodes and servers to authenticate each other without needing passwords. When your initial account is created on Chef server via the web GUI, the server creates a key-pair. The private half is added to a zipped download bundle that is expanded on the client into a directory. The client directory is then checked into Git (keying materials are not checked in). Whenever Knife is executed, it uses the private key to authenticate with the server first. The client bundle also includes a “validation key,” which is copied to new nodes at the time of creation. This validation key is used to initiate a key-exchange process with the server to create a node-specific key, after which point the validation key can be removed from the node.
Resource providers, which perform tasks listed in cookbook scripts. These providers allow Chef cookbook commands to remain relatively abstracted from the underlying OS commands. For example, the include resource provider invokes the package managers on various systems (yum for Red Hat or CentOS, apt for Debian-style Linuxes etc). Creative combinations enable interesting results: for example, you can populate directories on target nodes with Git checkout contents. If you had previously versioned website page contents, contents, creating an up-to-date static webserver can be done automatically by causing it to pull the latest content from the master repo — a neat trick.
Community site, which hosts cookbooks from OpsCode and third parties, saving you the trouble of writing your own cookbooks. The Apache cookbook, for example, is extremely complete and allows for flexible customization. I have not finished fooling around with it yet in my own experiments, but the properties files allow for quite a bit of hardening; you can specify which Apache modules to include and exclude, create website aliases, map directories and do many of the things that old Apache-tuners like me have been doing by hand for years. As you might expect, the degree of configurability for any particular cookbook varies greatly depending on the skill of the author and amount of iterative refinement the cookbook recipes have received over time.

In addition to its own components, Chef also makes good use of a few other key tools that you might be familiar with, chiefly:

Git, the distributed version control system at the epicenter of the DevOps movement. When you create a new Chef project, the first thing you usually do is commit the new project into a local Git repository. At that point, you can easily create and link to a remote repository so that changes to the project are appropriately versioned centrally. As noted above, client-side keying materials are not automatically versioned; they are part of the default .gitignore file initially downloaded from the server.
Vagrant, a command-line utility for managing virtual machines. Vagrant allows you to download and cache a pristine community machine image, which can be quickly spun up, bootstrapped with Chef, and destroyed. The default VM image type is Oracle’s VirtualBox, but Vagrant can also manage VMWare, Amazon and Rackspace images. With VirtualBox images, Vagrant can also manage networking settings so that it is easy to create test machines on your laptop. Using Chef and Vagrant together, for example, I was able to create a new virtual machine, bootstrap it with Chef, and converge it to a desired Apache state in about 30 seconds.

Implications for security

So, why is a security guy like me fooling around with Chef, and what are the implications for security? Here’s what I like about it:

Infrastructure as code. I really like how you can create and manage machines essentially as code. I do a fair amount of programming as an after-hours “professional hobby,” so it is great to be able to use some of the same tools and languages (notably Git and Ruby) here also.
Clever crypto. The mutual authentication system using naked public/private keys is clever. I’ve always felt that for the sorts of things Chef does, certificates would be too heavyweight and too much bother. While it is true that the client-side private key is not, by default, protected with a password, one can easily be added. The no-password default, however, does strike a nice balance of making it easy to communicate with Chef server without needing to worry too much. As long as the client node is protected, subversion isn’t a huge worry.
Stepwise assimilation. I like how Chef can be added to an existing machine so that it can be massaged into the desired state. When I have a little more time, I plan to perfect my Apache cookbook adaptations and converge my existing securitymetrics.org server into it. That would allow me to quickly recreate the web-server parts of the site if it got 0wned. I keep a rather long list of anal-retentive instructions for hand-tuning the Apache, Mailman, Logwatch, SSHD, etc. I intend to gradually move each of these items under Chef control. Gradual assimilation is nice, because it easier for most organizations to implement rather than focusing on big-gulp “golden image” projects.
Baked-in security possibilities. As you might imagine, the ability to converge nodes into predictable and known states is Chef’s strong point. If you are a security professional who believes in Building Security In (“Mr McGraw, white courtesy phone…”), Chef gives you powerful tools in service of that goal. Through Chef, cookbooks, services and applications can be minimized. Key exposures can be limited via existing cookbooks or through custom ones that you may create.

Key caveats when working with Chef

So, that’s what I like about Chef. However, Chef has some important limitations that security professionals must keep in mind:

Chef’s frame of reference is that of an Agile developer, not that of a system administrator or a security pro. Cookbooks and recipes, and infrastructure-as-code are powerful metaphors, but they are different than those used by traditional configuration management tools. There is no concept of a CMDB other than in a very loose sense — the Chef server data and any projects managed by Git. Using Chef effectively requires you tho think like a developer. In companies where Agile or Lean has taken root — where development and operations are tightly coupled in a common workflow — this is a plus. But shops that aren’t fully wedded to the DevOps philosophy are likely to find Chef’s mindset a little alien.
Chef’s learning curve is steep and can lock you in. Chef property files and cookbook scripts are nothing more than stock Ruby files arranged in specific directory layouts and used in specific ways. Consequently, mastering Chef requires one to learn a bit of Ruby. Personally, I’ve found Ruby easier to learn than Perl or Bash (neither one of which I like very much). It allows me to express intent more simply and in a more compact fashion. What it means is that if you are a security or infrastructure professional who wants to build security in, you will have to roll up your sleeves a bit and learn a new language. Your investment in learning Chef and Ruby will lead to increased lock-in, which is usually a good thing. Certainly, it is better than the alternatives — rat’s nests of Perl, Bash, wikis and READMEs.
Chef’s documentation is average for open projects, with the pluses and minuses this implies. OpsCode offers a licensing and support model similar to other hybrid companies: the source code is freely available for most components; licensing is generous and corporate-friendly (Apache 2.0 license); and a vibrant community helps newbies ascend the initial learning curves. If you want support you have to pay. For those who want to self-support, documentation is on par but not dramatically better than many open source projects: it covers basic use cases well, but minor deviations from potted plots cause hiccups. In my own experiments, for example, a server node wasn’t converging as it should have because chef-client wasn’t running as root. Error messages were cryptic and shed no light on the cause. Attempting to reinitialize the master workstation client made matters worse because I erased my private keys. I eventually figured out what was going on, but only through logical deduction rather than consulting the documentation.
Chef is server-centric, and won’t help you converge state on other types of devices, such as routers, load-balancers or databases. For those whose ambitions extend to automating the configuration of entire virtual or physical environments, you will need to bolster Chef with other tools. That isn’t necessarily a minus, but it does mean that Chef is only good at the things it is meant to be good at. It won’t be the only tool in your bag.

Alternatives to Chef

As Sean’s blog points out, Chef is not the only game in town. Puppet serves a similar role for many companies, and its design philosophy is close to that of Chef. Both were inspired by CFEngine. I chose to experiment with Chef because I felt it had more polish and refinement than Puppet. I have no idea whether this is actually true or not. At a certain point, it does not matter. Whether you like Chef, Puppet or CFEngine, the point is to try them out and see where it takes you. I am quite pleased so far with Chef and look forward to using it more with my own projects. I will post more details in future blog posts.

If you are a security or infrastructure who is working with Chef or similar tools, I would love to hear about you experiences. Add a comment or send me an email!

New Web Adventures with Heroku

2013-08-26T08:55:00-04:00

Many ardent followers of this blog know that among other things, one of my professional hobbies is application development. I am a “weekend programmer.” I always have a side project or two going, but do not professionally program (much) as part of my day job. That’s not necessarily for lack of talent (cough), but for lack of desire to make my living from it. That said, as the CTO of a cloud security software company, it’s rather good to know how software is built these days. As a bonus, by staying close to dev via a hobby or two, I can relate better to my colleagues who actually do make their living from programming.

I have been programming most of my life. I learned to program around age 11 on time-sharing systems, and then later, on the Apple II+ and the PDP 11-44. My high-school computer science team was nationally ranked, usually #1 or #2 in the country, and I scored well enough on my high school Advanced Placement (AP) test — 5 on a 5 scale — to waive all of my college science requirements in college. I could have majored in computer science but chose economics and political science instead, with huge dollops of Japanese and architecture on the side. I took just one computer science class in college — for fun — as a senior. It was CS 201, the hard-core freshman course for future majors. The course focused on LISP. I hated it, frankly, because Lisp is a weird language, and because all of my 201 classmates were jumping with both feet into their future majors and left me breathing their dust. As a result, I found myself — for the first time in my life — on the ass end of the grading curve. That aside, I got my first consulting gigs after college as a programmer, and have kept coding, on-and-off, ever since.

A few of my “weekend projects” have been more than that. For example, in the early 2000s I became enamored of Java 2 Enterprise Edition (J2EE) in general, and with a Java-based wiki software package (JSPWiki) in particular. I didn’t like JSPWiki’s security model and volunteered to re-write it. That was fun. Five years later, I had contributed about 100,000 lines of code, added LDAP and database authentication, re-wrote the authorization system, given it a new front end based on Stripes, and helped incubate it into a top-level Apache Software Foundation project. By that time, I had essentially become the co-lead on the project with my colleague Janne Jalkanen. But I found that I was no longer using the software day-to-day, and other life priorities intervened (marriage and a job change). So I retired from JSPWiki.

More recently, I have indulged interests in two areas: mobile development — iOS in particular — and Dev Ops. In the mobile realm, I have been working on-and-off on an ambitious (too ambitious?) productivity app that will address a need that everyone has.

On Dev Ops, I have lately been fooling around with some of the build-automation and hosting frameworks. Heroku is my current preoccupation. It combines server-side build automation with hosting. What that means is that developers can write code in their language of choice using a Heroku-mandated directory structure and packaging specification. Developers check in their code to Heroku’s servers using Git. When the code is checked in, Heroku packages the app based on the packaging document — called a Procfile — and deploys it in one or more web servers, depending on how much scalability the customer pays for. Heroku also offers a pre-configured private SQL database (Postgres), which developers can use for application data storage. As a bonus, Heroku offers a downloadable set of command-line utilities called Toolbelt that allow the server-side environment to be simulated and tested on the client. Best of all, Heroku offers compute time on one server — which they call a dyno — more-or-less for free, assuming the cycles don’t exceed a relatively generous threshold.

From the developer’s perspective, Heroku is pretty great. Code is effortlessly deployed when it’s checked in. One simply pushes the latest code to Heroku using the usual method — git push heroku master. Heroku’s server-side hook detects the check-in, builds the deployment bundle (what Heroku calls a _slug) and deploys it on the dyno(s) in a few seconds. All of the most popular server-side development stacks are supported: JEE, Ruby + Rails, Ruby + Sinatra, Java + Play, Scala, Node.js, Clojure, Python + Django and many more. There’s a Postgres database in the cloud that is pre-provisioned for each application and is just “there” waiting to be used. A vibrant ecosystem allows third parties to offer NoSQL, monitoring and other services. Server scaling is merely a question of pulling out a credit card and buying some more incremental compute cycles. The documentation is simple and clear. And Heroku’s command-line Toolbelt tools make everything very, very easy and quick.

What it all means is that developers can create, deploy, test and use low-volume web applications without spending a dime. Other than typing a few initial Toolbelt commands, everything else is done using their everyday workhorse, Git. The infrastructure is completely abstracted so that pushing an app out to the Internet is as simple as typing the words git push.

All of which ought to be terrifying for security managers.

From the big picture perspective, Heroku represents a complete rethink, and outsourcing of, the entire application development stack. That it can all be done for free — at least, for the first hit — means that Heroku and providers that offer similar stacks (CloudBees, Joyent, Engine Yard) create a natural alternative to traditional IT for prototyping, experimentation, and possibly, deployment.

We can take this one step further. What Heroku and services like it means is that in the future, IT will remain relevant only if it can continue to engender respect with developers. If IT insists on being a roadblock — for example, if it can’t or won’t buy prototyping servers fast enough, imposes uninformed mandates about “company standard” frameworks, continues to require CVS (shudder) or SVN (wince) rather than Git, or breaks out in hives at the mention of this newfangled Hadoop thingy — it will create economic incentives for developers to look elsewhere. By “IT” I mean it as an aggregate entity — the architect rule-setters, security-gate-keepers and purse-string-holders that collectively and emergent-ly determine how applications are made and where they run.

In my next post, I’ll offer some perspective on some of my experiments with the Play Framework, a radical re-think of Java that offers a compelling alternative to traditional JEE applications. My occasional correspondent and Twitter friend Rob Williams turned me on to Play. It can be deployed quickly and simply on Heroku. I’ll have some observations shortly.

First Look at Stephen Few’s “Information Dashboard Design, Second Edition”

2013-08-13T23:01:00-04:00

Twenty years ago, a polymath prophet named Edward Tufte self-published an incendiary book, The Visual Display of Quantitative Information. It forever changed how a certain species of white-collar professional viewed the world. As a DNA-tested, confirmed member of the species homo visualis, I can tell you that his book, and successors such as Envisioning Information, taught me how to create strong, effective statistical graphics. Tufte introduced the concepts of chart junk, the data-to-ink ratio, small multiples and sparklines. He argued forcefully and persuasively that designers of statistical graphics need not condescend to their audiences. And perhaps most important, he inspired a generation of authors, professionals and scientists — call them “Tuftees” — to strive for simplicity, clarity and honesty in their representations of data.

Indeed, in my book Security Metrics: Replacing Fear, Uncertainty, and Doubt, I wrote an entire 40-page chapter on how to graphically present security data. That chapter owes everything to Tufte. I mention my own book not out of a desire to gratuitously promote it (not that there’s anything wrong with that), but because in the 2nd edition of Stephen Few’s Information Dashboard Design: Displaying Data for At-A-Glance Monitoring I can sense exactly why and how Mr Few was driven to write his own book about visualization.

In my case, I felt compelled to summarize quickly everything I had learned about effective graphical techniques, because I wanted to help security professionals create exhibits that weren’t awful. After I put fingers to keyboard to write the chapter, though, I found it hard to stop writing. No treatment of security metrics would be complete without an honest discussion of visualization techniques, and that took space and length to do well. Cranky about the state of graphical practice in my own industry, and lacking decent models to point others at, I decided to build some of my own, often imperfect, models. (Really cranky, too: after re-reading chapter 6, it’s a wonder Addison-Wesley let me publish the book at all!) In short, pissyness led to something productive.

You can smell the same faint alternating whiffs of frustration and hope in Mr Few’s book, too. He’s my kind of cranky. He’s a Tuftee. The first half of the book, about 110 sparse pages, focuses on what not to do when designing dashboards. Dozens of examples of bad dashboards fill the first hundred pages. I can only imagine the nightmare of getting screenshot copyright clearances from the vendors whose products he made examples of.

But despite the sport he has with the screenshots, Information Dashboard Design also grounds practitioners in the basics. Few defines a dashboard as:

…a visual display of the most important information needed to achieve one or more objectives, consolidated and arranged on a single screen so that the information can be monitored at a glance.

That is nicely said. Building on this fundamental definition, the first half of the book covers these additional topics:

Clarifying the Vision: What is a dashboard? Why do we use them?
Thirteen common mistakes in dashboard design: exactly what you’d imagine; this is a regular rogues gallery
Assessing what’s needed: what people need when they see a dashboard
Fundamental considerations: how frequency of use, screen sizes, and data types influence dashboard design
Tapping into the power of visual perception: how we can use what we know about cognition to improve perception of dashboards
Achieving eloquence through simplicity: A Tufte-inspired discussion of maximizing the data/ink ratio, and of getting rid of filler and unnecessary ornamentation
Advantages of graphs: why pictures are worth a thousand words

The remainder of the book covers putting theory into practice. I have not read these chapters yet, but am looking forward to them.

If you are a Tuftee, you won’t find much in the first half of Mr Few’s book that breaks new ground. At least, not as of 2013. But then again, in 2006, this book was a big deal. It was well-received, sold well enough to merit a second edition, and has been widely cited since.

I admire Mr Few very much for writing this book. I don’t get the impression that he was a graphic designer by training. Nor does he appear to have an economics or statistics degree — indeed, I can’t find a résumé or LinkedIn profile anywhere. And he’s not a programmer. Not that any of that matters. Few is clearly a fanatic; he won’t change his mind, and won’t change the subject.

The principles in this book don’t apply just to dashboards, however. Every business professional who creates any kind of chart or exhibit can benefit from this book. I can say that with a high level of confidence — and I haven’t even gotten to the really good bits yet.

Stay tuned for my review of the second half of the book.

TIA Panel: M2M and Cybersecurity: What does success as an industry look like?

2013-06-04T00:00:00-04:00

This is the nominal text of panel remarks I delivered at the Telecommunications Industry Association’s M2M & Cybersecurity Workshop on June 4th, 2013. The objective of the panel was to discuss the following topic:

Define a cohesive vision for a secure, reliable and economically viable machine network. What are the key objectives and what level of risk can be tolerated?

Good afternoon. I am Andrew Jaquith, the CTO of SilverSky, a leading cloud security provider. It’s great to talk to you today. You may not know SilverSky, so first, a little about us and our qualifications:

SilverSky protects our customers’ most important information. We manage customers’ email and collaboration, secure their data with our security software, and monitor their infrastructure for compromises, all from our cloud.
We have 6000 customers, mostly in the private sector, including 1800 in the most risk averse and security sensitive industry there is: financial services We filter 50 million emails a day, and analyze 425 million security events
We protect half a trillion in banking and financial assets
We have a growing presence in telecommunications and communications service providers. We partner with Cbeyond, Telepacific, NTT, Windstream and — thanks to an an acquisition we are announcing tomorrow — with XO and Peak10

While we aren’t a device maker or carrier ourselves, we see a large volume of network traffic and security events every day. We see a lot of activity and have an perspective of what’s going on in the private sector.

Let’s talk about machine-to-machine (M2M). M2M means any digital, network-protected device that is part of a larger system. The “M” in M2M means something with an IP address. Everything from ATM machines to smartphones to copiers to energy grid sensors to that networked refrigerator we’ve all been predicting — at least, ever since MIT networked its soda machine in the early 1990s. I remember as a young pup around 1993 when Novell predicted that one day, there would be 1 billion network-connected devices. That prediction seemed audacious then; it is merely quaint now.

The “2M” part of M2M means that connected thing is a node in a larger network, and that the communications are only partially directed by a human. A consumer, for example, might own a mobile phone. They will surf the web, buy their kids gifts on Amazon, and play Words with Friends with, well, their friends. There’s nothing about these activities that is different or more interesting than what we have seen on the PC. However, all of the supporting services underpinning the mobile experience — cellular data communications, background telemetry, push notifications, carrier updates — that is all M2M traffic. So are the networked soda machine replenishment signals, SCADA traffic, the cellular tower updates, etc. These are not initiated by humans; these are all machines talking to machines.

The reason we are all here: talking about security in M2M. We are here, I think, because so much of what we experience and take for granted every day relies on networking; that is, the “2M” part. Increasingly, all of that networking is under the covers, and not directly perceived or controlled by the consumer or end customer. It is of paramount importance, therefore, that we can trust the networks, devices, clouds and data that underpin the M2M economy. We need to trust the things that filter the water we drink, transmit the power we consume, and connect us to other people.

What is does success at securing M2M look like? With something as diverse as M2M, one cannot easily articulate a “vision” for security. There is no single “system” one can articulate a vision for. It’s a “system” in the same way that health care is a system: fragmented, partly analog, few standards, and filled with many parties with competing interests.

But the need is clear. Risks abound across the system. A popular grey-hat security research project, for example, has released automated exploits for SCADA systems from Rockwell, GE, Schneider, Siemens and many others — making it relatively easy for attackers to weaponize and use on a large scale. Scary. And these are people who are supposed to be our friends. Then there are those who are not our friends: nation-states such as China, Russia and Iran, which have funded large offensive cyber-warfare teams. It is certain that M2M systems are on the target list. Rounding out the list of threat actors includes the usual criminal gangs, unsavory hackers, miscreants, attention-seekers, pirates and — arguably the worst of the bunch — Mr Murphy (as in Murphy’s Law).

So, defining a vision for M2M is arguably a fool’s errand. That said, if I could suggest one big hairy audacious goal for M2M security, it would be this:

The absence of surprise

“Surprise” in the context of M2M means disrupted business, theft of service, successful attacks on critical infrastructure, civil unrest, loss of life or livelihood, theft of secrets or corrupted data. Drilling down a bit more, “absence of surprise” implies four other goals. It implies:

Designing for failure: having compensating processes for dealing with compromises
Designing for resilience: making it possible to diagnose, upgrade in the field, and have robust functions in less-than-optimal environments
Eternal vigilance: having a strategy for continuous monitoring; for incident handling, and for response activities (often neglected)
Risk management: eyes-open knowledge of what adverse events are acceptable, and how frequently they can be tolerated

Let me illustrate by example. Ten years ago I helped design of a security subsystem for some hardware devices due to be deployed by one of the most zealous and security conscious organizations around. This organization would do just about anything to ensures that their mission was achieved, that their devices were not compromised, and that they were as protected as possible from the threat posed by attackers. No, I’m not talking about the military, the CIA, or the NSA. I’m talking about cable TV.

The job was to design Comcast’s next-generation conditional access system (called DCAS aka True Thru-Way). What was the goal? To design a bulletproof CAS that would securely deliver any programming of the customer’s choice, so they could get anything they wanted and paid for. But — and this is important — not what they didn’t pay for. Also: nobody else could get the programming without paying either. The system we designed had a three key features designed to advance this goal:

Device integrity: keep the device in a known state. This implied that we needed not just a way of keeping a set-top box (STB) from being tampered with, but a way of knowing when it was being tampered with.
Content protection: require encryption between the cable network head end and the STBs. A strategy for hardening the box. Creating a cryptographic “key ladder” with long-lived session keys and ephemeral ones, so that compromising a more frequently used key meant a finite window of time for the compromise. We also needed “secure elements” on the box that would be “personalized” for each unit.
Device updates: develop a way of revving the local STB firmware and updates. That implied having a “root of trust” derived from keys that were managed centrally. We know from watching the experiences of DirecTV (and today, Apple) with “hackers” that adversarial warfare with determined opponents makes defenders stronger.

What this meant in our case: lots of crypto. Serious review and iteration. Willingness to learn through evolution. Knowing that you have to walk a fine line between between security robustness, flexibility, usability and ability to manage at scale in the field. Perhaps most important: all of the design decisions were informed by an acceptance by Comcast of exactly how hard it ought to be for a pirate to pop a box and get free TV. How hard should it really be, and what would the company tolerate? Also, Comcast defined which “tail risks” they wanted to avoid. That is, what does catastrophic failure look like? In this case, just for example, Comcast wanted to make sure that other than stealing the topmost root key — which was made very, very difficult — no mass compromise was possible; an attacker would have to go box-by-box.

This should give you an idea of what is required to build devices with high levels of security, where that security supports the business goal. For a more modern example, look at Apple’s iPhone. That is a great example of fairly robust security and usability. Fifteen years ago, if I told you that you would see the rise of a consumer computing platform with over 500 million units deployed, where the entire platform includes trusted boot, mandatory access control, full device encryption, mandatory application screening, mandatory application signing from a central authority, a vibrant developer scene, and very little (essentially zero) malware, and one that doesn’t drive customers batty — indeed is one heck of a pleasure to use — you’d say I was nuts. But yet that’s what we have. I don’t advocate Apple’s model per se. But it illustrates one way to try to accomplish many goals, and do them all well enough that the net risk to consumers is very low.

On the other side, look at what happened with Stuxnet. The attack was essentially via USB stick plus a stealthy worm that attacked Siemens SCADA systems used to control and monitor centrifuges for enriching uranium. This system runs a variant of Windows. Very few of the ideal security characteristics one would like to see in a robust, secure embedded operations system were in place in this case. (Arguably in the case of Stuxnet this was a feature rather than a bug.)

My wish for the industries that are involved in M2M, looping back to my original comment, is that we design collectively and individually for the absence of surprise. Any surprises you get should be those you expect… And then, of course, they aren’t surprises. They fall into the category of what Donald Rumsfeld memorably called “known unknowns.” Our eyes are wide open, based on enlightened economic self interest. In addition, I would hope that have enough eyes wide open that many of the “unknown unknowns” are imagined as well.

That won’t be good enough in all cases, though. In closing, we will need to consider incentives to swing the calculus to align economic self interest with good security outcomes. Speaking as a trained economist who works in the security field (and who programs to relax), almost all security failures are rooted in perverse economic incentives. Our goal ought to be to align incentives so we get better outcomes. In my view, everything should be on the table: software security liability for manufacturers, legal shielding for sharing of security data and incidents, promotion of industry standards and inclusion of these standards in purchasing guidelines, and, in cases where the risks demand it, regulation or legislation.

If we do all of these things, we will have successfully used our collective imaginations to identify, reduce, or willingly accept the M2M risks we face, both today and in the future.

Thanks very much for listening.

“Everything was green. Mulally thought that was odd for a company losing billions.”

2013-02-21T00:00:00-05:00

I have been a fan of the Ford Motor Company ever since I was a boy. There’s no rational reason for it, but then again, experts tell us that brand preferences are formed at very early ages. Somewhere around the age of 10 or so I decided I liked Ford cars. My first car after college was a 1993 Ford Taurus, which I later gave to my sister when I moved overseas. My second car was a 1998 Ford Contour. I changed to a nice little Honda Civic five years ago; at the time, the domestics weren’t looking so great. But I still have a certain patriotic wistfulness about Fords, and probably always will have. For this reason, I’ve been watching Ford’s recovery with interest.

Most people know that Ford didn’t take a dime in government money during the Great Recession. It was the only one of the Big Three automakers that did not. Much of the credit for this belongs to Alan Mulally, the CEO of Ford. He made the gutsy and prescient decision to take out a $23 billion loan two years before the recession hit. He used absolutely everything as collateral to get it, including the iconic blue oval logo. Since that time, Ford sold off its troubled and de-focusing Jaguar, Volvo and Aston-Martin luxury brands, built a terrific new line of fuel-efficient cars, whittled the number of cars “platforms” it used globally down to just a few, and steadily increased its car quality.

Ford’s near-death experience — and subsequent rejuvenation — have been the subject of many case studies. A good short one, and the impetus for this post, is “An Insider’s View of the Ford Story” from the Ross School of Business at the University of Michigan. In it, Ford COO Mark Fields tells a wonderful anecdote that most of us can relate to:

At a weekly business status meeting early in Mulally’s tenure, charts from top executives didn’t indicate the company was in any trouble. Ford uses a color code for topics — green for good, yellow for a potential issue, red for a problem — and everything was green. Mulally thought that odd for a company losing billions.

Meanwhile Fields, then president of the Americas, had an issue with a product launch that year. The new Edge had a liftgate problem that threatened to delay its critical debut.

“I said, ‘Code it red,’ and they said, ‘Are you sure you want to do that?’,” Fields said. “I said, ‘This is what Alan wants. Let’s go for it.’”

Finally it was Fields’ turn — Edge launch: bright red. “I could feel the chairs move away from the table,” said Fields. “I said we have a problem, and I’d love to have help from manufacturing and quality to help resolve it. Alan turns to me and starts clapping. The next week, everybody’s chart was like a rainbow.

By all accounts, Alan Mulally is a no-BS guy who does not fear hearing bad news. Indeed, everything I’ve read suggests that he encourages his staff to bring problems to the surface so that they can be discussed dispassionately and dealt with. Crucially, he encourages his team to do this without finger-pointing. At Ford, this has helped break through the factionalism that had traditionally plagued the company. As Fields puts it, “Working together has been so crucial for us to get through a very difficult time and work through our issues on our own.”

As described in an older CNN Money story, establishing trust and a culture of openness was a big change. But there’s no doubt that the ultimate referee is Mulally:

There are no pre-meetings or briefing books. “They don’t bring their big books anymore because I’m not going to grind them with as many questions as I can to humiliate them,” Mulally says. “We’ll see them next week. We don’t take action – I’m going to see you next week.” No BlackBerrys are allowed, and no side conversations either – Mulally is insistent about that. “If somebody starts to talk or they don’t respect each other, the meeting just stops. They know I’ve removed vice presidents because they couldn’t stop talking because they thought they were so damn important.”

Ford’s success isn’t solely due to the leadership qualities of the CEO, of course. Building better quality cars, after all, is the point of the whole exercise. That, the company has done well. But I love this story because it shows how setting the “tone at the top” matters, and that having a positive culture of problem-solving can (literally) pay dividends all around.

Bully for BlackBerry. But Is It Too Late?

2013-02-15T00:00:00-05:00

Last week Research In Motion announced three things:

It had renamed itself to BlackBerry
It would soon ship two new BlackBerry 10-compatible devices, the Q10 (with keyboard) and Z10 (touchscreen only)
It had shipped the new BlackBerry Enterprise Service, version 10

These three announcements, taken together, signaled the end of a long period of frustration for customers, employees and shareholders. After a wait of nearly three years, BlackBerry, indeed, delivered the goods. Reviewers of the Q10 and Z10 have generally been impressed; these are solid handsets. Ditto for the BlackBerry 10 operating system. BlackBerry Enterprise Service 10 includes updated software updated for managing BlackBerry 10 devices and PlayBooks (the BlackBerry Device Service); a new bundled, reskinned, version of the Ubitexx MDM software it acquired in 2011 for managing iOS and Android devices (now called the Universal Device Service); and, an updated version of the server software for routing data to older BlackBerry devices using the “classic” BlackBerry network infrastructure (BlackBerry Enterprise Server). BES 10 also includes an updated version of BlackBerry Enterprise Server Express (aka “the cheap one”) for customers who don’t need all of the power and complexity that the non-Express version offers.

At least one reviewer intoned that with its new offerings, enterprises now had a true BlackBerry BYOD (bring-your-own-device) solution. It seems that the wait may have been worth it.

Left unanswered, though, is the existential question of whether it matters.

Recitation of the facts make for lamentable reading. The company’s share of new handset sales in North America was just under 3% for the most recent quarter, down from 35% just five years ago. Apple came out of nowhere to become the US’ biggest handset maker, something BlackBerry had no answer for. The Android juggernaut, led by Samsung, seems poised to take most of the rest of the market, leaving BlackBerry and Microsoft-aligned handset makers such as Nokia to fight for the scraps. As reported by Asymco’s inimitable Horace Dediu, Apple and Samsung together accounted for 103% of the smartphone industry’s profits — a number greater than 100% because competitors (RIM and others) lost money. The question isn’t so much whether BlackBerry will suddenly exit the market — partial annuity business are nearly impossible to kill. The question is, can it continue to make products that it can sell at a profit?

The path to profitability starts with great products, which in ideal circumstances lead to a virtuous cycle of desire, demand, scale, continued cost decreases and increased pricing power. That assumes that the company executes well. It hasn’t in the past. The PlayBook, for example, contained the germ of a good idea but didn’t take root with buyers for lots of reasons. Among them, an asinine and insulting advertising campaign that failed to ignite interest. (Alternatively: “We’re the only tablet that delivers the whole Internet because we have Adobe Flash!” and, “Amateur Hour is Over,”). The PlayBook also initially omitted key features, such as the ability to do on its own the thing that BlackBerry customers prized most: get email.

BlackBerry also stubbornly clung to the idea that it needed to be a network provider, years after TCP/IP over cellular had become commonplace. What was once a compelling competitive advantage has turned into disadvantage: an extra cost for customers to bear, and a single point of failure. For context, see something I wrote on the Forrester analyst blog three years ago:

The BlackBerry was introduced in 1999 as a two-way pager on steroids. Back then, TCP/IP over GSM (and other wireless networks) was just a pipe dream. RIM implemented a system by which all traffic is collected from the mobile networks of the sender, funneled through RIM servers and then routed back onto the recipient’s mobile networks and pushed to the handset. In essence, RIM — rather than the Interwebs — provided the routing capabilities needed to ensure that mail and messages are delivered. That was necessary, and worked well, when Internet data plans were not universally available. It gave BlackBerry instant push e-mail and guaranteed delivery. And critically, it was a competitive advantage that no other wireless vendor had.

And then, last year, in the wake of the pervasive RIM network outages that swept the globe, I noted the following on the SilverSky (nee Perimeter E-Security) blog:

Data plans that provide TCP/IP over wireless carrier networks are now ubiquitous, nullifying a key RIM advantage. Moreover, push email protocols such as ActiveSync are licensed by the two Post-PC device leaders, Apple and Google. ActiveSync isn’t as good as BlackBerry push email, but it is good enough for most businesses. But in spite of the ubiquity of TCP/IP-over-wireless, RIM continues to do its own thing. Essentially, when you choose BlackBerry, you are making a bet that RIM’s reliability will be better than that of your wireless carrier’s data service. That might have been a safe assumption five years ago, but [with the recent outages] it isn’t any longer.

Fast-forward to last week’s announcements. BlackBerry has directly addressed the network issue with the new handsets in two ways. First, to BlackBerry’s great credit, they are allowing the new Q10 and Z10 devices to get email, calendars and contacts over ActiveSync, using regular carrier data networks. Even better, the BES management server appears to be able to communicate with BB 10-compatible handsets over the carrier data networks, too. In short: BlackBerry has signaled its intent to make the classic RIM data service — yesterday’s network — optional. The company has not played this up, for obvious reasons. But it’s a big win for customers. I’ve already gotten inquiries from several SilverSky customers who want to scrap their expensive data plans and use the new devices in ActiveSync-only configurations.

This sounds like good news, but is it too good to be true? Some early reviews suggest that BlackBerry’s ActiveSync feature support isn’t complete, with one reviewer finding himself unable to inviting people to meetings, for example. On the CrackBerry boards, many confused addicts can’t figure out whether a separate data plan really is needed for devices that are managed by BES. BlackBerry’s website and documents are maddeningly vague.

Worse, BlackBerry’s three administration components — BlackBerry Device Service, Universal Device Service, and BlackBerry Enterprise Server — are not well integrated. Well, they are “integrated” in the sense that the are all part of the “BlackBerry Enterprise Service” 10 family, and some user interface elements are shared. But administrators must read three different manuals. There’s also components called BlackBerry Management Studio, BlackBerry Mobile Fusion Studio and BlackBerry Enterprise Server Express, which have their own interfaces as well. Confused? Old-hand BES-heads probably aren’t. But suppose you are an enterprise CIO who considering, or has already purchased, a product from an MDM arriviste such as MobileIron, AirWatch of FiberLink. Would you “leap backwards” and consolidate everything with BlackBerry’s new services? You might, or you might decide that the company’s management tools still aren’t ready. Compared to the seductive “one-console, one-policy, all-device” visions that the arrivistes are painting, BlackBerry’s looks complex and parochial by comparison. Why suffer through Jackson Pollock when you can run away with a Renoir?

BlackBerry, then, has five challenges as I see it. It must:

Stanch the bleeding in its traditional customer base. The new Q10 and Z10 handsets and BB 10 operating system should ensure that wavering customers give the new products a long look. That’s step one. Continuing to keep up a rapid cadence for operating system upgrades, third party app availability and new handset models: that is step two. Step three will be convert existing customers as quickly as possible to the new operating system and server software, locking them in for another few product cycles.
Eliminate any perceived or actual dependencies on the classic RIM network. BlackBerry needs to show customers that the traditional RIM routing and transport service isn’t needed any more, and that it has fully embraced ActiveSync as an alternative mail protocol. It must do this to eliminate the notion that its products are expensive to operate in addition to stodgy (although the “stodgy” perception will be lessened by the new handsets). This will necessarily reduce network and BES revenues but will increase sales of handsets. It is, in effect, a form of cannibalization. BlackBerry executives need to adopt the same attitude the late Steve Jobs did when explaining to shareholders why Apple didn’t fear the prospect of iPads cannibalizing Mac revenues: “if you don’t cannibalize yourself, someone else will.”
Offer a compelling, unified alternative to MDM products. MDM vendors offer the promise of being able to define a single IT policy once, and apply it across all devices regardless of make or model. The reality is different, though; industry’s dirty secret is that most of the MDM products are focused almost exclusively on ActiveSync platforms, and iOS in particular. BlackBerry management features are usually paper-thin. That is partly because BlackBerry doesn’t offer good APIs, and partly because customers need help most with ActiveSync; the BlackBerrys they have are company-owned and well-managed. With focus, BlackBerry could offer a first-class management experience across all device types. But — and this is important — ActiveSync support can’t be half-assed.
Offer “something more” to CIOs. The preceding three recommendations will at best stabilize BlackBerry’s declining share of the enterprise mobility market. To grow it robustly, the company needs to offer something other handset makers and MDM vendors can’t. Some suggestions: (1) extend the “Balance” data labeling technology to iOS and Android; (2) introduce a bulletproof proxy platform that does for ActiveSync what BES did for RIM devices and what Blue Coat did for web security (think: attachment management and stripping; bandwidth management; content inspection; time and location controls; APIs third parties can use, etc); (3) unveil a cross-device, encrypted mobile cloud backup and sharing network (think: a more secure iCloud + DropBox on steroids). These might not be the right sorts of “something more”; but regardless, the focus should be on differentiating versus handset makers and MDM vendors.
Attract consumers again. Although BlackBerry likes to talk about its success in developing markets (Exhibit A: Nigeria), substantial revenue growth won’t come unless it can succeed in developed markets. Consumers, which substantially outnumber business customers, are the key. BlackBerry could do many things to increase its consumer share, ranging from incremental to radical. The most radical step would be ride the coattails of a consumer-friendly OS by shifting platforms yet again, to Android — or better — to Windows Phone. Less radical steps include building products that target demanding “prosumer” segments such as photography, design or programming; bribing popular app makers to develop to BlackBerry first; or… well, I’m at a bit of a loss here. BlackBerry will never be cool. But being seen as “reliable,” “fast” and “trustworthy” might be enough.

BlackBerry must do most — say, four out of five — of these things well in order to grow again. If the company does not, it will continue to shrink, slowly ceding shelf space to Apple and Samsung. It must make BlackBerry executives crazy to think that enterprises are willing to grapple with consumer-grade devices instead of soldiering on with their trusty, secure BlackBerrys. It must make them crazier still to know that they must put to pasture one their finest inventions, the RIM network. And yet, that is the world we live in. One where “good-enough” security and management has beaten great, where TCP/IP-over-cellular has supplanted proprietary networks, and where the black-and-white of company-owned has been replaced by the gray of BYOD.

It is late in the day for BlackBerry. It’s not too late, however, for the company to pull more rabbits out of its toque. I hope it will.

Four Things To Like About Obama's Executive Order on Cyber-Security... and Four to Dislike

2013-02-14T00:00:00-05:00

During his State of the Union Address on Tuesday night, President Obama announced an Executive Order on Cyber-Security. The full text is available in many places, including Wired. I’d urge you to read it in full; it is short and well-written, as you might expect anything coming from this president (or his staff) to be.

The Order directs DHS to notify private companies in “critical infrastructure” sectors of any impending attacks by extending the Enhanced CyberSecurity Services program. To promote greater information-sharing, the Order provides a “safe harbor” to companies that share information with DHS. It directs the National Institute for Standards and Technology (NIST) to create a new “Cyber-Security Framework” to reduce risk in critical industries. And to evaluate the success of the program, the Order includes a series of regularly recurring opportunities to review and recommend new actions to take.

Understand that the President signed the Order because of lack of a Congressional alternative. Last year’s two dueling cyber-security bills died in session due to partisan wrangling. Republican senator John McCain objected to the initial bipartisan proposal, the CyberSecurity Act, because of the idea that government has a role to play in setting standards, which it clearly does. McCains’s alternative bill, the SECURE IT Act, preserved the CyberSecurity Act’s focus on information-sharing but watered down any additional regulatory oversight. The Order more closely resembles the McCain bill, if only by the necessity that the Order cannot ask agencies to do anything beyond what existing laws allow.

I reviewed the Executive Order and found a lot to like in it. But it’s lacking in important ways, too. Here’s what I liked:

The scope of the proposed Cyber-Security Framework is comprehensive. The Framework will ostensibly “help owners and operators of critical infrastructure to identify, assess and manage cyber risk.” It will identify areas of improvement that can be addressed by the private sector, identify methods for reducing risk, and will recommend ways that companies can measure their success at implementing their programs. This is good. Critical infrastructure companies, particularly those in comparative security backwaters like utilities, need all of the help they can get.
Materials shared by private sector are shielded from discovery. Section 5c of the order states that “Information submitted voluntarily in accordance with 6 U.S.C. 133 by private entities under this order shall be protected from disclosure to the fullest extent permitted by law.” What that means is that any information shared with the government can’t be obtained under a FOIA request, for example. The information could still be discovered in a private suit.
NIST’s Cyber-Security Framework will incorporate industry standards. I have a lot of respect for the work NIST does. I know and have worked with many people in the agency. NIST also regularly collaborates with outside organizations such as the Center for Internet Security (CIS) and SANS. These groups are doing good work as clearing-houses for effective practices. It’s good to see the President explicitly ask NIST to “incorporate voluntary consensus standards and industry best practices to the fullest extent possible.”
The Order offers wiggle-room to define what industries are “critical.” The specific sectors covered by the Order are not mentioned in the text, but the scope defines critical infrastructure as “systems and assets, whether physical or virtual, so vital to the United States that the incapacity or destruction of such systems and assets would have a debilitating impact on security, national economic security, [or] national public health or safety.” One can easily imagine that the critical sectors are likely to include energy, utilities, and financial services. But transportation, pharma, and the defense-industrial base would qualify, too, depending on the how the President and his advisors see things. We’ll see how this evolves, but having flexibility here is important.

The President’s Cyber-Security Order is important because it puts an important stake in the ground in the absence of legislation. It recommends many important and fine things that we need more of, notably information sharing. But the Order also disappoints because it misses opportunities to do more. Some shortcomings are due to natural limits imposed on the Executive branch. The President cannot propose new regulation, for example. Others are failures of vision. Here are the four problem areas I see:

Participation by private companies is voluntary. The Order directs DHS to initiate an information-sharing program with industry to give them advance warning of attacks, and to obtain relevant information from target companies. The Order also asks DHS to create “incentives designed to promote participation” in the program and to analyze whether those incentives have been effective. That sounds like a tacit admission that they won’t be effective. To be fair to the President, he has no power under existing laws to compel participation; by definition, he must rely on incentives, persuasion, and motherhood-and-apple-pie instincts. Come to think of it, maybe he should send private sector CEOs… apple pies. Until legislation is passed that mandates participation, apple pies might be the best he can do.
Private companies that might have security insights aren’t included. Many large security companies have a significant amount of operational visibility into the day-to-day risks and attacks in critical infrastructure sectors. These include managed security services provides such as Symantec, IBM/ISS, Verizon, Dell SecureWorks and my company. They also include software and security companies that underpin large parts of the “trust infrastructure” that we all rely on, companies like RSA Security, Symantec (née VeriSign), Microsoft and Apple. Although one could file this in the “be careful what you wish for” category, it would seem odd that companies that control the keys to the many critical infrastructure kingdoms, or have visibility about what goes in or out of them, would not be in scope.
Wrong-headed emphasis on technology neutrality. The Order takes pains to emphasize that any guidance issued by NIST should be technology-neutral so that companies can “benefit from a competitive market for products and services.” Never mind that this sentence makes no sense. The whole sentiment seems wrong to me, because cyber-security is one area where government should make specific recommendations about technologies. It is a fact that some technologies are better and safer choices than others. Divorcing the guidance from the technology turns NIST’s efforts into a big “process” exercise. Process is good, but fixing things is better. All the guidance in the world isn’t going to stop your wide-open Windows NT 3.5 SCADA systems from being owned if they haven’t been patched since 1995. I don’t want NIST to “name and shame” or “pick winners and losers,” but it should be prescriptive where possible. That’s not a technology-neutral activity.
The framework will take too long to develop. NIST won’t finish its draft Cyber-Security Framework until mid-October. The final framework won’t come until February 2014. NIST offers plenty of vendor-neutral, technology-neutral guidance already, covering everything from risk assessment to metrics. It seems to me that existing materials could be easily re-packaged for critical industries without much effort. Let’s hope the dates are sandbagged, and that we will see drafts sooner than October.

Overall, though, President Executive Order for Improving Critical Infrastructure Cyber-Security is an important step forward. Let’s hope it prods Congress into passing something more permanent, prescriptive, and durable, with the regulatory powers DHS needs to get the job done.

Note: this article also appears on my company blog at silversky.com.

Moving securitymetrics.org to Octopress

2013-02-04T00:00:00-05:00

Soon, I will be moving the securitymetrics.org website to a simpler, secure and more usable system — the same platform that powers Markerbench. It should be done in time for Mini-Metricon (March 1st, 2013).

Some background. When I started securitymetrics.org in 2004 with Dan Geer and Kevin Soo Hoo, I had visions of making a cool, collaborative website for it. I liked the “wiki” philosophy (simple markup, text-based) and thought it would work well as a lightweight collaboration platform. I was, at the time, the co-author/co-architect of JSPWiki, an open source JEE-based wiki package. Because it contained a lot of my code, went the reasoning, I’d know just whose throat to choke when things went wrong. I could customize the server in all sorts of ways when I wanted to. And as a side benefit, I’d get to demonstrate my mad enterprise skillz by using JSPWiki’s four-tier architecture (client, web server, app server, database). That was the theory.

But a few things happened between 2004 and now:

Competing content-management systems — like Wordpress and Drupal — got better and better. With bigger dev teams came more features. The wiki software hasn’t kept pace — no social integration, for example.
Spammers overran my wiki self-registation system. As a result, I was forced to disable additional registrations, rendering securitymetrics.org not very collaborative at all.
I discovered I hated maintaining and upgrading that four-tier architecture I had been so proud of.
The wiki server’s memory leaks meant it needed monthly reboots — if I remembered. If I didn’t, it meant lots of downtime.
The threat environment got a lot nastier, leaving me leery of running ANY kind of content management server, never mind a wiki server.
I got married. No explanation needed here, I suspect.

It has been clear for a while that securitymetrics.org needed something better. Something simpler, more secure and (preferably) social. After a long period of on-and-off searching, I found something pretty close to ideal: Octopress.

As I’ve written before, Octopress is a static website generator; it is derived from Jekyll. You can think of it as a website “compiler.” You feed it text files, and it emits lovely HTML, CSS and JavaScript with all of the modern goodies built-in: discussion threads and comments via Discus, social integration via Twitter, Google and Facebook “like” buttons, search via Google Simple Site Search, and site traffic analysis via Google Analytics. The best part is that all of the components are 100% static. There are no application servers, no databases, no users; and, no user input accepted or stored. All of the dynamic behaviors result from loading up various bits of client-side JavaScript to invoke other people’s stuff. That makes the attack surface pretty close to zero. And, because everything is static, all that static stuff can be WAN-accelerated until it screams, and stored just about anywhere (Amazon S3, GitHub Pages, Heroku) without worry.

What does this mean for securitymetrics.org?

We will switch over to the new site at or around Mini-Metricon on March 1st.
The new website will be better looking and much better organized. I’ve edited and re-organized ALL of the Metricon content, for example, while I was at it.
I will be the sole “publisher” of the website, at least for the time being, but welcome all collaborators.
We will have a new workflow for people who want to write on the securitymetrics blog. It will involve Markdown and DropBox.
After Mini-Metricon I will probably move the mailing list to a new, faster, provider (first things first: website, then listserv)

For those of you who will be attending Mini-Metricon, I am looking forward to showing you the new site.

All Andy's Posts Now on Markerbench

2013-01-29T00:00:00-05:00

As part of a continuing experiment with static blogging, I have moved all of my historical blog posts from securitymetrics.org to Markerbench.com. Everything is now here, including the somewhat notorious essay Escaping the Hamster Wheel of Pain, which introduced a certain rodent-related metaphor to the security trade and served as the introduction to my book, “Security Metrics: Replacing Fear, Uncertainty and Doubt”.

For the curious, here’s some background on why I moved everything here:

The securitymetrics.org site has for many years been running on JSPWiki, a Java Enterprise Edition (JEE) application that uses “wiki text” as a markup language. It has served me well, and I am proud to have been one of the platform’s primary authors. However, my (older) deployed version of JSPWiki has suffered from a slow memory leak that has required me to restart the web app container about once per month. I have also had to disable site registration and commenting features, due to the lack of a reasonably-bulletproof spam filtering system. That meant that securitymetrics.org had become essentially a static website. So, why go to the trouble and expense of hosting it on a complex web app server? Now that I’ve gotten the hang of Octopress, a Jekyll-based static web publishing system, the time was right to make the move to a much simpler alternative. And because I had used securitymetrics.org as a personal blog, it seemed like a good idea to move all of the bloggish-type posts here.

Moving everything to Octopress means that I now write in Markdown, John Gruber’s elegant markup language. Among other things, that gives me the ability to use my preferred writing application, iAWriter — a beautiful writer’s tool that synchronizes with iCloud and hence, with all my devices. Markdown is a simpler markup language than wiki text; similar in many respects but with some key differences:

Headings: in Markdown, you create a new heading this way: # (Heading 1), ## (Heading 2), etc whereas in wiki text you use are !!!, !!, and ! respectively. I find the Markdown syntax a little more economical.
Emphasis: in Markdown, you emphasis text by surrounding it with _ (for italics) and __ (for bold). In wiki text you use '' for italics and _ for bold. Not a big difference, but logically, it makes sense that more emphasis means more characters to type (2 for bold versus 1 for italics).
Code blocks and inline code: In Markdown you indent text by four or more spaces to indicate a code block, and you indicate inline code by enclosing the text in back ticks (`). In wiki text you enclose the text with triple and double curly braces: } and “.
Hyperlinks: You create links in Markdown with the hyperlinked text enclosed in square brackets, and the link itself in parentheses, for example this snippet [link](http://www.markerbench.com) links to my blog. In wiki text, you use square brackets separated by a pipe eg, [link | http://www.markerbench.com]. Both are lightweight enough for my needs; the Markdown syntax is slightly easier.

I’ll miss JSPWiki’s neat table syntax, which allowed you to create tables simply and cleanly (|| head 1 || head 2 || head 3 for header rows and | cell 1 | cell 2 | cell 3 for regular rows), but you can create tables in Markdown simply by passing through HTML. That’s ok with me; I haven’t seen a clean way to do tables on any platform. Supposedly-“WYSIWYG” editors — such as the one in Wordpress — mess tables up regularly. Writing them out manually is a little more work, but not too much.

Apart from the syntax itself, one difference between Octopress and JSPWiki is the way post metadata is stored. In JSPWiki, metadata such as the author name is stored in .properties files; last-modified times are whenever the file was last touched. In Octopress, one stores this and other information in the Markdown document itself, right at the top, in YAML syntax. For example, here’s the YAML for this post:

---
title: All Andy's Posts Now on Markerbench
created_at: 2013-01-29 07:00:00 -0500
layout: post
categories:
- blog
- applications
comments: true
---

Pretty simple stuff; you need to remember a few basic rules. Beyond the syntax, though, I like the simplicity of the system as a whole. By turning the blog into something that I “publish” with simple command-line invocations, I get rid of a lot of headaches. Instead of worrying about a web application that I need to maintain, upgrade, and secure, I only need to worry about my writing. (And the occasional GitHub update.)

Being the industrious-lazy sort, to move the blog posts, I created a little Ruby script that munged the wiki markup and produced decent Markdown. All of the posts needed a little work done to them, mostly to fix a few bullet issues my script didn’t account for, and to assign categories to each post. At some point I will post the script to GitHub, as soon as I get the hang of that. I’ll also, for the sake of completeness, I will likely cross-post a few notable essays from my work blog.

After I have tinkered a bit more with the Markerbench site, I’ll go ahead and move http://securitymetrics.org to Octopress as well, right in time for Metricon 8! I may, or may not, put the site contents on GitHub pages, as I have with this blog. Regardless, making it a totally static website makes it simple to host and scale. In addition, moving the rest of the securitymetrics.org site will allow me to use a much less expensive hosting plan. The only thing I’ll need a hosting subscription for, at that point, is to host the securitymetrics.org mailing list.

Paving Over the Proprietary Web: The Java Security Bigger Picture

2013-01-21T00:00:00-05:00

Perhaps you’ve heard about the recently disclosed Java 7 zero-day exploit. The flaw allows a remote attacker to take complete control of a computer. It has been incorporated into many exploit kits. The Department of Homeland security regards the Java exploit as sufficiently serious to recommend “disabling Java in web browsers until adequate updates are available.” Oracle’s fixes — aren’t.

Many of my colleagues at other security firms have spilled a lot of ink describing why this particular Java exploit is bad. It is indeed that bad; Apple, for example, has forced down an update that blocks the Java 7 plugin from executing in the browser at all, at least until Oracle is able to distribute an update. If you are in the habit of keeping Java switched on in your browser, you should turn it off — of course. But that isn’t always possible. Client-side Java, for example, powers GoToMeeting. Many other companies — including my own — rely on client-side Java for critical functions. So one cannot simply rip it out, or mandate that it be banned. Reality has a habit of messing up the best-intended recommendations. But make no mistake, at some point very soon Java on the client needs to go. CIOs, please take note.

Client-side Java is part of the web’s proprietary past, and its time is ending. That proprietary past also includes ActiveX and Flash, two other technologies that saw widespread adoption in the early 2000s. That all three of these technologies came of age at roughly the same time isn’t a coincidence; they all filled gaps in the web experience. ActiveX was Microsoft’s way of adding native client functionality to a then-crude web experience; client-side Java (Swing, Java Web Start etc) did the same. Flash and its cousin ShockWave provided smooth video and animations.

Since 2005, though, the native web has changed dramatically, and for the better. HTML 5, CSS and JavaScript toolkits have been the major catalysts of a revolution in web design. The canvas element added to HTML 5, for example, allowed standards-compliant browsers to draw shapes, create and fill paths, and animate objects. This, plus the video element, freed designers from needing Flash. Cascading Style Sheets (CSS) Levels 2 and 3 gave designers increasingly pixel-perfect control over the placement and appearance of content — a task made even easier with CSS pre-processors such as LESS and Sass, and with kitted CSS assemblies such as Twitter Bootstrap. On the JavaScript front, third-generation toolkits such as jQuery made it simple to make websites dynamic and responsive. You can do all of these things for free, without needing to buy any of the various Studios from Adobe or Microsoft.

The slow-motion revolution in how the Web is made means that the raîson-d’être for proprietary web technologies is going away. Like a lumbering concrete mixer, HTML5 and JavaScript are slowly paving over the parts of the web that had previously been occupied by Flash, ActiveX and Java. Ironically, the vendors of these proprietary technologies have, in their own ways, added limestone, clay and water to the paving machine.

Microsoft, for example, turned an entire generation of web developers against it with its long, and ultimately fruitless, resistance against robust CSS support in Internet Explorer. Although modern versions of IE are highly standards-compliant, Internet Explorer did not pass the CSS Acid3 test until September 2011. Any web developer who has been working with CSS for more than 5 years, for example, can probably regale you with stories of massive hacks needed to allow older Microsoft browsers to work with standards-based websites.

The roots of Adobe Flash’s decline are a little different. Nothing was “broken” with Flash, functionally speaking¹. Two related events resulted in a decline in Flash usage: Steve Jobs’ public refusal to add Flash support to the iPhone and successor iOS devices; and Google’s decision to convert its vast library of YouTube clips to HTML 5-compatible WebM and H.264 formats.

These actions, plus the increasing viability and efficiency of WebM and H.264, meant that you didn’t need Flash video any longer. This has clear implications for customers. For customer-facing websites, you can (and should strongly consider) retiring Flash video in favor of H.264. This is a quick win; the re-encoding process is relatively quick and painless. That said, the need is not as urgent compared to Java. Adobe’s security team (under the leadership my former @stake colleague Brad Arkin) has upped the tempo of bug fixes, adopted auto-update, and is taking security seriously enough that Flash has become less risky than it had been. Still, if you could remove a dependency on a third-party component that needs to be maintained and updated in addition to the base operating system, why wouldn’t you?

Java, on the other hand, is simply a mess. From a pure features perspective, Java’s caretaker parent, Oracle, no longer employs the kind and number of Java engineers that will keep it up-to-date — never mind put it back on the cutting edge. Most of the Java engineers and visionaries such as James Gosling, Josh Bloch, Tim Bray, Amy Fowler, and Adam Bosworth — the people I learned from and looked up to while I was learning Java J2EE — left long ago to greener pastures. Although server-side Java is still widely used, nobody I know would consider it for greenfield development for use with a browser.²

From a security standpoint, it is hard to see why Oracle would be Johnny-on-the-spot with security fixes. As my other (!) former @stake colleague David Litchfield has pointed out, the company doesn’t have the best track record on security. We can reasonably assume that fixing client-side Java security holes isn’t anywhere near the top of Oracle’s priority list. And even if it becomes so because screaming customers demand it, legacy products get legacy engineers. That’s just the way it is.

The same goes for Microsoft’s ActiveX. Developers don’t use it for new web-based projects, and the company has for several years recommended that developers use other technologies³ to make dynamic websites. The risks associated with ActiveX continue to be high, no doubt because ActiveX controls are basically chunks of native code written by various vendors of varying skill, remotely triggered by websites that may or may not be under the user’s control. (What could go wrong with that?) To be sure, Microsoft has done as much as any vendor in the industry to set the standard for responsible and secure development practices. Over the years, they have responded relatively quickly to the various ActiveX security issues that have popped up over the years. But as with client-side Java, it’s legacy technology maintained by legacy engineers.

It is much, much easier to talk about how the slow-moving concrete machine that is the modern web — HTML 5, CSS, and JavaScript — will slowly pave over the proprietary web. It is harder to state with confidence what it will mean for security. However, one may hazard a few guesses. The decline of these three technologies should increase the overall level of security over time. Logic dictates that a browser festooned with fewer proprietary plugins is a more secure browser. Put differently: migrating older websites to use CSS, HTML 5 and JavaScript support will have the effect of concentrating the attack surface by reducing the number of parties who must defend that surface. Over time, the broad public ought to be better served by having Apple or Google or Microsoft be responsible for the entire web browsing experience — including security.

But in the short term, it won’t be so clean. Based on vulnerability counts — an imprecise metric at best — the “younger guys” don’t score well. For example, the US National Vulnerability Database shows that the WebKit browsing engine had over 198 disclosed vulnerabilities last year. Internet Explorer? Just 61. Meanwhile, ActiveX, Java and Flash had 73, 169, and 67, respectively. I draw no other conclusions from these data, other than the simplest one — increased use of native browser capabilities is likely to increase risks in the short term, even as the decreased use of proprietary technologies decreases it over the longer term. At some point the two lines will cross and we will all be better off.

In the meantime, the cement truck keeps rumbling.

¹ Functionality aside, Flash’s security track record has been poor for a while.

² Java development is alive and well on the Android platform, of course.

³ It’s fair to say that Microsoft has been all over the place on this subject over the last 10 years: DHTML, XAML/SilverLight, and now Windows 8-style apps.

Review of Gene Kim’s novel, “The Phoenix Project”

2013-01-17T00:00:00-05:00

Over the Christmas holidays, I read an advance copy of Gene Kim’s first novel, “The Phoenix Project.” Gene’s co-authors were Kevin Behr and George Spafford. It was a better read than I was expecting. It is about 350 pages. Here’s my review.

The book aims to describe how to bring TQM and “lean” (as in, “manufacturing”) disciplines to IT. Although TQM is especially important in the context of operations, the book shows how “systems thinking” that spans the development and IT operations organizations, and reaches upstream into finance, sales and marketing is critically important for technology-reliant companies. Because all but the most hidebound companies rely on IT to run (and transform) their businesses, the lessons in this book are generalizable to every company.

The book is less of a dramatic novel than a disguised set of parables about IT and DevOps, using a fictional company, crisis and characters to illustrate them. Most lessons are imparted by a mysterious “board member” named Erik, who serves as a combination Greek chorus, Zen master, drill sergeant and mentor to the main protagonist, a generically named midrange IT operations manager named Bill. Bill receives a battlefield promotion into a role he is unprepared for while, all around him, systems are crashing and strategically important projects are running horrifically off track. Most of us who have been in IT roles for a while can relate.

The authors are at their best when they focus on the technology and related (often broken) processes that permeate most companies’ IT processes. The protagonist Bill’s evolution as a manager from fire-fighting and free-fall, to productivity and proactivity, is well-done. As he embraces his new job, he slowly begins to understand the four types of work: business projects, IT projects, changes, and unplanned work. He also learns the tools he needs to increase his ability to do all four types of work using techniques such as simple change management meetings, kanban boards, monitoring processes and targeted, cross-functional “tiger teams” that automate the ability to deliver finished work more quickly.

It is very clear that the authors have a soft spot for the Agile and Lean styles of application development, the hallmarks of which feature short two-week development “sprints,” tight feedback loops between operations and development, and continuous integration workflows that build, test and deploy finished systems in minutes or hours rather than in weeks. The type of skills and tools needed to successfully implement these types of processes are loosely called “DevOps,” and they are what have enabled companies like Flickr and Twitter to innovate quickly. By the end of the book, Bill’s company has embraced this thinking, too. Although there is a certain amount if hand-wavey “and then a miracle occurred” glossing over of how Bill and his formerly bedraggled band of IT misfits manage to pull it all off, it’s fun to watch.

In contrast to the care and attention spent on describing IT processes, the characters through which the various lessons are imparted are paper-thin; we don’t really get to know them all that well, and it is not obvious why we should care about their fates. Like in a typical Tarantino film, the dialogue is mostly one voice spoken by multiple characters. I don’t think that is a major flaw, though; most readers are not going to read Gene’s book expecting another Navokov, or even the Second Coming of Crichton.

Overall, the book succeeds in its goal of communicating complex processes by way of extended example. If you want to understand the most important ways in which modern, technology-reliant companies need to transform their IT organizations, this book offers a valuable introduction. If I had one criticism, it is that I found myself wishing that the “lessons” from each chapter were summarized at the end of each major section, textbook-style. And although I had some trouble getting through the first 20 pages or so — which bogs down with exposition and character development — the rest of the book flew by.

I read “The Phoenix Project” in essentially one sitting with a break for lunch. I liked it much more than I thought I would, and recommend it.

Outsource your web risks with a static website

2013-01-08T00:00:00-05:00

A few weeks ago I put together my annual Predictions blog post for the coming year. In that post and accompanying webinar, I suggested five emerging risk areas that CISOs need to pay attention to in the coming year. These are:

CISOs will wrestle with the risks of “as-a-Service” platforms
Android’s security issues will force CISOs to take action
Cloud application vendors will compete on metrics
California will become the de facto privacy regulator
Your password policy will undergo a major overhaul

Of these, prediction numbers 1 and 3 both related to cloud services, and to the security thereof. The first one, about Platform-as-a-Service (PaaS) was by far the one that I spent the most time thinking about. That is because PaaS is the one that CISOs have the least amount of control over. It is the sneakiest. From CISO’s standpoint, knowing that large parts of your developer toolchain (source code repository, test VMs) and runtime environment (web servers, databases) is sitting out there in “the cloud” is scary, not least because these parts didn’t exactly go through the traditional procurement channel. Even worse, your typical IT security auditor isn’t really going to know what to do with PaaS, other than slap hands on face, MacCauley Culkin-style, and make a beeline for the exit.

However, in this post I will describe one way in which the use of pervasive — and free — platform cloud services can actually reduce risk. That may sound ridiculous on its face, but I offer one worked example that proves the point: static websites.

Static websites are exactly what you might think: websites whose content is entirely static. The server never executes any business logic: it simply retrieves whatever is asked for and serves it up to the client. By “whatever is asked for” we mean plain-old HTML, images, cascading style sheets, JavaScript or anything else that makes up the website. Static websites depend on four principles:

Websites are “compiled” offline on a workstation, and “published” by uploading to the server
Servers execute no code, and only serve static resources
All dynamic features execute on the browser via JavaScript
Third party services provide “outsourced” commenting, social and analytics features via JavaScript, which are removed from the server’s areas of concern

Because the web server is not doing anything other than serving up static resources, it can be dumb as a bag of hammers, and locked down within an inch of its life. Even better, the simplicity of the server results in a radically reduced attack surface; there are no “user accounts,” no databases, and no middleware. Because the server does not need to, and indeed cannot, accept any user input, no application code needs to be audited.

A new wrinkle on an old idea

Static websites are not new. They have been around for a long, long time — as long as the World Wide Web. In fact, prior to the invention of CGI and early server-side scripting languages like Cold Fusion, all websites were static. But in the mid-1990s, developers began adding server-side languages and scripting frameworks to make websites more dynamic. These include PHP, ASP, JSP, and more recently, server-side JavaScript implementations like Node.JS.

Developers have also increased the number of components that collaborate server-side, too. In the early days, simple static websites required only a web server. But modern dynamic web applications are composed from many architectural components in addition to the web servers themselves. These additions include the various server-side scripting languages, plus application servers, application code, databases, and directories. And that is just for the simple applications. Even the humblest business website that serves up nothing more than corporate information from a content management system (CMS) needs most of these components. A site like that needs a web server (for example, Apache), scripting language (PHP), content management system (Drupal), database (MySQL) and a directory for authentication (Active Directory). That’s five components, and collectively they aren’t doing all that much.

In contrast to the complexity of modern web applications, static websites turn back the clock on the web. The static website philosophy mixes old-school web publishing and new-school DevOps. If you want an example of old school, for example, look at my friend Dan Geer’s website or a representative posting on it. Dan’s site is just text; no flashiness, and no graphics. On its face, Dan’s site and mine are similar in one major respect. Both offer the same thing: static resources served up by a dumb server.

Why not compile your website instead?

Modern static websites differ from their old-school cousins in two ways. First, the highly automated, explicitly developer-centric processes used to produce them feature many of the same tools used to produce code. Authors write posts using plain text editors rather than a WYSIWYG editor or CMS. They “check in” their posts into code versioning repositories the same way they check in their code. After the post or page is ready to publish, a designated DevOps person — perhaps the author — types a few commands to “compile” the site and upload it. Some static website aficionados have automated the process completely: one simply saves a new version of a post to a designated directory, and the website compiler automatically checks the page into GitHub, regenerates the site from scratch, and publishes it to the web server.

The second difference between modern static websites and their old-school cousins is inclusion of dynamic features by deliberately “outsourcing” them to other, usually free, service providers. Instructions on the static web page cause JavaScript code to be loaded and executed, which communicates with the provider’s service and provides the illusion of dynamic behavior. This allows site owners to include modern features would ordinarily require server-side code. Years ago, if you needed analyze website visitor traffic, you would install WebTrends on the server. Today, you just pop in a couple of lines of JavaScript for Google Analytics (free). If you wanted commenting features with protections against spam, you needed an application that had a back-end database and a decent anti-spam filter, like WordPress. Now, you can simply embed Disqus (which is also free). Or suppose you wanted to allow visitors to recommend and share items on your website. Traditionally, you’d need to create a web form, hook it up to an email server, and create scripts to send recommendations via email. Now, all you need are a few JavaScript statements to load up Facebook’s “Like” button, Twitter tweet and follow buttons, or Google’s +1.

Dynamic features aren’t just the only parts of the website that can be outsourced. The underlying web servers can be, too. For example, GitHub provides a free service called GitHub Pages that allows developers to upload HTML and other static resources. These are served up just like a website. Amazon S3 provides a similar service. For low-volume websites like this one (ha!), Amazon S3 is completely free.

Outsourcing risk

Static websites are simple, and require just one architectural component: a web server. By contrast, the typical corporate website that does nothing more than serve up company information, and forward leads to Salesforce, nonetheless requires five. The simpler website is better because it is less complex, and less complex is good.

But that is not the only advantage static websites have. Modern web applications aren’t just complex, but risky as well. They typically need to reach beyond the DMZ’s back-side firewall to resources inside the company; for example, to a database or three, or to an Active Directory forest. These additional network connections confer a corresponding amount of risk. Then there’s the setup, operations and maintenance tasks. Each architectural component needs to be configured, hardened, horizontally scaled, patched and monitored — forever.

But when you create a static website, most of the complexity goes away, along with the cost and risk associated with each. If you choose to outsource the remaining architectural component — the web server — to a third party, that goes away too. Why not let the fine folks at Amazon, Heroku or GitHub configure, harden, scale, patch and monitor the web server? They are likely to be better at it than you are.

Simplifying your architecture by eliminating complexity — and outsourcing the web server — eliminates a huge amount of security risk by cutting the attack surface nearly to zero. But the outsourcing of dynamic features such as user tracking, commenting, social sharing, analytics has another side-effect. Because the web server processes no user-generated content, a whole class of application and data-related security risks goes away. Cross-site scripting, SQL injection, parameter tampering, and the rest of the OWASP Top Ten are no longer worries. You don’t have any potential data breach obligations because you don’t keep any data. There is nothing to steal.

Of course, just because you no longer need to worry about application and data-related security risks, your outsourced comment-management service (Disqus) still does. They, and Facebook, Twitter and other providers your client-side JavaScript links to still need to police their members for spam, fraud, impersonation, and identity theft. They need to secure their JavaScript APIs and web applications. But you don’t need to do any of these things any more; with a static website, you have essentially outsourced your risks to them. Indeed, it is more correct to say that you have transferred them.

A few security risks remain. Access to servers that host static content must be controlled. If you manage those servers, you need to manage the SSH keys or passwords used for uploading content. And you should probably restrict the number of people allowed to operate the website compiler machinery to a few. And of course, you also need to worry about, um… a bunch of other, er… important stuff, like for example… let’s see…

Honestly, I can’t think of anything else. Lock down the web servers and make sure only the right people can compile and post. That shouldn’t be too hard, right?

Static websites aren’t for everybody. They still require a certain amount of developer savoir-faire, and they won’t reduce the need to build genuine web applications for business units. You can’t build a static e-commerce site or anything stateful, for example. But if you are a security-conscious company that just needs an online presence, static websites might be just the thing.

If you disagree, feel let me know in the Comments section below. I’m using Disqus — of course!

Coda: the making of this web page

I became interested in static websites several months ago when I read a few stray articles about the concept. But it wafted past me like so much second-hand smoke; I didn’t really inhale. However, after I did my Predictions webinar in December, I began spending more time digging into the capabilities of “new school” free-ish web service providers such as Heroku and GitHub. At a holiday party, my friend, neighbor and Drupal guru Stephan pointed out that these days it is pretty easy for a motivated developer to assemble a complete app infrastructure more-or-less for free.

A few weeks later, to support one of my professional hobbies, I opened a repository on GitHub. Shortly thereafter, I read a few more articles about static blogging and started connecting the dots. I decided it would be fun to create my own static website to prove the concept. But to make it interesting, I wanted to create something representative of what most people would want. That meant that it needed to have the typical kinds of things you would expect, such as commenting features and social integration. I downloaded and started experimenting with two static blogging packages with a lot of buzz, Nanoc and Jekyll.

Both implement the “website compiler” strategy: you customize some templates, write a few posts in Markdown and then type a few commands to generate the site and upload the contents to a web server. After starting with Nanoc and finding myself a little frustrated, I moved on to Jekyll. I was halfway through my proof-of-concept with Jekyll when I discovered Jekyll Bootstrap, a more kitted and polished version of Jekyll that didn’t have the some-assembly-required feeling. But finally, I discovered Octopress. It too is based on Jekyll, but includes pre-configured support for Google Analytics, Discus, Facebook, Twitter and Google+. In short, exactly what I wanted.

So I got to work getting a feel for the software, started drafting this post, and after about a day or two of after-hours work, things looked good. I needed to find a place to host the blog and decided on GitHub Pages, which is part of my GitHub account. While I was at it, I created Google Analytics and Disqus accounts. All pretty easy to do. Octopress worked pretty nicely once I got over a few self-imposed obstacles. What you see here, on this page, is a totally out-of-the box standard version of OctoPress, with nothing more than a few titles and text properties changed. [Author’s note: as of February 1, 2013, Markerbench is no longer out-of-the box; I now build it using a brilliant Twitter Bootstrap-derived theme created by Adrian Artiles.]

With a little more effort, maybe someday I’ll be able to make something as nice-looking as the Trail of Bits website. One can but dream, no? [Author’s note: it turned out to be a fairly straightforward weekend project.]

As for this blog post: I initially set out to write something very silly about how cool it was to try my hand at this. But the post kept getting longer. Whoops.

“Every time you perform arithmetic operations on ordinal numbers, God kills a kitten”

2008-02-19T00:00:00-05:00

I was reading Rich Beijtlich’s blog today, and came across that quote from a commenter known only as JimmyTheGeek. Wonderfully funny, and spot on.

Passwords-O-Plenty

2008-02-05T00:00:00-05:00

Before the holidays I ran a quick, three-question, survey of the securitymetrics.org mailing list membership about the number of passwords people use. Here are the results, drawn from 51 responses (not bad, considering the list membership is about 400 people). I’d promised the respondents that I’d share the results… so here they are.

Securitymetrics.org Quickie Survey: Online Credentials

1. How many online accounts do you manage, in total? How many “sensitive” accounts do you maintain?

By “account” I mean a public or private website, server or network that you log in to, for which you maintain a password or other credential. For example, a password or application entry in an OS X Keychain could be considered an account.

For purposes of this question, “sensitive accounts” means ones that you would consider problematic if they were compromised. Typically, these could be accounts that keep credit card information, manage your 401k details, or contain employment details.

Results (n=51):

Metric	All accounts	Sensitive accounts
Mean	60.7 accounts	20.6 accounts
Standard deviation	55.0	29.7
Min	3	0
First quartile	23.5	6
Median	40	15
Second quartile	72.5	25
Max	207	207
Mode	40	20

Comments: I draw 3 conclusions from these figures.

First, people have lots of accounts to keep track of — on average.
That said, the quartiles and median show that respondents skew towards the “conservative case” — that is, they most don’t tend to maintain too many accounts. A few crazy outliers (like me) are pushing the average number up.
Third, the ratio of sensitive-to-non-sensitive accounts stays fairly constant across quartiles, ranging from 26-38%. In other words: of all of the account passwords people maintain, it’s a fair bet that about a third of them will be “sensitive.”

I’d also note that the survey base is self-selected — in the sense that it’s the members of this list. Most of us are professional paranoids, right? Not sure if that means that the average user is worse off than the respondent base (more passwords to keep track of) or better off. Regardless, I’d say it does confirm what I already knew: we’re drowning in passwords. Further insights or armchair-psychology comments welcome.

2. What is your primary coping strategy for managing your online accounts?

I keep all of my passwords the same: 10%
I write everything down on paper: 12%
I use a form-filler product, like Apple’s Keychain, and use random passwords 12%
No particular strategy: 20%
Other: 47%

Comments: I can’t draw too many conclusions from the responses to this question, because I asked it badly. Considering that my day job is as an analyst, you’d think I would’ve asked this question in a way that got better answers. :)

3. Do you like the idea of surveying securitymetrics.org members about security practices?

Yes: This is a good idea: 92%
No: I’ve got enough spam as it is: 8%

Comments: Everyone seems to like the idea of surveying the membership more often. Cool! I’ve asked mailing list members to suggest ideas for future surveys.

Note: I’ve proposed that we spend some time on the subject of community-building at this year’s Mini-Metricon at RSA. More on this later… Betsy Nichols is going to put up a blog entry about Mini-Metricon on the website later today.

Retired Comedians and Missed Opportunities

2008-01-31T00:00:00-05:00

There’s this old joke about a comedians’ retirement home that goes something like this:

An aging comedian decides to retire to a community that has just other comedians living in it. On his first day there, he does down to lunch, and there’s a bunch of retired fellow comics sitting around the table.

The conversation they’re having puzzles the man a bit. One of comics at the table yells out, “12!” and everybody just dies laughing. Then another one says, “44!” and a three of them laugh so hard they roll straight out of their chairs and onto the floor.

When a lull in the conversation comes, the new guy introduces himself, and asks, “Hey, what’s going on? What’s so funny about yelling out numbers?”

One of the comics says, “Oh, you’re the new kid on the block, eh? Here’s what’s going on. We’ve all been retired for many years. We’ve been telling and re-telling the same old jokes for so long, we’ve assigned them all numbers. To save time, instead of telling the joke again, we just say the number!”

“Wow,” says the new guy. “I’ve never seen that before. That’s pretty cool. Mind if I join you?”

“Sure,” the other comic says, and beckons him to sit down.

The new guy is eager to fit in. So five minutes later, he yells out, “28!” NOBODY laughs — you could’ve heard a pin drop.

His voice qwavering, the new guy asks, “What’s wrong? Isn’t number 28 a good joke too?”

“Sure it is,” pipes in the other comic. “But it’s all about the delivery!”

I mention this because I can’t stand Jeff Jones’ quarterly festivals of FUD. Rather than complain yet again, and in detail, about how dumb vulnerability-counting is, why the methodology is flawed, why it has limited bearing on security, how the system is easily gamed, why it’s colored by Jeff’s obvious agenda, and why it’s a tragedy that Microsoft does not do what it should, namely mine the world’s most complete bug databases and code repositories for truly compelling information about code quality and application security metrics.

But I won’t do that again. I’m just going to, like these comics, just yell out the shorthand.

“Jeff Jones.”

Note that I’m not laughing.

Markerbench

Metricon X — Opening Remarks

Welcome

Data-driven security took root

“AI” has come to security, with uneven results

Success disasters are great teachers

Controls instrumentation offers terrific bang for the buck

Audience is everything

The Twenty-Year War on Cybercrime

Introduction

Introduce Self

Introduce BAE

The Rise of Digital Crime

Two types of threat actors: nation-states and criminal enterprises

The advantages attackers have over defenders

The Impact of Digital Crime

What can be done

Systems thinking, not silo thinking

Getting the full picture of risk

Scaling up

Conclusions

The risk intelligence mindset

Learning from John Boyd

Result: make customers’ jobs easier

Parting thought

The DevOps Security Handbook: Building Security In With Chef, Part III

Introduction

Generate self-signed SSL certificate

Installing Chef-vault for distributing secrets

Creating an encrypted vault for the SSL certificate and key

Creating a cookbook for configuring SSL

Copying SSL certificates to the server

Testing the webserver

Save your work

Next: Adding custom content

The DevOps Security Handbook: Building Security In With Chef, Part II

Introduction

Tightening the Apache configuration

Creating a new role for server hardening

Adding the base role to the server

Next: Managing SSL certificates and keys

The DevOps Security Handbook: Building Security In With Chef, Part I

Introduction

Getting started

Creating sample server run-lists, roles and environments

Backing up Chef server data

Creating a virtual machine for testing

Bootstrapping the virtual machine with Chef

Next: Adding security to the box

Building Security In Using Chef

Typical Chef workflow

Chef tools for cooking in the kitchen

Implications for security

Key caveats when working with Chef

Alternatives to Chef

New Web Adventures with Heroku

First Look at Stephen Few’s “Information Dashboard Design, Second Edition”

TIA Panel: M2M and Cybersecurity: What does success as an industry look like?

“Everything was green. Mulally thought that was odd for a company losing billions.”

Bully for BlackBerry. But Is It Too Late?

Four Things To Like About Obama's Executive Order on Cyber-Security... and Four to Dislike

Moving securitymetrics.org to Octopress

All Andy's Posts Now on Markerbench

Paving Over the Proprietary Web: The Java Security Bigger Picture

Review of Gene Kim’s novel, “The Phoenix Project”

Outsource your web risks with a static website

A new wrinkle on an old idea

Why not compile your website instead?

Outsourcing risk

Coda: the making of this web page

“Every time you perform arithmetic operations on ordinal numbers, God kills a kitten”

Passwords-O-Plenty

Securitymetrics.org Quickie Survey: Online Credentials

Retired Comedians and Missed Opportunities

Adding the `base` role to the server