September 4th, 2008
Microsoft finally earns a passing grade (barely) for WGA
Microsoft launched its Windows Genuine Advantage (WGA) anti-piracy program in early summer 2006. Its first year was, to put it charitably, a disaster. An epic fail. A big fat F on the year’s report card.
A certain amount of error is inevitable in any activation and registration system, but those numbers were clearly too high when WGA first rolled out. In an interview last week, Microsoft WGA director Alex Kochis tacitly acknowledged that fact, pointing out that “we’ve made major strides in the accuracy of the program” in the past two years.
How bad was it? Users began suffering unpleasant consequences almost immediately, including system failures and false positives that flagged perfectly legitimate Windows copies as “non-genuine.” I wrote about WGA and its problems extensively throughout 2006 and 2007, documenting the extent of the problems. (The complete index of WGA-tagged posts is here.) In August 2006, I performed an exhaustive survey of problem reports from Microsoft’s own WGA support forum and discovered that “42% of the people who experienced problems with WGA and reported those problems to Microsoft’s public forums during that period were actually running Genuine Microsoft Windows.”
There was another wave of failures in October 2006 and the first reports of Vista validation problems appeared in February 2007. I met with managers of the WGA program several times in early 2007 and we discussed how they were responding to these issues. To their credit, they made major changes in support policies, back-end systems, and the online experience. But in August 2007, just as the WGA program appeared to be running smoothly at long last, “human error” caused a WGA server failure, with an estimated 12,000 legitimate customers affected. Most of the glaring bugs in the system had been worked out, as I discovered when I examined forum reports from December 2006 and discovered that the failure rate had dropped from 42% to 22%. That failure rate was still too high to rate anything higher than a D-.
The August 2007 outage inspired a wave of rethinking and re-engineering at Microsoft to ensure that this sort of problem couldn’t happen again, Kochis says. “We needed to think about what the impact to the customer was so that we minimize negative impact on customers. In response, we put in place what we call a ‘circuit breaker.’” According to Kochis, the systems are now monitored continuously in real time, through automated systems and by engineers. “If we detect anything that’s happening in response to our automated and human monitoring, one of the first things we do is evaluate pulling the breaker, which will [respond to] any system that calls in for validation and either use the last validation status for that system or just pass that system for that moment in time.” In effect, any time an anomaly in the system is detected, the result defaults in the customer’s favor, declaring the system “genuine,” at least until the next check.
Page 2: No false positives for Windows XP?
Page 3: Windows Vista is more complicated
Page 4: For 2008, WGA gets a C+
Kochis also says the WGA group has revamped its internal processes to make them more responsive to issues that might affect Windows customers. “We do drills,” he told me, “many, many drills. And we get better every time. We’ve had some real events, too, [although] none have been as significant as the [August 2007] server outage. They’ve been invisible or transparent to end users or customers.” The biggest test of the “circuit breaker” system came in January 2008, when two undersea cables in the Mediterranean were severed, disrupting Internet service over much of the Middle East and Europe, including some of Microsoft’s busiest call centers.
“We learned about it very quickly and later that same day, we had a plan pulled together that would enable us to provide support for customers in a number of different ways. We did whatever we could to reduce call volume at that time. In Egypt, we have a call center that services a number of languages, including those in Europe. So one of the first things we did was have people on airplanes flying [from Egypt] to a call center in Germany so we could redirect phone traffic there and have local language support. Likewise, support calls for Spanish-speaking customers were routed to Latin America.
“Our online activation systems were also affected,” Kochis notes. “We actually pulled the circuit breaker in that situation, so that we would minimize call volume. All systems passed, none failed, until we were ready with our rerouting process.”
If that incident had happened a year earlier, the impact on activation and validation systems would have been catastrophic. With the new systems in place, there was literally no discernible impact. I’ve been monitoring WGA longer and more closely than anyone outside of Microsoft, and in the year since the August 2007 server outage, I have seen no reports of even brief failures in the WGA system. (One report at Ars Technica in July turned out to be a false alarm that shut down the telephone-based activation system for about 90 minutes but left WGA untouched.) That doesn’t mean WGA is working perfectly today. There’s still plenty of room for improvement, as I note in the conclusion of this report.
Back in 2006, the percentage of people affected by WGA failures and glitches was unacceptably high. Microsoft richly earned a big fat F in WGA in its freshman year. And 2007 was only a little better. Although the embarrassing conflicts with third-party software that falsely triggered WGA alerts in its early days had mostly been vanquished, the server outage of August 2007 clearly served as a wake-up call.
So the question is, two years later, has Microsoft finally gotten WGA right? Or at least good enough?
For the answers, I went back to the same rich data source I used in the original August 2006 report and for a follow-up in December 2006: Microsoft’s own WGA support forums. When I did the earlier study, Windows Vista had not yet launched, so all reports involved Windows XP. Today, two years later, there are separate WGA support forums for XP and Vista, and I looked at both of them. Back in 2006, I counted data for a 15-day period, August 1-15, and tallied 137 support requests directly related to product activation, validation, or WGA “non-genuine” messages. For the 2008 version, I used a larger sample, examining every thread on the two WGA forums that was started between August 1 and August 26
Next page: No false positives for Windows XP? –>
Ed Bott is an award-winning technology writer with more than two decades' experience writing for mainstream media outlets and online publications. See his full profile and disclosure of his industry affiliations.
