<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Moserware</title>
    <description></description>
    <link>http://www.moserware.com/</link>
    <atom:link href="http://feeds.feedburner.com/Moserware" rel="self" type="application/rss+xml"/>
    <pubDate>Wed, 21 Oct 2015 13:35:17 +0000</pubDate>
    <lastBuildDate>Wed, 21 Oct 2015 13:35:17 +0000</lastBuildDate>
    <generator>Jekyll v2.4.0</generator>
    
      <item>
        <title>Back from the Future Bugs</title>
        <description>&lt;p&gt;(In honor of this &lt;a href=&quot;http://www.usatoday.com/topic/29c22590-f1d9-4cf5-870a-0b06b1b77218/back-to-the-future/&quot;&gt;quasi-historic day&lt;/a&gt;, I wanted to share my most memorable bug.)&lt;/p&gt;

&lt;p&gt;Ordinary software bugs are merely annoying but straightforward to find and fix. Legendary bugs are the ones that &lt;a href=&quot;https://en.wikipedia.org/wiki/Heisenbug&quot;&gt;actively resist being found&lt;/a&gt; and make you question your sanity for prolonged periods of time. &lt;/p&gt;

&lt;p&gt;In early 2011, I was working on Kaggle’s submission handling code. &lt;a href=&quot;https://www.kaggle.com/&quot;&gt;Kaggle&lt;/a&gt; hosts competitions where the goal is to make the best predictions on a dataset. For example, given an image of a person’s retina, &lt;a href=&quot;https://www.kaggle.com/c/diabetic-retinopathy-detection&quot;&gt;determine if they’re suffering damage from diabetes&lt;/a&gt;. You upload your predictions and they get evaluated against the known (but private) solution in order to compute your score which shows up on a &lt;a href=&quot;https://www.kaggle.com/c/diabetic-retinopathy-detection/leaderboard&quot;&gt;leaderboard&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Long after deploying the submission handling code, a user emailed us saying that his score didn’t appear on the leaderboard. When I went to investigate, I saw his score was indeed on the leaderboard. I tried to reproduce his issue in production and I couldn’t. I painstakingly verified everything in a local debugger session and it worked perfectly without any issue. Given that everything seemed to be working fine, I closed the issue because I couldn’t reproduce it.&lt;/p&gt;

&lt;p&gt;Much time passed without any further problems. &lt;/p&gt;

&lt;p&gt;And then, it happened again: another user reported that their score didn’t appear on the leaderboard. This time I was convinced that there was some genuine edge case causing the issue. I once again carefully went through the whole submission process in a debugger and couldn’t replicate it. At this point, I was sure it was some weird database issue. Perhaps some obscure exception was thrown that caused the database to never update. I added some checks and exception handling code and the problem seemed to go away.&lt;/p&gt;

&lt;p&gt;But in June of 2014, it came back with a vengeance. A new developer on the team was handling support requests at the time and received several reports of submissions not showing up on the leaderboard. When he explained the bug report, I was immediately haunted by memories of it. I recounted the mystery of this bug and how I couldn’t reproduce the issue. In frustration, I suggested that he add code to manually force an update to the leaderboard to workaround this bug.&lt;/p&gt;

&lt;p&gt;I was disgusted by my own suggestion. It wasn’t a fix, it was duct-taping around the issue. But soon after offering this advice I had the forehead-slapping moment: &lt;strong&gt;different clocks can have different time&lt;/strong&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/back-from-the-future-bugs/DifferentServerTimes.png&quot; width=&quot;720&quot; height=&quot;316&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It’s embarrassingly obvious in hindsight. &lt;/p&gt;

&lt;p&gt;Here’s what happened:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;When the submission was created, I set its status to “pending” and set its timestamp to the current time &lt;em&gt;of the web server&lt;/em&gt;. &lt;/li&gt;
  &lt;li&gt;Another machine dequeued the submission, calculated its score, and then sent the result back.&lt;/li&gt;
  &lt;li&gt;The web server then updated the status of the submission which caused a stored procedure in the database to update the leaderboard based on all the submissions up to the current time &lt;em&gt;of the database server&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The reason why I could never reproduce this problem locally was because my local web server and database server were on the same physical machine with the same clock, so the results were always consistent. Further, most of the time the clocks on the web server and database server in production were carefully synchronized over the network to within a few milliseconds of each other. &lt;/p&gt;

&lt;p&gt;This bug only surfaced when:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The database server’s clock had drifted a second or two &lt;em&gt;behind&lt;/em&gt; the web server’s clock.&lt;/li&gt;
  &lt;li&gt;The submission processing time took less time than the amount of drift between the two machines.&lt;/li&gt;
  &lt;li&gt;The database recalculated the leaderboard based on all submissions up to &lt;em&gt;its current time&lt;/em&gt; and thus ignored the submission that was in its perceived future.&lt;/li&gt;
  &lt;li&gt;No subsequent submissions happened that would have forced another leaderboard recalculation based on the then current time.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once I understood it was a clock drift bug, there was very simple fix: always use the same clock. In this case, we chose to always use the same database clock and haven’t had a problem with it in well over a million submissions since then.&lt;/p&gt;

&lt;p&gt;In hindsight, it was a distributed systems rookie mistake. I had painfully rediscovered &lt;a href=&quot;https://en.wikipedia.org/wiki/Segal%27s_law&quot;&gt;Segal’s law&lt;/a&gt;: “A man with a watch knows what time it is. A man with two watches is never sure.”&lt;/p&gt;

&lt;p&gt;In addition to the simple solution we chose of always using the same physical clock, you can get around this problem by using fancier techniques like &lt;a href=&quot;https://en.wikipedia.org/wiki/Vector_clock&quot;&gt;vector clocks&lt;/a&gt; or TrueTime in Google’s &lt;a href=&quot;http://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf&quot;&gt;Spanner database&lt;/a&gt; that uses GPS synchronized clocks to carefully keep track of the uncertainty of the current time in order to provide transactional consistency across the planet.&lt;/p&gt;

&lt;p&gt;As we increasingly write software that executes across multiple machines, it’s important to have &lt;em&gt;some&lt;/em&gt; strategy for handling clock drift (even if it’s less than a second). Because, if you don’t, you too might be bitten by bugs caused from data that’s coming back… from the future.&lt;/p&gt;

</description>
        <pubDate>Wed, 21 Oct 2015 16:29:00 +0000</pubDate>
        <link>http://www.moserware.com/2015/10/back-from-the-future-bugs.html</link>
        <guid isPermaLink="true">http://www.moserware.com/2015/10/back-from-the-future-bugs.html</guid>
        
        
      </item>
    
      <item>
        <title>Life, Death, and Splitting Secrets</title>
        <description>&lt;p&gt;(&lt;strong&gt;Summary&lt;/strong&gt;: I created &lt;a href=&quot;https://github.com/moserware/SecretSplitter&quot;&gt;a program&lt;/a&gt; to help back up important data like your master password in case something happens to you. By splitting your secret into pieces, it provides a circuit breaker against a single point of failure. I’m giving it away as a free open source program with the hope that others might find it useful in addressing this aspect of our lives. Feel free to &lt;a href=&quot;https://github.com/moserware/SecretSplitter/releases/latest&quot;&gt;use the program&lt;/a&gt; and follow along with just the screenshots below or read all sections of this post if you want more context.)&lt;/p&gt;

&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;

&lt;p&gt;I just couldn’t do it.&lt;/p&gt;

&lt;p&gt;&lt;img alt=&quot;Grandma and Jeff&quot; title=&quot;Grandma and Jeff&quot; src=&quot;/assets/life-death-and-splitting-secrets/Grandma_and_Jeff_320.jpg&quot; align=&quot;right&quot; style=&quot;border:0; margin: 0px 0px 15px 15px; display: inline&quot; /&gt;My grandma died at this time last year from a stroke. She was a great woman. I still miss her. In that emotional last week, I was reminded of great memories with her and the fragility of life. I was also reminded about important documents that I still didn’t have.&lt;/p&gt;

&lt;p&gt;When something happens to you, be it death or incapacitation, there are some important steps that need to occur that can be greatly assisted by legal documents. For example:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;An &lt;a href=&quot;http://en.wikipedia.org/wiki/Advance_health_care_directive&quot;&gt;advance health care directive&lt;/a&gt; (aka “Living Will”) specifies what actions should (or shouldn’t) be taken with regards to your healthcare if you’re no longer able to make decisions for yourself.&lt;/li&gt;
  &lt;li&gt;A &lt;a href=&quot;http://en.wikipedia.org/wiki/Power_of_attorney#Durable_power_of_attorney&quot;&gt;durable power of attorney&lt;/a&gt; allows you to designate someone to legally act as you if you become incapacitated.&lt;/li&gt;
  &lt;li&gt;A &lt;a href=&quot;http://en.wikipedia.org/wiki/Will_(law)&quot;&gt;last will and testament&lt;/a&gt; allows you to legally assign caregivers for &lt;a href=&quot;http://en.wikipedia.org/wiki/Minor_(law)&quot; title=&quot;Typically 18 and younger.&quot;&gt;minor children&lt;/a&gt; as well as designate where you’d like your possessions to go.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;My grandma had these and it helped reduce stress and anxiety in this difficult time. We knew what she would have wanted and these documents helped legally enforce that.&lt;/p&gt;

&lt;p&gt;I had assumed that these documents were expensive and time-consuming to create. Furthermore, as a guy in my 20’s, death still seems like &lt;a href=&quot;http://quotationsbook.com/quote/10024/&quot; title=&quot;“Death is a distant rumor to the young.” - Andy Rooney (1919 - 2011)&quot;&gt;a distant rumor&lt;/a&gt;. As a Christian, I’m &lt;a href=&quot;http://www.biblegateway.com/passage/?search=Philippians%201:21&amp;amp;version=ESV&quot;&gt;not overly concerned&lt;/a&gt; &lt;a href=&quot;http://www.biblegateway.com/passage/?search=1%20Corinthians%2015:54-57&amp;amp;version=ESV&quot;&gt;about death itself&lt;/a&gt;, but my grandma’s death reminded me that these documents are not really for me, but rather the people I’d leave behind. I knew that if something happened to me, I’d potentially be leaving behind a mess, and that concern of irresponsibility compelled me to investigate what I could do.&lt;/p&gt;

&lt;p&gt;It turns out that creating these documents is essentially a matter of filling out a form template. I &lt;a href=&quot;http://www.amazon.com/gp/product/B004DLCQZ4/ref=as_li_ss_tl?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=217145&amp;amp;creative=399369&amp;amp;creativeASIN=B004DLCQZ4&quot; title=&quot;I used Quicken Willmaker 2011 Premium edition. I liked the premium edition because it came with a lot of extra books that made for fun reading. The 2012 version will probably be available soon.&quot;&gt;bought a program&lt;/a&gt; that made it about as easy as preparing taxes online. In most cases, you just need disinterested third parties, such as friends or coworkers, to witness you signing them to make them fully legal. At most, you might have to get them notarized or filed in your county for a small fee.&lt;/p&gt;

&lt;p&gt;One of the steps involved in filling out the “Information for Caregivers and Survivors” document is to list “&lt;a href=&quot;http://www.nolo.com/legal-encyclopedia/help-executor-secured-places-passwords-29669.html&quot;&gt;Secured Places and Passwords&lt;/a&gt;.” It’s a helpful section that your &lt;a href=&quot;http://en.wikipedia.org/wiki/Executor&quot;&gt;executor&lt;/a&gt; can turn to if something happened to you in order to do things like unlock your cell phone or access your online accounts. Sure, your survivors might be able use legal force to get access without it, but only after months of &lt;a href=&quot;https://mail.google.com/support/bin/answer.py?answer=14300&quot;&gt;sending official documentation&lt;/a&gt;. That’s a lot of hassle to put someone through. Also, it’s very likely that a lot of important things will be missed and no one would ever know they existed.&lt;/p&gt;

&lt;p&gt;It’s &lt;a href=&quot;http://research.microsoft.com/apps/pubs/?id=80436&quot; title=&quot;“The Rational Rejection of Security Advice by Users” provides some interesting counterpoints to security advice out there.&quot;&gt;probably rational&lt;/a&gt; to just write your passwords down and put them in a safe which your executor knows the location of and can access in a timely matter. Alternatively, you could pay for an attorney or a &lt;a href=&quot;http://mashable.com/2010/10/11/social-media-after-death/&quot;&gt;third-party service&lt;/a&gt; and leave your password list with them. However, this seemed like it would cause a maintenance problem, especially as I might add or update my passwords frequently. These options would also force me to trust someone I haven’t known for a long time. Most importantly, the thought of writing down my passwords on a piece of paper, even if it was in a relatively safe place, went against every fiber of my security being.&lt;/p&gt;

&lt;p&gt;I just couldn’t do it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DISCLAIMER&lt;/strong&gt;: The above simple approaches are probably fine and have worked for a lot of people over the years. &lt;strong&gt;If you’re comfortable with these basic approaches, by all means use them and ignore this post&lt;/strong&gt;. These simpler approaches have less moving parts and are easy to understand. However, if you want a little more security, or need to liven up this process with a little spy novel-esque fun, read on.&lt;/p&gt;

&lt;h2 id=&quot;the-modern-password--encryption-problem&quot;&gt;The Modern Password &amp;amp; Encryption Problem&lt;/h2&gt;

&lt;p&gt;As an online citizen, you don’t want to be that person. You know, the one whose password was so &lt;a href=&quot;http://blogs.wsj.com/digits/2010/12/13/the-top-50-gawker-media-passwords/&quot; title=&quot;If nothing else, promise me that none of your passwords are on this list!&quot;&gt;easy to guess&lt;/a&gt; that his email account was broken into and who “&lt;a href=&quot;http://www.nbclosangeles.com/news/tech/Email-Scams-83600577.html&quot;&gt;wrote&lt;/a&gt;” to you saying that he decided to go to Europe on a whim this past weekend but now needs you to wire him money right now and he’ll explain everything later: &lt;em&gt;that&lt;/em&gt; guy.&lt;/p&gt;

&lt;p&gt;You’ve learned that passwords like “thunder”, “thunder56”, and even “L0u|&amp;gt;Thund3r” are terrible because they’re &lt;a href=&quot;http://www.wired.com/politics/security/commentary/securitymatters/2007/01/72458?currentPage=all&quot; title=&quot;Password recovery tools are pretty good these days.&quot;&gt;easily guessed&lt;/a&gt;. You now know that the most important aspect of a password is its &lt;a href=&quot;http://xkcd.com/936/&quot; title=&quot;“correct horse battery staple” is a start, but character variation and padding help a lot&quot;&gt;length&lt;/a&gt; combined with &lt;a href=&quot;https://www.grc.com/haystack.htm&quot; title=&quot;Steve Gibson’s Password Haystacks page is worth at least a quick glance.&quot;&gt;basic padding and character variation&lt;/a&gt; such as “/* Thunder is coming! */”, “I hear &lt;em&gt;thunder&lt;/em&gt;!”, or “1.big.BOOM@thunder.mil”.&lt;/p&gt;

&lt;p&gt;In fact, you’re probably clever enough that you don’t create or remember most of your passwords anymore. You use a &lt;a href=&quot;http://en.wikipedia.org/wiki/Password_manager&quot;&gt;password manager&lt;/a&gt; like &lt;a href=&quot;https://lastpass.com/&quot;&gt;LastPass&lt;/a&gt; or &lt;a href=&quot;http://keepass.info/&quot;&gt;KeePass&lt;/a&gt; to automatically generate and store unique and completely random passwords for all of your accounts. This has simplified your life so that you only have to remember your “master password” that will get you into where you keep all the rest of your usernames and passwords.&lt;/p&gt;

&lt;p&gt;&lt;img alt=&quot;Skeleton Key&quot; height=&quot;320&quot; src=&quot;/assets/life-death-and-splitting-secrets/450px-Llave_bronce%5b1%5d_320.jpg&quot; align=&quot;left&quot; style=&quot;border:0; margin: 0px 15px 15px 0px; display: inline&quot; width=&quot;240&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You also understand that your email account credentials are a “&lt;a href=&quot;http://www.codinghorror.com/blog/2008/06/please-give-us-your-email-password.html&quot; title=&quot;It was especially sad in the Web’s early day when so many sites asked for your email login to effectively spam your contacts. It’s just inexcusable that some sites still do today.&quot;&gt;skeleton key&lt;/a&gt;” for almost everything else due to the widespread use of simple password reset emails. For this very reason, you probably realize that it’s critical to &lt;a href=&quot;http://googleblog.blogspot.com/2011/06/ensuring-your-information-is-safe.html&quot; title=&quot;If you do use Gmail, really consider enabling this for your own safety and to prevent yourself from being *that* guy.&quot;&gt;protect your email login with “two-factor” authentication&lt;/a&gt;. That is, your email account should at least be protected by:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Something you know (your password) &lt;em&gt;and&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;Something you have (your cellphone), that creates or receives a one-time use code when you want to login.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;On top of all of this, you try your best to follow the trusty advice that your passwords should be ones that nobody could guess and you never ever &lt;a href=&quot;http://www.schneier.com/blog/archives/2005/06/write_down_your.html&quot; title=&quot;Actually, it’s probably reasonable to write them down in keep them in your wallet&quot;&gt;write them&lt;/a&gt; &lt;a href=&quot;http://blog.jgc.org/2010/12/write-your-passwords-down.html&quot;&gt;down&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But what if something happens to you? If you’ve done everything “right,” then your master password and all your second factor details go with you.&lt;/p&gt;

&lt;p&gt;And then there are your encrypted files. Maybe you’re keeping a &lt;a href=&quot;http://www.youtube.com/watch?feature=player_embedded&amp;amp;v=R4vkVHijdQk&quot; title=&quot;You could use an encrypted journal or a separate email account. I wonder if “dear.sophie.lee@gmail.com” had two-factor authentication enabled on it. I mean, what happens if she writes back too early?&quot;&gt;private journal&lt;/a&gt; for your children to read when they grow up. Perhaps you’re living in some spy novel life where you’re worried that people will take you out to prevent something you know from being discovered. Wherever you fall on the spectrum, what do you do with such encrypted data?&lt;/p&gt;

&lt;p&gt;Modern encryption is a bit scary because it’s so good. If you use a decent encryption program with a good password/key, then it’s very likely that no one, &lt;a href=&quot;http://www.extremetech.com/computing/105931-full-disk-encryption-is-too-good-says-us-intelligence-agency&quot;&gt;not even a major government&lt;/a&gt;, could decrypt the file even after hundreds of years. Encryption is great for keeping prying eyes out, but it could sadden survivors that you want to have access to your data. The thought of something being lost forever might make you almost yearn for the days when you just put everything into a good safe that’s rated by how many &lt;a href=&quot;http://en.wikipedia.org/wiki/Safe#Class_TL-15&quot; title=&quot;For example, a TL-15 safe will resist abuse for about 15 minutes from people who know what they’re doing.&quot;&gt;minutes&lt;/a&gt; it might slow somebody down.&lt;/p&gt;

&lt;p&gt;On a much lighter note, the “something” that happens to you doesn’t have to be so grim. Maybe you had a really relaxing three week vacation and now you can’t remember the exact keyboard combination of your password. Given that our brains have to &lt;a href=&quot;http://www.radiolab.org/2007/jun/07/eternal-sunshine-of-the-spotless-rat/&quot; title=&quot;Start listening at 16:45 to find out more about this interesting idea.&quot;&gt;recreate memories each time you recall something&lt;/a&gt;, it’s possible that you could stress yourself out so much trying to remember your password that you effectively “forget” it. What do you do then?&lt;/p&gt;

&lt;p&gt;When you put all your eggs into a password manager basket, you really want to &lt;a href=&quot;http://herbison.com/herbison/broken_eggs_watch.html&quot; title=&quot;Whether it was Carnegie or Twain, the phrase “Put all your eggs in one basket and --- WATCH THAT BASKET!” is some good advice.&quot;&gt;watch that basket&lt;/a&gt;. Fortunately, creating a basic plan isn’t that hard.&lt;/p&gt;

&lt;h2 id=&quot;a-proposed-solution&quot;&gt;A Proposed Solution&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Permissive_Action_Link&quot;&gt;&lt;img alt=&quot;Example nuclear launch keys&quot; height=&quot;320&quot; src=&quot;/assets/life-death-and-splitting-secrets/Nuclear_missile_launch_keys%5b1%5d_320.jpg&quot; align=&quot;right&quot; style=&quot;border:0; margin: 0px 0px 15px 15px; display: inline&quot; width=&quot;212&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s borrow an &lt;a href=&quot;http://www.biblegateway.com/passage/?search=Numbers%2035:30&amp;amp;version=ESV&quot; title=&quot;For example, the 2-3 witnesses concept appears several times in the Bible.&quot;&gt;ancient&lt;/a&gt; yet incredibly useful idea: if it’s really important to get your facts right about something, be sure to have at least two or three witnesses. This is especially true concerning matters of life and death but it also comes up when protecting really valuable things.&lt;/p&gt;

&lt;p&gt;By the 20th century, this “&lt;a href=&quot;http://en.wikipedia.org/wiki/Two-man_rule&quot;&gt;two-man rule&lt;/a&gt;” was implemented in hardware to protect nuclear missiles from being launched by a lone rogue person without proper authorization. The main vault at &lt;a href=&quot;http://en.wikipedia.org/wiki/United_States_Bullion_Depository#Construction_and_security&quot; title=&quot;Also known as the “United States Buillion Depository”&quot;&gt;Fort Knox&lt;/a&gt; is locked by multiple combinations such that no single person is entrusted with all of them. On the Internet, the master key for protecting the new secure domain name system (&lt;a href=&quot;http://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions&quot;&gt;DNSSEC&lt;/a&gt;) &lt;a href=&quot;http://www.schneier.com/blog/archives/2010/07/dnssec_root_key.html&quot;&gt;is split between among 7 people from 6 different countries&lt;/a&gt; such that at least 5 people are needed to reconstruct it in the event of an Internet catastrophe.&lt;/p&gt;

&lt;p&gt;If this idea is good enough for protecting nuclear weapons, the Fort Knox vault, and one of the most critical security aspects on the Internet, it’s probably good enough for your password list. Besides, it can make a somewhat uncomfortable process a little more fun.&lt;/p&gt;

&lt;p&gt;Let’s start with a simple example. Let’s say that your master password is “1.big.BOOM@thunder.mil”. You could just write it out on a piece of paper and then use scissors to cut it up. This would work if you wanted to split it among 2 people, but it has some notable downsides:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;It doesn’t work if you want redundancy (i.e. any 2 of 3 people being able to reconstruct it)&lt;/li&gt;
  &lt;li&gt;Each piece would tell you something about the password and thus has value on its own. Ideally, we’d like the pieces to be worthless unless a threshold of people came together.&lt;/li&gt;
  &lt;li&gt;It doesn’t really work for more complicated scenarios like requiring 5 of 7 people.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Fortunately, some clever math can fix these issues and give you this ability for free. I created a program called &lt;a href=&quot;https://github.com/moserware/SecretSplitter&quot;&gt;SecretSplitter&lt;/a&gt; to automate all of this to hopefully make the whole process painless.&lt;/p&gt;

&lt;p&gt;Let’s say you want to require at least 2 witnesses to agree that something happened to you before your secret is available. You also want to build in redundancy such that &lt;em&gt;any&lt;/em&gt; pair of people can find out your password. For this scenario, you keep the can use the default settings and press the “split” button:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/SplitMessageSpecifyMessageThresholdAndShares.png&quot; title=&quot;Specify the message “1.big.BOOM@thunder.mil”&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/SplitMessageSpecifyMessageThresholdAndShares_576.png&quot; alt=&quot;Specifying message&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You’ll get this list of split pieces:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/SplitMessageShares.png&quot; title=&quot;List of message shares&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/SplitMessageShares_576.png&quot; alt=&quot;List of message shares&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice that each piece is twice as long as your original message (about twice the size of a package tracking number). This is by design.&lt;/p&gt;

&lt;p&gt;Now comes the hard part: you have to select three people you trust. You should have high confidence in anyone you’d entrust with a secret piece. It’s easy to get caught up in &lt;a href=&quot;http://xkcd.com/538/&quot;&gt;gee-whiz cryptography&lt;/a&gt; and miss fundamentals: you ultimately have to &lt;a href=&quot;http://cm.bell-labs.com/who/ken/trust.html&quot; title=&quot;“Reflections on Trusting Trust” is a fascinating read about the fundamentals of security.&quot;&gt;trust something&lt;/a&gt;, especially with important matters. SecretSplitter provides a trust circuit breaker just in case (because even well-meaning people can &lt;a href=&quot;http://abcnews.go.com/WN/president-bill-clinton-lost-nuclear-codes-office-book/story?id=11930878&quot; title=&quot;Like the nuclear biscuit&quot;&gt;lose&lt;/a&gt; &lt;a href=&quot;http://www.theatlantic.com/politics/archive/2010/10/why-clintons-losing-the-nuclear-biscuit-was-really-really-bad/65009/&quot; title=&quot;Thankfully it wasn’t needed&quot;&gt;important things&lt;/a&gt;). The splitting process adds a bit of complexity, but so do real circuit breakers. If you trust no one, then you can’t have anyone help you if something happens.&lt;/p&gt;

&lt;p&gt;For demonstration purposes, let’s say you trust 3 people.&lt;/p&gt;

&lt;p&gt;You now have to distribute these secret pieces. You could do all sorts of clever things like &lt;a href=&quot;http://www.hulu.com/watch/24493/back-to-the-future-part-ii-letter-from-doc&quot; title=&quot;Like Doc did to Marty in “Back to the Future III”&quot;&gt;send letters to people that will be delivered far in the future&lt;/a&gt; or read them over the phone. However, distributing them in person is a pretty good option:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/CreateShareEnvelope.JPG&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/CreateShareEnvelope_576.JPG&quot; alt=&quot;Creating an envelope with a share&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It can make the upcoming holiday table discussions even more fun:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/ShareHandoff.JPG&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/ShareHandoff_576.JPG&quot; alt=&quot;Handing over the envelope with the secret piece&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s pretend that something happened to you. Two of the three family members that you gave pieces to would come together and agree that “something” indeed has happened to you. What happens now?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/TwoEnvelopesOpened.JPG&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/TwoEnvelopesOpened_576.JPG&quot; alt=&quot;Two opened envelopes with secret shares&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Well, either you included a note with each secret piece or you emailed them previously with instructions that they’d just need to download and run this small program. The pair comes together at a laptop and they each type their piece in quickly and then press “Recover”:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoverMessageWithTypo.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoverMessageWithTypo_576.png&quot; alt=&quot;Typing in secret shares with a typo&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Oops… they typed so quickly that they mixed up one of the digits. It told us where to look:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoverMessageTypoWarning.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoverMessageTypoWarning_576.png&quot; alt=&quot;Warning about typo&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They fix the typo and press recover again:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoverMessageTypoFixed.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoverMessageTypoFixed_576.png&quot; alt=&quot;Fixed the typo&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And immediately they see:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoveredMessage.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoveredMessage_576.png&quot; alt=&quot;Recovered message&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Password recovered! They could now use this master password to log into your password manager where you’ve stored further details.&lt;/p&gt;

&lt;p&gt;This “message” approach is useful if you have a small amount of data such as a password that you could write on a piece of paper. One downside is that each piece is twice the size of the text message. If your message becomes much larger then it will no longer be feasible to type it in manually.&lt;/p&gt;

&lt;p&gt;One alternative approach is to bundle together all of your important files into a zip file:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/CompressedFileExample.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/CompressedFileExample_576.png&quot; alt=&quot;Example of a compressed file contents&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To split this file, you’d click the “Create” tab and then find the file, set the number of shares and click “Save”:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/SplitFileSpecifyFileAndShares.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/SplitFileSpecifyFileAndShares_576.png&quot; alt=&quot;Splitting up a file&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You’ll then be told:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/SplitFileSaveMessageBox.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/SplitFileSaveMessageBox_576.png&quot; alt=&quot;MessageBox asking you to save the file&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And then you pick where to save the encrypted file:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/SplitFileSaveDialog.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/SplitFileSaveDialog_576.png&quot; alt=&quot;Save file dialog&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, you’ll see this screen:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/SplitFileShares.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/SplitFileShares_576.png&quot; alt=&quot;Split file pieces&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This creates a slightly more complicated scenario because you now have 2 things to share: the secret pieces and the encrypted file with all your data. The encrypted file doesn’t have to be secret at all. You can safely email it to people that have a secret piece:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/SplitFileEmail.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/SplitFileEmail_576.png&quot; alt=&quot;Sending the fun email&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, if something happens to you, they’d run the program, and type in two shares and press “Recover”:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoverFileShares.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoverFileShares_576.png&quot; alt=&quot;Entering in file shares&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’ll then tell them:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoverFileSpecifyEncryptedFileMessageBox.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoverFileSpecifyEncryptedFileMessageBox_576.png&quot; alt=&quot;Specify encrypted file MessageBox&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They’d then go to their email and search for the email from you that includes your encrypted file:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoverEmailSearch.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoverEmailSearch_576.png&quot; alt=&quot;Searching email&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then they’d find the single message (or the latest one if you sent out updates) and download your encrypted attachment:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoverEmailFound.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoverEmailFound_576.png&quot; alt=&quot;Found email&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They’d then go back to the program to open it up:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoverFileOpen.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoverFileOpen_576.png&quot; alt=&quot;Opening the file&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and then they’d see a message to be careful where they saved it:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoverFileSafetyWarning.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoverFileSafetyWarning_576.png&quot; alt=&quot;Will you keep the data safe?&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and then they’d save it:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/SaveDecryptedFile.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/SaveDecryptedFile_576.png&quot; alt=&quot;Save decrypted&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They’d then be asked if they want to open the decrypted file, which they’d say “Yes”:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/RecoverFileOpenDecryptedFileMessageBox.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/RecoverFileOpenDecryptedFileMessageBox_576.png&quot; alt=&quot;Open decrypted file?&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now they can see everything:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/CompressedFileExample.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/CompressedFileExample_576.png&quot; alt=&quot;Example of a compressed file contents&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It might sound complicated, but if you’re familiar with the process, it might only take a minute. If you’re not tech savvy and have never done it before and type slowly, it might take 30 minutes. In either case, it’s faster than having to drive to your home and search around for a folder and it contains everything you wanted people to know (especially when things are time sensitive).&lt;/p&gt;

&lt;p&gt;That’s it! Your master password and important data are now backed up. The risk is distributed: if any one piece is compromised (i.e. gets lost or misplaced), you can have everyone else destroy their secret piece and nothing will be leaked. Also, the program has an advance feature that lets you save the file encryption key. This feature allows you to send out updated encrypted files that can be decrypted with the pieces you’ve already established in person.&lt;/p&gt;

&lt;p&gt;SecretSplitter implements a “(t,n) &lt;a href=&quot;http://en.wikipedia.org/wiki/Threshold_cryptosystem&quot;&gt;threshold cryptosystem&lt;/a&gt;” which can be thought of as a mathematical generalization of the physical two-man rule. The idea is that you split up a secret into pieces (called “shares”) and require at least a threshold of “t” shares to be present in order to recover the secret. If you have less than “t” shares, you gain no information about the secret. Whatever threshold you use, it’s really important that each “shareholder” know the threshold number of shares.&lt;/p&gt;

&lt;p&gt;You can be quite creative in setting the threshold and distributing shares. For example, you can trust your spouse more by giving her more shares than anyone else. The key idea is that &lt;strong&gt;a share is an atomic unit of trust&lt;/strong&gt;. You can give more than one unit of trust to a person, but you can never give less.&lt;/p&gt;

&lt;p&gt;Another important practical concern is that you should consider adding redundancy to any threshold system. This is easily achieved by creating more shares than the threshold number. The reason is that if you’re going out of your way to use a threshold system, then you probably want to make sure you have a backup plan in case one or more of the shares are unavailable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IMPORTANT LEGAL NOTE&lt;/strong&gt;: It’s tempting to keep everything, including the important directives and your will in only electronic form (even when they’re signed). Unfortunately, most states require the original signed documents to be considered legal and most courts will not accept a copy. For this reason, you should still have the paper originals somewhere such as a fireproof safe. However, be careful where you put the originals: although it might sound convenient to put them in a bank safety deposit box, there’s usually a rather long waiting period before a before a bank can legally provide access to your box to a survivor, so don’t put any time sensitive items there. My recommendation at the current time would be to include copies of the signed originals in your encrypted file and also include detailed instructions on where the originals are located and how to access them.&lt;/p&gt;

&lt;h2 id=&quot;how-it-works&quot;&gt;How It Works&lt;/h2&gt;

&lt;p&gt;Given the sensitive nature of the data being protected, I wanted to make sure I understood every part of the mathematics involved and literally every bit of the encrypted file. You’re more than welcome to just use the program without fully understanding the details, but I encourage people to verify my math and code if you’re able and curious.&lt;/p&gt;

&lt;p&gt;To get started, recall that computers work with &lt;a href=&quot;http://en.wikipedia.org/wiki/Bit&quot;&gt;bits&lt;/a&gt;: 1’s and 0’s that can represent anything. For example, the &lt;a href=&quot;http://en.wikipedia.org/wiki/UTF-8&quot;&gt;most popular way of encoding text&lt;/a&gt; will encode “thunder” in binary as&lt;/p&gt;

&lt;p&gt;01110100 01101000 01110101 01101110 01100100 01100101 01110010&lt;/p&gt;

&lt;p&gt;We can write this more efficiently using &lt;a href=&quot;http://en.wikipedia.org/wiki/Hexadecimal&quot;&gt;hexadecimal&lt;/a&gt; notation as: 74 68 75 6E 64 65 72. We can also treat this entire sequence of bits as a single 55 bit number whose decimal representation just happens to be 32,765,950,870,971,762. In fact, &lt;em&gt;any&lt;/em&gt; piece of data can be converted to a single number.&lt;/p&gt;

&lt;p&gt;Now that we have a single number, let’s go back to your algebra class and remember the equation for a &lt;a href=&quot;http://en.wikipedia.org/wiki/Line_(geometry)&quot;&gt;line&lt;/a&gt;:  y=mx+b.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/LineShowingIntercept.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/LineShowingIntercept_576.png&quot; alt=&quot;Line showing intercept&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this equation, “b” is the “&lt;a href=&quot;http://en.wikipedia.org/wiki/Y-intercept&quot;&gt;y-intercept&lt;/a&gt;”, which is where the line crosses the y-axis. The “m” value is the &lt;a href=&quot;http://en.wikipedia.org/wiki/Slope&quot;&gt;slope&lt;/a&gt; and represents how steep the line is (i.e. its “&lt;a href=&quot;http://en.wikipedia.org/wiki/Grade_(slope)&quot;&gt;grade&lt;/a&gt;” if it were a hill).&lt;/p&gt;

&lt;p&gt;This is all the core math you need to understand splitting secrets. In our particular case, our secret message is always represented by the y-intercept (i.e. “b” in y=mx+b). We want to create a line that will go through this point. Recall that a line could go through this point at any angle. The slope (i.e. “m” in y=mx+b) will direct us where it goes. For things to work securely, the slope must be a random number.&lt;/p&gt;

&lt;p&gt;Although we use large numbers in practice for security reasons, let’s keep it simple here. Let’s say our secret number is “7” and our random slope is “3.” These choices generate this line:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/Line3xp7.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/Line3xp7_576.png&quot; alt=&quot;y=3x+7&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this equation, we can generate an infinite number of points on the line. For example, we can pick the first three points: (1, 10), (2, 13), and (3, 16):&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/Line3points.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/Line3points_576.png&quot; alt=&quot;3 points&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see that if you had any two of these points, you could find the y-intercept.&lt;/p&gt;

&lt;p&gt;It’s critical to realize that having just one of these points gives us no useful information about the line. However, having any other point on the line would allow us to use a ruler and draw a straight line to the y-intercept and thus reveal the secret (we could also work it out algebraically). Each point represents a secret piece or “share” and has a unique “x” and “y” value.&lt;/p&gt;

&lt;p&gt;The mathematically fascinating part about this idea is that a line is just a simple &lt;a href=&quot;http://en.wikipedia.org/wiki/Polynomial&quot;&gt;polynomial&lt;/a&gt; (curve) and this technique works for polynomials of arbitrarily large &lt;a href=&quot;http://en.wikipedia.org/wiki/Polynomial#Degree&quot;&gt;degrees&lt;/a&gt;. For example, a second degree polynomial is a &lt;a href=&quot;http://en.wikipedia.org/wiki/Parabola&quot;&gt;parabola&lt;/a&gt; that requires 3 unique points to completely define it (one more than a line). Its equation is of the form y=ax^2 + bx + c. In our case “c” is the y-intercept and “a” and “b” are random as in y = 2x^2 + 3x + 7:&lt;/p&gt;

&lt;p&gt;Given this equation, we can generate as many “shares” as we’d like: (1,12), (2,21), (3,34), (4,51), etc.&lt;/p&gt;

&lt;p&gt;Keep in mind that a parabola requires three points to uniquely define it. If you just had two points, as in (1,12) and (2,21), you could create an infinite number of parabolas going through these points and thus have infinite choices for what the y-intercept (i.e. your secret) could be:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/Parabola6Curves.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/Parabola6Curves_576.png&quot; alt=&quot;6 parabolas going through the same two points&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, a third point will define the parabola and its y-intercept exactly:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/ParabolaSingleCurve.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/ParabolaSingleCurve_576.png&quot; alt=&quot;Unique parabola&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You’ve just learned that splitting a secret that requires three people is just a matter of creating a parabola. Requiring more people is just a matter of creating a higher-degree polynomial such as a &lt;a href=&quot;http://en.wikipedia.org/wiki/Cubic_function&quot;&gt;cubic&lt;/a&gt; or &lt;a href=&quot;http://en.wikipedia.org/wiki/Quartic_function&quot;&gt;quartic&lt;/a&gt; polynomial. If you understand this basic idea, the rest is just details:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Instead of using numbers, we translate the data to a big polynomial &lt;a href=&quot;http://en.wikipedia.org/wiki/GF(2)&quot;&gt;with binary coefficients&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Instead of using middle school algebra, we use a “&lt;a href=&quot;http://en.wikipedia.org/wiki/Finite_field&quot;&gt;finite field&lt;/a&gt;.” This helps keep results about the same size as the input and adds some security.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Don’t be intimidated by these changes. The core ideas are the same as the basic case. The only noticeable difference is that you have to think of operations like multiplication and division in a more abstract way. For details, check out my source code’s use of &lt;a href=&quot;https://github.com/moserware/SecretSplitter/blob/1b54b72a87d4bdcc5c84b12b36f17fca382d551d/SecretSplitter/Algebra/FiniteFieldPolynomial.cs#L40&quot;&gt;Horner’s scheme&lt;/a&gt; for evaluating polynomials, &lt;a href=&quot;https://github.com/moserware/SecretSplitter/blob/1b54b72a87d4bdcc5c84b12b36f17fca382d551d/SecretSplitter/Algebra/FiniteFieldPolynomial.cs#L63&quot;&gt;peasant multiplication&lt;/a&gt;, &lt;a href=&quot;https://github.com/moserware/SecretSplitter/blob/1b54b72a87d4bdcc5c84b12b36f17fca382d551d/SecretSplitter/Algebra/IrreduciblePolynomial.cs#L12&quot;&gt;irreducible polynomials&lt;/a&gt; &lt;a href=&quot;http://math.stackexchange.com/questions/14787/finding-irreducible-polynomials-over-gf2-with-the-fewest-terms&quot;&gt;with the fewest terms&lt;/a&gt;, &lt;a href=&quot;https://github.com/moserware/SecretSplitter/blob/1b54b72a87d4bdcc5c84b12b36f17fca382d551d/SecretSplitter/Algebra/LagrangeInterpolator.cs#L22&quot;&gt;Lagrange polynomial interpolation&lt;/a&gt; to find the y-intercept, and using &lt;a href=&quot;https://github.com/moserware/SecretSplitter/blob/1b54b72a87d4bdcc5c84b12b36f17fca382d551d/SecretSplitter/Algebra/FiniteFieldPolynomial.cs#L106&quot;&gt;Euclidean inverses&lt;/a&gt; for division.&lt;/p&gt;

&lt;p&gt;Again, it probably sounds more complicated than it really is. At its core, it’s simple. This technique is formally known as a &lt;a href=&quot;http://securespeech.cs.cmu.edu/reports/shamirturing.pdf&quot; title=&quot;See “How to Share a Secret” by Adi Shamir&quot;&gt;Shamir Secret Sharing Scheme&lt;/a&gt; and it was discovered in the 1970’s.&lt;/p&gt;

&lt;p&gt;I didn’t want to invent anything new unless I felt I absolutely had to. There was already a good tool called “&lt;a href=&quot;http://point-at-infinity.org/ssss/&quot;&gt;ssss-split&lt;/a&gt;” that generates shares similar to how I wanted. This program adds a special twist by scrambling the resulting y-intercept point and therefore adds an extra layer of protection. Since this program was already the de-facto standard, I wanted to be fully compatible with it. To make sure I was compatible, I had to copy its method of “diffusing” (i.e. scrambling) the bits using the public domain &lt;a href=&quot;http://en.wikipedia.org/wiki/XTEA&quot;&gt;XTEA algorithm&lt;/a&gt;. However, to ensure complete fidelity, I had to look at the source code. The only problem was that it was originally released under the &lt;a href=&quot;http://www.gnu.org/copyleft/gpl.htmlhttp://www.gnu.org/copyleft/gpl.html&quot;&gt;GNU Public License&lt;/a&gt; (GPL) and it used &lt;a href=&quot;http://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library&quot;&gt;a GPL library for working with large numbers&lt;/a&gt;. My goal was to make my implementation as open as I could, so I asked the author if I could look at his code to derive my own implementation that I’d release under the more permissive &lt;a href=&quot;http://www.opensource.org/licenses/mit-license.php&quot;&gt;MIT license&lt;/a&gt; and he graciously allowed me to do this.&lt;/p&gt;

&lt;p&gt;To prove the compatibility, you can use the &lt;a href=&quot;http://point-at-infinity.org/ssss/demo.html&quot;&gt;ssss-split demo page&lt;/a&gt; and paste the results &lt;a href=&quot;https://github.com/moserware/SecretSplitter/releases/latest&quot;&gt;into SecretSplitter&lt;/a&gt; and it’ll work just fine. In addition, I &lt;a href=&quot;https://github.com/moserware/SecretSplitter/releases&quot;&gt;created command line programs from scratch&lt;/a&gt; that are fully compatible with ssss-split and ssss-combine.&lt;/p&gt;

&lt;p&gt;After some basic usability testing, I decided to make one small adjustment. The “ssss-split” command allows you to attach a prefix that it ignores. I wanted to add a special prefix that would tell what type of share it was (i.e. a message or a file) as well as a &lt;a href=&quot;http://en.wikipedia.org/wiki/SHA-1&quot;&gt;simple checksum&lt;/a&gt; because with all those digits it’s easy to mistype one.&lt;/p&gt;

&lt;p&gt;Now, you can understand all the pieces of the long share:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/life-death-and-splitting-secrets/ShareComponents.png&quot;&gt;&lt;img src=&quot;/assets/life-death-and-splitting-secrets/ShareComponents_576.png&quot; alt=&quot;Share components&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In theory, you could “encrypt” a large file directly using this technique. In practice, it doesn’t work well because each share would be huge and not something you’d be able to write down by hand or say over the phone, even using the &lt;a href=&quot;http://en.wikipedia.org/wiki/NATO_phonetic_alphabet&quot;&gt;phonetic alphabet&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For lots of data, we use a hybrid approach: encrypt the file using standard file encryption with a random key and then split the small “key” into pieces.&lt;/p&gt;

&lt;p&gt;For file encryption, I again didn’t want to invent anything new. I decided to use the &lt;a href=&quot;http://tools.ietf.org/html/rfc4880&quot;&gt;OpenPGP Message Format&lt;/a&gt;, the same format used by &lt;a href=&quot;http://en.wikipedia.org/wiki/Pretty_Good_Privacy&quot;&gt;PGP&lt;/a&gt; and &lt;a href=&quot;http://www.gnupg.org/&quot;&gt;GNU Privacy Guard&lt;/a&gt; (GPG). I didn’t want to have to worry about licensing restrictions or including a &lt;a href=&quot;http://www.bouncycastle.org/&quot; title=&quot;Like Bouncy Castle&quot;&gt;third-party library&lt;/a&gt;, so I wrote my own implementation from scratch that did exactly what I wanted. I &lt;a href=&quot;http://commondatastorage.googleapis.com/rhuang/rfc4880.mobi&quot; title=&quot;I&#39;m a bit embarrassed to admit I read it on my Kindle by the beach. On the subject, I must admit that RFC2MOBI is a great free app for converting text-based RFCs to Kindle MOBI files. It does a remarkably decent job.&quot;&gt;read RFC4880&lt;/a&gt; and started sketching out what I needed to do. A few bug fixes later and I had a working implementation that was able to interoperate with GPG. To simplify my implementation, I only support a limited subset of features:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;I always use &lt;a href=&quot;http://en.wikipedia.org/wiki/Advanced_Encryption_Standard&quot;&gt;AES&lt;/a&gt; with a 256-bit key for encryption, even if users select a smaller effective key size. This means that users can pick any size key they want and thus balance security and share length. I picked AES because it’s strong and &lt;a href=&quot;http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html&quot;&gt;understandable with stick figures&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;The actual file encryption key is always a &lt;a href=&quot;http://tools.ietf.org/html/rfc4880#section-3.7.1.3&quot;&gt;hashed, salted, and stretched version&lt;/a&gt; of the reconstructed shares text.&lt;/li&gt;
  &lt;li&gt;The encrypted file has an &lt;a href=&quot;http://tools.ietf.org/html/rfc4880#section-5.13&quot;&gt;integrity protection packet&lt;/a&gt; to detect if the file has been modified and ensure it was decrypted correctly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Since I used common formats, you can verify the correctness of the generated files using a Linux shell. You can also create files using the shell and have them interoperate with SecretSplitter. I included &lt;a href=&quot;https://github.com/moserware/SecretSplitter/blob/master/Compatibility.txt&quot;&gt;a sample of how to do this with the source code&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;help-wanted--future-possibilities&quot;&gt;Help Wanted / Future Possibilities&lt;/h2&gt;

&lt;p&gt;SecretSplitter still looks and feels like a prototype. There are lots of possible improvements that could be made:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Secret splitting is a relatively complicated idea. In &lt;a href=&quot;http://www.amazon.com/gp/product/0470474246/ref=as_li_ss_tl?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=217145&amp;amp;creative=399369&amp;amp;creativeASIN=0470474246&quot;&gt;Cryptography Engineering&lt;/a&gt;, the authors write “secret sharing schemes are rarely used because they are too complex. They are complex to implement, but more importantly, they are complex to administrate and operate.”&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Although I tried to simplify the user experience for broad use, it could still use some user experience enhancements to simplify it further.&lt;br /&gt;
2.  I wrote it in C# for the .net platform because that is what I’m most familiar with (and it has some built-in powerful primitives like BigIntegers, AES, and hash functions). I suspect that an HTML5 version using JavaScript, a nice interface, and coming from a trusted domain would get much broader usage. In addition, since this is a problem that affects everyone, having great internationalization support would be a nice touch. It also would be nice to have a polished look with a good logo and other graphics.&lt;br /&gt;
3.  You could use more &lt;a href=&quot;http://en.wikipedia.org/wiki/Verifiable_secret_sharing&quot;&gt;elaborate secret sharing schemes&lt;/a&gt; than what I implemented in SecretSplitter. I considered these, but ultimately wanted to use a technique that was already compatible with widely deployed tools. I also considered enhancing shares with &lt;a href=&quot;http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FTime-based_One-time_Password_Algorithm&amp;amp;sa=D&amp;amp;sntz=1&amp;amp;usg=AFQjCNEG4XPPcQbdiivr7kuRUBxExU6Aqw&quot;&gt;two-factor&lt;/a&gt; support or using &lt;a href=&quot;http://en.wikipedia.org/wiki/Public_key_infrastructure&quot;&gt;existing public key infrastructure&lt;/a&gt;, but decided that added too much complexity. Perhaps it’s possible to incorporate these in a good design.&lt;br /&gt;
4.  It’d be neat if this scheme or something similar to it was integrated into LastPass and KeyPass as a core feature.&lt;br /&gt;
5.  Obviously the shares themselves are long. I tried making them shorter but the downsides outweighed the upsides. Perhaps it could be better. Also, a compelling graphically designed share card might make it more fun for broader use. The long length is somewhat of a safety mechanism that prevents people from memorizing with a quick glance. Also, it discourages overhasty use much like &lt;a href=&quot;http://vimeo.com/5735591&quot; title=&quot;Although, as this video demonstrates a hammer allows for quick access. However, at least you’d be making a conscious decision at that point.&quot;&gt;freezing a credit card&lt;/a&gt;.&lt;br /&gt;
6.  I kept the codes in a format that would be easy to write as well as read over the phone. I used a simple character set that avoids ambiguities like “O” vs “0”. One additional strategy could be to embed the share as a &lt;a href=&quot;http://qrcodenet.codeplex.com/&quot;&gt;QR code&lt;/a&gt; or something similar. I didn’t pursue this approach in favor of simplicity, but this could be an option.&lt;br /&gt;
7.  Really paranoid people might want to back up their encrypted file to paper. &lt;a href=&quot;http://www.codinghorror.com/blog/2009/07/the-paper-data-storage-option.html&quot;&gt;This is possible&lt;/a&gt;, but I’m not sure if it should belong inside the program itself.&lt;br /&gt;
8.  It’d be good to have suggestions on how to exchange shares or perhaps borrow ideas from PGP &lt;a href=&quot;http://en.wikipedia.org/wiki/Key_signing_party&quot;&gt;key signing parties&lt;/a&gt;. I suspect that if secret splitting were to become popular, then “&lt;a href=&quot;http://en.wikipedia.org/wiki/Web_of_trust&quot;&gt;web of trust&lt;/a&gt;” scenarios would naturally occur (i.e. “I’ll hold your secret share if you hold mine”).&lt;br /&gt;
9.  It’d be fun to compile a list of non-obvious uses for SecretSplitter to share with others. For example, it could make for interesting scavenger hunt clues.&lt;/p&gt;

&lt;p&gt;If you’d like to donate your time to any of the above ideas, I’d encourage you to just give it a go. You don’t have to ask for my permission but it would be nice if you posted your results somewhere or left a comment to this post. You can use my code for whatever purpose you’d like. My only hope is that you might get some benefit out of it.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;SecretSplitter is just a tool that gives another option for backing up very sensitive information by splitting it up into pieces. It’s not a full solution, only a tool. By relying on people I trust instead of &lt;a href=&quot;http://mashable.com/2010/10/11/social-media-after-death/&quot; title=&quot;Besides, I don&#39;t want to have to worry about a third-party company “dying” before I do.&quot;&gt;a third-party company&lt;/a&gt;, it helped me remove one excuse I had for not preparing somewhat unpleasant but important documents that we should all probably have. I still don’t have this all figured out, but writing SecretSplitter help me get started.&lt;/p&gt;

&lt;p&gt;If you’re young, don’t have any &lt;a href=&quot;http://en.wikipedia.org/wiki/Minor_(law)&quot;&gt;minor children&lt;/a&gt;, and don’t care at all what happens to your stuff, then you could run some mental actuarial model and convince yourself that the probability of you or your survivors needing these documents or password recovery procedure anytime soon is low, but you’re not given any guarantees.&lt;/p&gt;

&lt;p&gt;At the very least, it’s a good idea to make sure all of your financial assets and life insurance policies have a named beneficiary and at perhaps at least one alternate. You can also declare things like organ donor preferences on your driver’s license instead of making declarations in other documents. It’s also a good idea to have an “&lt;a href=&quot;http://en.wikipedia.org/wiki/In_case_of_emergency&quot; title=&quot;In Case of Emergency&quot;&gt;ICE&lt;/a&gt;” entry in your cell phone. However, going the extra step and making very basic final documents doesn’t require that much more work. Besides, once you have baseline documents, keeping them fresh is just a matter of occasional updates due to life events.&lt;/p&gt;

&lt;p&gt;The increasing digitization of our lives means that more personal things will only be stored digitally. From our journals to email to videos to health records, all of this will eventually only exist digitally and likely hidden behind passwords. This future needs some safety net for backing up sensitive things in a safe and accessible way.&lt;/p&gt;

&lt;p&gt;Everything doesn’t need to be backed up. There are also lots of files, usernames and passwords that don’t really matter. Don’t include those. SecretSplitter was built with the assumption that everything that really mattered could be stored in a file small enough to email to others. This helps focus and pare down to what really matters.&lt;/p&gt;

&lt;p&gt;It’s also good to have a healthy dose of common sense. Instead of holding out a secret until after your death, maybe you should get that resolved today. You’ll probably live better. My general view is that these final “secrets” should be mostly boring by just containing account details and credentials.&lt;/p&gt;

&lt;p&gt;Finally, on a more personal level, I think it’s healthy to be reminded about our own mortality at least once every year or so. It’s a helpful reminder of how much a gift every day is and helps focus what we do and not worry about things that don’t matter.&lt;/p&gt;

&lt;p&gt;If a little bit of fancy math can help you sleep better at night, well then, I’d consider it a success.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Special thanks to B. Poettering for creating the original&lt;/em&gt; &lt;a href=&quot;http://point-at-infinity.org/ssss/&quot;&gt;&lt;em&gt;ssss&lt;/em&gt;&lt;/a&gt; &lt;em&gt;program and allowing me to clone its format.&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Mon, 21 Nov 2011 08:43:00 +0000</pubDate>
        <link>http://www.moserware.com/2011/11/life-death-and-splitting-secrets.html</link>
        <guid isPermaLink="true">http://www.moserware.com/2011/11/life-death-and-splitting-secrets.html</guid>
        
        
      </item>
    
      <item>
        <title>Notes from porting C# code to PHP</title>
        <description>&lt;p&gt;(&lt;strong&gt;Summary&lt;/strong&gt;: I ported my TrueSkill implementation from &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;C#&lt;/a&gt; to PHP and &lt;a href=&quot;http://github.com/moserware/PHPSkills&quot; title=&quot;Patches welcome :)&quot;&gt;posted it on GitHub&lt;/a&gt;. It was my first real encounter with PHP and I learned a few things.)&lt;/p&gt;

&lt;p&gt;I braced for the worst. &lt;a href=&quot;http://php.net/download-logos.php&quot;&gt;&lt;img style=&quot;display: inline; margin-left: 15px; margin-right: 0px; border:0px&quot; align=&quot;right&quot; src=&quot;/assets/notes-from-porting-c-code-to-php/1000px-PHP-logo.svg_200.png&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After years of hearing &lt;a href=&quot;http://www.codinghorror.com/blog/2008/05/php-sucks-but-it-doesnt-matter.html&quot; title=&quot;Jeff Atwood&#39;s: &#39;PHP Sucks, But It Doesn&#39;t Matter.&#39; Jeff has gone on record many times bemoaning the PHP language.&quot;&gt;negative&lt;/a&gt; &lt;a href=&quot;http://stackoverflow.com/questions/309300/defend-php-convince-me-it-isnt-horrible&quot; title=&quot;Stack Overflow Question: &#39;Defend PHP; convince me it isn&#39;t horrible&#39;&quot;&gt;things&lt;/a&gt; about PHP, I had been led to believe that touching it would rot my brain. Ok, maybe that’s a &lt;em&gt;bit&lt;/em&gt; much, but its &lt;a href=&quot;http://thisdeveloperslife.com/post/1270441885/1-0-5-homerun&quot; title=&quot;In the &#39;Homerun&#39; episode of &#39;This Developer&#39;s Life&#39;, David Heinemeier Hansson mentioned that one of the reasons why he switched to Ruby and created Rails was that he basically thought PHP (and Java) were beyond hope.&quot;&gt;reputation&lt;/a&gt; had me believe it was full of &lt;a href=&quot;http://www.softwarebyrob.com/2006/11/17/single-important-rule-retaining-software-developers/&quot; title=&quot;To quote Paul Graham: &#39;Not every kind of hard is good. There is good pain and bad pain. You want the kind of pain you get from going running, not the kind you get from stepping on a nail. A difficult problem could be good for a designer, but a fickle client or unreliable materials would not be.&#39; The basic idea is that bad problems just wear you out without giving you any benefit or insight.&quot;&gt;bad problems&lt;/a&gt;. Even the &lt;a href=&quot;http://www.mailchimp.com/blog/ewww-you-use-php/#more-10515&quot; title=&quot;The guys at MailChimp recently wrote about how they&#39;re having some difficulties hiring programmers because their site is in PHP. This is probably indicative of a larger trend, especially among alpha geeks.&quot;&gt;cool kids&lt;/a&gt; &lt;a href=&quot;http://news.ycombinator.com/item?id=1818954&quot; title=&quot;I think some of the general attitude can be summed up by this quote by pilif on Hacker News: &#39;While I really hate some aspects of PHP by now and I would love to have a Ruby or Python codebase to work with instead, rewriting all of this is out of the question.&#39; which I can respect.&quot;&gt;had&lt;/a&gt; &lt;a href=&quot;http://www.reddit.com/r/programming/comments/dutgs/ewww_you_use_php/&quot; title=&quot;Selected comment from skillet-thief on Reddit: &#39;PHP hinders you on a lot of levels: the community has such a wide range of skill levels, including a huge class of users who mostly know how to install and uninstall and reinstall until something works; code reuse is much harder than in other languages because there is a lot of bad code out there, the good code is packaged in a way that makes it hard to share (as a stand-alone tool a lot of times). Abstractions are generally harder to make too. There were no real anonymous functions until very recently.&#39;&quot;&gt;issues&lt;/a&gt; with PHP. But I thought that it couldn’t be too bad because there was &lt;a href=&quot;http://www.facebook.com/&quot; title=&quot;Formerly known as thefacebook&quot;&gt;that one website&lt;/a&gt; that gets a few hits using a &lt;a href=&quot;http://github.com/facebook/hiphop-php/wiki&quot; title=&quot;I think that Zuckerberg&#39;s usage of PHP is similar to most people&#39;s in that it was easy to get started. Throw in lots of programmers and bam! You have a large codebase and a ship that&#39;s not feasible to rewrite. This probably justified the whole HipHop compiler rather than a rewrite. This is similar to FogBugz programmers using Wasabi to avoid rewriting VBScript code.&quot;&gt;dialect of it&lt;/a&gt;. When &lt;a href=&quot;http://kaggle.com/&quot;&gt;Kaggle&lt;/a&gt; offered to sponsor a port of my &lt;a href=&quot;http://www.moserware.com/2010/03/computing-your-skill.html&quot;&gt;TrueSkill&lt;/a&gt; &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;C# code&lt;/a&gt; to PHP, I thought I’d finally have my first real encounter with PHP.&lt;/p&gt;

&lt;h2 id=&quot;php-echo-disclaimer-&quot;&gt;&lt;?php echo &quot;Disclaimer:&quot;; ?&gt;&lt;/h2&gt;

&lt;p&gt;To make the port quick, I kept most of the design and class structure from my C# implementation. This led to a less-than-optimal result since PHP really &lt;a href=&quot;http://michaelkimsal.com/blog/php-is-not-object-oriented/&quot; title=&quot;Yes, it has the &#39;class&#39; keyword, but that was bolted on relatively late and wasn&#39;t the primary focus in PHP&#39;s design.&quot;&gt;isn’t object-oriented&lt;/a&gt;. I didn’t do a deep dive on redesigning it in the native PHP way. I stuck with the philosophy that you can &lt;a href=&quot;http://queue.acm.org/detail.cfm?id=1039535&quot; title=&quot;The classic phrase is: &#39;You can write Fortran in any language.&#39; By not catering to PHP&#39;s strengths, I might have brought too much C#-ness to PHP without better factoring things.&quot;&gt;write quasi-C# in any language&lt;/a&gt;. Also, I didn’t use any of the web and database features that motivate most people to choose PHP in the first place. In other words, I didn’t cater to PHP’s &lt;a href=&quot;http://stackoverflow.com/questions/694246/how-is-php-done-the-right-way&quot;&gt;specialty&lt;/a&gt;, so my reflections are probably an unfair and biased comparison as I was not using PHP the way it was intended. I &lt;a href=&quot;http://www.lessonsoffailure.com/developers/language-flamewars-blub-paradox/&quot;&gt;expect&lt;/a&gt; that I missed tons of great things about PHP.&lt;/p&gt;

&lt;p&gt;Personal disclaimers aside, even PHP book authors don’t claim that it’s the nicest language. Instead, they highlight the language’s popularity. I sort of got the feeling that people mainly choose PHP in lieu of languages like C# because of its &lt;a href=&quot;http://www.tiobe.com/index.php/paperinfo/tpci/PHP.html&quot;&gt;current popularity&lt;/a&gt; and its perception of having a lower upfront cost, especially among cash-strapped startups. Matt Doyle, author of &lt;a href=&quot;http://www.amazon.com/gp/product/0470413964?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470413964&quot;&gt;Beginning PHP 5.3&lt;/a&gt;, wrote the following while comparing PHP to other languages:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/0470413964?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470413964&quot; title=&quot;Beginning PHP 5.3&quot;&gt;&lt;img align=&quot;right&quot; border=&quot;0&quot; src=&quot;/assets/notes-from-porting-c-code-to-php/BeginningPHPBookCover.jpg&quot; /&gt;&lt;/a&gt;&lt;img src=&quot;http://www.assoc-amazon.com/e/ir?t=moserware-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=0470413964&quot; alt=&quot;&quot; /&gt;“Many would argue that C# is a nicer, better-organized language to program in than PHP, although C# is arguably harder to learn. Another advantage of ASP.NET is that C# is a compiled language, which generally means it runs faster than PHP’s interpreted scripts (although PHP compilers are available).” - p.5&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;He continued:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“ASP and ASP.NET have a couple of other disadvantages compared to PHP. First of all, they have a commercial license, which can mean spending additional money on server software, and hosting is often more expensive as a result. Secondly, ASP and ASP.NET are fairly heavily tied to the Windows platform, whereas the other technologies in this list are much more cross-platform.” - p.5&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Next, he hinted that Ruby might eventually replace PHP’s reign:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Like Python, Ruby is another general-purpose language that has gained a lot of traction with Web developers in recent years. This is largely due to the excellent Ruby on Rails application framework, which uses the Model-View-Controller (MVC) pattern, along with Ruby’s extensive object-oriented programming features, to make it easy to build a complete Web application very quickly. As with Python, Ruby is fast becoming a popular choice among Web developers, but for now, PHP is much more popular.” - p.6&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and then elaborating on why PHP might be popular today:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“[T]his middle ground partly explains the popularity of PHP. The fact that you don’t need to learn a framework or import tons of libraries to do basic Web tasks makes the language easy to learn and use. On the other hand, if you need the extra functionality of libraries and frameworks, they’re there for you.” - p.7&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Fair enough. However, to really understand the language, I needed to dive in personally and experience it firsthand. I took notes during the dive about some of the things that stuck out.&lt;/p&gt;

&lt;h2 id=&quot;the-goodhttpwwwamazoncomgpproduct0596517742ieutf8tagmoserware-20linkcodeas2camp1789creative390957creativeasin0596517742-this-section-title-comes-from-the-subtitle-of-my-favorite-javascript-book-parts&quot;&gt;The &lt;a href=&quot;http://www.amazon.com/gp/product/0596517742?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0596517742&quot; title=&quot;This section title comes from the subtitle of my favorite JavaScript book&quot;&gt;Good&lt;/a&gt; Parts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;It’s relatively easy to learn and get started with PHP. As a C# developer, I was able to pick up PHP in a few hours after a brief overview of the syntax from &lt;a href=&quot;http://www.amazon.com/gp/product/0470413964?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470413964&quot;&gt;a book&lt;/a&gt;. Also, PHP has some decent &lt;a href=&quot;http://php.net/manual/en/index.php&quot;&gt;online help&lt;/a&gt;. &lt;/li&gt;
  &lt;li&gt;PHP is available on almost all web hosts these days at no extra charge (in contrast with ASP.NET hosting). I can’t emphasize this enough because it’s a reason why I would still consider writing a small website in it. &lt;/li&gt;
  &lt;li&gt;I was pleasantly surprised to have unit test support with &lt;a href=&quot;http://www.phpunit.de/&quot;&gt;PHPUnit&lt;/a&gt;. This made me feel at home and made it easier to develop and debug code. &lt;/li&gt;
  &lt;li&gt;It’s very easy and reasonable to create a website in PHP using techniques like Model-View-Controller (MVC) designs that separate the view from the actual database model. The language doesn’t seem to pose any hindrance to this. &lt;/li&gt;
  &lt;li&gt;PHP has a “&lt;a href=&quot;http://php.net/manual/en/language.oop5.static.php&quot;&gt;static&lt;/a&gt;” keyword that is sort of like a static version of a “this” reference. This was useful in creating a quasi-static “subclass” of my “&lt;a href=&quot;http://github.com/moserware/PHPSkills/blob/master/Skills/Numerics/Range.php&quot;&gt;Range&lt;/a&gt;” class for validating &lt;a href=&quot;http://github.com/moserware/PHPSkills/blob/master/Skills/PlayersRange.php&quot;&gt;player&lt;/a&gt; and &lt;a href=&quot;http://github.com/moserware/PHPSkills/blob/master/Skills/TeamsRange.php&quot;&gt;team&lt;/a&gt; sizes. This feature is formally known as &lt;a href=&quot;http://en.wikipedia.org/wiki/Late_static_binding#Late_static&quot;&gt;late static binding&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;the-when-in-romehttpenwiktionaryorgwikiwheninromedoastheromansdo-si-fueris-romae-romano-vivito-more-si-fueris-alibi-vivito-sicut-ibi-parts&quot;&gt;The “&lt;a href=&quot;http://en.wiktionary.org/wiki/when_in_Rome,_do_as_the_Romans_do&quot; title=&quot;Si fueris Romae, Romano vivito more; Si fueris alibi, vivito sicut ibi.&quot;&gt;When in Rome&lt;/a&gt;…” Parts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Class names use PascalCase while functions tend to use lowerCamelCase like Java whereas C# tends to use PascalCase for both. In addition, .NET in general seems to have &lt;a href=&quot;http://www.moserware.com/2008/12/private-life-of-public-api.html&quot;&gt;more universally accepted naming conventions&lt;/a&gt; than PHP has. &lt;/li&gt;
  &lt;li&gt;PHP variables have a ‘$’ prefix which makes variables stick out: &lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-php&quot; data-lang=&quot;php&quot;&gt;&lt;span class=&quot;x&quot;&gt;function increment($someNumber) &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;{ &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    $result = $someNumber + 1; &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    return $result; &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This convention was probably copied from &lt;a href=&quot;http://en.wikipedia.org/wiki/Perl#Data_types&quot;&gt;Perl’s scalar variable sigil&lt;/a&gt;. This makes sense because PHP &lt;a href=&quot;http://en.wikipedia.org/wiki/PHP#History&quot;&gt;was originally&lt;/a&gt; a set of Perl scripts intended to be a simpler Perl. &lt;br /&gt;
-   You access class members and functions using an arrow operator (“-&amp;gt;”) like C++ instead of the C#/Java dot notation (“.”). That is, in PHP you say &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-php&quot; data-lang=&quot;php&quot;&gt;&lt;span class=&quot;x&quot;&gt;$someClass-&amp;gt;someMethod()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;instead of &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;n&quot;&gt;someClass&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;someMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;The arguments in a “&lt;a href=&quot;http://php.net/manual/en/control-structures.foreach.php&quot;&gt;foreach&lt;/a&gt;” statement are reversed from what C# uses. In PHP, you write: &lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-php&quot; data-lang=&quot;php&quot;&gt;&lt;span class=&quot;x&quot;&gt;foreach($allItems as $currentItem) { ... }&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;instead of the C# way: &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;k&quot;&gt;foreach&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;currentItem&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;allItems&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;One advantage to the PHP way is its special syntax that makes iterating through key/value pairs in an map easier: &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-php&quot; data-lang=&quot;php&quot;&gt;&lt;span class=&quot;x&quot;&gt;foreach($someArray as $key =&amp;gt; $value) { ... }&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;vs. the C# way of something like this: &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;k&quot;&gt;foreach&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;someDictionary&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; 
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// use pair.Key and pair.Value &lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;The “=&amp;gt;” operator in PHP denotes a map entry as in &lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-php&quot; data-lang=&quot;php&quot;&gt;&lt;span class=&quot;x&quot;&gt;$numbers = array(1 =&amp;gt; ‘one’, 2 =&amp;gt; ‘two’, ...)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In C#, the arrow “=&amp;gt;” is instead used for a lightweight &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/bb308966.aspx#csharp3.0overview_topic7&quot;&gt;lambda expression syntax&lt;/a&gt;: &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To define the rough equivalent of the PHP array, you’d have to write this in C# &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;kt&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;numbers&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Dictionary&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;&amp;gt;{&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;one&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;two&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;On the one hand, the PHP notations for maps is cleaner, but it comes at a cost of having no lightweight lambda syntax (more on that later). &lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;PHP has some “&lt;a href=&quot;http://php.net/manual/en/language.oop5.magic.php&quot;&gt;magical methods&lt;/a&gt;” such as “&lt;a href=&quot;http://www.php.net/manual/en/language.oop5.decon.php#language.oop5.decon.constructor&quot;&gt;__construct&lt;/a&gt;” and “&lt;a href=&quot;http://www.php.net/manual/en/language.oop5.magic.php#language.oop5.magic.tostring&quot;&gt;__toString&lt;/a&gt;” for the equivalent of C#’s &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms173115.aspx&quot;&gt;constructor&lt;/a&gt; and &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.object.tostring.aspx&quot;&gt;ToString&lt;/a&gt; functionality. I like C#’s approach here, but I’m biased.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;the-ok-i-guess-parts&quot;&gt;The “Ok, &lt;em&gt;I guess&lt;/em&gt;” Parts&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;The free &lt;a href=&quot;http://netbeans.org/features/php/index.html&quot;&gt;NetBeans IDE for PHP&lt;/a&gt; is pretty &lt;a href=&quot;http://stackoverflow.com/questions/6166/any-good-php-ide-preferably-free-or-cheap/6169#6169&quot; title=&quot;I first learned about NetBeans through the StackOverflow question &#39;Any good PHP IDE, preferably free or cheap?&#39;&quot;&gt;decent&lt;/a&gt; for writing PHP code. Using it in conjunction with PHP’s &lt;a href=&quot;http://www.xdebug.org/&quot;&gt;XDebug&lt;/a&gt; debugger functionality is a must. After my initial attempts at writing code with a &lt;a href=&quot;http://www.flos-freeware.ch/notepad2.html&quot;&gt;basic notepad&lt;/a&gt;, I found NetBeans to be a very capable editor. My only real complaint with it is that I had some occasional cases where the editor would lock up and the debugger wouldn’t support things like watching variables. That said, it’s still good for being a free editor. &lt;/li&gt;
  &lt;li&gt;By default, PHP passes function arguments by value instead of by reference like C# does it. This probably caused the &lt;a href=&quot;http://github.com/moserware/PHPSkills/commit/4c7cfef8d6c602e733f47965a59676080a81f860&quot; title=&quot;As you can tell by my many git commits, it took awhile to figure this out... and I still probably missed something.&quot;&gt;most&lt;/a&gt; &lt;a href=&quot;http://github.com/moserware/PHPSkills/commit/803a0816a84879ebfa651ec975664c6ba2f7b93f&quot;&gt;difficulty&lt;/a&gt; with the port. Complicating things further is that &lt;a href=&quot;http://www.php.net/manual/en/language.references.return.php&quot; title=&quot;They&#39;re more like symlinks on a filesystem than pointers&quot;&gt;PHP references are not like references in other languages&lt;/a&gt;. For example, using references usually incurs a performance penalty since extra work is required. &lt;/li&gt;
  &lt;li&gt;You &lt;a href=&quot;http://bugs.php.net/bug.php?id=47872&quot;&gt;can’t&lt;/a&gt; import types via namespaces alone like you can in C# (and Java for that matter). In PHP, you have to import each type manually: &lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-php&quot; data-lang=&quot;php&quot;&gt;&lt;span class=&quot;x&quot;&gt;use Moserware\Skills\FactorGraphs\ScheduleLoop; &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;use Moserware\Skills\FactorGraphs\ScheduleSequence; &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;use Moserware\Skills\FactorGraphs\ScheduleStep; &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;use Moserware\Skills\FactorGraphs\Variable;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;whereas in C# you can just say: &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;Moserware.Skills.FactorGraphs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;PHP’s way makes things explicit and I can see that viewpoint, but it was a bit of a surprising requirement given how PHP usually required less syntax. &lt;br /&gt;
-   PHP lacks support for C#-like &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/512aeb7t(v=VS.100).aspx&quot;&gt;generics&lt;/a&gt;. On the one hand, I missed the generic type safety and performance benefits, but on the other hand it forced me to redesign some classes to not have an army of angle brackets (e.g. compare &lt;a href=&quot;http://github.com/moserware/Skills/blob/master/Skills/FactorGraphs/Factor.cs&quot;&gt;this class in C#&lt;/a&gt; to &lt;a href=&quot;http://github.com/moserware/PHPSkills/blob/master/Skills/FactorGraphs/Factor.php&quot;&gt;its PHP equivalent&lt;/a&gt;). -   You have to manually call your parent class’s constructor in PHP if you want that feature: &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-php&quot; data-lang=&quot;php&quot;&gt;&lt;span class=&quot;x&quot;&gt;class BaseClass &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;{ &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    function __construct() { ... } &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;x&quot;&gt;class DerivedClass extends BaseClass &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;{ &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    function __construct() &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    { &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;        // this line is optional, but if you omit it, the BaseClass constructor will *not* be called &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;        parent::__construct(); &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    } &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This gives you more flexibility, but it doesn’t enforce C#-like assumptions that your parent class’s constructor was called. &lt;br /&gt;
-   PHP doesn’t seem to have the concept of an implicit “$this” inside of a class. This forces you to always qualify class member variables with $this: &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-php&quot; data-lang=&quot;php&quot;&gt;&lt;span class=&quot;x&quot;&gt;class SomeClass &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;{ &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    private $_someLocalVariable; &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    function someMethod() &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    { &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;        $someMethodVariable = $this-&amp;gt;_someLocalVariable + 1; &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;        ... &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    } &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I put this in the “OK” category because some C# developers &lt;a href=&quot;http://blogs.msdn.com/b/omars/archive/2004/02/05/67687.aspx&quot;&gt;prefer&lt;/a&gt; to always be explicit on specifying “this” as well. &lt;br /&gt;
-   PHP allows you to specify the type of some (but not all kinds) of the arguments of a function: &lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-php&quot; data-lang=&quot;php&quot;&gt;&lt;span class=&quot;x&quot;&gt;function myFunction(SomeClass $someClass, array $someArray, $someString) &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;{ &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;    ... &lt;/span&gt;
&lt;span class=&quot;x&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is called “&lt;a href=&quot;http://php.net/manual/en/language.oop5.typehinting.php&quot;&gt;type hinting&lt;/a&gt;.” It seems that it is designed for enforcing API contracts instead of general IDE help as it actually causes a &lt;a href=&quot;http://stackoverflow.com/questions/3580628/is-type-hinting-helping-the-performance-of-php-scripts/3580660#3580660&quot;&gt;decrease in performance&lt;/a&gt;. &lt;br /&gt;
-   PHP doesn’t have the concept of &lt;a href=&quot;http://msdn.microsoft.com/en-us/netframework/aa904594.aspx&quot;&gt;LINQ&lt;/a&gt;, but it does support some similar functional-like concepts like &lt;a href=&quot;http://php.net/manual/en/function.array-map.php&quot;&gt;array_map&lt;/a&gt; and &lt;a href=&quot;http://www.php.net/manual/en/function.array-reduce.php&quot;&gt;array_reduce&lt;/a&gt;. &lt;br /&gt;
-   PHP has support for &lt;a href=&quot;http://php.net/manual/en/functions.anonymous.php&quot;&gt;anonymous functions&lt;/a&gt; by using the “&lt;code&gt;function($arg1, ...){}&lt;/code&gt;” syntax. This is sort of reminiscent of how C# did the same thing in version 2.0 where you had to type out “&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/0yw3tz5k(v=VS.100).aspx&quot;&gt;delegate&lt;/a&gt;.” C# 3.0 simplified this with a lighter weight version (e.g. “&lt;code&gt;x =&amp;gt; x*x&lt;/code&gt;”). I’ve found that this seemingly tiny change “isn’t about doing the same thing faster, it allows me to work in a &lt;a href=&quot;http://www.youtube.com/watch?v=4XpnKHJAok8#t=54m47s&quot; title=&quot;The quote comes from Linus talking about how git&#39;s speed changes how you work. The full quote is: &#39;that is the kind of performance that changes how you work. It’s no longer doing the same thing faster, it’s allowing you to work in a completely different manner.&#39;&quot;&gt;completely different manner&lt;/a&gt;” by employing functional concepts without thinking. It’s sort of a shame PHP didn’t elevate this concept with concise syntax. When C#’s lambda syntax was introduced in 3.0, it made me want to use them much more often. PHP’s lack of something similar is a strong discourager to the functional style and is a lesson that &lt;a href=&quot;http://herbsutter.com/2010/10/07/c-and-beyond-session-lambdas-lambdas-everywhere/&quot;&gt;C++ guys have recently learned&lt;/a&gt;. &lt;br /&gt;
-   Item 4 of &lt;a href=&quot;http://www.php.net/license/index.php#faq-lic&quot;&gt;the PHP license&lt;/a&gt; states:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Products derived from this software may not be called “PHP”, nor may “PHP” appear in their name, without prior written permission from group@php.net. You may indicate that your software works in conjunction with PHP by saying “Foo for PHP” instead of calling it “PHP Foo” or “phpfoo”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This explains why you see carefully worded names like “&lt;a href=&quot;http://developers.facebook.com/blog/post/358&quot;&gt;HipHop for PHP&lt;/a&gt;” rather than something like “php2cpp.” This technically doesn’t stop you doesn’t stop you from having a project with the PHP name in it (e.g. &lt;a href=&quot;http://www.phpunit.de/&quot;&gt;PHPUnit&lt;/a&gt;) so long as the official PHP code is not included in it. However, it’s clear that the PHP group is trying to clean up its name from tarnished projects like &lt;a href=&quot;http://en.wikipedia.org/wiki/PHP-Nuke&quot;&gt;PHP-Nuke&lt;/a&gt;. I understand their frustration, but this leads to an official preference for names like &lt;a href=&quot;http://www.zope.org/&quot;&gt;Zope&lt;/a&gt; and &lt;a href=&quot;http://www.smarty.net/&quot;&gt;Smarty&lt;/a&gt; that seem to be less clear on what the project actually does. This position would be like Microsoft declaring that you couldn’t use the “#” suffix or the “Implementation Running On .Net (&lt;a href=&quot;http://stackoverflow.com/questions/1194309/why-are-many-ports-of-languages-to-net-prefixed-with-iron&quot;&gt;Iron&lt;/a&gt;)” prefix in your project name (but maybe that would lead to more creativity?).&lt;/p&gt;

&lt;h2 id=&quot;the-frustratinghttpwwwjoelonsoftwarecomuibookchaptersfog0000000057html-like-joel-mentions-in-this-post-from-2000-tiny-frustrations-add-up-to-a-really-bad-experience-parts&quot;&gt;The &lt;a href=&quot;http://www.joelonsoftware.com/uibook/chapters/fog0000000057.html&quot; title=&quot;Like Joel mentions in this post from 2000, tiny frustrations add up to a really bad experience&quot;&gt;Frustrating&lt;/a&gt; Parts:&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;As someone who’s primarily worked with a statically typed language for the past 15 years, I prefer upfront compiler errors and warnings that C# offers and agree with &lt;a href=&quot;http://en.wikipedia.org/wiki/Anders_Hejlsberg&quot;&gt;Anders Hejlsberg&lt;/a&gt;’s &lt;a href=&quot;http://www.se-radio.net/2008/05/episode-97-interview-anders-hejlsberg/&quot; title=&quot;The quote begins around 35:45&quot;&gt;philosophy&lt;/a&gt;:&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
  &lt;p&gt;“I think one of the reasons that languages like Ruby for example (or Python) are becoming popular is really in many ways in spite of the fact that they are not typed… but because of the fact that they [have] very good metaprogramming support. I don’t see a lot of downsides to static typing other than the fact that it may not be practical to put in place, and it &lt;em&gt;is&lt;/em&gt; harder to put in place and therefore takes longer for us to get there with static typing, but once you do have static typing. I mean, gosh, you know, like hey – the compiler is going to report the errors before the space shuttle flies instead of whilst it’s flying, that’s a good thing!”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But more dynamic languages like PHP have their supporters. For example, &lt;a href=&quot;http://en.wikipedia.org/wiki/Douglas_Crockford&quot;&gt;Douglas Crockford&lt;/a&gt; &lt;a href=&quot;http://video.yahoo.com/watch/111596/1710658&quot; title=&quot;See video starting at the -18:14 mark&quot;&gt;raves&lt;/a&gt; about JavaScript’s dynamic aspects:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“I found over the years of working with JavaScript… I used to be of the religion that said ‘Yeah, absolutely brutally strong type systems. Figure it all out at compile time.’ I’ve now been converted to the other camp. I’ve found that the expressive power of JavaScript is so great. I’ve not found that I’ve lost anything in giving up the early protection [of statically compiled code]”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I still haven’t seen where Crockford is coming from given my recent work with PHP. Personally, I think that given C# 4.0’s optional support of &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/dd264736.aspx&quot;&gt;dynamic&lt;/a&gt; objects, the lines between the two worlds are grayer and that with C# you get the best of both worlds, but I’m probably biased here.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;You don’t have to define &lt;a href=&quot;http://www.php.net/manual/en/language.variables.basics.php&quot;&gt;variables&lt;/a&gt; in PHP. This reduces some coding “&lt;a href=&quot;http://msdn.microsoft.com/en-us/magazine/dd419655.aspx&quot; title=&quot;There&#39;s a lot of talk out there about ceremony vs essence.&quot;&gt;ceremony&lt;/a&gt;” to get to the essence of your code, but I think it removes a &lt;a href=&quot;http://podcasts.pragprog.com/2007-10/michael-nygard-interview.mp3&quot; title=&quot;Quote is at 3:46 - &#39;We should have shock absorbers and circuit breakers so that [our systems] can be resilient to failure.&#39;&quot;&gt;shock absorber/circuit-breaker&lt;/a&gt; that can be built into the language. This “feature” &lt;a href=&quot;http://github.com/moserware/PHPSkills/commit/fa10d276d6121f390b930b655a66edd9376e114e#L0L24&quot;&gt;turned my typo into a bug&lt;/a&gt; and led to a runtime error. Fortunately, options like &lt;a href=&quot;http://php.net/manual/en/errorfunc.configuration.php&quot;&gt;E_NOTICE&lt;/a&gt; can catch these, but it caught me off guard. Thankfully, NetBean’s auto-completion saved me from most of these types of errors. &lt;/li&gt;
  &lt;li&gt;PHP has built-in support for associative arrays, but you &lt;a href=&quot;http://php.net/manual/en/language.types.array.php&quot;&gt;can’t use objects as keys&lt;/a&gt; or else you’ll get an “Illegal Offset Type” error. Because my C# API heavily relied on this ability and I didn’t want to redesign the structure, I &lt;a href=&quot;http://github.com/moserware/PHPSkills/blob/master/Skills/HashMap.php&quot;&gt;created my own hashmap&lt;/a&gt; that supports object keys. This omission tended to reinforce the belief that &lt;a href=&quot;http://michaelkimsal.com/blog/php-is-not-object-oriented/&quot;&gt;PHP is not really object oriented&lt;/a&gt;. That said, I’m probably missing something and did it wrong. &lt;/li&gt;
  &lt;li&gt;PHP &lt;a href=&quot;http://bugs.php.net/bug.php?id=9331&amp;amp;edit=1&quot;&gt;doesn’t support operator overloading&lt;/a&gt;. This made my &lt;a href=&quot;http://github.com/moserware/PHPSkills/blob/master/Skills/Numerics/GaussianDistribution.php&quot;&gt;GaussianDistribution&lt;/a&gt; and &lt;a href=&quot;http://github.com/moserware/PHPSkills/blob/master/Skills/Numerics/Matrix.php&quot;&gt;Matrix&lt;/a&gt; classes a little harder to work with by having to invent explicit names for the operators. &lt;/li&gt;
  &lt;li&gt;PHP lacks support for a C#-like &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/x9fsa0sw(v=VS.100).aspx&quot;&gt;property syntax&lt;/a&gt;. Having to write getters and setters made me feel like I was back programming in Java again. &lt;/li&gt;
  &lt;li&gt;My code ran &lt;a href=&quot;http://twitter.com/GregB/status/27244912213&quot;&gt;slower in PHP&lt;/a&gt;. To be fair, most of the performance problem was in &lt;a href=&quot;http://github.com/moserware/PHPSkills/blob/master/Skills/Numerics/Matrix.php&quot;&gt;my horribly naive matrix implementation&lt;/a&gt; which could be improved with a better implementation. Regardless, it seems that larger sites deal with PHP’s performance problem by writing critical parts in compiled languages &lt;a href=&quot;http://news.ycombinator.com/item?id=1820451&quot;&gt;like C/C++&lt;/a&gt; or by using caching layers such as &lt;a href=&quot;http://en.wikipedia.org/wiki/Memcached&quot;&gt;memcached&lt;/a&gt;. One interesting observation is that the performance issue isn’t really with the &lt;a href=&quot;http://en.wikipedia.org/wiki/Zend_Engine&quot;&gt;Zend Engine&lt;/a&gt; per-se but rather the semantics of the PHP language itself. Haiping Zhao on the HipHop for PHP team &lt;a href=&quot;http://www.youtube.com/watch?v=p5S1K60mhQU#t=51m44s&quot; title=&quot;From the Stanford lecture on HipHop for PHP&quot;&gt;gave a good overview of the issue&lt;/a&gt;:&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Around the time that we started the [HipHop for PHP] project, we absolutely looked into the Zend Engine. The first question you ask is ‘The Zend Engine must be terribly implemented. That’s why it’s slow, right?’ &lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;So we looked into the Zend Engine and tried different places, we looked at the hash functions to see if it’s sufficient and look some of the profiles the Zend Engine has and different parts of the Zend Engine. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;You finally realize that the Zend Engine is pretty compact. It just does what it promises. If you have that kind of semantics you just cannot avoid the dynamic function table, you cannot avoid the variable table, you just cannot avoid a lot of the things that they built… &lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;that’s the point that [you realize] PHP can also be called C++Script because the syntax is so similar then you ask yourself, ‘What is the difference between the speed of these two different languages and those are the items that are… different like the dynamic symbol lookup (it’s not present in C++), the weak typing is not present in C++, everything else is pretty much the same. The Zend Engine is very close to C implementation. The layer is very very thin. I don’t think we can blame the Zend Engine for the slowness PHP has.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That said, I don’t think that performance alone would stop me from using PHP. It’s good enough for most things. Furthermore, I’m sure optimizers could use tricks like what the &lt;a href=&quot;http://en.wikipedia.org/wiki/Dynamic_Language_Runtime&quot;&gt;DLR&lt;/a&gt; and &lt;a href=&quot;http://code.google.com/p/v8/&quot;&gt;V8&lt;/a&gt; use to squeak out more performance. However, I think that in practice, there is a case of &lt;a href=&quot;http://en.wikipedia.org/wiki/Amdahl&#39;s_law&quot;&gt;diminishing returns&lt;/a&gt; where I/O (and not CPU time) typically become the limiting factor.&lt;/p&gt;

&lt;h2 id=&quot;parting-thoughts&quot;&gt;Parting Thoughts&lt;/h2&gt;

&lt;p&gt;Despite my brief encounter, I feel that I learned quite a bit and feel comfortable around PHP code now. I think my quick ramp-up highlights a core value of PHP: its simplicity. I did miss C#-like compiler warnings and type safety, but maybe that’s my own personal acquired taste. Although PHP &lt;em&gt;does&lt;/em&gt; have some &lt;a href=&quot;http://www.reddit.com/r/programming/comments/dst56/today_i_learned_about_php_variable_variables/c12n0w9&quot;&gt;dubious features&lt;/a&gt;, it’s not nearly as bad as some people make it out to be. I think that its simplicity makes it a very respectable choice for the type of things it was originally designed to do like &lt;a href=&quot;http://wordpress.org/extend/themes/&quot; title=&quot;e.g. Wordpress ones&quot;&gt;web templates&lt;/a&gt;. Although I still wouldn’t pick PHP as my &lt;a href=&quot;http://weblogs.asp.net/scottgu/archive/tags/MVC/default.aspx&quot;&gt;first choice&lt;/a&gt; as a general purpose web programming language, I can now look at its features in a much more balanced way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;P.S.&lt;/strong&gt; I’d love to hear suggestions on how to improve my implementation and learn where I did something wrong. Please feel free to use &lt;a href=&quot;http://github.com/moserware/PHPSkills&quot;&gt;my PHP TrueSkill code&lt;/a&gt; and submit &lt;a href=&quot;http://help.github.com/pull-requests/&quot;&gt;pull requests&lt;/a&gt;. As always, feel free to fork the code and port it to another language like &lt;a href=&quot;http://github.com/nsp&quot;&gt;Nate Parsons&lt;/a&gt; did with his &lt;a href=&quot;http://github.com/nsp/JSkills&quot;&gt;JSkills Java port&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Tue, 26 Oct 2010 08:34:00 +0000</pubDate>
        <link>http://www.moserware.com/2010/10/notes-from-porting-c-code-to-php.html</link>
        <guid isPermaLink="true">http://www.moserware.com/2010/10/notes-from-porting-c-code-to-php.html</guid>
        
        
      </item>
    
      <item>
        <title>Computing Your Skill</title>
        <description>&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;: I describe how the &lt;a href=&quot;http://research.microsoft.com/en-us/projects/trueskill/&quot;&gt;TrueSkill algorithm&lt;/a&gt; works using concepts you’re already familiar with. TrueSkill is used on &lt;a href=&quot;http://www.xbox.com/en-US/LIVE/&quot; title=&quot;I&#39;m actually not a gamer myself, I just like the math of their ranking algorithm :-)&quot;&gt;Xbox Live&lt;/a&gt; to rank and match players and it serves as a great way to understand how statistical machine learning is actually applied today. I’ve also created an &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;open source project&lt;/a&gt; where I implemented TrueSkill three different times in increasing complexity and capability. In addition, I’ve created a &lt;a href=&quot;/assets/computing-your-skill/The%20Math%20Behind%20TrueSkill.pdf&quot; title=&quot;It&#39;s over 40 pages because I had a fun time with the equation editor.&quot;&gt;detailed supplemental math paper&lt;/a&gt; that works out equations that I gloss over here. Feel free to jump to sections that look interesting and ignore ones that seem boring. Don’t worry if this post seems a bit long, there are &lt;em&gt;lots&lt;/em&gt; of pictures.&lt;/p&gt;

&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;It seemed easy enough: I wanted to create a database to track the skill levels of my coworkers in &lt;a href=&quot;http://en.wikipedia.org/wiki/Chess&quot;&gt;chess&lt;/a&gt; and &lt;a href=&quot;http://en.wikipedia.org/wiki/Table_football&quot;&gt;foosball&lt;/a&gt;. I already knew that I wasn’t very good at foosball and would bring down better players. I was curious if an algorithm could do a better job at creating well-balanced matches. I also wanted to see if I was improving at chess. I knew I needed to have an easy way to collect results from everyone and then use an algorithm that would keep getting better with &lt;a href=&quot;http://www.facebook.com/techtalks#/video/video.php?v=644326502463&quot; title=&quot;Peter Norvig&#39;s &#39;Theorizing From Data&#39; talk is fantastic, I highly recommend it.&quot;&gt;more&lt;/a&gt; &lt;a href=&quot;http://research.microsoft.com/en-us/collaboration/fourthparadigm/&quot; title=&quot;Microsoft Research put out this interesting book on how massive amounts of data will dominate scientific discoveries.&quot;&gt;data&lt;/a&gt;. I was looking for a way to compress all that data and distill it down to some simple knowledge of how skilled people are. Based on some &lt;a href=&quot;http://bits.blogs.nytimes.com/2009/09/21/netflix-awards-1-million-prize-and-starts-a-new-contest/?ref=technology&quot; title=&quot;I think the lasting legacy of the Netflix prize is that if you make something interesting and put it online, it shouldn&#39;t be a surprise that you can get PhDs to work on it for a dollar an hour or less. There&#39;s probably a deep lesson there for most tech companies.&quot;&gt;previous&lt;/a&gt; &lt;a href=&quot;http://www.pbs.org/wgbh/nova/darpa/&quot; title=&quot;If you haven&#39;t seen it yet, you should check out the PBS NOVA episode that covered this.&quot;&gt;things&lt;/a&gt; that I had heard about, this seemed like a good fit for “&lt;a href=&quot;http://tv.theiet.org/technology/infopro/turing-2010.cfm&quot; title=&quot;If you want a friendly introduction to machine learning, especially how it&#39;s applied at Microsoft, then Christopher Bishop&#39;s 2010 Turing lecture is a fantastic high level overview.&quot;&gt;machine learning&lt;/a&gt;.”&lt;/p&gt;

&lt;p&gt;But, there’s a problem.&lt;/p&gt;

&lt;p&gt;Machine learning is a &lt;em&gt;hot&lt;/em&gt; area in Computer Science— but it’s intimidating. Like most subjects, there’s &lt;a href=&quot;http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html&quot;&gt;a lot&lt;/a&gt; &lt;a href=&quot;http://news.ycombinator.com/item?id=1055042&quot; title=&quot;There are lots of machine learning resources out there, unfortunately most of them scare off beginners.&quot;&gt;to learn&lt;/a&gt; to be an expert in the field. I didn’t need to go very deep; I just needed to understand enough to solve my problem. I found a link to &lt;a href=&quot;http://research.microsoft.com/apps/pubs/default.aspx?id=67956&quot;&gt;the paper&lt;/a&gt; describing the TrueSkill algorithm and I read it several times, but it didn’t make sense. It was only 8 pages long, but it seemed beyond my capability to understand. I felt dumb. Even so, I was too stubborn to give up. Jamie Zawinski &lt;a href=&quot;http://books.google.com/books?id=nneBa6-mWfgC&amp;amp;printsec=frontcover&amp;amp;dq=coders+at+work&amp;amp;ei=hVFeS6CSI5G2NJadyPQC&amp;amp;cd=1#v=onepage&amp;amp;q=%22Not%20knowing%20something%20doesn%27t%20mean%20you%27re%20dumb%22&amp;amp;f=false&quot; title=&quot;The quote comes from Coders @ Work&quot;&gt;said it well&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Not knowing something doesn’t mean you’re dumb— it just means you don’t know it.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I learned that the problem isn’t the difficulty of the ideas themselves, but rather that the ideas make too big of a jump from &lt;a href=&quot;http://www.ted.com/talks/arthur_benjamin_s_formula_for_changing_math_education.html&quot; title=&quot;If you&#39;re like most people, then top of your math career was calculus. Although it has interesting concepts, you probably don&#39;t use it anymore. You would have been far better off learning more about statistics to handle all the data you&#39;re faced with. Arthur Benjamin&#39;s 2009 TED talk goes into this.&quot;&gt;the math&lt;/a&gt; that &lt;a href=&quot;http://www.youtube.com/watch?v=TsvPE1EqwQ8&quot; title=&quot;We spend way too much tyime learning how calculate, long-divide, integrate by parts, yadda yadda, instead of learning why you&#39;d want to do that or what it&#39;s actually useful for. In the era of Moore&#39;s law, you can bank on computers getting better at doing computational grunt work, but it&#39;s sad that you can&#39;t depend on the education system teaching kids how to take advantage of all that power. Although *slightly* biased towards using tools like Mathematica, this talk by Conrad Wolfram shares a similar viewpoint.&quot;&gt;we typically learn&lt;/a&gt; &lt;a href=&quot;http://news.ycombinator.com/item?id=1058584&quot; title=&quot;To prove this, start talking about even concepts in this blog post at your next party and look at the reaction.&quot;&gt;in school&lt;/a&gt;. This is sad because underneath the apparent complexity lies some beautiful concepts. In hindsight, the algorithm seems relatively simple, but it took me several months to arrive at that conclusion. My hope is that I can short-circuit the haphazard and slow process I went through and take you directly to the beauty of &lt;em&gt;understanding&lt;/em&gt; what’s inside the gem that is the TrueSkill algorithm.&lt;/p&gt;

&lt;h2 id=&quot;skill--probability-of-winning&quot;&gt;Skill ≈ Probability of Winning&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/File:Osaka07_D2A_Torri_Edwards.jpg&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/100M_dash_Osaka07_D2A_Torri_Edwards_320.jpg&quot; alt=&quot;Women runners in the 100 meter dash.&quot; title=&quot;World Athletics Championships 2007 in Osaka. Photo from Wikipedia by Eckhard Pecher. Used under the Creative Commons Attribution 2.5 Generic License&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
Skill is tricky to measure. Being good at something takes &lt;a href=&quot;http://www.moserware.com/2008/03/what-does-it-take-to-become-grandmaster.html&quot;&gt;deliberate practice&lt;/a&gt; and sometimes a bit of luck. How do you measure that in a person? You could just ask someone if they’re skilled, but this would only give a rough approximation since people tend to be &lt;a href=&quot;http://en.wikipedia.org/wiki/Overconfidence_effect&quot; title=&quot;It&#39;s worth reading about the overconfidence effect if you haven&#39;t done it before&quot;&gt;overconfident&lt;/a&gt; in their ability. Perhaps a better question is “what would the &lt;a href=&quot;http://en.wikipedia.org/wiki/Units_of_measurement&quot; title=&quot;for example, meters, seconds, etc.&quot;&gt;units&lt;/a&gt; of skill be?” For something like the 100 meter dash, you could just average the number of seconds of several recent sprints. However, for a game like chess, it’s harder because all that’s really important is if you win, lose, or draw.&lt;/p&gt;

&lt;p&gt;It might make sense to just tally the total number of wins and losses, but this wouldn’t be fair to people that played a lot (or a little). Slightly better is to record the percent of games that you win. However, this wouldn’t be fair to people that &lt;a href=&quot;http://www.codinghorror.com/blog/archives/000961.html&quot; title=&quot;Jeff Atwood discussed the concept further.&quot;&gt;beat up on far worse players&lt;/a&gt; or players who got decimated but maybe learned a thing or two. The goal of most games is to win, but if you win &lt;em&gt;too&lt;/em&gt; much, then you’re probably not challenging yourself. Ideally, if all players won about half of their games, we’d say things are balanced. In this ideal scenario, everyone would have a near 50% win ratio, making it impossible to compare using that metric.&lt;/p&gt;

&lt;p&gt;Finding universal units of skill is too hard, so we’ll just give up and not use &lt;em&gt;any&lt;/em&gt; units. The only thing we really care about is roughly who’s better than whom and by how much. One way of doing this is coming up with a &lt;a href=&quot;http://en.wikipedia.org/wiki/Scale_%28social_sciences%29&quot; title=&quot;There&#39;s a lot of cool stuff you can do with scales, specifically things like the Thurstone Case V and Bradley-Terry models, but there just wasn&#39;t enough space to cover these in detail, so I&#39;m only going to passively mention them here, but encourage you to check them out.&quot;&gt;scale&lt;/a&gt; where each person has a unit-less number expressing their rating that you could use for comparison. If a player has a skill rating much higher than someone else, we’d expect them to win if they played each other.&lt;/p&gt;

&lt;p&gt;The key idea is that a single skill number is meaningless. What’s important is how that number compares with others. This is an important point worth repeating: &lt;strong&gt;skill only makes sense if it’s relative to something else&lt;/strong&gt;. We’d like to come up with a system that gives us numbers that are useful for comparing a person’s skill. In particular, we’d like to have a skill rating system that we could use to predict the probability of winning, losing, or drawing in matches based on a numerical rating.&lt;/p&gt;

&lt;p&gt;We’ll spend the rest of our time coming up with a system to calculate and update these skill numbers with the assumption that they can be used to determine the probability of an outcome.&lt;/p&gt;

&lt;h2 id=&quot;what-exactly-is-probability-anyway&quot;&gt;What Exactly is Probability Anyway?&lt;/h2&gt;

&lt;p&gt;You can learn about probability if you’re willing to flip a coin— &lt;em&gt;a lot&lt;/em&gt;. You flip a few times:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://www.flickr.com/photos/matthiasxc/3600131465/&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/pennyheads_400.jpg&quot; alt=&quot;Heads&quot; title=&quot;Photo by matthiasxc on Flickr. Used under the Creative Commons Attribution License&quot; /&gt;&lt;/a&gt;&lt;a href=&quot;http://www.flickr.com/photos/matthiasxc/3600131465/&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/pennyheads_400.jpg&quot; alt=&quot;Heads&quot; title=&quot;Photo by matthiasxc on Flickr. Used under the Creative Commons Attribution License&quot; /&gt;&lt;/a&gt;&lt;a href=&quot;http://www.flickr.com/photos/matthiasxc/3600942160/in/photostream/&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/pennytails_matthiasxc_400.jpg&quot; alt=&quot;Tails&quot; title=&quot;Photo by matthiasxc on Flickr. Used under the Creative Commons Attribution License&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Heads, heads, tails!&lt;/p&gt;

&lt;p&gt;Each flip has a &lt;a href=&quot;http://www.codingthewheel.com/archives/the-coin-flip-a-fundamentally-unfair-proposition&quot; title=&quot;It turns out that flipping a coin is actually biased towards the side that is face up when you flip it.&quot;&gt;seemingly&lt;/a&gt; random outcome. However, “random” usually means that you haven’t looked long enough to see a pattern emerge. If we take the total number of heads and divide it by the total number of flips, we see a very definite pattern emerge:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/headspercentage.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/headspercentage_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But you knew that it was going to be a 50-50 chance &lt;em&gt;in the long run&lt;/em&gt;. When saying something is random, we often mean it’s bounded within some range. &lt;a href=&quot;http://www.flickr.com/photos/ladymixy-uk/4063190403/&quot; title=&quot;Photo is &#39;Wee!&#39; by &#39;M i x y&#39; on Flickr. Used under the Creative Commons Attribution License.&quot;&gt;&lt;img style=&quot;border:0; margin: 10px 0px 0px 15px; width: 320px; display: inline; height: 213px&quot; align=&quot;right&quot; src=&quot;/assets/computing-your-skill/target_ladymixy_uk_320.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It turns out that a better metaphor is to think of a bullseye that archers shoot at. Each arrow will land somewhere near that center. It would be extraordinary to see an arrow hit the bullseye exactly. Most of the arrows will seem to be randomly scattered around it. Although “random,” it’s far more likely that arrows will be near the target than, for example, way out in the woods (well, except if &lt;em&gt;I&lt;/em&gt; was the archer).&lt;/p&gt;

&lt;p&gt;This isn’t a new metaphor; the Greek word στόχος (stochos) refers to a stick set up to aim at. It’s where statisticians get the word &lt;a href=&quot;http://blogs.wnyc.org/radiolab/2009/06/15/stochasticity/&quot; title=&quot;besides, stow chass tick is just fun to pronounce&quot;&gt;stochastic&lt;/a&gt;: a fancy, but slightly more correct word than random. The distribution of arrows brings up another key point:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All things are possible, but not all things are &lt;em&gt;probable&lt;/em&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Probability has &lt;a href=&quot;http://www.youtube.com/watch?v=3pRM4v0O29o#t=5m00s&quot; title=&quot;This is a great talk about the history of probability by Keith Devlin. This specific point comes up around the 5 minute mark.&quot;&gt;changed how ordinary people think&lt;/a&gt;, a feat that rarely happens in mathematics. The very idea that you could understand &lt;em&gt;anything&lt;/em&gt; about future outcomes is such a big leap in thought that it &lt;a href=&quot;http://www.amazon.com/gp/product/0465009107?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0465009107&quot; title=&quot;This is the book that is described in the video of the previous link. It&#39;s a quick read and interesting to see how mathematics is really developed.&quot;&gt;baffled Blaise Pascal&lt;/a&gt;, one of the best mathematicians in history.&lt;/p&gt;

&lt;p&gt;In the summer of 1654, Pascal exchanged a &lt;a href=&quot;http://www.york.ac.uk/depts/maths/histstat/pascal.pdf&quot; title=&quot;You can read the letters here&quot;&gt;series of letters&lt;/a&gt; with &lt;a href=&quot;http://en.wikipedia.org/wiki/Pierre_de_Fermat&quot;&gt;Pierre de Fermat&lt;/a&gt;, another brilliant mathematician, concerning an “unfinished game.” Pascal wanted to know how to divide money among gamblers if they have to leave before the game is finished. Splitting the money fairly required some notion of the probability of outcomes if the game would have been played until the end. This problem gave birth to the field of probability and laid the foundation for lots of fun things like life insurance, casino games, and scary &lt;a href=&quot;http://en.wikipedia.org/wiki/Derivative_%28finance%29&quot; title=&quot;Warren Buffet calls them financial weapons of mass destruction&quot;&gt;financial derivatives&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But probability is more general than predicting the future— it’s a measure of your ignorance of something. It doesn’t matter if the event is set to happen in the future or if it happened months ago. All that matters is that &lt;em&gt;you lack knowledge in something&lt;/em&gt;. Just because we lack knowledge doesn’t mean we can’t do anything useful, but we’ll have to do a lot more coin flips to see it.&lt;/p&gt;

&lt;h2 id=&quot;aggregating-observations&quot;&gt;Aggregating Observations&lt;/h2&gt;

&lt;p&gt;The real magic happens when we aggregate a lot of observations. What would happen if you flipped a coin 1000 times and counted the number of heads? Lots of things are possible, but in my case I got 505 heads. That’s about half, so it’s not surprising. I can graph this by creating a bar chart and put all the possible outcomes (getting 0 to 1000 heads) on the bottom and the total number of times that I got that particular count of heads on the vertical axis. For 1 outcome of 505 total heads it would look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/totalheads1.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/totalheads1_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not too exciting. But what if we did it again? This time I got 518 heads. I can add that to the chart:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/totalheads2.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/totalheads2_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Doing it 8 more times gave me 489, 515, 468, 508, 492, 475, 511, and once again, I got 505. The chart now looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/totalheads10.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/totalheads10_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And after a billion times, a total of one &lt;em&gt;trillion&lt;/em&gt; flips, I got this:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/totalheads1e9.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/totalheads1e9_576.png&quot; alt=&quot;&quot; title=&quot;In case you&#39;re wondering, I used a cryptographically strong random number generator and kept all my two CPU cores busy for a few hours running it as an idle job.&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In all the flips, I never got less than 407 total heads and I never got more than 600. Just for fun, we can zoom in on this region:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/totalheads1e9_zoomed.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/totalheads1e9_zoomed_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we do more sets of flips, the &lt;a href=&quot;http://en.wikipedia.org/wiki/Binomial_distribution&quot; title=&quot;The jagged edges are actually part of a Binomial Distribution. This is discussed more in the accompanying math paper to this article.&quot;&gt;jagged edges&lt;/a&gt; smooth out to give us the famous “&lt;a href=&quot;http://en.wikipedia.org/wiki/Normal_distribution&quot;&gt;bell curve&lt;/a&gt;” that you’ve probably seen before. Math guys love to refer to it as a “&lt;a href=&quot;http://en.wikipedia.org/wiki/Gaussian_function&quot;&gt;Gaussian&lt;/a&gt;” curve because it was used by the German mathematician Carl Gauss in 1809 to investigate errors in astronomical data. He came up with an exact formula of what to expect if we flipped a coin an infinite number of times (so that we don’t have to). This is such a famous result that you can see the curve and its equation if you look closely at the middle of an old 10 Deutsche Mark banknote bearing Gauss’s face:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/10_DM_Gauss_Cropped.jpg&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/10_DM_Gauss_Cropped_576.jpg&quot; alt=&quot;&quot; title=&quot;I wonder: what is the probability of having a mathematician on (legal) USA currency?&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Don’t miss the forest from all the flippin’ trees. The curve is showing you the density of all possible outcomes. By density, I mean how tall the curve gets at a certain point. For example, in counting the total number of heads out of 1000 flips, I expected that 500 total heads would be the most popular outcome and indeed it was. I saw 25,224,637 out of a billion sets that had exactly 500 heads. This works out to about 2.52% of all outcomes. In contrast, if we look at the bucket for 450 total heads, I only saw this happen 168,941 times, or roughly 0.016% of the time. This confirms your observation that the curve is denser, that is, &lt;em&gt;taller&lt;/em&gt; at the mean of 500 than further away at 450.&lt;/p&gt;

&lt;p&gt;This confirms the key point: &lt;strong&gt;all things are possible, but outcomes are not all equally probable&lt;/strong&gt;. There are &lt;a href=&quot;http://www.thisamericanlife.org/Radio_Episode.aspx?episode=398&quot; title=&quot;Here&#39;s a This American Life episode dedicated to longshots&quot;&gt;longshots&lt;/a&gt;. Professional athletes &lt;a href=&quot;http://www.gladwell.com/2000/2000_08_21_a_choking.htm&quot; title=&quot;It&#39;s interesting to read Gladwell&#39;s description of the difference between these two.&quot;&gt;panic or ‘choke’&lt;/a&gt;. The &lt;a href=&quot;http://en.wikipedia.org/wiki/Deep_Blue_%E2%80%93_Kasparov,_1997,_Game_6&quot; title=&quot;Ok, so Kasparov might have had a simple mistake in the last game, but given enough time with Moore&#39;s law, it was going to happen eventually, it just so happened that it was him.&quot;&gt;world’s best chess players have bad days&lt;/a&gt;. Additionally, tales about underdogs &lt;a href=&quot;http://www.youtube.com/watch?v=Hv8x9x5A49s&quot; title=&quot;I think the best part about Mine That Bird winning the Kentucky Derby in 2009 is that it took the TV announcer about 10 seconds to get the horse&#39;s name once it took the lead at the end.&quot;&gt;make us smile&lt;/a&gt;— the longer the odds the better. Unexpected outcomes happen, but there’s still a lot of predictability out there.&lt;/p&gt;

&lt;p&gt;It’s not just coin flips. The bell curve shows up in lots of places like casino games, to the thickness of tree bark, to the measurements of a person’s IQ. Lots of people have looked at the world and have come up with Gaussian models. It’s easy to think of the world as one big, bell shaped playground.&lt;/p&gt;

&lt;p&gt;But the real world isn’t always Gaussian. History books are full of “&lt;a href=&quot;http://www.amazon.com/gp/product/1400063515?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1400063515&quot; title=&quot;The story goes that people used to use the phrase &#39;black swan&#39; to have the same meaning as &#39;when pigs fly&#39; until black swans were actually discovered to exist.&quot;&gt;Black Swan&lt;/a&gt;” events. Stock market crashes and the invention of the computer are statistical outliers that Gaussian models tend not to predict well, but these events shock the world and forever change it. This type of reality isn’t covered by the bell curve, what Black Swan author &lt;a href=&quot;http://www.fooledbyrandomness.com/&quot;&gt;Nassim Teleb&lt;/a&gt; calls the “&lt;a href=&quot;http://books.google.com/books?id=YdOYmYA2TJYC&amp;amp;lpg=PA229&amp;amp;dq=%22the%20bell%20curve%20that%20great%20intellectual%20fraud%22&amp;amp;pg=PA229#v=onepage&amp;amp;q=%22the%20bell%20curve%20that%20great%20intellectual%20fraud%22&amp;amp;f=false&quot;&gt;Great Intellectual Fraud&lt;/a&gt;.” These events would have such low probability that no one would predict them actually happening. There’s a different view of randomness that is a fascinating playground of &lt;a href=&quot;http://en.wikipedia.org/wiki/Beno%C3%AEt_Mandelbrot&quot;&gt;Benoît Mandelbrot&lt;/a&gt; &lt;a href=&quot;http://www.amazon.com/gp/product/0465043577/ref=pd_lpo_k2_dp_sr_1?pf_rd_p=486539851&amp;amp;pf_rd_s=lpo-top-stripe-1&amp;amp;pf_rd_t=201&amp;amp;pf_rd_i=0465043550&amp;amp;pf_rd_m=ATVPDKIKX0DER&amp;amp;pf_rd_r=1J3P8AMPM2MT0QD3S5K3#noop&quot;&gt;and his fractals&lt;/a&gt; that better explain some of these events, but we will ignore all of this to keep things simple. We’ll acknowledge that the Gaussian view of the world isn’t &lt;em&gt;always&lt;/em&gt; right, no more than a map of the world is the actual terrain.&lt;/p&gt;

&lt;p&gt;The Gaussian worldview assumes everything will typically be some average value and then treats everything else as increasingly less likely “errors” as you exponentially drift away from the center (Gauss used the curve to measure &lt;em&gt;errors&lt;/em&gt; in astronomical data after all). However, it’s not fair to treat real observations from the world as “errors” any more than it is to say that a person is an “error” from the “average human” that is half male and half female. Some of these same problems can come up treating a person as having skill that is Gaussian. Disclaimers aside, we’ll go along with George Box’s &lt;a href=&quot;http://books.google.com/books?id=63v--IZrNtsC&amp;amp;lpg=PA61&amp;amp;dq=%22all%20models%20are%20wrong%22%20george%20box&amp;amp;pg=PA61#v=onepage&amp;amp;q=&amp;amp;f=false&quot; title=&quot;See the bottom of page 61 here, although he said it much earlier, at least in 1987. I first heard of this quote in a talk by Peter Norvig on the usefulness of even poor models given lots of data.&quot;&gt;view&lt;/a&gt; that “all models are wrong, but some models are useful.”&lt;/p&gt;

&lt;h2 id=&quot;gaussian-basics&quot;&gt;Gaussian Basics&lt;/h2&gt;

&lt;p&gt;Gaussian curves are completely described by two values:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The mean (average) value which is often represented by the Greek letter μ (mu)&lt;/li&gt;
  &lt;li&gt;The standard deviation, represented by the Greek letter σ (sigma). This indicates how far apart the data is spread out.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In counting the total number heads in 1000 flips, the mean was 500 and the standard deviation was &lt;a href=&quot;http://www.wolframalpha.com/input/?i=sqrt%281000*.5*%281-.5%29%29&quot; title=&quot;I go into this in more details in the accompanying math paper.&quot;&gt;about 16&lt;/a&gt;. In general, 68% of the outcomes will be within ± 1 standard deviation (e.g. 484-516 in the experiment), 95% within 2 standard deviations (e.g. 468-532) and 99.7% within 3 standard deviations (452-548):&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/NormalDistributionWithPercentages.png&quot; title=&quot;I got the idea for this diagram from the Wikipedia article on the normal distribution. However, the color and look didn&#39;t match the rest of the post, so I recreated it in Excel.&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/NormalDistributionWithPercentages_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An important takeaway is that the bell curve allows for &lt;em&gt;all&lt;/em&gt; possibilities, but each possibility is most definitely not equally likely. The bell curve gives us a model to calculate how likely something should be given an average value and a spread. Notice how outcomes sharply become less probable as we drift further away from the mean value.&lt;/p&gt;

&lt;p&gt;While we’re looking at the Gaussian curve, it’s important to look at -3σ away from the mean on the left side. As you can see, &lt;em&gt;most&lt;/em&gt; of the area under the curve is to the right of this point. I mention this because &lt;strong&gt;the TrueSkill algorithm uses the -3σ mark as a (very) conservative estimate for your skill&lt;/strong&gt;. You’re probably better than this conservative estimate, but you’re most likely not worse than this value. Therefore, it’s a stable number for comparing yourself to others and is useful for use in sorting a leaderboard.&lt;/p&gt;

&lt;h2 id=&quot;d-bell-curves-multivariate-gaussians&quot;&gt;3D Bell Curves: Multivariate Gaussians&lt;/h2&gt;

&lt;p&gt;A non-intuitive observation is that Gaussian distributions can occur in more than the two dimensions that we’ve seen so far. You can sort of think of a Gaussian in three dimensions as a mountain. Here’s an example:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Gaussian_3D_Circular.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Gaussian_3D_Circular.png&quot; alt=&quot;&quot; title=&quot;In case you&#39;re wondering, I used GNU Plot to make this. See the accompanying math paper for more details.&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this plot, taller regions represent higher probabilities. As you can see, not all things are equally probable. The most probable value is the mean value that is right in the middle and then things sharply decline away from it.&lt;/p&gt;

&lt;p&gt;In maps of &lt;em&gt;real&lt;/em&gt; mountains, you often see a 2D contour plot where each line represents a different elevation (e.g. every 100 feet):&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/PikesPeakTopoMap.png&quot; title=&quot;I took this snapshot from the 7.5-Minute Series Topographic Map of Pikes Peak Quadrangle from the U.S. Geological Survey (USGS). My wife and I went to Pikes Peak the day we landed in Colorado from Indianapolis. One thing is certain: I felt those elevation lines :). For best experiences, acclimate yourself for a few days and then go.&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/PikesPeakTopoMap_640.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The closer the lines on the map, the sharper the inclines. You can do something similar for 2D representations of 3D Gaussians. In textbooks, you often just see 2D representation that looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Gaussian_2D_Contour.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Gaussian_2D_Contour_640.png&quot; alt=&quot;&quot; title=&quot;It&#39;s unfortunate that most books never show the 3D perspective, it&#39;s much easier to see where it comes from.&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is called an “isoprobability contour” plot. It’s just a fancy way of saying “things that have the same probability will be the same color.” Note that it’s still in three dimensions. In this case, the third dimension is color intensity instead of the height you saw on a surface plot earlier. I like to think of contour plots as treasure maps for playing the “you’re getting warmer…” game. In this case, black means “you’re cold,” red means “you’re getting warmer…,” and yellow means “you’re on fire!” which corresponds to the highest probability.&lt;/p&gt;

&lt;p&gt;See? Now you understand Gaussians and know that “&lt;a href=&quot;http://en.wikipedia.org/wiki/Multivariate_normal_distribution&quot;&gt;multivariate Gaussians&lt;/a&gt;” aren’t as scary as they sound.&lt;/p&gt;

&lt;h2 id=&quot;lets-talk-about-chess&quot;&gt;Let’s Talk About Chess&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/File:ChessSet.jpg&quot; title=&quot;&#39;Chess Set&#39; by Alan Light on Wikipedia, retouched by Andre Riemann. Licensed under the Creative Commons Attribution ShareAlike 3.0 License.&quot;&gt;&lt;img style=&quot;border:0; margin: 0px 0px 5px 10px; display: inline&quot; align=&quot;right&quot; src=&quot;/assets/computing-your-skill/ChessSet_160.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There’s still more to learn, but we’ll pick up what we need along the way. We already have enough tools to do something useful. To warm up, let’s talk about chess because ratings are well-defined there.&lt;/p&gt;

&lt;p&gt;In chess, a bright beginner is expected to have a rating around 1000. Keep in mind that ratings have no units; it’s just a number that is only meaningful when compared to someone else’s number. By &lt;a href=&quot;http://www.chessbase.com/newsdetail.asp?newsid=4326&quot; title=&quot;This 200 point class tradition was established by the Harkness system developed in the early 1950&#39;s. It was a popular precursor to the Elo system that we&#39;ll cover shortly.&quot;&gt;tradition&lt;/a&gt;, a difference of 200 indicates the better ranked player is expected to win 75% of the time. Again, nothing is special about the number 200, it was just chosen to be the difference needed to get a 75% win ratio and effectively defines a “class” of player.&lt;/p&gt;

&lt;p&gt;I’ve slowly been practicing and have a rating around 1200. This means that if I play a bright beginner with a rating of 1000, I’m expected to win three out of four games.&lt;/p&gt;

&lt;p&gt;We can start to visualize a match between me and bright beginner by drawing two bell curves that have a mean of 1000 and 1200 respectively with both having a standard deviation of 200:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/bell_curves_of_bright_beginner_vs_jeff_before.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/bell_curves_of_bright_beginner_vs_jeff_before_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above graph shows what the ratings represent: they’re an indicator of how we’re &lt;em&gt;expected&lt;/em&gt; to perform if we play a game. The most likely performance is exactly what the rating is (the mean value). One non-obvious point is that you can &lt;a href=&quot;http://mathworld.wolfram.com/NormalDifferenceDistribution.html&quot; title=&quot;This subtraction idea is also covered more in the accompanying math paper.&quot;&gt;subtract two bell curves and get another bell curve&lt;/a&gt;. The new center is the difference of the means and the resulting curve is a bit wider than the previous curves. By taking my skill curve (red) and subtracting the beginner’s curve (blue), you’ll get this resulting curve (purple):&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/bell_curves_difference.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/bell_curves_difference_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that it’s centered at 1200 - 1000 = 200. Although interesting to look on its own, it gives some useful information. This curve is representing all possible game outcomes between me and the beginner. The middle shows that I’m expected to be 200 points better. The far left side shows that there is a tiny chance that the beginner has a game where he plays as if he’s 700 points better than I am. The far right shows that there is a tiny chance that I’ll play as if I’m 1100 points better. The curve actually goes on forever in both ways, but the expected probability for those outcomes is so small that it’s effectively zero.&lt;/p&gt;

&lt;p&gt;As a player, you really only care about one very specific point on this curve: zero. Since I have a higher rating, I’m interested in all possible outcomes where the difference is positive. These are the outcomes where I’m expected to outperform the beginner. On the other hand, the beginner is keeping his eye on everything to the left of zero. These are the outcomes where the performance difference is negative, implying that he outperforms me.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/performance_difference_shaded_to_zero.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/performance_difference_shaded_to_zero_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can plug a few numbers into &lt;a href=&quot;http://www.wolframalpha.com/input/?i=CDF%5BNormalDistribution%5B200%2C+200+*+Sqrt%5B2%5D%5D%2C+0%5D&quot; title=&quot;For example: Wolfram Alpha or Excel&quot;&gt;a calculator&lt;/a&gt; and see that there is about a 24% probability that the performance difference will be negative, implying the beginner wins, and a 76% chance that the difference will be positive, meaning that I win. This is roughly the 75% that we were expecting for a 200 point difference.&lt;/p&gt;

&lt;p&gt;This has been a bit too concrete for my particular match with a beginner. We can generalize it by creating another curve where the horizontal axis represents the difference in player ratings and the vertical axis represents the total probability of winning given that rating difference:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/cdf_chess_given_rating_difference.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/cdf_chess_given_rating_difference_640.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As expected, having two players with equal ratings, and thus a rating difference of 0, implies the odds of winning are 50%. Likewise, if you look at the -200 mark, you see the curve is at the 24% that we calculated earlier. Similarly, +200 is at the 76% mark. This also shows that outcomes on the far left side are quite unlikely. For example, the odds of me winning a game against &lt;a href=&quot;http://en.wikipedia.org/wiki/Magnus_Carlsen&quot; title=&quot;Since Kasparov stopped playing professionally, Magnus is the top guy. Not surprisingly, Kasparov is now Magnus&#39;s teacher.&quot;&gt;Magnus Carlsen&lt;/a&gt;, who is at the top of the &lt;a href=&quot;http://ratings.fide.com/top.phtml?list=men&quot; title=&quot;The 19 year old Magnus was at the top of the FIDE leaderboard at the time of this writing (March 2010)&quot;&gt;chess leaderboard&lt;/a&gt; with a rating of 2813, would be at the -1613 mark (1200 - 2813) on this chart and have a probability near one in a &lt;em&gt;billion&lt;/em&gt;. I won’t hold my breath. (Actually, most chess groups use a slightly different curve, but the ideas are the same. See the &lt;a href=&quot;/assets/computing-your-skill/The%20Math%20Behind%20TrueSkill.pdf&quot;&gt;accompanying math paper&lt;/a&gt; for details.)&lt;/p&gt;

&lt;p&gt;All of these curves were probabilities of what &lt;em&gt;might&lt;/em&gt; happen, not what &lt;em&gt;actually&lt;/em&gt; happened. In actuality, let’s say I lost the game by some silly blunder (oops!). The question that the beginner wants to know is how much his rating will go up. It also makes sense that my rating will go down as a punishment for the loss. The harder question is just &lt;em&gt;how much&lt;/em&gt; should the ratings change?&lt;/p&gt;

&lt;p&gt;By winning, the beginner demonstrated that he was probably better than the 25% winning probability we thought he would have. One way of updating ratings is to imagine that each player bets a certain amount of his rating on each game. The amount of the bet is determined by the probability of the outcome. In addition, we decide how dramatic the ratings change should be for an individual game. If you believe the most recent game should count 100%, then you’d expect my rating to go down a lot and his to go up a lot. The decision of how much the most recent game should count leads to what chess guys call the multiplicative “K-factor.”&lt;/p&gt;

&lt;p&gt;The K-Factor is what we multiply a probability by to get the total amount of a rating change. It reflects the maximum possible change in a person’s rating. A reasonable choice of a weight is that the most recent game counts about 7% which leads to a K-factor of 24. New players tend to have more fluctuations than well-established players, so new players might get a K-Factor of 32 while grand masters have a K-factor around&lt;br /&gt;
10. Here’s how the K-Factor changes with respect to how much the latest game should count:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/KFactorAlphaImpact.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/KFactorAlphaImpact_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using a K-Factor of 24 means that my rating will now be lowered to 1182 and the beginner’s will rise to 1018. Our curves are now closer together:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/BeginnerVsJeffAfterUpdate.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/BeginnerVsJeffAfterUpdate_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that our standard deviations never change. Here are the probabilities if we were to play again:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/performance_difference_shaded_to_zero_after.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/performance_difference_shaded_to_zero_after_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This method is known as the &lt;a href=&quot;http://en.wikipedia.org/wiki/Elo_rating_system&quot;&gt;Elo rating system&lt;/a&gt;, named after &lt;a href=&quot;http://en.wikipedia.org/wiki/Arpad_Elo&quot;&gt;Arpad Elo&lt;/a&gt;, the chess enthusiast who created it. It’s relatively simple to implement and most games that calculate skill end here.&lt;/p&gt;

&lt;h2 id=&quot;i-thought-you-said-youd-talk-about-trueskill&quot;&gt;I Thought You Said You’d Talk About TrueSkill?&lt;/h2&gt;

&lt;p&gt;Everything so far has just been prerequisites to the main event; the TrueSkill paper assumes you’re already familiar with it. It was all sort of new to me, so it took awhile to get comfortable with the Elo ideas. Although the Elo model will get you far, there are a few notable things it doesn’t handle well:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Newbies&lt;/strong&gt; - In the Elo system, you’re typically assigned a “provisional” rating for the first 20 games. These games tend to have a higher K-factor associated with them in order to let the algorithm determine your skill faster before it’s slowed down by a non-provisional (and smaller) K-factor. We would like an algorithm that converges quickly onto a player’s true skill (get it?) to not waste their time having unbalanced matches. This means the algorithm should start giving reasonable approximations of skill within 5-10 games.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Teams&lt;/strong&gt; - Elo was explicitly designed for two players. Efforts to adapt it to work for multiple people on multiple teams have primarily been unsophisticated hacks. One such approach is to treat teams as individual players that duel against the other players on the opposing teams and then apply the average of the duels. This is the “duelling heuristic” mentioned in the TrueSkill paper. I implemented it in the &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;accompanying project&lt;/a&gt;. It’s ok, but seems a bit too hackish and doesn’t converge well.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Draws&lt;/strong&gt; - Elo treats draws as a half win and half loss. This doesn’t seem fair because draws can tell you a lot. Draws imply you were evenly paired whereas a win indicates you’re better, but unsure how much better. Likewise, a loss indicates you did worse, but you don’t really know how much worse. So it seems that a draw is important to explicitly model.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The TrueSkill algorithm generalizes Elo by keeping track of two variables: your average (mean) skill &lt;em&gt;and&lt;/em&gt; the system’s uncertainty about that estimate (your standard deviation). It does this instead of relying on a something like a fixed K-factor. Essentially, this gives the algorithm a dynamic k-factor. This addresses the newbie problem because it removes the need to have “provisional” games. In addition, it addresses the other problems in a nice statistical manner. Tracking these two values are so fundamental to the algorithm that Microsoft researchers informally referred to it as the μσ (mu-sigma) system until the marketing guys gave it the name TrueSkill.&lt;/p&gt;

&lt;p&gt;We’ll go into the details shortly, but it’s helpful to get a quick visual overview of what TrueSkill does. Let’s say we have Eric, an experienced player that has played a lot and established his rating over time. In addition, we have newbie: Natalia.&lt;/p&gt;

&lt;p&gt;Here’s what their skill curves might look like before a game:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/TrueSkillCurvesBeforeExample.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/TrueSkillCurvesBeforeExample_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And after Natalia wins:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/TrueSkillCurvesAfterExample.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/TrueSkillCurvesAfterExample_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice how Natalia’s skill curve becomes narrower and taller (i.e. makes a big update) while Eric’s curve barely moves. This shows that the TrueSkill algorithm thinks that she’s probably better than Eric, but doesn’t how much better. Although TrueSkill is a little more confident about Natalia’s mean after the game (i.e. it’s now taller in the middle), it’s still very uncertain. Looking at her updated bell curve shows that her skill could be between 15 and 50.&lt;/p&gt;

&lt;p&gt;The rest of this post will explain how calculations like this occurred and how much more complicated scenarios can occur. But to understand it well enough to implement it, we’ll need to learn a couple of new things.&lt;/p&gt;

&lt;h2 id=&quot;bayesian-probability&quot;&gt;Bayesian Probability&lt;/h2&gt;

&lt;p&gt;Most basic statistics classes focus on frequencies of events occurring. For example, the probability of getting a red marble when randomly drawing from a jar that has 3 red marbles and 7 blue marbles is 30%. Another example is that the probability of rolling two dice and getting a total of 7 is &lt;a href=&quot;http://www.wolframalpha.com/input/?i=probability+getting+7+two+dice&quot;&gt;about 17%&lt;/a&gt;. The key idea in both of these examples is that you can count each type of outcome and then compute the &lt;em&gt;frequency&lt;/em&gt; directly. Although helpful in calculating your odds at casino games, “frequentist” thinking is not that helpful with many practical applications, like finding your skill in a team.&lt;/p&gt;

&lt;p&gt;A different approach is to think of probability as degree of belief in something. The basic idea is that you have some &lt;strong&gt;prior belief&lt;/strong&gt; and then you observe some &lt;strong&gt;evidence&lt;/strong&gt; that updates your belief leaving you with an updated &lt;strong&gt;posterior&lt;/strong&gt; belief. As you might expect, learning about new evidence will typically make you more certain about your belief.&lt;/p&gt;

&lt;p&gt;Let’s assume that you’re trying to find a treasure on a map. The treasure could be anywhere on the map, but you have a hunch that it’s probably around the center of the map and increasingly less likely as you move away from the center. We could track the probability of finding the treasure using the 3D multivariate Gaussian we saw earlier:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Gaussian_3D_Circular.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Gaussian_3D_Circular_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, let’s say that after studying a book about the treasure, you’ve learned that there’s a strong likelihood that treasure is somewhere along the diagonal line on the map. Perhaps this was based on some secret clue. Your clue information doesn’t necessarily mean the treasure will be &lt;em&gt;exactly&lt;/em&gt; on that line, but rather that the treasure will most-likely be near it. The &lt;strong&gt;likelihood function&lt;/strong&gt; might look like this in 3D:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Gaussian_3D_Likelihood.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Gaussian_3D_Likelihood_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We’d like to use our &lt;em&gt;prior&lt;/em&gt; information and this new &lt;em&gt;likelihood&lt;/em&gt; information to come up with a better &lt;em&gt;posterior&lt;/em&gt; guess of the treasure. It turns out that we can just multiply the prior and likelihood to obtain a posterior distribution that looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Gaussian_3D_Posterior.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Gaussian_3D_Posterior_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is giving us a smaller and more concentrated area to look at.&lt;/p&gt;

&lt;p&gt;If you look at most textbooks, you typically just see this information using 2D isoprobability contour plots that we learned about earlier. Here’s the same information in 2D:&lt;/p&gt;

&lt;p&gt;Prior:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Gaussian_2D_Prior.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Gaussian_2D_Prior_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Likelihood:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Gaussian_2D_Likelihood.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Gaussian_2D_Likelihood_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Posterior:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Gaussian_2D_Posterior.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Gaussian_2D_Posterior_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For fun, let’s say we found additional information saying the treasure is along the other diagonal with the following likelihood:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Gaussian_2D_Likelihood_Opposite_Direction.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Gaussian_2D_Likelihood_Opposite_Direction_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To incorporate this information, we’re able to &lt;a href=&quot;http://en.wikipedia.org/wiki/Conjugate_prior&quot; title=&quot;The fancy term for the being able to do this is called the &#39;conjugate prior&#39; since the prior and posterior are &#39;conjoined&#39; like twins. That is, they&#39;re of the same class of function.&quot;&gt;take our last posterior and make that the prior for the next iteration&lt;/a&gt; using the new likelihood information to get this updated posterior:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Gaussian_2D_Posterior_Updated.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Gaussian_2D_Posterior_Updated_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a much more focused estimate than our original belief! We could iterate the procedure and potentially get an even smaller search area.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Thomas_Bayes&quot; title=&quot;Thomas Bayes (c. 1702 - 17 April 1761)&quot;&gt;&lt;img style=&quot;border:0;&quot; align=&quot;right&quot; src=&quot;/assets/computing-your-skill/Thomas_Bayes_220.gif&quot; /&gt;&lt;/a&gt;And that’s basically all there is to it. In TrueSkill, the buried treasure that we look for is a person’s skill. This approach to probability is called “Bayesian” because it was discovered by a Presbyterian minister in the 1700’s named &lt;a href=&quot;http://en.wikipedia.org/wiki/Thomas_Bayes&quot; title=&quot;More precisely, it was Bayes&#39; friend Richard Price who found this unpublished paper after Bayes&#39; death and saw that it was useful and then decided to publish it.&quot;&gt;Thomas Bayes&lt;/a&gt; who liked to dabble in math.&lt;/p&gt;

&lt;p&gt;The central ideas to Bayesian statistics are the prior, the likelihood, and the posterior. There’s detailed math that goes along with this and is in the &lt;a href=&quot;/assets/computing-your-skill/The%20Math%20Behind%20TrueSkill.pdf&quot;&gt;accompanying paper&lt;/a&gt;, but understanding these basic ideas is more important:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“When you understand something, then you can find the math to express that understanding. The math doesn’t provide the understanding.”— &lt;a href=&quot;http://www.reddit.com/r/programming/comments/bblt4/lamport_when_you_understand_something_then_you/&quot;&gt;Lamport&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Bayesian methods have only recently become popular in the computer age because computers can quickly iterate through several tedious rounds of priors and posteriors. Bayesian methods have historically been popular inside of Microsoft Research (where TrueSkill was invented). Way &lt;a href=&quot;http://people.cs.ubc.ca/~murphyk/Bayes/la.times.html&quot;&gt;back in 1996&lt;/a&gt;, Bill Gates considered Bayesian statistics to be Microsoft Research’s secret sauce.&lt;/p&gt;

&lt;p&gt;As we’ll see later on, we can use the Bayesian approach to calculate a person’s skill. In general, it’s highly useful to update your belief based off previous evidence (e.g. your performance in previous games). This &lt;em&gt;usually&lt;/em&gt; works out well. However, sometimes “&lt;a href=&quot;http://www.amazon.com/gp/product/1400063515?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1400063515&quot;&gt;Black Swans&lt;/a&gt;” are present. For example, &lt;a href=&quot;http://books.google.com/books?id=gWW4SkJjM08C&amp;amp;lpg=PR2&amp;amp;dq=black%20swan&amp;amp;pg=PA40#v=onepage&amp;amp;q=&amp;amp;f=false&quot; title=&quot;In general, this is called the Problem of Inductive Knowledge and is discussed in the book.&quot;&gt;a turkey&lt;/a&gt; using Bayesian inference would have a very specific posterior distribution of the kindness of a farmer who feeds it every day for 1000 days only to be surprised by a Thanksgiving event that was so many standard deviations away from the turkey’s mean belief that he never would have saw it coming. Skill has similar potential for a “Thanksgiving” event where an average player beats the best player in the world. We’ll acknowledge that small possibility, but ignore it to simplify things (and give the unlikely winner a great story for the rest of his life).&lt;/p&gt;

&lt;p&gt;TrueSkill claims that it is Bayesian, so you can be sure that there is going to be a concept of a prior and a likelihood in it— and there is. We’re getting closer, but we still need to learn a few more details.&lt;/p&gt;

&lt;h2 id=&quot;the-marginalized-but-not-forgotten-distribution&quot;&gt;The Marginalized, but Not Forgotten Distribution&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;http://www.flickr.com/photos/clairity/145758101/&quot; title=&quot;&#39;Running against the light&#39; by clairity on Flickr. Used under the Creative Commons Attribution License&quot;&gt;&lt;img style=&quot;border:0; margin: 0px 0px 0px 10px; width: 400px; display: inline; height: 300px&quot; align=&quot;right&quot; src=&quot;/assets/computing-your-skill/clarity_man_running_at_crosswalk_400.jpg&quot; /&gt;&lt;/a&gt;Next we need to learn about “&lt;a href=&quot;http://en.wikipedia.org/wiki/Marginal_distribution&quot;&gt;marginal distributions&lt;/a&gt;”, often just called “marginals.” Marginals are a way of distilling information to focus on what you care about. Imagine you have a table of sales for each month for the past year. Let’s say that you only care about total sales for the year. You could take out your calculator and add up all the sales in each month to get the total aggregate sales for the year. Since you care about this number and it wasn’t in the original report, you could add it in the &lt;em&gt;margin&lt;/em&gt; of the table. That’s roughly where “margin-al” got its name.&lt;/p&gt;

&lt;p&gt;Wikipedia has a great &lt;a href=&quot;http://en.wikipedia.org/wiki/Marginal_distribution&quot; title=&quot;This illustration came from the article on Marginal distribution that helped me finally get marginals&quot;&gt;illustration&lt;/a&gt; on the topic: consider a guy that ignores his mom’s advice and &lt;em&gt;never&lt;/em&gt; looks both ways when crossing the street. Even worse, he’s too engrossed in listening to his iPod that he doesn’t look &lt;em&gt;any&lt;/em&gt; way, he just always crosses.&lt;/p&gt;

&lt;p&gt;What’s the probability of him getting hit by a car at a specific intersection? Let’s simplify things by saying that it just depends on whether the light is red, yellow, or green.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Light State&lt;/th&gt;
      &lt;th&gt;Red&lt;/th&gt;
      &lt;th&gt;Yellow&lt;/th&gt;
      &lt;th&gt;Green&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Probability of getting hit given light state&lt;/td&gt;
      &lt;td&gt;1%&lt;/td&gt;
      &lt;td&gt;9%&lt;/td&gt;
      &lt;td&gt;90%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This is helpful, but it doesn’t tell us what we want. We also need to know how long the light stays a given color&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Light color&lt;/th&gt;
      &lt;th&gt;Red&lt;/th&gt;
      &lt;th&gt;Yellow&lt;/th&gt;
      &lt;th&gt;Green&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;% Time in Color&lt;/td&gt;
      &lt;td&gt;60%&lt;/td&gt;
      &lt;td&gt;10%&lt;/td&gt;
      &lt;td&gt;30%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;There’s a bunch of probability data here that’s a bit overwhelming. If we join the probabilities together, we’ll have a “joint distribution” that’s just a big complicated system that tells us &lt;em&gt;too much&lt;/em&gt; information.&lt;/p&gt;

&lt;p&gt;We can start to distill this information down by calculating the probability of getting hit given each light state:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Red&lt;/th&gt;
      &lt;th&gt;Yellow&lt;/th&gt;
      &lt;th&gt;Green&lt;/th&gt;
      &lt;th&gt;Total Probability of Getting Hit&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;1%*60% = 0.6%&lt;/td&gt;
      &lt;td&gt;9%*10% = 0.9%&lt;/td&gt;
      &lt;td&gt;90%*30% = 27%&lt;/td&gt;
      &lt;td&gt;28.5%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;In the right &lt;em&gt;margin&lt;/em&gt; of the table we get the value that really matters to this guy. There’s a 28.5% &lt;em&gt;marginal probability&lt;/em&gt; of getting hit if the guy never looks for cars and just always crosses the street. We obtained it by “summing out” the individual components. That is, we simplified the problem by eliminating variables and we eliminated variables by just focusing on the total rather than the parts.&lt;/p&gt;

&lt;p&gt;This idea of marginalization is very general. The central question in this article is “computing your skill,” but your skill is complicated. When using Bayesian statistics, we often can’t observe something directly, so we have to come up with a probability distribution that’s more complicated and then “marginalize” it to get the distribution that we really want. We’ll need to marginalize your skill by doing a similar “summing-out” procedure as we did for the reckless guy above.&lt;/p&gt;

&lt;p&gt;But before we do that, we need to learn another technique to make calculations simpler.&lt;/p&gt;

&lt;h2 id=&quot;whats-a-factor-graph-and-why-do-i-care&quot;&gt;What’s a Factor Graph, and Why Do I Care?&lt;/h2&gt;

&lt;p&gt;Remember your algebra class when you worked with expressions like this?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/equation_not_factored.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/equation_not_factored_640.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Your teacher showed you that you could simplify this by “factor-ing” out w, like this:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/expression_factored.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/expression_factored_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We often factor expressions to make them easier to understand and to simplify calculations. Let’s replace the variables above with w=4, x=1, y=2, and z=3.&lt;/p&gt;

&lt;p&gt;Let’s say the numbers on our calculator are circles and the operators are squares. We could come up with an “&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/bb397951.aspx&quot;&gt;expression tree&lt;/a&gt;” to describe the calculation like this:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/factor_graph_complicated_factorization.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/factor_graph_complicated_factorization_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can tell how tedious this computation is by counting 11 “buttons” we’d have to push. We could also factor it like this&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/factor_graph_complicated_simplified.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/factor_graph_complicated_simplified_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This “factorization” has a total of 7 buttons, a savings of 4 buttons. It might not seem like much here, but factorizing is a big idea.&lt;/p&gt;

&lt;p&gt;We face a similar problem of how to factor things when we’re looking to simplify a complicated probability distribution. We’ll soon see how your skill is composed of several “factors” in a joint distribution. We can simplify computations based on how variables are related to these factors. We’ll break up the joint distribution into a bunch of factors on a graph. &lt;strong&gt;This graph that links factors and variables is called a “factor graph.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The key idea about a factor graph is that we represent the marginal conditional probabilities as variables and then represent each major function of those variables as a “factor.” We’ll take advantage of how the graph “factorizes” and imagine that each factor is a node on a network that’s optimized for efficiency. A key efficiency trick is that factor nodes send “messages” to other nodes. These messages help simplify further marginal computations. The “message passing” is very important and thus will be highlighted with arrows in the upcoming graphs; gray arrows represent messages going “down” the graph and black show messages coming “up” the graph.&lt;/p&gt;

&lt;p&gt;The accompanying &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;code&lt;/a&gt; and &lt;a href=&quot;/assets/computing-your-skill/The%20Math%20Behind%20TrueSkill.pdf&quot;&gt;math paper&lt;/a&gt; go into details about exactly how this happens, but it’s important to realize the high level idea first. That is, we want to look at all the factors that go into creating the likelihood function for updating a person’s skill based on a game outcome. Representing this information in a factor graph helps us see how things are related.&lt;/p&gt;

&lt;p&gt;Now we have all the foundational concepts that we’re ready for the main event: the TrueSkill factor graph!&lt;/p&gt;

&lt;h2 id=&quot;enough-chess-lets-rank-something-harder&quot;&gt;Enough Chess, Let’s Rank Something Harder!&lt;/h2&gt;

&lt;p&gt;The TrueSkill algorithm is Bayesian because it’s composed of a prior multiplied by a likelihood. I’ve highlighted these two components in the sample factor graph from the TrueSkill paper that looks scary at first glance:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/TrueSkillFullFactorgraph.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/TrueSkillFullFactorgraph_720.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This factor graph shows the outcome of a match that had 3 teams all playing against each other. The first team (on the left) only has one player, but this player was able to defeat both of the other teams. The second team (in the middle) had two players and this team tied the third team (on the right) that had just one player.&lt;/p&gt;

&lt;p&gt;In TrueSkill, we just care about a player’s marginal skill. However, as is often the case with Bayesian models, we have to explicitly model other things that impact the variable we care about. We’ll briefly cover each factor (more details are in the &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;code&lt;/a&gt; and &lt;a href=&quot;/assets/computing-your-skill/The%20Math%20Behind%20TrueSkill.pdf&quot;&gt;math paper&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id=&quot;factor-1-what-do-we-already-know-about-your-skill&quot;&gt;Factor #1: What Do We Already Know About Your Skill?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Layer1_priors.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Layer1_priors_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first factor starts the whole process. It’s where we get a player’s previous skill level from somewhere (e.g. a player database). At this point, we add some uncertainty to your skill’s standard deviation to keep game dynamics interesting and prevent the standard deviation from hitting zero since the rest of algorithm will make it smaller (since the whole point is to learn about you and become more certain).&lt;/p&gt;

&lt;p&gt;There is a factor and a variable for each player. Each factor is a function that remembers a player’s previous skill. Each variable node holds the current value of a player’s skill. I say “current” because this is the value that we’ll want to know about after the whole algorithm is completed. Note that the message arrow on the factor only goes one way; we never go back to the prior factor. It just gets things going. However, we will come back to the variable.&lt;/p&gt;

&lt;p&gt;But we’re getting ahead of ourselves.&lt;/p&gt;

&lt;h2 id=&quot;factor-2-how-are-you-going-to-perform&quot;&gt;Factor #2: How Are You Going To Perform?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Layer2_likelihood.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Layer2_likelihood_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we add in beta (β). You can think of beta as the number of skill points to guarantee about an 80% chance of winning. The TrueSkill inventors &lt;a href=&quot;http://www.microsoft.com/downloads/details.aspx?FamilyID=1acc9bf7-920d-477b-a7b1-4945b3cb04dd&amp;amp;DisplayLang=en&quot; title=&quot;This occurred during a GameFest 2007 presentation. Although this talk gets cut short due to an audio problem, it&#39;s pretty good at giving an overview.&quot;&gt;refer&lt;/a&gt; to beta as defining the length of a “skill chain.”&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/BetaSkillChainIllustration.png&quot; title=&quot;The faceless people and chain in this picture came from the Open Clip Art project and are in the public domain. The idea for this image came from Ralf Herbrich&#39;s 2007 GameFest presentation that I linked to in the previous link.&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/BetaSkillChainIllustration_640.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The skill chain is composed of the worst player on the far left and the best player on the far right. Each subsequent person on the skill chain is “beta” points better and has an 80% win probability against the weaker player. This means that a small beta value indicates a high-skill game (e.g. &lt;a href=&quot;http://en.wikipedia.org/wiki/Go_%2528game%2529&quot;&gt;Go&lt;/a&gt;) since smaller differences in points lead to the 80%:20% ratio. Likewise, a game based on chance (e.g. &lt;a href=&quot;http://en.wikipedia.org/wiki/Uno_%2528card_game%2529&quot;&gt;Uno&lt;/a&gt;) is a low-skill game that would have a higher beta and smaller skill chain.&lt;/p&gt;

&lt;p&gt;Factor #3: How is Your Team Going to Perform?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Layer3_team_sum.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Layer3_team_sum_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we’re ready for one of the most controversial aspects of TrueSkill: computing the performance of a team as a whole. In TrueSkill, we assume the team’s performance is the sum of each team member’s performance. I say that it’s “controversial” because some members of the team probably work harder than others. Additionally, sometimes special dynamics occur that make the sum greater than the parts. However, we’ll fight the urge to make it much more complicated and heed &lt;a href=&quot;http://www.forecastingprinciples.com/files/pdf/Makridakia-The%20M3%20Competition.pdf&quot; title=&quot;See page 452, second column, item a&quot;&gt;Makridakis’s advice&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One cool thing about this factor is that you can weight each team member’s contribution by the amount of time that they played. For example, if two players are on a team but each player only played half of the time (e.g. &lt;a href=&quot;http://en.wiktionary.org/wiki/tag_team&quot;&gt;a tag team&lt;/a&gt;), then we would treat them differently than if these two players played the entire time. This is officially known as “partial play.” Xbox game titles report the percentage of time a player was active in a game under the “X_PROPERTY_PLAYER_PARTIAL_PLAY_PERCENTAGE” property that is recorded for each player (it defaults to 100%). This information is used by TrueSkill to perform a fairer update. I implemented this feature in the &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;accompanying source code&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;factor-4-howd-your-team-compare&quot;&gt;Factor #4: How’d Your Team Compare?&lt;/h2&gt;

&lt;p&gt;Next, we compare team performances in pairs. We do this by subtracting team performances to come up with pairwise differences:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Layer4_Team_Diff.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Layer4_Team_Diff_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is similar to what we did earlier with Elo and subtracting curves to get a new curve.&lt;/p&gt;

&lt;h2 id=&quot;factor-5-how-should-we-interpret-the-team-differences&quot;&gt;Factor #5: How Should We Interpret the Team Differences?&lt;/h2&gt;

&lt;p&gt;The bottom of the factor graph contains a comparison factor based on the team performance differences we just calculated:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/computing-your-skill/Layer5_Diff_Comparison.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The comparison depends on whether the pairwise difference was considered a “win” or a “draw.” Obviously, this depends on the rules of the game. It’s important to realize that TrueSkill only cares about these two types of results. TrueSkill doesn’t care if you won by a little or a lot, the only thing that matters is if you won. Additionally, in TrueSkill we imagine that there is a buffer of space called a “draw margin” where performances are equivalent. For example, in Olympic swimming, two swimmers can “draw” because their times are equivalent to 0.01 seconds even though the times differ by several thousandths of a second. In this case, the “draw margin” is relatively small around 0.005 seconds. Draws are very common in chess at the grandmaster level, so the draw margin would be much greater there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The output of the comparison factor directly relates to how much your skill’s mean and standard deviation will change&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The exact math involved in this factor &lt;a href=&quot;http://research.microsoft.com/apps/pubs/default.aspx?id=74554&quot; title=&quot;Ok, so it&#39;s quite complicated&quot;&gt;is complicated&lt;/a&gt;, but the core idea is simple:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Expected outcomes cause small updates because the algorithm already had a good guess of your skill. -   Unexpected outcomes (upsets) cause larger updates to make the algorithm more likely to predict the outcome in the future.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href=&quot;/assets/computing-your-skill/The%20Math%20Behind%20TrueSkill.pdf&quot;&gt;accompanying math paper&lt;/a&gt; goes into detail, but conceptually you can think of the performance difference as a number on the bottom (x-axis) of a graph. It represents the difference between the expected winner and the expected loser. A large negative number indicates a big upset (e.g. an underdog won) and a large positive number means the expected person won. The exact update of your skill’s mean will depend on the probability of a draw, but you can get a feel for it by looking at this graph:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/VWinFunctionWithDrawProbabilities.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/VWinFunctionWithDrawProbabilities_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Similarly, the update to a skill’s standard deviation (i.e. uncertainty) depends on how expected the outcome was. An expected outcome shrinks the uncertainty by a small amount (e.g. we already knew it was going to happen). Likewise, an unexpected outcome shrinks the standard deviation more because it was new information that we didn’t already have:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/WWinFunctionWithDrawProbabilities.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/WWinFunctionWithDrawProbabilities_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One problem with this comparison factor is that we use some fancy math that just makes an approximation (a good approximation, but still an approximation). We’ll refine the approximation in the next step.&lt;/p&gt;

&lt;h2 id=&quot;the-inner-schedule-iterate-iterate-iterate&quot;&gt;The Inner Schedule: Iterate, Iterate, Iterate!&lt;/h2&gt;

&lt;p&gt;We can make a better approximation of the team difference factors by passing around the messages that keep getting updated in the following loop:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/Layer_Iterate_Inner.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/Layer_Iterate_Inner_576.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After a few iterations of this loop, the changes will be less dramatic and we’ll arrive at stable values for each marginal.&lt;/p&gt;

&lt;h2 id=&quot;enough-already-give-me-my-new-rating&quot;&gt;Enough Already! Give Me My New Rating!&lt;/h2&gt;

&lt;p&gt;Once the inner schedule has stabilized the values at the bottom of the factor graph, we can reverse the direction of each factor and propagate messages back up the graph. These reverse messages are represented by black arrows in the graph of each factor. &lt;strong&gt;Each player’s new skill rating will be the value of player’s skill marginal variable once messages have reached the top of the factor graph.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By default, we give everyone a “full” skill update which is the result of the above procedure. However, there are times when a game title might want to not make the match outcome count much because of less optimal playing conditions (e.g. there was a lot of network lag during the game). Games can do this with a “partial update” that is just a way to apply only a fraction of the full update. Game titles specify this via the X_PROPERTY_PLAYER_SKILL_UPDATE_WEIGHTING_FACTOR variable. I implemented this feature in the &lt;a href=&quot;http://github.com/moserware/Skills/blob/master/Skills/PartialPlay.cs&quot;&gt;accompanying source code&lt;/a&gt; and describe it in the &lt;a href=&quot;/assets/computing-your-skill/The%20Math%20Behind%20TrueSkill.pdf&quot;&gt;math paper&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;

&lt;p&gt;There are some more details left, but we’ll stop for now. The &lt;a href=&quot;/assets/computing-your-skill/The%20Math%20Behind%20TrueSkill.pdf&quot;&gt;accompanying math paper&lt;/a&gt; and &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;source code&lt;/a&gt; fill in most of the missing pieces. One of the best ways to learn the details is to implement TrueSkill yourself. Feel free to create a port of the &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;accompanying project&lt;/a&gt; in your favorite language and share it with the world. Writing your own implementation will help solidify all the concepts presented here.&lt;/p&gt;

&lt;p&gt;The most rewarding part of implementing the TrueSkill algorithm is to see it work well in practice. My coworkers have commented on how it’s almost “eerily” accurate at computing the right skill for everyone relatively quickly. After several months of playing foosball, the top of the leaderboard (sorted by TrueSkill: the mean minus 3 standard deviations) was very stable. Recently, a very good player started playing and is now the #2 player. Here’s a graph of the most recent changes in TrueSkill for the top 5 (of around 40) foosball players:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/computing-your-skill/MostRecentFoosballTrueSkill.png&quot;&gt;&lt;img src=&quot;/assets/computing-your-skill/MostRecentFoosballTrueSkill_720.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Note: Look how quickly the system detected how good this new #2 player is even though his win ratio is right at 50%)&lt;/p&gt;

&lt;p&gt;Another interesting aspect of implementing TrueSkill is that it has raised an awareness of ratings among players. People that otherwise wouldn’t have played together now occasionally play each other because they know they’re similarly matched and will have a good game. One advantage of TrueSkill is that it’s not that big of a deal to lose to a much better player, so it’s still ok to have unbalanced games. In addition, having ratings has been a good way to judge if you’re improving in ability with a new shot technique in foosball or learning more chess theory.&lt;/p&gt;

&lt;h2 id=&quot;fun-things-from-here&quot;&gt;Fun Things from Here&lt;/h2&gt;

&lt;p&gt;The obvious direction to go from here is to add more games to the system and see if TrueSkill handles them equally well. Given that TrueSkill is the default ranking system on Xbox live, this will probably work out well. Another direction is to see if there’s a big difference in TrueSkill based on position in a team (e.g. midfield vs. goalie in foosball). Given TrueSkill’s sound statistics based on ranking and matchmaking, you might even have some success in using it to decide between to several options. You could have each option be a “player” and decide each “match” based on your personal whims of the day. If nothing else, this would be an interesting way to pick your next vacation spot or even your child’s name.&lt;/p&gt;

&lt;p&gt;If you broaden the scope of your search to using the ideas that we’ve learned along the way, there’s a lot more applications. Microsoft’s &lt;a href=&quot;http://videolectures.net/nipsworkshops09_graepel_pmlca/&quot;&gt;AdPredictor&lt;/a&gt; (i.e. the part that delivers relevant ads on &lt;a href=&quot;http://www.bing.com/&quot;&gt;Bing&lt;/a&gt;) was created by the TrueSkill team and uses similar math, but is a different application.&lt;/p&gt;

&lt;p&gt;As for me, it was rewarding to work with an algorithm that has fun social applications as well as picking up machine learning tidbits along the way. It’s too bad all of that didn’t help me hit the top of any of the leaderboards.&lt;/p&gt;

&lt;p&gt;Oh well, it’s been a fun journey. I’d love to hear if you dived into the algorithm after reading this and would especially appreciate any updates to &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;my code&lt;/a&gt; or other language forks.&lt;/p&gt;

&lt;p&gt;Links:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/assets/computing-your-skill/The%20Math%20Behind%20TrueSkill.pdf&quot;&gt;The Math Behind TrueSkill&lt;/a&gt; - A math-filled paper that fills in some of the details left out of this post. -   &lt;a href=&quot;http://github.com/moserware/Skills&quot;&gt;Moserware.Skills&lt;/a&gt; Project on GitHub - My full implementation of Elo and TrueSkill in C#. Please feel free to create your own language forks. -   Microsoft’s online &lt;a href=&quot;http://research.microsoft.com/en-us/projects/trueskill/calculators.aspx&quot;&gt;TrueSkill Calculators&lt;/a&gt; - Allows you to play with the algorithm without having to download anything. My implementation matches the results of these calculators.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Special thanks to &lt;a href=&quot;http://research.microsoft.com/en-us/people/rherb/&quot;&gt;Ralf Herbrich&lt;/a&gt;, &lt;a href=&quot;http://research.microsoft.com/en-us/um/people/minka/&quot;&gt;Tom Minka&lt;/a&gt;, and &lt;a href=&quot;http://research.microsoft.com/en-us/people/thoreg/&quot;&gt;Thore Graepel&lt;/a&gt; on the &lt;a href=&quot;http://research.microsoft.com/en-us/projects/trueskill/&quot;&gt;TrueSkill&lt;/a&gt; team at &lt;a href=&quot;http://research.microsoft.com/en-us/labs/cambridge/default.aspx&quot;&gt;Microsoft Research Cambridge&lt;/a&gt; for their help in answering many of my detailed questions about their fascinating algorithm.&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Thu, 18 Mar 2010 08:33:00 +0000</pubDate>
        <link>http://www.moserware.com/2010/03/computing-your-skill.html</link>
        <guid isPermaLink="true">http://www.moserware.com/2010/03/computing-your-skill.html</guid>
        
        
      </item>
    
      <item>
        <title>A Stick Figure Guide to the Advanced Encryption Standard (AES)</title>
        <description>&lt;p&gt;&lt;strong&gt;(A play in 4 acts. Please feel free to exit along with the stage character that best represents you. Take intermissions as you see fit. Click on the stage if you have a hard time seeing it. If you get bored, you can &lt;a href=&quot;http://github.com/moserware/AES-Illustrated&quot;&gt;jump to the code&lt;/a&gt;. Most importantly, enjoy the show!)&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&quot;act-1-once-upon-a-time&quot;&gt;Act 1: Once Upon a Time…&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_01_intro_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_01_intro_576.png&quot; title=&quot;I handle petabytes of data every day. From encrypting juicy Top Secret intelligence to boring packets bound for your WiFi router, I do it all!&quot; alt=&quot;intro&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_02_sad_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_02_sad_576.png&quot; title=&quot;...and still no one seems to care about me or my story.&quot; alt=&quot;sad&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_03_cinderella_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_03_cinderella_576.png&quot; title=&quot;I&#39;ve got a better-than-Cinderella story as I made my way to become king of the block cipher world.&quot; alt=&quot;aes act 1 scene 03 cinderella&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_04_started_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_04_started_576.png&quot; title=&quot;Whoah! You&#39;re still there. You want to hear it? Well let&#39;s get started...&quot; alt=&quot;aes act 1 scene 04 started&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_05_judge_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_05_judge_576.png&quot; title=&quot;Once upon a time, there was no good way for people outside secret agencies to judge good crypto.&quot; alt=&quot;aes act 1 scene 05 judge&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_06_nbs_decree_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_06_nbs_decree_576.png&quot; title=&quot;A decree went throughout the land to find a good, secure, algorithm.&quot; alt=&quot;aes act 1 scene 06 nbs decree&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_07_lucifer_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_07_lucifer_576.png&quot; title=&quot;One worth competitor named Lucifer came forward.&quot; alt=&quot;aes act 1 scene 07 lucifer&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_08_anoint_des_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_08_anoint_des_576.png&quot; title=&quot;After being modified by the National Security Agency (NSA), he was anointed as the Data Encryption Standard (DES).&quot; alt=&quot;aes act 1 scene 08 anoint des&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_09_des_ruled_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_09_des_ruled_576.png&quot; title=&quot;DES ruled in the land for over 20 years. Academics studied him intently. For the first time, there was something specific to look at. The modern field of cryptography was born.&quot; alt=&quot;aes act 1 scene 09 des ruled&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_10_des_defeated_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_10_des_defeated_576.png&quot; title=&quot;Over the years, many attackers challenged DES. He was defeated in several battles.&quot; alt=&quot;aes act 1 scene 10 des defeated&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_11_triple_des_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_11_triple_des_576.png&quot; title=&quot;The only way to stop the attacks was to use DES 3 times in a row to form Triple-DES. This worked, but it was awfully slow.&quot; alt=&quot;aes act 1 scene 11 triple des&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_12_nist_decree_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_12_nist_decree_576.png&quot; title=&quot;Another decree went out...&quot; alt=&quot;aes act 1 scene 12 nist decree&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_13_rallied_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_13_rallied_576.png&quot; title=&quot;This call rallied the crypto wizards to develop something better.&quot; alt=&quot;aes act 1 scene 13 rallied&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_14_rijndael_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_14_rijndael_576.png&quot; title=&quot;My creators, Vincent Rijmen and Joan Daemen, were among these crypto wizards. They combined their last names to give me my birth name: Rijndael.&quot; alt=&quot;aes act 1 scene 14 rijndael&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_15_vote_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_15_vote_576.png&quot; title=&quot;Everyone got together to vote and...&quot; alt=&quot;aes act 1 scene 15 vote&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_16_won_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_16_won_576.png&quot; title=&quot;I won!!&quot; alt=&quot;aes act 1 scene 16 won&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_17_intel_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_17_intel_576.png&quot; title=&quot;...and now I&#39;m the new king of the crypto world. You can find me everywhere. Intel is even putting native instructions for me in their next chip to make me smokin&#39; fast!&quot; alt=&quot;aes act 1 scene 17 intel&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_18_crypto_question_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_1_scene_18_crypto_question_576.png&quot; title=&quot;AES: Any questions? Audience guy: Nice story and all, but how does crypto work?&quot; alt=&quot;aes act 1 scene 18 crypto question&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;act-2-crypto-basics&quot;&gt;Act 2: Crypto Basics&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_2_scene_01_three_big_ideas_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_2_scene_01_three_big_ideas_576.png&quot; title=&quot;Great question! You only need to know 3 big ideas to understand crypto.&quot; alt=&quot;aes act 2 scene 01 three big ideas&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_2_scene_02_confusion_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_2_scene_02_confusion_576.png&quot; title=&quot;Big Idea #1: Confusion - It&#39;s a good idea to obscure the relationship between your real message and your encrypted message. An example of this confusion is the trusty ol&#39; Caesar Cipher.&quot; alt=&quot;aes act 2 scene 02 confusion&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_2_scene_03_diffusion_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_2_scene_03_diffusion_576.png&quot; title=&quot;Big Idea #2: Diffusion - It&#39;s also a good idea to spread out the message. An example of this diffusion is a simple column transposition.&quot; alt=&quot;aes act 2 scene 03 diffusion&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_2_scene_04_key_secrecy_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_2_scene_04_key_secrecy_576.png&quot; title=&quot;Big Idea #3: Secrecy Only in the Key - After thousands of years, we learned that it&#39;s a bad idea to assume that no one knows how your method works. Someone will eventually find that out.&quot; alt=&quot;aes act 2 scene 04 key secrecy&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_2_scene_05_aes_details_question_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_2_scene_05_aes_details_question_576.png&quot; title=&quot;AES: Does that answer your question? Some audience guy: That helps, but was too general. How do *you* work?&quot; alt=&quot;aes act 2 scene 05 aes details question&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;act-3-details&quot;&gt;Act 3: Details&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_01_sign_this_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_01_sign_this_576.png&quot; title=&quot;AES: I&#39;d be happy to tell you how I work, but you have to sign this first. Some audience guy: Uh... what&#39;s that?&quot; alt=&quot;aes act 3 scene 01 sign this&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_02_agreement_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_02_agreement_576.png&quot; title=&quot;Foot-Shooting Prevention Agreement: I, (your name), promise that once I see how simple AES really is, I will *not* implement it in production code even though it would be really fun. This agreement shall be in effect until the undersigned creates a meaningful interpretive dance that compares and contrasts cache-based, timing, and other side channel attacks and their countermeasures. (Signature) (Date)&quot; alt=&quot;aes act 3 scene 02 agreement&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_03_state_matrix_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_03_state_matrix_576.png&quot; title=&quot;I take your data and load it into this 4x4 square.&quot; alt=&quot;aes act 3 scene 03 state matrix&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_04_initial_round_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_04_initial_round_576.png&quot; title=&quot;The initial round has me xor each input byte with the corresponding byte of the first round key.&quot; alt=&quot;aes act 3 scene 04 initial round&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_05_xor_tribute_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_05_xor_tribute_576.png&quot; title=&quot;A Tribute to XOR: There&#39;s a simple reason why I use xor to apply the key and in other spots: it&#39;s fast and cheap - a quick bit flipper. It uses minimal hardware and can be done in parallel since no pesky carry bits are needed.&quot; alt=&quot;aes act 3 scene 05 xor tribute&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_06_key_expansion_part_1_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_06_key_expansion_part_1_576.png&quot; title=&quot;Key Expansion: Part 1 - I need lots of keys for use in later rounds. I derive all of them from the initial key using a simple mixing technique that&#39;s really fast. Despite its critics, it&#39;s good enough.&quot; alt=&quot;aes act 3 scene 06 key expansion part 1&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_07_key_expansion_part_2a_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_07_key_expansion_part_2a_576.png&quot; title=&quot;Key Expansion: Part 2a - 1. I take the last column of the previous round key and move the top byte to the bottom. 2. Next, I run each byte through a substitution box that will map it to something else.&quot; alt=&quot;aes act 3 scene 07 key expansion part 2a&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_08_key_expansion_part_2b_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_08_key_expansion_part_2b_576.png&quot; title=&quot;Key Expansion: Part 2b - 3. I then xor the column with a round constant that is different for each round. 4. Finally, I xor it with the first column of the previous round key.&quot; alt=&quot;aes act 3 scene 08 key expansion part 2b&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_09_key_expansion_part_3_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_09_key_expansion_part_3_576.png&quot; title=&quot;Key Expansion: Part 3 - The other columns are super-easy, I just xor the previous column with the same column of the previous round.&quot; alt=&quot;aes act 3 scene 09 key expansion part 3&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_10_intermediate_round_start_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_10_intermediate_round_start_576.png&quot; title=&quot;Next, I start the intermediate rounds. A round is just a series of steps that I repeat several times. The number of repetitions depends on the size of the key.&quot; alt=&quot;aes act 3 scene 10 intermediate round start&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_11_substitute_bytes_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_11_substitute_bytes_576.png&quot; title=&quot;Applying Confusion: Substitute Bytes - I use confusion (Big Idea #1) to obscure the relationship of each byte. I put each byte into a substitution box (sbox), which will map it to a different byte.&quot; alt=&quot;aes act 3 scene 11 substitute bytes&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_12_shift_rows_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_12_shift_rows_576.png&quot; title=&quot;Applying Diffusion: Part 1 (Shift Rows) - Next, I shift the rows to the left and then wrap them around the other side.&quot; alt=&quot;aes act 3 scene 12 shift rows&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_13_mix_columns_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_13_mix_columns_576.png&quot; title=&quot;Applying Diffusion: Part 2 (Mix Columns) - I take each column and mix up the bits in it.&quot; alt=&quot;aes act 3 scene 13 mix columns&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_14_add_round_key_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_14_add_round_key_576.png&quot; title=&quot;Applying Key Secrecy: Add Round Key - At the end of each round, I apply the next round key with an xor.&quot; alt=&quot;aes act 3 scene 14 add round key&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_15_final_round_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_15_final_round_576.png&quot; title=&quot;In the final round, I skip the Mix Columns step since it wouldn&#39;t increase security and would just slow things down.&quot; alt=&quot;aes act 3 scene 15 final round&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_16_more_rounds_the_merrier_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_16_more_rounds_the_merrier_576.png&quot; title=&quot;...and that&#39;s it. Each round I do makes the bits more confused and diffused. It also has the key impact them. The more rounds, the merrier!&quot; alt=&quot;aes act 3 scene 16 more rounds the merrier&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_17_tradeoffs_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_17_tradeoffs_576.png&quot; title=&quot;Determining the number of rounds always involves several tradeoffs.&quot; alt=&quot;aes act 3 scene 17 tradeoffs&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_18_security_margin_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_18_security_margin_576.png&quot; title=&quot;When I was being developed, a clever guy was able to find a shortcut path through 6 rounds. That&#39;s not good! If you look carefully, you&#39;ll see that each bit of a round&#39;s output depends on every bit from two rounds ago. To increase this diffusion avalanche, I added 4 extra rounds. This is my security margin.&quot; alt=&quot;aes act 3 scene 18 security margin&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_19_in_pictures_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_19_in_pictures_576.png&quot; title=&quot;So in pictures, we have this...&quot; alt=&quot;aes act 3 scene 19 in pictures&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_20_decrypting_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_20_decrypting_576.png&quot; title=&quot;Decrypting means doing everything in reverse.&quot; alt=&quot;aes act 3 scene 20 decrypting&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_21_modes_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_21_modes_576.png&quot; title=&quot;One last tidbit: I shouldn&#39;t be used as-is, but rather as a building block to a decent mode.&quot; alt=&quot;aes act 3 scene 21 modes&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_22_questions_what_really_happens_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_22_questions_what_really_happens_576.png&quot; title=&quot;AES: Make sense? Did that answer your question? Some audience guy: Almost... except you just waved your hands and used weird analogies. What really happens?&quot; alt=&quot;aes act 3 scene 22 questions what really happens&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_23_math_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_3_scene_23_math_576.png&quot; title=&quot;AES: Another great question! It&#39;s not hard, but... it involves a little... math. Some audience guy: I&#39;m game. Bring it on!!&quot; alt=&quot;aes act 3 scene 23 math&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;act-4-math&quot;&gt;Act 4: Math!&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_01_algebra_class_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_01_algebra_class_576.png&quot; title=&quot;Let&#39;s go back to your algebra class...&quot; alt=&quot;aes act 4 scene 01 algebra class&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_02_reviewing_the_basics_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_02_reviewing_the_basics_576.png&quot; title=&quot;Reviewing the Basics&quot; alt=&quot;aes act 4 scene 02 reviewing the basics&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_03_algebra_coefficients_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_03_algebra_coefficients_576.png&quot; title=&quot;We&#39;ll change things slightly. In the old way, coefficients could get as big as we wanted. In the new way, they can only be 0 or 1.&quot; alt=&quot;aes act 4 scene 03 algebra coefficients&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_04_remember_multiplication_growth_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_04_remember_multiplication_growth_576.png&quot; title=&quot;Remember how multiplication could make things grow fast?&quot; alt=&quot;aes act 4 scene 04 remember multiplication growth&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_05_cant_go_bigger_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_05_cant_go_bigger_576.png&quot; title=&quot;With the new addition, things are simpler, but the x^13 is still too big. Let&#39;s make it so we can&#39;t go bigger than x^7. How can we do that?&quot; alt=&quot;aes act 4 scene 05 cant go bigger&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_06_clock_math_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_06_clock_math_576.png&quot; title=&quot;We use our friend, clock math, to do this. Just add things up and do long division. Keep a close watch on the remainder.&quot; alt=&quot;aes act 4 scene 06 clock math&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_07_clock_math_polynomials_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_07_clock_math_polynomials_576.png&quot; title=&quot;We can do clock math with polynomials. Instead of dividing by 12, my creators told me to use m(x) = x^8 + x^4 + x^3 + x + 1. Let&#39;s say we wanted to multiply x * b(x) where b(x) has coefficients b7...b0&quot; alt=&quot;aes act 4 scene 07 clock math polynomials&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_08_divide_by_mx_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_08_divide_by_mx_576.png&quot; title=&quot;We divide it by m(x) = x^8 + x^4 + x^3 + x + 1 and take the remainder&quot; alt=&quot;aes act 4 scene 08 divide by mx&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_09_logarithms_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_09_logarithms_576.png&quot; title=&quot;Now we&#39;re ready for the hardest blast from the past: logarithms. After logarithms, everything else is cake! Logarithms let us turn multiplication into addition.&quot; alt=&quot;aes act 4 scene 09 logarithms&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_10_using_logarithms_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_10_using_logarithms_576.png&quot; title=&quot;We can use logarithms in our new world. Instead of using 10 as the base, we can use the simple polynomial of x + 1 and watch the magic unravel.&quot; alt=&quot;aes act 4 scene 10 using logarithms&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_11_polynomial_as_byte_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_11_polynomial_as_byte_576.png&quot; title=&quot;Why bother with all of this math? Encryption deals with bits and bytes, right? Well, there&#39;s one last connection: a 7th degree polynomial can be represented in exactly 1 byte since the new way uses only 0 or 1 for coefficients.&quot; alt=&quot;aes act 4 scene 11 polynomial as byte&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_12_byte_operations_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_12_byte_operations_576.png&quot; title=&quot;With bytes, polynomial addition becomes a simple xor. We can use our logarithm skills to make a table for speedy multiplication.&quot; alt=&quot;aes act 4 scene 12 byte operations&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_13_byte_inverses_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_13_byte_inverses_576.png&quot; title=&quot;Since we know how to multiply, we can find the inverse polynomial byte for each byte. This is the byte that will undo/invert the polynomial back to 1. There are only 255 of them, so we can use brute force to find them.&quot; alt=&quot;aes act 4 scene 13 byte inverses&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_14_sbox_math_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_14_sbox_math_576.png&quot; title=&quot;Now we can understand the mysterious s-box. It takes a byte &#39;a&#39; and applies two functions. The first is &#39;g&#39; which just finds the byte inverse. The second is &#39;f&#39; which intentionally makes the math uglier to foil attackers.&quot; alt=&quot;aes act 4 scene 14 sbox math&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_15_round_constants_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_15_round_constants_576.png&quot; title=&quot;We can also understand those crazy round constants in the key expansion. I get them by starting with 1 and then keep multiplying by &#39;x&#39;&quot; alt=&quot;aes act 4 scene 15 round constants&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_16_mix_columns_math_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_16_mix_columns_math_576.png&quot; title=&quot;Mix Columns is the hardest. I treat each column as a polynomial. I then use our new multiply method to multiply it by a specially crafted polynomial and then take the remainder after dividing by x^4 + 1. This all simplifies to a matrix multiply.&quot; alt=&quot;aes act 4 scene 16 mix columns math&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_17_crib_sheet_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_17_crib_sheet_576.png&quot; title=&quot;AES Crib Sheet (Handy for Memorizing)&quot; alt=&quot;aes act 4 scene 17 crib sheet&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_18_got_it_now_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_18_got_it_now_576.png&quot; title=&quot;Only audience guy left: Whoa... I think I get it now. It&#39;s relatively simple once you grok the pieces. Thanks for explaining it. I gotta go now.  AES: My pleasure. Come back anytime!&quot; alt=&quot;aes act 4 scene 18 got it now&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_19_so_much_more_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_19_so_much_more_576.png&quot; title=&quot;But there&#39;s so much more to talk about: my resistance to linear and differential cryptanalysis, my Wide Trail Strategy, impractical related-key attacks, and... so much more... but no one is left.&quot; alt=&quot;aes act 4 scene 19 so much more&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_20_gotta_go_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_20_gotta_go_576.png&quot; title=&quot;Oh well... there&#39;s some boring router traffic that needs to be encrypted. Gotta go!&quot; alt=&quot;aes act 4 scene 20 gotta go&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;a href=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_21_the_end_1100.png&quot;&gt;&lt;img src=&quot;/assets/stick-figure-guide-to-advanced/aes_act_4_scene_21_the_end_576.png&quot; title=&quot;The End&quot; alt=&quot;aes act 4 scene 21 the end&quot; style=&quot;border: 2px solid; margin: 5px;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;epilogue&quot;&gt;Epilogue&lt;/h2&gt;

&lt;p&gt;I created a heavily-commented AES/Rijndael implementation to go along with this post and &lt;a href=&quot;http://github.com/moserware/AES-Illustrated&quot;&gt;put it on GitHub&lt;/a&gt;. In keeping with the Foot-Shooting Prevention Agreement, it shouldn’t be used for production code, but it should be helpful in seeing exactly where all the numbers came from in this play. Several resources were useful in creating this:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/3540425802?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=3540425802&quot;&gt;&lt;img align=&quot;right&quot; src=&quot;/assets/stick-figure-guide-to-advanced/DesignOfRijndael.jpg&quot; style=&quot;MARGIN: 20px;&quot; /&gt;The Design of Rijndael&lt;/a&gt; is &lt;em&gt;the&lt;/em&gt; book on the subject, written by the Rijndael creators. It was helpful in understanding specifics, especially the math (although some parts were beyond me). It’s also where I got the math notation and graphical representation in the left and right corners of the scenes describing the layers (&lt;a href=&quot;http://en.wikipedia.org/wiki/Advanced_Encryption_Standard#The_SubBytes_step&quot;&gt;SubBytes&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Advanced_Encryption_Standard#The_ShiftRows_step&quot;&gt;ShiftRows&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Advanced_Encryption_Standard#The_MixColumns_step&quot;&gt;MixColumns&lt;/a&gt;, and &lt;a href=&quot;http://en.wikipedia.org/wiki/Advanced_Encryption_Standard#The_AddRoundKey_step&quot;&gt;AddRoundKey&lt;/a&gt;). &lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;http://www.csrc.nist.gov/publications/fips/fips197/fips-197.pdf&quot;&gt;FIPS-197&lt;/a&gt; specification formally defines AES and provides a good overview. &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/0140067485?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0140067485&quot;&gt;The Puzzle Palace&lt;/a&gt;, especially &lt;a href=&quot;http://cryptome.org/nsa-v-all.htm&quot;&gt;chapter 9&lt;/a&gt;, was helpful while creating Act 1. For more on how the NSA modified DES, see &lt;a href=&quot;http://catless.ncl.ac.uk/Risks/6.01.html#subj4&quot;&gt;this&lt;/a&gt;. &lt;/li&gt;
  &lt;li&gt;More on Intel’s (and now AMD) inclusion of native AES instructions can be found &lt;a href=&quot;http://en.wikipedia.org/wiki/AES_instruction_set&quot;&gt;here&lt;/a&gt; and in detail &lt;a href=&quot;http://software.intel.com/en-us/articles/advanced-encryption-standard-aes-instructions-set/&quot;&gt;here&lt;/a&gt;. -   Other helpful resources include &lt;a href=&quot;http://en.wikipedia.org/wiki/Advanced_Encryption_Standard&quot;&gt;Wikipedia&lt;/a&gt;, &lt;a href=&quot;http://www.samiam.org/rijndael.html&quot;&gt;Sam Trenholme’s AES math series&lt;/a&gt;, and &lt;a href=&quot;http://www.cs.bc.edu/~straubin/cs381-05/blockciphers/rijndael_ingles2004.swf&quot;&gt;this animation&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Please leave a comment if you notice something that can be better explained.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update #1&lt;/strong&gt;: Several scenes were updated to fix some errors mentioned in the comments. &lt;br /&gt;
&lt;strong&gt;Update #2&lt;/strong&gt;: By request, I’ve created a slide show presentation of this play in both &lt;a href=&quot;/assets/stick-figure-guide-to-advanced/A%20Stick%20Figure%20Guide%20to%20the%20Advanced%20Encryption%20Standard%20%28AES%29.pptx&quot;&gt;PowerPoint&lt;/a&gt; and &lt;a href=&quot;/assets/stick-figure-guide-to-advanced/A%20Stick%20Figure%20Guide%20to%20the%20Advanced%20Encryption%20Standard%20%28AES%29.pdf&quot;&gt;PDF&lt;/a&gt; formats. I’ve licensed them under the &lt;a href=&quot;http://creativecommons.org/licenses/by/3.0/&quot;&gt;Creative Commons Attribution License&lt;/a&gt; so that you can use them as you see fit. If you’re teaching a class, consider giving extra credit to any student giving a worthy interpretive dance rendition in accordance with the Foot-Shooting Prevention Agreement.&lt;/p&gt;
</description>
        <pubDate>Tue, 22 Sep 2009 08:12:00 +0000</pubDate>
        <link>http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html</link>
        <guid isPermaLink="true">http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html</guid>
        
        
      </item>
    
      <item>
        <title>Just Enough MBA to Be a Programmer</title>
        <description>&lt;p&gt;There’s that awkward moment in your software development life when you realize that most of the people in your company &lt;em&gt;aren’t&lt;/em&gt; programmers. Scanning your address book reveals Marketing, Sales, Accounting, Human Resources, and yes, the “business people” with their Masters of Business Administration (MBAs).&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/0060799072?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0060799072&quot;&gt;&lt;img style=&quot;margin: 0px 0px 0px 10px;&quot; src=&quot;http://ecx.images-amazon.com/images/I/51SXTGSSDDL._SL160_.jpg&quot; align=&quot;right&quot; /&gt;&lt;/a&gt;I’ve always been curious about what MBAs really do. In my weaker moments, I’ve even thought that the only reason people got an MBA was to demand a higher salary or to “move up the corporate ladder” into some management job. What did these MBA ninjas actually learn in school? Would having an MBA help me better understand how I affected my company’s bottom line? Although I had the curiosity, I never acted on it. This changed when &lt;a href=&quot;http://chadfowler.com/&quot; title=&quot;Chad Fowler&quot;&gt;another programmer&lt;/a&gt; &lt;a href=&quot;http://www.amazon.com/gp/product/1934356344?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1934356344&quot; title=&quot;see page 53 of The Passionate Programmer&quot;&gt;recommended&lt;/a&gt; that I read &lt;a href=&quot;http://www.amazon.com/gp/product/0060799072?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0060799072&quot; title=&quot;The Ten-Day MBA&quot;&gt;The Ten-Day MBA&lt;/a&gt; by &lt;a href=&quot;http://www.harpercollins.com/authors/18530/Steven_A_Silbiger/index.aspx&quot;&gt;Steven Silbiger&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Sure, I knew that no one would anoint me with a real MBA at the end of the book any more than watching &lt;a href=&quot;http://ocw.mit.edu/OcwWeb/web/home/home/index.htm&quot; title=&quot;MIT&#39;s Open Courseware&quot;&gt;MIT&lt;/a&gt; lectures online would make me an MIT grad. Besides, going to a &lt;a href=&quot;http://www.hbs.edu/mba/&quot; title=&quot;... like the Harvard Business School&quot;&gt;nice&lt;/a&gt; &lt;a href=&quot;http://www.wharton.upenn.edu/mba/&quot; title=&quot;or Wharton&quot;&gt;MBA&lt;/a&gt; &lt;a href=&quot;http://www.gsb.stanford.edu/mba/&quot; title=&quot;... or Stanford&quot;&gt;school&lt;/a&gt; is more about being around other motivated people and professors. The real value in having an MBA is in applying the concepts, not the concepts themselves.&lt;/p&gt;

&lt;p&gt;Disclaimers aside, I was determined to read the book and take notes on what a programmer should know about an MBA.&lt;/p&gt;

&lt;h2 id=&quot;day-1---marketing&quot;&gt;Day 1 - Marketing&lt;/h2&gt;

&lt;p&gt;Every developer painfully learns that technology doesn’t win on its own. At best, it just &lt;a href=&quot;http://video.google.com/videoplay?docid=-6909078385965257294&quot; title=&quot;Fast forward to 2:50 to hear Seth talk about this&quot;&gt;gives you a shot at marketing&lt;/a&gt;. Marketing is proof that software doesn’t sell itself, &lt;a href=&quot;http://blog.fairsoftware.net/2009/07/09/good-programmers-dont-need-no-marketing/&quot; title=&quot;Good programmers don&#39;t need no marketing by Alain Raynaud&quot;&gt;no matter how good it is&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A software company might have a &lt;a href=&quot;http://www.birds-eye.net/definition/m/mrd-market_requirements_document.shtml&quot;&gt;Marketing Requirements Document&lt;/a&gt; (MRD) that outlines what the next version will contain. This usually is the result of a standard marketing analysis that the book outlined:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Consumer Analysis - Who are they? What do they want? How many different segments of people do you have? Is the buyer of your product different than the user? (The book gives the example that women buy the majority of men’s socks and underwear, thus it’s good to market appropriately).&lt;/li&gt;
  &lt;li&gt;Market Analysis - How big is your target market? Is it new? Is it growing? Where is the product in the life cycle?&lt;/li&gt;
  &lt;li&gt;Competitive Analysis - How do your &lt;a href=&quot;http://en.wikipedia.org/wiki/SWOT_analysis&quot;&gt;Strengths, Weaknesses, Opportunities, and Threats&lt;/a&gt; (SWOTs) compare to your competition?&lt;/li&gt;
  &lt;li&gt;Distribution Analysis - What “channels” does your company use to reach your customer? Who are the intermediate players (e.g. the Apple Store, Amazon.com, etc)? What cuts do they take? What are their motivations?&lt;/li&gt;
  &lt;li&gt;Plan the Marketing Mix - How will you differentiate your products? How will you place it, promote it, and price it?&lt;/li&gt;
  &lt;li&gt;Determine the Economics - How long will it take before you break even? What are your fixed costs vs. margin costs? (Thankfully software has a low marginal cost)&lt;/li&gt;
  &lt;li&gt;Revise - Tweak and repeat as needed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One big marketing theme is to “own a word in the consumer’s mind”:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;If you establish one benefit in the consumer’s mind, the consumer may attribute other positives as well to your product. FedEx means “overnight delivery.” Only one company can own a word and it is tough to change it once it’s established… The easiest way to own a word is to be first. Consumers tend to stick with products that work for them. Kleenex cleans runny noses. p.26&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This explains why your family still uses &lt;a href=&quot;http://www.mapquest.com/&quot; title=&quot;MapQuest&quot;&gt;MapQuest&lt;/a&gt; despite your repeated attempts to show them how much better &lt;a href=&quot;http://maps.google.com/&quot; title=&quot;Google Maps&quot;&gt;Google Maps&lt;/a&gt; is. It’s also helpful if your product name matches what it does. “&lt;a href=&quot;http://www.drano.com/&quot;&gt;Drano&lt;/a&gt;” is easier to remember than a “Web 2.0” name like &lt;a href=&quot;http://www.qoop.com/&quot;&gt;Qoop&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I was surprised to learn that the popular online advertising term &lt;a href=&quot;http://en.wikipedia.org/wiki/Cost_per_impression&quot;&gt;Cost per Thousand&lt;/a&gt; (CPM), is a relatively old term that has long existed in print media. In general, the more targeted a group is, the higher the CPM is. This explains why a programming ad on &lt;a href=&quot;http://stackoverflow.com/&quot;&gt;Stack Overflow&lt;/a&gt; can probably fetch a better CPM than the same ad on a site like &lt;a href=&quot;http://www.pandora.com/&quot;&gt;Pandora&lt;/a&gt;, even though programmers use both.&lt;/p&gt;

&lt;p&gt;Marketing people typically have their reasons for doing things that frustrate us. For example, if your software will take a long time to get through a distribution channel or marketing foresees a long customer buying process, they might begin to “market” your code long before a beta is available with the belief that it’ll hopefully be read by the time the customer is ready to buy.&lt;/p&gt;

&lt;p&gt;Sometimes marketing has to make an extreme choice. When GTE faced rebuilding its tarnished brand in the 1990’s, it was probably a clever marketing person who suggested that they &lt;a href=&quot;http://blip.tv/file/319044/&quot; title=&quot;gave up on fixing their brand&quot;&gt;give up on fixing their brand name&lt;/a&gt; and re-brand themselves as &lt;a href=&quot;http://en.wikipedia.org/wiki/Verizon_Communications&quot;&gt;Verizon&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Despite all the good advice, I was disappointed by the book’s lack of coverage of the Apple/BMW style of “marketing” &lt;a href=&quot;http://video.google.com/videoplay?docid=-6909078385965257294&amp;amp;ei=P5NiSvPbFIb2qAL9jLS-DA&amp;amp;q=seth+godin&quot; title=&quot;Watch at 20:27&quot;&gt;the engineering department can do&lt;/a&gt; by creating a remarkable product. Creating a product that allows users to quickly jump over the “&lt;a href=&quot;http://headrush.typepad.com/creating_passionate_users/2006/03/how_to_be_an_ex.html&quot;&gt;suck threshold&lt;/a&gt;” is just one example where a programmer can make a tremendous “marketing” contribution.&lt;/p&gt;

&lt;h2 id=&quot;day-2---ethics&quot;&gt;Day 2 - Ethics&lt;/h2&gt;

&lt;p&gt;Ethics seems easy to understand: “&lt;a href=&quot;http://www.biblegateway.com/passage/?search=Luke%206:31&quot; title=&quot;Do to others as you would have them do to you&quot;&gt;Do to others as you would have them do to you&lt;/a&gt;.” The hard part is realizing how the “others” are affected by your actions. Others include customers, executives, shareholders, suppliers, employees (and their families), the government, the planet, and the “future generations.”&lt;/p&gt;

&lt;p&gt;Unfortunately, when simplicity is lost, &lt;a href=&quot;http://en.wikipedia.org/wiki/Sarbanes-Oxley_Act&quot; title=&quot;Sarbanes-Oxley Act&quot;&gt;Sarbanes-Oxley Act&lt;/a&gt;s are found.&lt;/p&gt;

&lt;h2 id=&quot;day-3---accounting&quot;&gt;Day 3 - Accounting&lt;/h2&gt;

&lt;p&gt;In theory, accounting is simple. Just answer these questions about your &lt;a href=&quot;http://www.answers.com/topic/accounting-entity&quot; title=&quot;entity&quot;&gt;entity&lt;/a&gt;/business:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What does a company own? &lt;/li&gt;
  &lt;li&gt;How much does a company owe others? &lt;/li&gt;
  &lt;li&gt;How well did a company’s operations perform? &lt;/li&gt;
  &lt;li&gt;How does the company get the cash to fund itself? - p.72&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you get nothing else out of accounting, know how to read a balance sheet. Although Microsoft CEO &lt;a href=&quot;http://en.wikipedia.org/wiki/Steve_Ballmer&quot; title=&quot;Steve Ballmer&quot;&gt;Steve Ballmer&lt;/a&gt; dropped out of Stanford’s MBA program to become employee #24, he &lt;a href=&quot;http://ecorner.stanford.edu/authorMaterialInfo.html?mid=2242&quot; title=&quot;The quote begins around 10:50&quot;&gt;knew balance sheets were important&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;In 1980, I came in to “be a business person” whatever that meant. Didn’t know much. Frankly all I’d ever really done is interview for jobs and market brownie mix. I wasn’t exactly well credentialed. I’d taken the first year at Stanford Business School so &lt;em&gt;I can read a balance sheet – that was pretty important&lt;/em&gt;. We didn’t have that much money back then so there wasn’t much to read. But anyway those lessons were important.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Balance sheets are simple to follow:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;As the name implies, the balance sheet is a “balance” sheet. The fundamental equation that rules over accounting balance is: &lt;br /&gt;
Assets (A) = Liabilities (L) + Owners’ Equity (OE) &lt;br /&gt;
What you own (assets) equals the total of what you borrowed (liabilities) and what you have invested (equity) to pay for it. This equation or “identity” explains &lt;em&gt;everything&lt;/em&gt; that happens in the accounting records of a company over time. Remember it! - p.83&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example, your work computer is a company asset (which explains the “asset” tag on it). Your company created an equal liability to pay for it. When your company started, the founders gave up some of their money to increase the new company’s cash assets (left side) in exchange for stock in the company (right side).&lt;/p&gt;

&lt;p&gt;For example, we can read &lt;a href=&quot;http://www.google.com/finance?q=NASDAQ:GOOG&amp;amp;fstype=ii&quot;&gt;Google’s balance sheet&lt;/a&gt; for the first quarter of 2009 and see:&lt;/p&gt;

&lt;p&gt;Assets = $33.51 Billion Liabilities = $3.66 Billion Owner Equity = $29.85 Billion (which includes $14.98 Billion in “retained earnings” that Google is keeping for growth rather than giving it back to the owners of its 315.75 million shares)&lt;/p&gt;

&lt;p&gt;Sure enough, everything “balances”:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/just-enough-mba-to-be-programmer/GoogleBalances.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;From basic data, we can derive a bunch of helpful ratios to see how healthy Google is:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Liquidity/Current Ratio = (Current Assets / Current Liabilities) = 33.51 / 3.66 = 9.14 (Greater than 1 means there’s room to pay for liabilities)&lt;/li&gt;
  &lt;li&gt;Financial Leverage = (Total Liabilities + Owners’ Equity) / OE = (3.66 + 29.85) / 29.85 = 1.12 (Greater than 2 indicates a company is using a lot of debt to operate)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Return_on_equity&quot; title=&quot;Return on Equity&quot;&gt;Return on Equity&lt;/a&gt; = (Net Income / Owners’ Equity) = 1.42 / 29.85 = 4.77% (Which indicates how efficiently the company is using shareholder equity)&lt;/li&gt;
  &lt;li&gt;… and &lt;a href=&quot;http://www.reuters.com/finance/stocks/ratios?symbol=GOOG.O&amp;amp;rpc=66&quot; title=&quot;many more&quot;&gt;many more&lt;/a&gt; …&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;day-4---organizational-behavior&quot;&gt;Day 4 - Organizational Behavior&lt;/h2&gt;

&lt;p&gt;The whole purpose of Organizational Behavior (OB) is to get you to think before you act around people. You want to motivate people? OB has an equation for that:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Motivation = Expectation of Work will lead to Performance * Expectation Performance will lead to Reward * Value of Reward.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Feel free to tweak the variables as you see fit. You can &lt;a href=&quot;http://en.wikipedia.org/wiki/Management_by_objectives&quot;&gt;Manage by Objective&lt;/a&gt; (MBO) where you set goals and then get out of the way or you can &lt;a href=&quot;http://www.futurecents.com/mainmbwa.htm&quot;&gt;Manage by Walking Around&lt;/a&gt; (MBWA) where you play a more active role in day-to-day execution. The best choice depends on your environment and culture. You might need to mix the two. Remember that we humans are delicate creatures with our own wants and desires. Be careful.&lt;/p&gt;

&lt;h2 id=&quot;day-5---quantitative-analysis&quot;&gt;Day 5 - Quantitative Analysis&lt;/h2&gt;

&lt;p&gt;Quantitative Analysis (QA) explains why Excel has so many functions that I’d never heard of. A core idea is that “a dollar today is worth more than a dollar received in the future.” (p.173).&lt;/p&gt;

&lt;p&gt;Imagine that someone promises to pay you a dollar in a year if you give them money now. What is that worth to you today? Obviously, it matters on how much you trust them to pay you back. The more you trust them, the more you’re willing to give them now. Similarly, the less you trust them, the more you might “discount” that dollar in the future because they’re tying up money that could be used for better investments. This is called the “discount” or “hurdle” rate. Having a 10% discount rate means that the dollar in the future has a &lt;a href=&quot;http://en.wikipedia.org/wiki/Net_present_value&quot;&gt;net present value&lt;/a&gt; of $0.91 today:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;$1 * (1 + 10%)&lt;sup&gt;-1&lt;/sup&gt; = $0.91&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This simple idea has lots of consequences. For example, let’s oversimplify things and say that you can spend $2,000 today to buy and maintain a server that will last for 3 years or you can lock in a price with Amazon for that same server for $800 a year for the same 3 years. A naïve person would just see that $2000 is less than $2400, but a QA person that assigns a 10% discount rate would see:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/just-enough-mba-to-be-programmer/AmazonServerCost.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;… and come to the conclusion that it’s about $10 cheaper, &lt;em&gt;in today’s dollars&lt;/em&gt;, to have Amazon maintain the server.&lt;/p&gt;

&lt;p&gt;You can also do the inverse calculation. Assume you’re Amazon and that server costs you $1800 today and you can get someone to pay you $800 a year for it for 3 years. What is your &lt;a href=&quot;http://en.wikipedia.org/wiki/Internal_rate_of_return&quot;&gt;internal rate of return&lt;/a&gt; for this investment?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/just-enough-mba-to-be-programmer/IRR.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Here we see an internal rate of return of about 16% on the server.&lt;/p&gt;

&lt;p&gt;We could also use the time value of money to include valuing users. Early adopters of eBay and Twitter were worth more per user than late adopters because the early ones were more likely to tell their friends who hadn’t used the service and thus attract more new people.&lt;/p&gt;

&lt;h2 id=&quot;day-6---finance&quot;&gt;Day 6 - Finance&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Finance&quot; title=&quot;Finance&quot;&gt;Finance&lt;/a&gt; blends time, money, and risk.&lt;/p&gt;

&lt;p&gt;To start, a business needs a structure that gives it some capital. Popular options include:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Sole_proprietorship&quot; title=&quot;Sole Proprietorships&quot;&gt;Sole Proprietorships&lt;/a&gt; - An individual or a married couple. You are effectively your business. All earnings are treated as personal income and taxed appropriately. You take in all the profits but also have unlimited liability. You can’t divide the company up. It’s simple, but the downside is that it makes it hard to raise money. &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Partnership&quot; title=&quot;Partnerships&quot;&gt;Partnerships&lt;/a&gt; - Involves more people than a proprietorship. Several people come together and can be &lt;a href=&quot;http://en.wikipedia.org/wiki/General_partnership&quot; title=&quot;general partners&quot;&gt;general partners&lt;/a&gt; (each having unlimited liability) or &lt;a href=&quot;http://en.wikipedia.org/wiki/Limited_partnership&quot; title=&quot;limited partners&quot;&gt;limited partners&lt;/a&gt; (liable up to the investment). As a partner, you pay taxes on your percentage of the business’s income on your personal taxes. &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Corporation&quot; title=&quot;Corporations&quot;&gt;Corporations&lt;/a&gt; - Effectively you give birth to a new legal entity that is distinct from the &lt;a href=&quot;http://en.wikipedia.org/wiki/Shareholder&quot; title=&quot;shareholders&quot;&gt;shareholders&lt;/a&gt;. Most large companies are “&lt;a href=&quot;http://en.wikipedia.org/wiki/C_Corporation&quot; title=&quot;C Corporations&quot;&gt;C Corporations&lt;/a&gt;” and have a &lt;a href=&quot;http://en.wikipedia.org/wiki/Double_taxation&quot; title=&quot;double taxation&quot;&gt;double taxation&lt;/a&gt; issue where the corporation’s income &lt;a href=&quot;http://en.wikipedia.org/wiki/Corporate_tax&quot; title=&quot;is taxed&quot;&gt;is taxed&lt;/a&gt; and the dividends it issues to shareholders &lt;a href=&quot;http://en.wikipedia.org/wiki/Dividend_tax&quot; title=&quot;are taxed&quot;&gt;are taxed&lt;/a&gt; as well. If you have a smaller company with fewer than 100 shareholders, you may qualify for “&lt;a href=&quot;http://en.wikipedia.org/wiki/S_corp&quot; title=&quot;S Corporation&quot;&gt;S Corporation&lt;/a&gt;” status. S Corporations usually don’t pay income tax and instead rely on shareholders to pay the associated tax on their percentage of the income. This tends to give S Corporations the legal liability benefit of corporation status and the single taxation benefit of partnerships.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Corporations issue stock to raise money. Stock entitles the holder to a &lt;a href=&quot;http://en.wikipedia.org/wiki/Residual&quot; title=&quot;residual&quot;&gt;residual&lt;/a&gt; claim on earnings and assets after other debt obligations have been met. One obvious question is “what’s a good stock price?” This has a lot of factors, such as a company’s growth potential and the company’s earnings. Popular metrics include a company’s ratio of its stock price divided by its earnings (&lt;a href=&quot;http://en.wikipedia.org/wiki/P/E_ratio&quot; title=&quot;P/E ratio&quot;&gt;P/E ratio&lt;/a&gt;). Higher P/E ratios tend to indicate that shareholders have higher expectations the company will grow and eventually make more money in the future. Some examples:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Company&lt;/th&gt;
      &lt;th&gt;P/E Ratio&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;http://www.google.com/finance?q=GOOG&quot; title=&quot;Google&quot;&gt;Google&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;31.47&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;http://www.google.com/finance?q=MSFT&quot; title=&quot;Microsoft&quot;&gt;Microsoft&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;13.98&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;a href=&quot;http://www.google.com/finance?q=NASDAQ%3AAMZN&quot; title=&quot;Amazon.com&quot;&gt;Amazon.com&lt;/a&gt;&lt;/td&gt;
      &lt;td&gt;54.98&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;After you raised some capital, you should carefully think how you’ll spend it. There are many ways to do this. The &lt;a href=&quot;http://en.wikipedia.org/wiki/Payback_period&quot; title=&quot;Payback Period Method&quot;&gt;Payback Period Method&lt;/a&gt; has you calculate how long it’ll take to recover your investment. The shorter the payback period, the less risky the investment is. For example, adding RAM is so cheap that the productivity boost has a short payback period. In contrast, completely rewriting a huge codebase might have &lt;a href=&quot;http://www.joelonsoftware.com/articles/fog0000000069.html&quot; title=&quot;put your company out of business&quot;&gt;put your company out of business&lt;/a&gt; before you get your money back.&lt;/p&gt;

&lt;p&gt;Another approach is to use the &lt;a href=&quot;http://www.answers.com/topic/net-present-value-method&quot; title=&quot;Net Present Value Method&quot;&gt;Net Present Value Method&lt;/a&gt; to see how much the investment will return over its lifetime in terms of today’s dollars. Once you determine the discount factor to reflect the risk, you only consider investments that have a positive Net Present Value.&lt;/p&gt;

&lt;h2 id=&quot;day-7---operations&quot;&gt;Day 7 - Operations&lt;/h2&gt;

&lt;p&gt;Operations is about making stuff. Popular operations guys include &lt;a href=&quot;http://en.wikipedia.org/wiki/Frederick_Winslow_Taylor&quot; title=&quot;Frederick Taylor&quot;&gt;Frederick Taylor&lt;/a&gt; from the late 1800’s who is famous for breaking up tasks into small pieces and walking around factories with a stopwatch to find the “one right way” of doing them. &lt;a href=&quot;http://en.wikipedia.org/wiki/Elton_Mayo&quot; title=&quot;Elton Mayo&quot;&gt;Elton Mayo&lt;/a&gt;’s bold claim was that caring about your employees mattered. You could even &lt;a href=&quot;http://en.wikipedia.org/wiki/Hawthorne_effect&quot; title=&quot;make terrible working conditions&quot;&gt;make terrible working conditions&lt;/a&gt; if the employees were otherwise treated well and felt important.&lt;/p&gt;

&lt;p&gt;Although some MBAs might use some programming techniques like optimizing flow-charts to improve operations, it’s more likely to see factory techniques used when managing programmers. Oversimplifying things, software development is a factory that turns &lt;a href=&quot;http://www.joelonsoftware.com/articles/fog0000000074.html&quot; title=&quot;capital into code&quot;&gt;capital into code&lt;/a&gt;. To this end, you’ll often see popular manufacturing processes like Toyota’s &lt;a href=&quot;http://en.wikipedia.org/wiki/Kanban&quot; title=&quot;Kanban&quot;&gt;Kanban&lt;/a&gt; method of using visual cards to control workflow &lt;a href=&quot;http://www.infoq.com/articles/hiranabe-lean-agile-kanban&quot; title=&quot;sneaking into our offices&quot;&gt;making their way&lt;/a&gt; into our world as “new” or “agile” software methodologies.&lt;/p&gt;

&lt;h2 id=&quot;day-8---economics&quot;&gt;Day 8 - Economics&lt;/h2&gt;

&lt;p&gt;Economics is the magic that allows me to write software in exchange for steak burritos. As &lt;a href=&quot;http://en.wikipedia.org/wiki/Adam_Smith&quot; title=&quot;Adam Smith&quot;&gt;Adam Smith&lt;/a&gt; &lt;a href=&quot;http://en.wikipedia.org/wiki/The_Wealth_of_Nations&quot; title=&quot;realized&quot;&gt;realized&lt;/a&gt;, society as a whole becomes “wealthier” when we seek division of labor to specialize and do something well rather than trying to do everything ourselves poorly.&lt;/p&gt;

&lt;p&gt;At a &lt;a href=&quot;http://en.wikipedia.org/wiki/Microeconomics&quot; title=&quot;micro level&quot;&gt;micro level&lt;/a&gt;, economics is a simple matter of supply equals demand. When you look at the larger/&lt;a href=&quot;http://en.wikipedia.org/wiki/Macroeconomics&quot; title=&quot;macro economies&quot;&gt;macro economies&lt;/a&gt;, more complicated equations pop up like &lt;a href=&quot;http://en.wikipedia.org/wiki/Equation_of_exchange&quot; title=&quot;this one&quot;&gt;this one&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Money_supply&quot; title=&quot;Money&quot;&gt;Money&lt;/a&gt; × &lt;a href=&quot;http://en.wikipedia.org/wiki/Velocity_of_money&quot; title=&quot;Velocity&quot;&gt;Velocity&lt;/a&gt; = &lt;a href=&quot;http://en.wikipedia.org/wiki/Price_level&quot; title=&quot;Price Level&quot;&gt;Price Level&lt;/a&gt; × &lt;a href=&quot;http://en.wikipedia.org/wiki/Measures_of_national_income_and_output&quot; title=&quot;Real Gross National Product&quot;&gt;Real Gross National Product&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This equation shows that it’s important that money is moving around (e.g. isn’t hidden under your mattress) and that prices are stable or have reasonable growth.&lt;/p&gt;

&lt;p&gt;One of the best things about the economics of software is that it has really low marginal costs (e.g. the cost to copy it). With processors, bandwidth, and storage all roughly following &lt;a href=&quot;http://en.wikipedia.org/wiki/Moore%27s_Law&quot; title=&quot;Moore&#39;s Law&quot;&gt;Moore’s Law&lt;/a&gt; exponential curves, the capacity is doubling every 18 - 24 months which implies that the cost for a fixed amount is falling by half over the same period.&lt;/p&gt;

&lt;p&gt;As Chris Anderson points out in his book &lt;a href=&quot;http://www.scribd.com/doc/17135767/FREE-full-book-by-Chris-Anderson&quot; title=&quot;Free&quot;&gt;Free&lt;/a&gt;, it can sometimes makes sense to round these increasingly lower marginal costs down to zero and make money in different ways such as advertising or selling complements. It’s hard to find other industries that have as many economic freedoms as software.&lt;/p&gt;

&lt;h2 id=&quot;day-9---strategy&quot;&gt;Day 9 - Strategy&lt;/h2&gt;

&lt;p&gt;Strategy should be simple: have a &lt;a href=&quot;http://www.ted.com/talks/seth_godin_on_sliced_bread.html&quot; title=&quot;remarkable product&quot;&gt;remarkable product&lt;/a&gt; &lt;a href=&quot;http://www.paulgraham.com/startuplessons.html&quot; title=&quot;that people want&quot;&gt;that people want&lt;/a&gt;. &lt;a href=&quot;http://www.amazon.com/gp/product/1590597214?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1590597214&quot; title=&quot;Bad things happen&quot;&gt;Bad things happen&lt;/a&gt; if you don’t do this. It’s especially helpful if you have a &lt;a href=&quot;http://en.wikipedia.org/wiki/Cash_cow&quot; title=&quot;cash cow&quot;&gt;cash cow&lt;/a&gt; you can milk for lots of money to fund new initiatives. For example, Google makes so much money &lt;a href=&quot;http://en.wikipedia.org/wiki/AdSense&quot; title=&quot;AdSense&quot;&gt;from ads&lt;/a&gt; that it can have &lt;a href=&quot;http://news.ycombinator.com/item?id=699460&quot; title=&quot;this&quot;&gt;this&lt;/a&gt; &lt;a href=&quot;http://mashable.com/2009/07/11/google-equation/&quot; title=&quot;strategy&quot;&gt;strategy&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Revenue = Amount of Web Pages Viewed&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Google’s strategy of getting you to view lots of pages (which conveniently have Google ads on them) explains a lot of what it does. From wanting to &lt;a href=&quot;http://googleblog.blogspot.com/2009/06/lets-make-web-faster.html&quot; title=&quot;speed up the web&quot;&gt;speed up the web&lt;/a&gt;, to making a free &lt;a href=&quot;http://en.wikipedia.org/wiki/Android_%28operating_system%29&quot; title=&quot;phone os&quot;&gt;phone OS&lt;/a&gt;, to creating a ton of &lt;a href=&quot;http://en.wikipedia.org/wiki/List_of_Google_products&quot; title=&quot;free services&quot;&gt;free services to keep you hooked on the web&lt;/a&gt;. Google really doesn’t &lt;em&gt;care&lt;/em&gt; what you do so long as you enjoy it and take in the targeted ads.&lt;/p&gt;

&lt;p&gt;The book tended to focus on more traditional forms of strategy such as “cost leadership”, “differentiation”, and “focus on the customer” as well as applying &lt;a href=&quot;http://oyc.yale.edu/economics/game-theory/&quot; title=&quot;game theory&quot;&gt;lessons&lt;/a&gt; from the famous &lt;a href=&quot;http://en.wikipedia.org/wiki/Prisoner%27s_dilemma&quot; title=&quot;prisoner&#39;s dilemma&quot;&gt;prisoner’s dilemma&lt;/a&gt;. I acknowledge that these are important as well, but I think that at its core, strategy can be simple.&lt;/p&gt;

&lt;h2 id=&quot;day-10---minicourses&quot;&gt;Day 10 - Minicourses&lt;/h2&gt;

&lt;p&gt;The book ended with “minicourses” in areas relevant to business such as:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Property (&lt;a href=&quot;http://en.wikipedia.org/wiki/Real_estate&quot; title=&quot;real estate&quot;&gt;real estate&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Patent&quot; title=&quot;patents&quot;&gt;patents&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Copyright&quot; title=&quot;copyright&quot;&gt;copyright&lt;/a&gt;, etc) &lt;/li&gt;
  &lt;li&gt;Leadership (e.g. schools want to create ‘leaders’ because they’ll be better future donors).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although those were interesting, the section I enjoyed the most was on business law.&lt;/p&gt;

&lt;p&gt;In our jobs, we often bump into legal matters. We face &lt;a href=&quot;http://en.wikipedia.org/wiki/Software_license_agreement&quot; title=&quot;End User License Agreements&quot;&gt;End User License Agreements&lt;/a&gt; (EULAs) and &lt;a href=&quot;http://en.wikipedia.org/wiki/Non-disclosure_agreement&quot; title=&quot;Non-Disclosure Agreements&quot;&gt;Non-Disclosure Agreements&lt;/a&gt; (NDAs) that we rarely read and often don’t fully understand. It was interesting to see any proper &lt;a href=&quot;http://en.wikipedia.org/wiki/Contract&quot; title=&quot;contract&quot;&gt;contract&lt;/a&gt; requires the following four conditions to be valid:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Capacity of Parties - Parties must have legal authorization and be mentally capable to enter into the agreement.&lt;/li&gt;
  &lt;li&gt;Mutual Agreement (Assent) or Meeting of the Minds - There must be a valid offer and an acceptance.&lt;/li&gt;
  &lt;li&gt;Consideration Given - Value must be given for the promise to be enforceable.&lt;/li&gt;
  &lt;li&gt;Legality - You can’t enforce a contract dealing with illegal goods or actions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When bad things happen, it can sometimes escalate to a “legal action” which has a &lt;a href=&quot;http://en.wikipedia.org/wiki/Template:Civil_procedure_%28United_States%29&quot; title=&quot;standard procedure&quot;&gt;standard procedure&lt;/a&gt; involving steps you sometimes hear in the news:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Jurisdiction - For a court to hear a case, it must have “jurisdiction” to hear the case and power to bind the parties the decision.&lt;/li&gt;
  &lt;li&gt;Pleadings - The paperwork to start the trial process. The plaintiff (p) files a complaint asserting that the defendant (?) has done something wrong and requests a punishment or remedy.&lt;/li&gt;
  &lt;li&gt;Discovery - Lawyers gather witnesses and evidence before a trial. Each side is allowed to see the evidence held by the other side.&lt;/li&gt;
  &lt;li&gt;Pretrial Conference - The lawyers and judge try to focus the case on the most important issues. This is also good time for out-of-court settlements if possible.&lt;/li&gt;
  &lt;li&gt;Trial - Occurs before the court. The jury decides the factual disputes. The case can be thrown out by the judge with a “summary judgment” if it has no merit.&lt;/li&gt;
  &lt;li&gt;Jury Instruction by the Judge and the Verdict - The judge instructs the jury about the relevant law involved and the jury makes its decision about the facts and penalty within its authority.&lt;/li&gt;
  &lt;li&gt;Posttrial Motions - Options include asking for a retrial if an error of law or procedure occurred (e.g. jury misconduct).&lt;/li&gt;
  &lt;li&gt;Appeal - Each party in a lawsuit is entitled to one appeal at an appellate court where they can file a written brief with arguments for a new trial.&lt;/li&gt;
  &lt;li&gt;Secure or Enforce the Judgment - Send the person to jail and/or collect money.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;While the short overview was intriguing, it enforced my belief that it’s important to have &lt;a href=&quot;http://wiki.answers.com/Q/What_is_the_difference_between_a_lawyer_and_an_attorney&quot; title=&quot;an attorney or a lawyer&quot;&gt;an attorney or a lawyer&lt;/a&gt; when it comes to the legal matters. At the very least, they usually have malpractice insurance if things go really bad.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://www.amazon.com/gp/product/0060799072?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0060799072&quot;&gt;Ten Day MBA&lt;/a&gt; helped me move from being &lt;a href=&quot;http://en.wikipedia.org/wiki/Four_stages_of_competence&quot; title=&quot;unconscious incompetence&quot;&gt;unconsciously incompetent&lt;/a&gt; about business administration to becoming consciously incompetent in just a few days. I think that alone made it worth the time. I don’t have aspirations to get a real MBA, but I now have more respect for those that do.&lt;/p&gt;

&lt;p&gt;And now, back to programming…&lt;/p&gt;
</description>
        <pubDate>Mon, 20 Jul 2009 08:00:00 +0000</pubDate>
        <link>http://www.moserware.com/2009/07/just-enough-mba-to-be-programmer.html</link>
        <guid isPermaLink="true">http://www.moserware.com/2009/07/just-enough-mba-to-be-programmer.html</guid>
        
        
      </item>
    
      <item>
        <title>The First Few Milliseconds of an HTTPS Connection</title>
        <description>&lt;p&gt;Convinced from spending hours reading &lt;a href=&quot;http://www.amazon.com/Tuscan-Whole-Milk-Gallon-128/product-reviews/B00032G1S0/ref=dp_top_cm_cr_acr_txt?ie=UTF8&amp;amp;showViewpoints=1&quot;&gt;rave reviews&lt;/a&gt;, Bob eagerly clicked “Proceed to Checkout” for his gallon of &lt;a href=&quot;http://www.amazon.com/gp/product/B00032G1S0?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=B00032G1S0&quot;&gt;Tuscan Whole Milk&lt;/a&gt; and…&lt;/p&gt;

&lt;p&gt;Whoa! What just happened?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/securitysymbols.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In the 220 milliseconds that flew by, a lot of interesting stuff happened to make Firefox change the address bar color and put a lock in the lower right corner. With the help of &lt;a href=&quot;http://www.wireshark.org/&quot;&gt;Wireshark&lt;/a&gt;, my favorite network tool, and a slightly modified debug build of Firefox, we can see &lt;em&gt;exactly&lt;/em&gt; what’s going on.&lt;/p&gt;

&lt;p&gt;By agreement of &lt;a href=&quot;http://tools.ietf.org/html/rfc2818&quot;&gt;RFC 2818&lt;/a&gt;, Firefox knew that “https” meant it should connect to &lt;a href=&quot;http://tools.ietf.org/html/rfc2818#section-2.3&quot;&gt;port 443&lt;/a&gt; at Amazon.com:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/httpsport.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Most people associate HTTPS with &lt;a href=&quot;http://en.wikipedia.org/wiki/Secure_Sockets_Layer&quot;&gt;SSL&lt;/a&gt; (Secure Sockets Layer) which was &lt;a href=&quot;http://www.mozilla.org/projects/security/pki/nss/history.html&quot;&gt;created by Netscape in the mid 90’s&lt;/a&gt;. This is becoming less true over time. As Netscape lost market share, SSL’s maintenance moved to the Internet Engineering Task Force (&lt;a href=&quot;http://en.wikipedia.org/wiki/IETF&quot;&gt;IETF&lt;/a&gt;). The first post-Netscape version was re-branded as Transport Layer Security (&lt;a href=&quot;http://en.wikipedia.org/wiki/Secure_Sockets_Layer&quot;&gt;TLS&lt;/a&gt;) 1.0 which &lt;a href=&quot;http://tools.ietf.org/html/rfc2246&quot;&gt;was released&lt;/a&gt; in January 1999. It’s rare to see true “SSL” traffic given that TLS has been around for 10 years.&lt;/p&gt;

&lt;h2 id=&quot;client-hello&quot;&gt;Client Hello&lt;/h2&gt;

&lt;p&gt;TLS wraps all traffic in “records” of different types. We see that the first byte out of our browser is the hex byte 0x16 = 22 which &lt;a href=&quot;http://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml&quot;&gt;means&lt;/a&gt; that this is a “handshake” record:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/clienthellowithannotations.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The next two bytes are 0x0301 which indicate that this is a version 3.1 record which shows that TLS 1.0 is essentially SSL 3.1.&lt;/p&gt;

&lt;p&gt;The handshake record is broken out into several messages. The first is our “Client Hello” message (0x01). There are a few important things here:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Random:&lt;br /&gt;
&lt;img src=&quot;/assets/first-few-milliseconds-of-https/randomclientbytes.png&quot; alt=&quot;&quot; /&gt; &lt;br /&gt;
There are four bytes representing the current Coordinated Universal Time (&lt;a href=&quot;http://en.wikipedia.org/wiki/Coordinated_Universal_Time&quot;&gt;UTC&lt;/a&gt;) in the Unix epoch format, which is the number of seconds since January 1, 1970. In this case, 0x4a2f07ca. It’s followed by 28 random bytes. This will be used later on. &lt;/li&gt;
  &lt;li&gt;Session ID:&lt;br /&gt;
&lt;img src=&quot;/assets/first-few-milliseconds-of-https/sessionid.png&quot; alt=&quot;&quot; /&gt; &lt;br /&gt;
Here it’s empty/null. If we had previously connected to Amazon.com a few seconds ago, we could potentially resume a session and avoid a full handshake. &lt;/li&gt;
  &lt;li&gt;Cipher Suites:&lt;br /&gt;
&lt;img src=&quot;/assets/first-few-milliseconds-of-https/ciphersuites.png&quot; alt=&quot;&quot; /&gt; &lt;br /&gt;
This is a list of all of the encryption algorithms that the browser is willing to support. Its top pick is a very strong choice of “&lt;a href=&quot;http://en.wikipedia.org/wiki/Secure_Sockets_Layer&quot;&gt;TLS&lt;/a&gt;_&lt;a href=&quot;http://en.wikipedia.org/wiki/Elliptic_Curve_Diffie-Hellman&quot;&gt;ECDHE&lt;/a&gt;_&lt;a href=&quot;http://en.wikipedia.org/wiki/Elliptic_Curve_DSA&quot;&gt;ECDSA&lt;/a&gt;_WITH_&lt;a href=&quot;http://en.wikipedia.org/wiki/Advanced_Encryption_Standard&quot;&gt;AES&lt;/a&gt;_256_&lt;a href=&quot;http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Cipher-block_chaining_.28CBC.29&quot;&gt;CBC&lt;/a&gt;_&lt;a href=&quot;http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-0_and_SHA-1&quot;&gt;SHA&lt;/a&gt;” followed by 33 others that it’s willing to accept. Don’t worry if none of that makes sense. We’ll find out later that Amazon doesn’t pick our first choice anyway. &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://tools.ietf.org/html/rfc4366#section-3.1&quot;&gt;server_name extension&lt;/a&gt;:&lt;br /&gt;
&lt;img src=&quot;/assets/first-few-milliseconds-of-https/server_name.png&quot; alt=&quot;&quot; /&gt; &lt;br /&gt;
This is a way to tell Amazon.com that our browser is trying to reach &lt;a href=&quot;https://www.amazon.com/&quot;&gt;https://www.amazon.com/&lt;/a&gt;. This is really convenient because our TLS handshake occurs long before any HTTP traffic. HTTP has a &lt;a href=&quot;http://tools.ietf.org/html/rfc2616#section-14.23&quot;&gt;“Host” header&lt;/a&gt; which allows a cost-cutting Internet hosting companies to pile hundreds of websites onto a single IP address. SSL has traditionally required a different IP for each site, but this extension allows the server to respond with the appropriate certificate that the browser is looking for. If nothing else, this extension should allow an extra week or so of IPv4 addresses.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;server-hello&quot;&gt;Server Hello&lt;/h2&gt;

&lt;p&gt;Amazon.com replies with a handshake record that’s a massive two packets in size (2,551 bytes). The record has version bytes of 0x0301 meaning that Amazon agreed to our request to use TLS 1.0. This record has three sub-messages with some interesting data:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;“Server Hello” Message (2): &lt;br /&gt;
&lt;img src=&quot;/assets/first-few-milliseconds-of-https/serverhello.png&quot; alt=&quot;&quot; /&gt; 
    &lt;ul&gt;
      &lt;li&gt;We get the server’s four byte time Unix epoch time representation and its 28 random bytes that will be used later. &lt;/li&gt;
      &lt;li&gt;A 32 byte session ID in case we want to reconnect without a big handshake. &lt;/li&gt;
      &lt;li&gt;Of the 34 cipher suites we offered, Amazon picked “TLS_RSA_WITH_RC4_128_MD5” (0x0004). This means that it will use the “&lt;a href=&quot;http://en.wikipedia.org/wiki/RSA&quot;&gt;RSA&lt;/a&gt;” &lt;a href=&quot;http://en.wikipedia.org/wiki/Public-key_cryptography&quot;&gt;public key&lt;/a&gt; algorithm to verify certificate signatures and exchange keys, the &lt;a href=&quot;http://en.wikipedia.org/wiki/RC4&quot;&gt;RC4&lt;/a&gt; encryption algorithm to encrypt data, and the &lt;a href=&quot;http://en.wikipedia.org/wiki/MD5&quot;&gt;MD5&lt;/a&gt; hash function to verify the contents of messages. We’ll cover these in depth later on. I personally think Amazon had selfish reasons for choosing this cipher suite. Of the ones on the list, it was the one that was least CPU intensive to use so that Amazon could crowd more connections onto each of their servers. A much less likely possibility is that they wanted to pay special tribute to &lt;a href=&quot;http://en.wikipedia.org/wiki/Ronald_L._Rivest&quot;&gt;Ron Rivest&lt;/a&gt;, who created all three of these algorithms.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Certificate Message (11):&lt;br /&gt;
&lt;img src=&quot;/assets/first-few-milliseconds-of-https/certificatemessage.png&quot; alt=&quot;&quot; /&gt; 
    &lt;ul&gt;
      &lt;li&gt;This message takes a whopping 2,464 bytes and is the certificate that the client can use to validate Amazon’s. It isn’t anything fancy. You can view most of its contents in your browser:&lt;br /&gt;
&lt;img src=&quot;/assets/first-few-milliseconds-of-https/AmazonBasicCertInfo.png&quot; alt=&quot;&quot; /&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;“Server Hello Done” Message (14):&lt;br /&gt;
&lt;img src=&quot;/assets/first-few-milliseconds-of-https/serverhellodone.png&quot; alt=&quot;&quot; /&gt; 
    &lt;ul&gt;
      &lt;li&gt;This is a zero byte message that tells the client that it’s done with the “Hello” process and indicate that the server won’t be asking the client for a certificate.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;checking-out-the-certificate&quot;&gt;Checking out the Certificate&lt;/h2&gt;

&lt;p&gt;The browser has to &lt;a href=&quot;http://web.archive.org/web/20090614041808id_/http://www.koders.com/c/fid340AB659241B7C717B5B3E0095BBA4245FCE34FD.aspx#L862&quot;&gt;figure out&lt;/a&gt; if it should trust Amazon.com. In this case, it’s using certificates. It looks at Amazon’s certificate and &lt;a href=&quot;http://web.archive.org/web/20090614041813id_/http://www.koders.com/c/fid9207CD3EB61F5F08E38858D14997264BEDB5B62C.aspx#L1091&quot;&gt;sees&lt;/a&gt; that the current time is between the “not before” time of August 26th, 2008 and before the “not after” time of August 27, 2009. It also &lt;a href=&quot;http://web.archive.org/web/20090614041813id_/http://www.koders.com/c/fid9207CD3EB61F5F08E38858D14997264BEDB5B62C.aspx?s=CERT_CheckCertValidTimes#L1211&quot;&gt;checks&lt;/a&gt; to make sure that the certificate’s public key is authorized for exchanging secret keys.&lt;/p&gt;

&lt;p&gt;Why should we trust this certificate?&lt;/p&gt;

&lt;p&gt;Attached to the certificate is a “signature” that is just a really long number in &lt;a href=&quot;http://en.wikipedia.org/wiki/Endianness#Big-endian&quot;&gt;big-endian&lt;/a&gt; format:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/AmazonCertSigned.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Anyone could have sent us these bytes. Why should we trust this signature? To answer that question, need to make a speedy detour into &lt;a href=&quot;http://en.wikipedia.org/wiki/Donald_in_Mathmagic_Land&quot;&gt;mathemagic land&lt;/a&gt;:&lt;/p&gt;

&lt;h2 id=&quot;interlude-a-short-not-too-scary-guide-to-rsa&quot;&gt;Interlude: A Short, Not &lt;em&gt;Too&lt;/em&gt; Scary, Guide to RSA&lt;/h2&gt;

&lt;p&gt;People &lt;a href=&quot;http://stackoverflow.com/questions/575561/do-programmers-have-to-be-good-in-mathematics-closed&quot;&gt;sometimes wonder&lt;/a&gt; if math has any relevance to programming. Certificates give a very practical example of applied math. Amazon’s certificate tells us that we should use the RSA algorithm to check the signature. &lt;a href=&quot;http://en.wikipedia.org/wiki/RSA&quot;&gt;RSA&lt;/a&gt; was created in the 1970’s by MIT professors &lt;a href=&quot;http://people.csail.mit.edu/rivest/&quot;&gt;Ron &lt;em&gt;R&lt;/em&gt;ivest&lt;/a&gt;, &lt;a href=&quot;http://en.wikipedia.org/wiki/Adi_Shamir&quot;&gt;Adi &lt;em&gt;S&lt;/em&gt;hamir&lt;/a&gt;, and &lt;a href=&quot;http://en.wikipedia.org/wiki/Leonard_Adleman&quot;&gt;Len &lt;em&gt;A&lt;/em&gt;dleman&lt;/a&gt; who found a &lt;a href=&quot;http://people.csail.mit.edu/rivest/Rsapaper.pdf&quot;&gt;clever way&lt;/a&gt; to combine ideas spanning &lt;a href=&quot;http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm&quot;&gt;2000&lt;/a&gt; &lt;a href=&quot;http://en.wikipedia.org/wiki/Chinese_remainder_theorem&quot;&gt;years&lt;/a&gt; &lt;a href=&quot;http://en.wikipedia.org/wiki/Fermat%27s_little_theorem&quot;&gt;of&lt;/a&gt; &lt;a href=&quot;http://en.wikipedia.org/wiki/Euler_totient_function&quot;&gt;math&lt;/a&gt; development to come up with a &lt;a href=&quot;http://mathworld.wolfram.com/RSAEncryption.html&quot;&gt;beautifully simple algorithm&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;You &lt;a href=&quot;http://en.wikipedia.org/wiki/Primality_test&quot;&gt;pick&lt;/a&gt; two huge prime numbers “p” and “q.” Multiply them to get “n = p*q.” Next, you pick a small public &lt;a href=&quot;http://en.wikipedia.org/wiki/Exponentiation&quot;&gt;exponent&lt;/a&gt; “e” which is the “encryption exponent” and &lt;a href=&quot;http://en.wikipedia.org/wiki/Modular_multiplicative_inverse&quot;&gt;a specially crafted inverse&lt;/a&gt; of “e” called “d” as the “decryption exponent.” You then &lt;strong&gt;make “n” and “e” public and keep “d” as secret as you possibly can&lt;/strong&gt; and then throw away “p” and “q” (or keep them as secret as “d”). It’s really important to remember that “e” and “d” are inverses of each other.&lt;/p&gt;

&lt;p&gt;Now, if you have some message, you just need to interpret its bytes as a number “M.” If you want to “encrypt” a message to create a “ciphertext”, you’d calculate:&lt;/p&gt;

&lt;p&gt;C ≡ M&lt;sup&gt;e&lt;/sup&gt; (mod n)&lt;/p&gt;

&lt;p&gt;This means that you multiply “M” by itself “e” times. The “mod n” means that we only take the remainder (e.g. “&lt;a href=&quot;http://en.wikipedia.org/wiki/Modular_arithmetic&quot;&gt;modulus&lt;/a&gt;”) when dividing by “n.” For example, 11 AM + 3 hours ≡ 2 (PM) (mod 12 hours). The recipient knows “d” which allows them to invert the message to recover the original message:&lt;/p&gt;

&lt;p&gt;C&lt;sup&gt;d&lt;/sup&gt; ≡ (M&lt;sup&gt;e&lt;/sup&gt;)&lt;sup&gt;d&lt;/sup&gt; ≡ M&lt;sup&gt;e*d&lt;/sup&gt; ≡ M&lt;sup&gt;1&lt;/sup&gt; ≡ M (mod n)&lt;/p&gt;

&lt;p&gt;Just as interesting is that the person with “d” can “sign” a document by raising a message “M” to the “d” exponent:&lt;/p&gt;

&lt;p&gt;M&lt;sup&gt;d&lt;/sup&gt; ≡ S (mod n)&lt;/p&gt;

&lt;p&gt;This works because “signer” makes public “S”, “M”, “e”, and “n.” Anyone can verify the signature “S” with a simple calculation:&lt;/p&gt;

&lt;p&gt;S&lt;sup&gt;e&lt;/sup&gt; ≡ (M&lt;sup&gt;d&lt;/sup&gt;)&lt;sup&gt;e&lt;/sup&gt; ≡ M&lt;sup&gt;d*e&lt;/sup&gt; ≡ M&lt;sup&gt;e*d&lt;/sup&gt; ≡ M&lt;sup&gt;1&lt;/sup&gt; ≡ M (mod n)&lt;/p&gt;

&lt;p&gt;Public key cryptography algorithms like RSA are often called “asymmetric” algorithms because the encryption key (in our case, “e”) is not equal to (e.g. “symmetric” with) the decryption key “d”. Reducing everything “mod n” makes it impossible to use the easy techniques that we’re used to such as normal &lt;a href=&quot;http://en.wikipedia.org/wiki/Logarithm&quot;&gt;logarithms&lt;/a&gt;. The magic of RSA works because you can calculate/encrypt C ≡ M&lt;sup&gt;e&lt;/sup&gt; (mod n) &lt;a href=&quot;http://en.wikipedia.org/wiki/Modular_exponentiation&quot;&gt;very quickly&lt;/a&gt;, but it is &lt;em&gt;really hard&lt;/em&gt; to calculate/decrypt C&lt;sup&gt;d&lt;/sup&gt; ≡ M (mod n) without knowing “d.” As we saw earlier, “d” is derived from &lt;a href=&quot;http://en.wikipedia.org/wiki/Integer_factorization&quot;&gt;factoring&lt;/a&gt; “n” back to its “p” and “q”, which is a &lt;a href=&quot;http://en.wikipedia.org/wiki/NP_%28complexity%29&quot;&gt;tough problem&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;verifying-signatures&quot;&gt;Verifying Signatures&lt;/h2&gt;

&lt;p&gt;The big thing to keep in mind with RSA in the real world is that all of the numbers involved have to be &lt;em&gt;big&lt;/em&gt; to make things really hard to break using the &lt;a href=&quot;http://en.wikipedia.org/wiki/General_number_field_sieve&quot;&gt;best algorithms that we have&lt;/a&gt;. How big? Amazon.com’s certificate was “signed” by “VeriSign Class 3 Secure Server CA.” From the certificate, we see that this VeriSign modulus “n” is 2048 bits long which has this 617 digit base-10 representation:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
1890572922 9464742433 9498401781 6528521078 8629616064 
3051642608 4317020197 7241822595 6075980039 8371048211 
4887504542 4200635317 0422636532 2091550579 0341204005 
1169453804 7325464426 0479594122 4167270607 6731441028 
3698615569 9947933786 3789783838 5829991518 1037601365 
0218058341 7944190228 0926880299 3425241541 4300090021 
1055372661 2125414429 9349272172 5333752665 6605550620 
5558450610 3253786958 8361121949 2417723618 5199653627 
5260212221 0847786057 9342235500 9443918198 9038906234 
1550747726 8041766919 1500918876 1961879460 3091993360 
6376719337 6644159792 1249204891 7079005527 7689341573 
9395596650 5484628101 0469658502 1566385762 0175231997 
6268718746 7514321
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;(Good luck trying to find “p” and “q” from this “n” - if you could, you could generate real-looking VeriSign certificates.)&lt;/p&gt;

&lt;p&gt;VeriSign’s “e” is 2&lt;sup&gt;16&lt;/sup&gt; + 1 = 65537. Of course, they keep their “d” value secret, probably on a safe hardware device protected by retinal scanners and armed guards. Before signing, VeriSign checked the validity of the contents that Amazon.com claimed on its certificate using a real-world “handshake” that involved &lt;a href=&quot;http://www.verisign.com/ssl/ssl-information-center/ssl-basics/index.html#a7&quot;&gt;looking at several of their business documents&lt;/a&gt;. Once VeriSign was satisfied with the documents, they used the &lt;a href=&quot;http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-0_and_SHA-1&quot;&gt;SHA-1&lt;/a&gt; hash algorithm to get a hash value of the certificate that had all the claims. In Wireshark, the full certificate shows up as the “signedCertificate” part:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/signedcertificate.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It’s sort of a misnomer since it actually means that those are the bytes that the signer is &lt;em&gt;going to sign&lt;/em&gt; and not the bytes that already include a signature.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/certsignature.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The actual signature, “S”, is simply called “encrypted” in Wireshark. If we raise “S” to VeriSign’s public “e” exponent of 65537 and then take the remainder when divided by the modulus “n”, we get this “decrypted” signature hex value:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
0001FFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF 
FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF 
FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF 
FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF 
FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF 
FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF 
FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF 
FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF 
FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF 
FFFFFFFF00302130 0906052B0E03021A 05000414C19F8786 
871775C60EFE0542 E4C2167C830539DB
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://tools.ietf.org/html/rfc2313#page-9&quot;&gt;Per the PKCS #1 v1.5 standard&lt;/a&gt;, the first byte is “00” and it “ensures that the encryption block, [when] converted to an integer, is less than the modulus.” The second byte of “01” indicates that this is a private key operation (e.g. it’s a signature). This is followed by a lot of “FF” bytes that are used to pad the result to make sure that it’s big enough. The padding is terminated by a “00” byte. It’s followed by “30 21 30 09 06 05 2B 0E 03 02 1A 05 00 04 14” which is the &lt;a href=&quot;http://tools.ietf.org/html/rfc3447#page-43&quot;&gt;PKCS #1 v2.1 way&lt;/a&gt; of specifying the &lt;a href=&quot;http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-0_and_SHA-1&quot;&gt;SHA-1&lt;/a&gt; hash algorithm. The last 20 bytes are SHA-1 hash digest of the bytes in “signedCertificate.”&lt;/p&gt;

&lt;p&gt;Since the decrypted value &lt;a href=&quot;http://www.matasano.com/log/558/public-key-signature-forgery-collected/&quot;&gt;is properly formatted&lt;/a&gt; and the last bytes are the same hash value that we can calculate independently, we can assume that whoever knew “VeriSign Class 3 Secure Server CA”’s private key “signed” it. We implicitly trust that only VeriSign knows the private key “d.”&lt;/p&gt;

&lt;p&gt;We can repeat the process to verify that “VeriSign Class 3 Secure Server CA”’s certificate was signed by VeriSign’s “Class 3 Public Primary Certification Authority.”&lt;/p&gt;

&lt;p&gt;But why should we trust &lt;em&gt;that&lt;/em&gt;? There are no more levels on the trust chain.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/BuiltInCertificateHierarchy.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The top “VeriSign Class 3 Public Primary Certification Authority” was signed by &lt;em&gt;itself&lt;/em&gt;. This certificate has been built into Mozilla products as an implicitly trusted good certificate since version &lt;a href=&quot;http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/security/nss/lib/ckfw/builtins/certdata.txt&amp;amp;rev=NSS_3_12_2_WITH_CKBI_1_73_RTM&amp;amp;mark=1.51&quot;&gt;1.4 of certdata.txt&lt;/a&gt; in the Network Security Services (&lt;a href=&quot;http://www.mozilla.org/projects/security/pki/nss/&quot;&gt;NSS&lt;/a&gt;) library. It was checked-in on September 6, 2000 by Netscape’s Robert Relyea with the following comment:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Make the framework compile with the rest of NSS. Include a ‘live’ certdata.txt with those certs we have permission to push to open source (additional certs will be added as we get permission from the owners).”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This decision has had a relatively long impact since the certificate has a validity range of January 28, 1996 - August 1, 2028.&lt;/p&gt;

&lt;p&gt;As Ken Thompson explained so well in his “&lt;a href=&quot;http://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf&quot;&gt;Reflections on Trusting Trust&lt;/a&gt;”, you ultimately have to implicitly trust somebody. There is no way around this problem. In this case, we’re implicitly trusting that Robert Relyea made a good choice. We also hope that &lt;a href=&quot;http://www.mozilla.org/projects/security/certs/policy/&quot;&gt;Mozilla’s built-in certificate policy&lt;/a&gt; is reasonable for the other built-in certificates.&lt;/p&gt;

&lt;p&gt;One thing to keep in mind here is that all these certificates and signatures were simply used to form a trust chain. On the public Internet, VeriSign’s root certificate is implicitly trusted by Firefox long before you go to any website. In a company, you can create your own root certificate authority (CA) that you can install on everyone’s machine.&lt;/p&gt;

&lt;p&gt;Alternatively, you can get around having to pay companies like VeriSign and avoid certificate trust chains altogether. Certificates are used to establish trust by using a trusted third-party (in this case, VeriSign). If you have a secure means of sharing a secret “key”, such as whispering a long password into someone’s ear, then you can use that pre-shared key (PSK) to establish trust. There are extensions to TLS to allow this, such as &lt;a href=&quot;http://tools.ietf.org/html/rfc4279&quot;&gt;TLS-PSK&lt;/a&gt;, and my personal favorite, &lt;a href=&quot;http://tools.ietf.org/html/rfc5054&quot;&gt;TLS with Secure Remote Password (SRP) extensions&lt;/a&gt;. Unfortunately, these extensions aren’t nearly as widely deployed and supported, so they’re usually not practical. Additionally, these alternatives impose a burden that we have to have some other secure means of communicating the secret that’s more cumbersome than what we’re trying to establish with TLS (otherwise, why wouldn’t we use &lt;em&gt;that&lt;/em&gt; for everything?).&lt;/p&gt;

&lt;p&gt;One final check that we need to do is to verify that the host name on the certificate is what we expected. &lt;a href=&quot;http://www.linkedin.com/in/nelsonbolyard&quot;&gt;Nelson Bolyard&lt;/a&gt;’s comment in the &lt;a href=&quot;http://web.archive.org/web/20090614041758id_/http://www.koders.com/c/fid1C807D78F4E4CA73466FEEAA78EA9F0B2D618199.aspx#L260&quot;&gt;SSL_AuthCertificate function&lt;/a&gt; explains why:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;cm&quot;&gt;/* cert is OK. This is the client side of an SSL connection.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * Now check the name field in the cert against the desired hostname.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * NB: This is our only defense against Man-In-The-Middle (MITM) attacks! &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; */&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This check helps prevent against a &lt;a href=&quot;http://en.wikipedia.org/wiki/Man-in-the-middle_attack&quot;&gt;man-in-the-middle&lt;/a&gt; attack because we are implicitly trusting that the people on the certificate trust chain wouldn’t do something bad, like sign a certificate claiming to be from Amazon.com unless it actually was Amazon.com. If an attacker is able to modify your DNS server by using a technique like &lt;a href=&quot;http://en.wikipedia.org/wiki/DNS_cache_poisoning&quot;&gt;DNS cache poisoning&lt;/a&gt;, you might be fooled into thinking you’re at a trusted site (like Amazon.com) because the address bar will look normal. This last check implicitly trusts certificate authorities to stop these bad things from happening.&lt;/p&gt;

&lt;h2 id=&quot;pre-master-secret&quot;&gt;Pre-Master Secret&lt;/h2&gt;

&lt;p&gt;We’ve verified some claims about Amazon.com and know its public encryption exponent “e” and modulus “n.” Anyone listening in on the traffic can know this as well (as evidenced because we are using Wireshark captures). Now we need to create a random secret key that an eavesdropper/attacker can’t figure out. This isn’t as easy as it sounds. In 1996, researchers figured out that &lt;a href=&quot;http://en.wikipedia.org/wiki/Netscape_Navigator&quot;&gt;Netscape Navigator&lt;/a&gt; 1.1 was &lt;a href=&quot;http://www.cs.berkeley.edu/~daw/papers/ddj-netscape.html&quot;&gt;using only three sources&lt;/a&gt; to seed their pseudo-random number generator (&lt;a href=&quot;http://en.wikipedia.org/wiki/Pseudorandom_number_generator&quot;&gt;PRNG&lt;/a&gt;). The sources were: the time of day, the process id, and the parent process id. As the researchers showed, these “random” sources aren’t that random and were relatively easy to figure out.&lt;/p&gt;

&lt;p&gt;Since everything else was derived from these three “random” sources, it was possible to “break” the SSL “security” in 25 seconds on a 1996 era machine. If you still don’t believe that finding randomness is hard, just &lt;a href=&quot;http://www.schneier.com/blog/archives/2008/05/random_number_b.html&quot;&gt;ask the Debian OpenSSL maintainers&lt;/a&gt;. If you mess it up, all the security built on top of it is suspect.&lt;/p&gt;

&lt;p&gt;On Windows, random numbers used for cryptographic purposes are generated by calling the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/aa379942%28VS.85%29.aspx&quot;&gt;CryptGenRandom function&lt;/a&gt; that hashes bits &lt;a href=&quot;http://blogs.msdn.com/michael_howard/archive/2005/01/14/353379.aspx#353493&quot;&gt;sampled from over 125 sources&lt;/a&gt;. Firefox uses this function along with some bits derived from &lt;a href=&quot;http://web.archive.org/web/20090614041823id_/http://www.koders.com/c/fidBC778BD3666AA64522D1FD4F4EC3331E44B4D204.aspx?s=RNG_GetNoise&quot;&gt;its own function&lt;/a&gt; to seed its &lt;a href=&quot;http://web.archive.org/web/20090614041833id_/http://www.koders.com/c/fidD184CA9064625C0ADF48025F3FA0588FCD664057.aspx&quot;&gt;pseudo-random number generator&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The 48 byte “pre-master secret” random value that’s generated isn’t used directly, but it’s very important to keep it secret since a lot of things are derived from it. Not surprisingly, Firefox makes it hard to find out this value. I had to compile a debug version and set the &lt;a href=&quot;http://web.archive.org/web/20090614041829id_/http://www.koders.com/c/fidCFCD763A9E0B2BEF3FB9D4D6C17B4094CBF21548.aspx#L2092&quot;&gt;SSLDEBUGFILE&lt;/a&gt; and &lt;a href=&quot;http://web.archive.org/web/20090614041829id_/http://www.koders.com/c/fidCFCD763A9E0B2BEF3FB9D4D6C17B4094CBF21548.aspx#L2101&quot;&gt;SSLTRACE&lt;/a&gt; environment variables to see it.&lt;/p&gt;

&lt;p&gt;In this particular session, the pre-master secret showed up in the SSLDEBUGFILE as:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
4456: SSL[131491792]: Pre-Master Secret [Len: 48] 
03 01 bb 7b 08 98 a7 49 de e8 e9 b8 91 52 ec 81 ...{...I.....R.. 
4c c2 39 7b f6 ba 1c 0a b1 95 50 29 be 02 ad e6 L.9{......P).... 
ad 6e 11 3f 20 c4 66 f0 64 22 57 7e e1 06 7a 3b .n.? .f.d&quot;W~..z;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Note that it’s not completely random. The first two bytes are, &lt;a href=&quot;http://tools.ietf.org/html/rfc2246#page-44&quot;&gt;by convention&lt;/a&gt;, the TLS version (03 01).&lt;/p&gt;

&lt;h2 id=&quot;trading-secrets&quot;&gt;Trading Secrets&lt;/h2&gt;

&lt;p&gt;We now need to get this secret value over to Amazon.com. By Amazon’s wishes of “TLS_RSA_WITH_RC4_128_MD5”, we will use RSA to do this. You &lt;em&gt;could&lt;/em&gt; make your input message equal to just the 48 byte pre-master secret, but the Public Key Cryptography Standard (PKCS) #1, version 1.5 RFC &lt;a href=&quot;http://tools.ietf.org/html/rfc2313#page-8&quot;&gt;tells us&lt;/a&gt; that we should pad these bytes with &lt;em&gt;random&lt;/em&gt; data to make the input equal to exactly the size of the modulus (1024 bits/128 bytes). This makes it harder for an attacker to determine our pre-master secret. It also gives us one last chance to protect ourselves in case we did something really bone-headed, like reusing the same secret. If we reused the key, the eavesdropper would likely see a different value placed on the network due to the random padding.&lt;/p&gt;

&lt;p&gt;Again, Firefox makes it hard to see these random values. I had to insert debugging statements into &lt;a href=&quot;http://web.archive.org/web/20090614041803id_/http://www.koders.com/c/fid1EB31A222A560045DBF9EC54457A1E0339825D58.aspx#L190&quot;&gt;the padding function&lt;/a&gt; to see what was going on:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;n&quot;&gt;wrapperHandle&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fopen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;plaintextpadding.txt&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;a&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;fprintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wrapperHandle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;PLAINTEXT = &amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;modulusLen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;fprintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wrapperHandle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;%02X &amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;fprintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wrapperHandle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\r\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;fclose&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wrapperHandle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In this session, the full padded value was:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
00 02 12 A3 EA B1 65 D6 81 6C 13 14 13 62 10 53 23 B3 96 85 FF 24
FA CC 46 11 21 24 A4 81 EA 30 63 95 D4 DC BF 9C CC D0 2E DD 5A A6 
41 6A 4E 82 65 7D 70 7D 50 09 17 CD 10 55 97 B9 C1 A1 84 F2 A9 AB 
EA 7D F4 CC 54 E4 64 6E 3A E5 91 A0 06 00 03 01 BB 7B 08 98 A7 49 
DE E8 E9 B8 91 52 EC 81 4C C2 39 7B F6 BA 1C 0A B1 95 50 29 BE 02 
AD E6 AD 6E 11 3F 20 C4 66 F0 64 22 57 7E E1 06 7A 3B
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Firefox took this value and &lt;a href=&quot;http://web.archive.org/web/20090614041754id_/http://www.koders.com/c/fid1B0E0F62F1B3DB6D7272F0BD781A1609D76FE6FE.aspx#L312&quot;&gt;calculated&lt;/a&gt; “C ≡ M&lt;sup&gt;e&lt;/sup&gt; (mod n)” to get the value we see in the “&lt;a href=&quot;http://tools.ietf.org/html/rfc2246#page-43&quot;&gt;Client Key Exchange&lt;/a&gt;” record:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/clientkeyexchange.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Finally, Firefox sent out one last unencrypted message, a “&lt;a href=&quot;http://tools.ietf.org/html/rfc2246#page-24&quot;&gt;Change Cipher Spec&lt;/a&gt;” record:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/clientchangecipherspec.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This is Firefox’s way of telling Amazon that it’s going to start using the agreed upon secret to encrypt its next message.&lt;/p&gt;

&lt;h2 id=&quot;deriving-the-master-secret&quot;&gt;Deriving the Master Secret&lt;/h2&gt;

&lt;p&gt;If we’ve done everything correctly, both sides (and only those sides) now know the 48 byte (256 bit) pre-master secret. There’s a slight trust issue here from Amazon’s perspective: the pre-master secret just has bits that were generated by the client, they don’t take anything into account from the server or anything we said earlier. We’ll fix that be computing the “master secret.” &lt;a href=&quot;http://tools.ietf.org/html/rfc2246#page-47&quot;&gt;Per the spec&lt;/a&gt;, this is done by calculating:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;n&quot;&gt;master_secret&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PRF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre_master_secret&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
                    &lt;span class=&quot;s&quot;&gt;&amp;quot;master secret&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
                    &lt;span class=&quot;n&quot;&gt;ClientHello&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ServerHello&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The “pre_master_secret” is the secret value we sent earlier. The “master secret” is simply a string whose &lt;a href=&quot;http://en.wikipedia.org/wiki/ASCII&quot;&gt;ASCII&lt;/a&gt; bytes (e.g. “6d 61 73 74 65 72 …”) are used. We then concatenate the random values that were sent in the ClientHello and ServerHello (from Amazon) messages that we saw at the beginning.&lt;/p&gt;

&lt;p&gt;The PRF is the “Pseudo-Random Function” that’s also &lt;a href=&quot;http://tools.ietf.org/html/rfc2246#page-11&quot;&gt;defined in the spec&lt;/a&gt; and is quite clever. It combines the secret, the ASCII label, and the seed data we give it by using the keyed-Hash Message Authentication Code (&lt;a href=&quot;http://en.wikipedia.org/wiki/HMAC&quot;&gt;HMAC&lt;/a&gt;) versions of both &lt;a href=&quot;http://en.wikipedia.org/wiki/MD5&quot;&gt;MD5&lt;/a&gt; and &lt;a href=&quot;http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-0_and_SHA-1&quot;&gt;SHA-1&lt;/a&gt; hash functions. Half of the input is sent to each hash function. It’s clever because it is quite resistant to attack, even in the face of &lt;a href=&quot;http://www.win.tue.nl/hashclash/rogue-ca/&quot;&gt;weaknesses in MD5&lt;/a&gt; &lt;a href=&quot;http://www.schneier.com/blog/archives/2005/02/sha1_broken.html&quot;&gt;and SHA-1&lt;/a&gt;. This process can feedback on itself and iterate forever to generate as many bytes as we need.&lt;/p&gt;

&lt;p&gt;Following this procedure, we obtain a 48 byte “master secret” of&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
4C AF 20 30 8F 4C AA C5 66 4A 02 90 F2 AC 10 00 39 DB 1D E0 1F CB 
E0 E0 9D D7 E6 BE 62 A4 6C 18 06 AD 79 21 DB 82 1D 53 84 DB 35 A7 
1F C1 01 19
&lt;/code&gt;&lt;/p&gt;

&lt;h2 id=&quot;generating-lots-of-keys&quot;&gt;Generating Lots of Keys&lt;/h2&gt;

&lt;p&gt;Now that both sides have a “master secrets”, the spec &lt;a href=&quot;http://tools.ietf.org/html/rfc2246#page-21&quot;&gt;shows us&lt;/a&gt; how we can derive all the needed session keys we need using the PRF to create a “key block” where we will pull data from:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
key_block = PRF(SecurityParameters.master_secret, 
                &quot;key expansion&quot;, 
                SecurityParameters.server_random + 
                SecurityParameters.client_random);
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The bytes from “key_block” are used to populate the following:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
client_write_MAC_secret[SecurityParameters.hash_size]
server_write_MAC_secret[SecurityParameters.hash_size]
client_write_key[SecurityParameters.key_material_length]
server_write_key[SecurityParameters.key_material_length]
client_write_IV[SecurityParameters.IV_size]
server_write_IV[SecurityParameters.IV_size]
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Since we’re using a &lt;a href=&quot;http://en.wikipedia.org/wiki/Stream_cipher&quot;&gt;stream cipher&lt;/a&gt; instead of a &lt;a href=&quot;http://en.wikipedia.org/wiki/Block_cipher&quot;&gt;block cipher&lt;/a&gt; like the Advanced Encryption Standard (&lt;a href=&quot;http://en.wikipedia.org/wiki/Advanced_Encryption_Standard&quot;&gt;AES&lt;/a&gt;), we don’t need the Initialization Vectors (&lt;a href=&quot;http://en.wikipedia.org/wiki/Initialization_vector&quot;&gt;IV&lt;/a&gt;s). Therefore, we just need two Message Authentication Code (&lt;a href=&quot;http://en.wikipedia.org/wiki/Message_authentication_code&quot;&gt;MAC&lt;/a&gt;) keys for each side that are 16 bytes (128 bits) each since the specified MD5 hash digest size is 16 bytes. In addition, the RC4 cipher uses a 16 byte (128 bit) key that both sides will need as well. All told, we need 2&lt;em&gt;16 + 2&lt;/em&gt;16 = 64 bytes from the key block.&lt;/p&gt;

&lt;p&gt;Running the PRF, we get these values:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
client_write_MAC_secret = 80 B8 F6 09 51 74 EA DB 29 28 EF 6F 9A B8 81 B0 
server_write_MAC_secret = 67 7C 96 7B 70 C5 BC 62 9D 1D 1F 4A A6 79 81 61 
client_write_key = 32 13 2C DD 1B 39 36 40 84 4A DE E5 6C 52 46 72 
server_write_key = 58 36 C4 0D 8C 7C 74 DA 6D B7 34 0A 91 B6 8F A7
&lt;/code&gt;&lt;/p&gt;

&lt;h2 id=&quot;prepare-to-be-encrypted&quot;&gt;Prepare to be Encrypted!&lt;/h2&gt;

&lt;p&gt;The last handshake message the client sends out is the “&lt;a href=&quot;http://tools.ietf.org/html/rfc2246#page-46&quot;&gt;Finished message&lt;/a&gt;.” This is a clever message that proves that no one tampered with the handshake and it proves that we know the key. The client takes all bytes from all handshake messages and puts them into a “handshake_messages” buffer. We then calculate 12 bytes of “verify_data” using the pseudo-random function (PRF) with our master key, the label “client finished”, and an MD5 and SHA-1 hash of “handshake_messages”:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
verify_data = PRF(master_secret, 
                  &quot;client finished&quot;, 
                  MD5(handshake_messages) + 
                  SHA-1(handshake_messages)
                 ) [12]
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;We take the result and add a record header byte “0x14” to indicate “finished” and length bytes “00 00 0c” to indicate that we’re sending 12 bytes of verify data. Then, like all future encrypted messages, we need to make sure the decrypted contents haven’t been tampered with. Since our cipher suite in use is TLS_RSA_WITH_RC4_128_MD5, this means we use the MD5 hash function.&lt;/p&gt;

&lt;p&gt;Some people get paranoid when they hear MD5 because it has some weaknesses. I certainly don’t advocate using it as-is. However, TLS is smart in that it doesn’t use MD5 directly, but rather the &lt;a href=&quot;http://en.wikipedia.org/wiki/HMAC&quot;&gt;HMAC&lt;/a&gt; version of it. This means that instead of using MD5(m) directly, we calculate:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
HMAC_MD5(Key, m) = MD5((Key ⊕ opad) ++ MD5((Key ⊕ ipad) ++ m)
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;(The ⊕ means &lt;a href=&quot;http://en.wikipedia.org/wiki/Exclusive_or&quot;&gt;XOR&lt;/a&gt;, ++ means concatenate, “opad” is the bytes “5c 5c … 5c”, and “ipad” is the bytes “36 36 … 36”).&lt;/p&gt;

&lt;p&gt;In particular, we calculate:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
HMAC_MD5(client_write_MAC_secret, 
         seq_num + 
         TLSCompressed.type + 
         TLSCompressed.version + 
         TLSCompressed.length + 
         TLSCompressed.fragment));
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;As you can see, we include a sequence number (“seq_num”) along with attributes of the plaintext message (here it’s called “TLSCompressed”). The sequence number foils attackers who might try to take a previously encrypted message and insert it midstream. If this occurred, the sequence numbers would definitely be different than what we expected. This also protects us from an attacker dropping a message.&lt;/p&gt;

&lt;p&gt;All that’s left is to encrypt these bytes.&lt;/p&gt;

&lt;h2 id=&quot;rc4-encryption&quot;&gt;RC4 Encryption&lt;/h2&gt;

&lt;p&gt;Our negotiated cipher suite was TLS_RSA_WITH_RC4_128_MD5. This tells us that we need to use &lt;a href=&quot;http://people.csail.mit.edu/rivest/faq.html&quot;&gt;Ron’s Code&lt;/a&gt; #4 (&lt;a href=&quot;http://en.wikipedia.org/wiki/RC4&quot; title=&quot;RC4&quot;&gt;RC4&lt;/a&gt;) to encrypt the traffic. &lt;a href=&quot;http://en.wikipedia.org/wiki/Ron_Rivest&quot;&gt;Ron Rivest&lt;/a&gt; developed the RC4 algorithm to generate random bytes based on a 256 byte key. The algorithm is so simple you can actually memorize it in a few minutes.&lt;/p&gt;

&lt;p&gt;RC4 begins by creating a 256-byte “S” byte array and populating it with 0 to 255. You then iterate over the array by mixing in bytes from the key. You do this to create a state machine that is used to generate “random” bytes. To generate a random byte, we shuffle around the “S” array.&lt;/p&gt;

&lt;p&gt;Put graphically, it looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/RC4.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To encrypt a byte, we &lt;a href=&quot;http://en.wikipedia.org/wiki/Exclusive_or&quot;&gt;xor&lt;/a&gt; this pseudo-random byte with the byte we want to encrypt. Remember that xor’ing a bit with 1 causes it to flip. Since we’re generating random numbers, on average the xor will flip half of the bits. This random bit flipping is effectively how we encrypt data. As you can see, it’s not very complicated and thus it runs quickly. I think that’s why Amazon chose it.&lt;/p&gt;

&lt;p&gt;Recall that we have a “client_write_key” and a “server_write_key.” The means we need to create two RC4 instances: one to encrypt what our browser sends and the other to decrypt what the server sent us.&lt;/p&gt;

&lt;p&gt;The first few random bytes out of the “client_write” RC4 instance are “7E 20 7A 4D FE FB 78 A7 33 …” If we xor these bytes with the unencrypted header and verify message bytes of “14 00 00 0C 98 F0 AE CB C4 …”, we’ll get what appears in the encrypted portion that we can see in Wireshark:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/clientencryptedkeyexchange.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The server does almost the same thing. It sends out a “Change Cipher Spec” and then a “Finished Message” that includes all handshake messages, including the &lt;em&gt;decrypted&lt;/em&gt; version of the client’s “Finished Message.” Consequently, this proves to the client that the server was able to successfully decrypt our message.&lt;/p&gt;

&lt;h2 id=&quot;welcome-to-the-application-layer&quot;&gt;Welcome to the Application Layer!&lt;/h2&gt;

&lt;p&gt;Now, 220 milliseconds after we started, we’re finally ready for the application layer. We can now send normal HTTP traffic that’ll be encrypted by the TLS layer with the RC4 write instance and decrypt traffic with the server RC4 write instance. In addition, the TLS layer will check each record for tampering by computing the HMAC_MD5 hash of the contents.&lt;/p&gt;

&lt;p&gt;At this point, the handshake is over. Our TLS record’s content type is now 23 (0x17). Encrypted traffic begins with “17 03 01” which indicate the record type and TLS version. These bytes are followed by our encrypted size, which includes the HMAC hash.&lt;/p&gt;

&lt;p&gt;Encrypting the plaintext of:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;GET /gp/cart/view.html/ref=pd_luc_mri HTTP/1.1 
Host: www.amazon.com 
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.10) Gecko/2009060911 Minefield/3.0.10 (.NET CLR 3.5.30729) 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
Accept-Language: en-us,en;q=0.5 
Accept-Encoding: gzip,deflate 
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 
Keep-Alive: 300 
Connection: keep-alive 
...
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;will give us the bytes we see on the wire:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/firstclientappdataencrypted.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The only other interesting fact is that the sequence number increases on each record, it’s now 1 (and the next record will be 2, etc).&lt;/p&gt;

&lt;p&gt;The server does the same type of thing on its side using the server_write_key. We see its response, including the tell-tale application data header:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/firstserverappdata.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Decrypting this gives us:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;HTTP/1.1 200 OK 
Date: Wed, 10 Jun 2009 01:09:30 GMT 
Server: Server 
... 
Cneonction: close 
Transfer-Encoding: chunked
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which is a normal HTTP reply that includes a non-descriptive “Server: Server” header and a misspelled “&lt;a href=&quot;http://www.nextthing.org/archives/2005/08/07/fun-with-http-headers&quot;&gt;Cneonction: close&lt;/a&gt;” header coming from Amazon’s load balancers.&lt;/p&gt;

&lt;p&gt;TLS is just below the application layer. The HTTP server software can act as if it’s sending unencrypted traffic. The only change is that it writes to a library that does all the encryption. &lt;a href=&quot;http://www.openssl.org/&quot;&gt;OpenSSL&lt;/a&gt; is a popular open-source library for TLS.&lt;/p&gt;

&lt;p&gt;The connection will stay open while both sides send and receive encrypted data until either side sends out a “&lt;a href=&quot;http://tools.ietf.org/html/rfc2246#page-25&quot;&gt;closure alert&lt;/a&gt;” message and then closes the connection. If we reconnect shortly after disconnecting, we can re-use the negotiated keys (if the server still has them cached) without using public key operations, otherwise we do a completely new full handshake.&lt;/p&gt;

&lt;p&gt;It’s important to realize that application data records can be &lt;em&gt;anything&lt;/em&gt;. The only reason “HTTPS” is special is because the web is so popular. There are lots of other TCP/IP based protocols that ride on top of TLS. For example, TLS is used by &lt;a href=&quot;http://tools.ietf.org/html/rfc4217&quot;&gt;FTPS&lt;/a&gt; and &lt;a href=&quot;http://tools.ietf.org/html/rfc3207&quot;&gt;secure extensions to SMTP&lt;/a&gt;. It’s certainly better to use TLS than inventing your own solution. Additionally, you’ll benefit from a protocol that has withstood careful &lt;a href=&quot;http://tools.ietf.org/html/rfc5246#appendix-F&quot;&gt;security analysis&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;and-were-done&quot;&gt;… And We’re Done!&lt;/h2&gt;

&lt;p&gt;The very readable &lt;a href=&quot;http://tools.ietf.org/html/rfc5246&quot;&gt;TLS RFC&lt;/a&gt; covers many more details that were missed here. We covered just one single path in our observation of the 220 millisecond dance between Firefox and Amazon’s server. Quite a bit of the process was affected by the TLS_RSA_WITH_RC4_128_MD5 Cipher Suite selection that Amazon made with its ServerHello message. It’s a reasonable choice that slightly favors speed over security.&lt;/p&gt;

&lt;p&gt;As we saw, if someone could secretly factor Amazon’s “n” modulus into its respective “p” and “q”, they could effectively decrypt all “secure” traffic until Amazon changes their certificate. Amazon counter-balances this concern this with a short one year duration certificate:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/first-few-milliseconds-of-https/amazoncertvalidity.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;One of the cipher suites that was offered was “TLS_DHE_RSA_WITH_AES_256_CBC_SHA” which uses the &lt;a href=&quot;http://en.wikipedia.org/wiki/Diffie-Hellman_key_exchange&quot;&gt;Diffie-Hellman key exchange&lt;/a&gt; that has a nice property of “&lt;a href=&quot;http://en.wikipedia.org/wiki/Perfect_forward_secrecy&quot;&gt;forward secrecy&lt;/a&gt;.” This means that if someone cracked the mathematics of the key exchange, they’d be no better off to decrypt another session. One downside to this algorithm is that it requires more math with big numbers, and thus is a little more computationally taxing on a busy server. The “Advanced Encryption Standard” (&lt;a href=&quot;http://en.wikipedia.org/wiki/Advanced_Encryption_Standard&quot;&gt;AES&lt;/a&gt;) algorithm was present in many of the suites that we offered. It’s different than RC4 in that it works on 16 byte “blocks” at a time rather than a single byte. Since its key can be up to 256 bits, many consider this to be more secure than RC4.&lt;/p&gt;

&lt;p&gt;In just 220 milliseconds, two endpoints on the Internet came together, provided enough credentials to trust each other, set up encryption algorithms, and started to send encrypted traffic.&lt;/p&gt;

&lt;p&gt;And to think, all of this just so Bob can buy milk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt; I wrote a program that walks through the handshake steps mentioned in this article. &lt;a href=&quot;http://github.com/moserware/TLS-1.0-Analyzer/tree/master&quot;&gt;I posted it to GitHub&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Wed, 10 Jun 2009 08:57:00 +0000</pubDate>
        <link>http://www.moserware.com/2009/06/first-few-milliseconds-of-https.html</link>
        <guid isPermaLink="true">http://www.moserware.com/2009/06/first-few-milliseconds-of-https.html</guid>
        
        
      </item>
    
      <item>
        <title>Using Obscure Windows COM APIs in .NET</title>
        <description>&lt;p&gt;Most native Windows APIs are simple to call from .NET. For example, if you need to do something special when showing a window, you can use the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms633548%28VS.85%29.aspx&quot;&gt;ShowWindow&lt;/a&gt; API using &lt;a href=&quot;http://en.wikipedia.org/wiki/Platform_Invocation_Services&quot;&gt;Platform Invocation Services&lt;/a&gt; (P/Invoke) like this:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;na&quot;&gt;[DllImport(&amp;quot;user32.dll&amp;quot;)]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;extern&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;ShowWindow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IntPtr&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hWnd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nCmdShow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When you call this function, here’s roughly what happens:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The CLR calls &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms684175%28VS.85%29.aspx&quot;&gt;LoadLibrary&lt;/a&gt; on the file (e.g. “user32.dll”)&lt;/li&gt;
  &lt;li&gt;The CLR then calls &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms683212%28VS.85%29.aspx&quot;&gt;GetProcAddress&lt;/a&gt; on the function name (e.g. “ShowWindow”) to get the address of where the function is located.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For the most part, it just magically works. If we had used a function like “MessageBox”, the CLR would notice that it doesn’t exist and would then pick between the ANSI version (e.g. “MessageBoxA”) or the Unicode version (e.g. “MessageBoxW”).&lt;/p&gt;

&lt;p&gt;With the address in hand, it’s easy to &lt;a href=&quot;http://en.wikipedia.org/wiki/Branch_%28computer_science%29&quot;&gt;jump&lt;/a&gt; to it and you’re all set. Simple and easy.&lt;/p&gt;

&lt;p&gt;I was expecting a simple API like this when I was investigating how to register my program as the default handler for “.wav” files on Vista. In the pre-Vista days, most programs would write directly into a registry key for the file extension (e.g. “&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms724475%28VS.85%29.aspx&quot;&gt;HKEY_CLASSES_ROOT&lt;/a&gt;\.wav”) and move on. Problems come when your program wants to register itself as a handler for a “popular” extension like .MP3 or .HTM. Some programs go into an all out arms race with other programs in a fight of wills to make sure they keep the extension.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/using-obscure-windows-com-apis-in-net/SetFileAssociations.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In Windows Vista and later, Microsoft wants us to use the new “&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/bb756951.aspx&quot;&gt;Default Programs&lt;/a&gt;” feature. The idea is that you register what file extensions your program supports in the registry and then a nice UI allows people to easily pick which of those extensions they want to associate with your program. Digging around the documentation led me to discover that the bulk of the functionality was exposed via the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/bb776332.aspx&quot;&gt;IApplicationAssociationRegistration&lt;/a&gt; COM interface.&lt;/p&gt;

&lt;p&gt;Ah, COM.&lt;/p&gt;

&lt;p&gt;Over the years, I’ve tried to keep my distance from it. This irrational fear came from wizards that “next, next, finish”‘d your way into thousands of lines of inscrutable code. It took me years of passing glances to &lt;a href=&quot;http://www.moserware.com/2008/01/finally-understanding-com-after.html&quot;&gt;finally understand its basics&lt;/a&gt;. Even then, when I needed to use it from .NET, I’d right click on my project references and click “Add Reference”:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/using-obscure-windows-com-apis-in-net/AddComReference.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I’d pick the library I needed and then somehow I could use the types as if they were .NET objects. I didn’t ask further questions and moved on.&lt;/p&gt;

&lt;p&gt;Unfortunately, IApplicationAssociationRegistration was nowhere to be found on the “Add Reference” list since it doesn’t seem to have a registered type library associated with it. Using my basic COM knowledge, I knew that if I wanted to use it I would need to know the interface identifier (IID) as well as a class identifier (CLSID) that pointed to a concrete implementation.&lt;/p&gt;

&lt;p&gt;Following the MSDN documentation, I knew I’d probably find success in shobjidl.idl:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/using-obscure-windows-com-apis-in-net/InterfaceInformation.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Sure enough, shobjidl.idl was sitting in my “C:\Program Files\Microsoft SDKs\Windows\&lt;a href=&quot;http://www.microsoft.com/downloads/details.aspx?FamilyID=e6e1c3df-a74f-4207-8586-711ebe331cdc&amp;amp;displaylang=en&quot;&gt;v6.1&lt;/a&gt;\Include” directory and had this interface definition:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;object&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;uuid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;4e530&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b0a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e611&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c77&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a3ac&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9031&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d022281b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;pointer_default&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;helpstring&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Protocol URL and Extension File Application&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;nl&quot;&gt;IApplicationAssociationRegistration&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IUnknown&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;HRESULT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QueryCurrentDefault&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
     &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LPCWSTR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pszQuery&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ASSOCIATIONTYPE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;atQueryType&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ASSOCIATIONLEVEL&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alQueryLevel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
     &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LPWSTR&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ppszAssociation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A little further down was the declaration for the concrete class (coclass) and its associated class id (CLSID):&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// CLSID_ApplicationAssociationRegistration&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;uuid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;591209&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c7&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;767&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;42&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;9f&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ba&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ee4615f2c7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;coclass&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ApplicationAssociationRegistration&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IApplicationAssociationRegistration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In the IDL, we also see the definitions for the enums that the functions use:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v1_enum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tagASSOCIATIONLEVEL&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;AL_MACHINE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;AL_EFFECTIVE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;AL_USER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ASSOCIATIONLEVEL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v1_enum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tagASSOCIATIONTYPE&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;AT_FILEEXTENSION&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;AT_URLPROTOCOL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;AT_STARTMENUCLIENT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;AT_MIMETYPE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ASSOCIATIONTYPE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Getting this to work in .NET was surprisingly easy. The basic idea is that the CLR has to have just enough information to find the types:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The “&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.comimportattribute.aspx&quot;&gt;ComImportAttribute&lt;/a&gt;” is almost as simple to use as &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.dllimportattribute.aspx&quot;&gt;DllImportAttribute&lt;/a&gt;. In addition, you need to use the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.guidattribute.aspx&quot;&gt;GuidAttribute&lt;/a&gt; to specify the gigantic GUIDs.&lt;/li&gt;
  &lt;li&gt;You use the “&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.interfacetypeattribute.aspx&quot;&gt;InterfaceTypeAttribute&lt;/a&gt;” to specify the basic interface(s) that the interface you’re importing uses. In COM, all interfaces derive from &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms680509.aspx&quot;&gt;IUnknown&lt;/a&gt;. If the interface supports scripting then it implements &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms221608.aspx&quot;&gt;IDispatch&lt;/a&gt;. If you provide a speedy C++ way of accessing your interface (e.g. &lt;a href=&quot;http://blogs.msdn.com/oldnewthing/archive/2004/02/05/68017.aspx&quot;&gt;vtable definition&lt;/a&gt;) and the scripting IDispatch interface, you’ve got a “&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/aa366807%28VS.85%29.aspx&quot;&gt;dual&lt;/a&gt;” interface.&lt;/li&gt;
  &lt;li&gt;You need to translate the parameter types to their .NET equivalents. This is an incredibly mechanical process that’s straightforward. If there is a chance that the underlying bits are different between COM and .NET (e.g. they’re not &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/75dwhxf7.aspx&quot;&gt;blittable&lt;/a&gt;) then you need to use the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.marshalasattribute.aspx&quot;&gt;MarshalAsAttribute&lt;/a&gt; to tell the CLR how to convert the types as necessary.&lt;/li&gt;
  &lt;li&gt;You need to remember that COM handles errors by returning HRESULTs instead of natively using exceptions like .NET uses. By default, the CLR will make the last parameter that is an OUT parameter in the IDL to be the return value (it helps if it’s marked by “retval”). Therefore, you can act as if the function really returns its last parameter and the CLR will automatically check the HRESULT and throw a corresponding .NET exception as needed.&lt;/li&gt;
  &lt;li&gt;Optionally, and perhaps most controversially, you’re free de-&lt;a href=&quot;http://en.wikipedia.org/wiki/Hungarian_notation&quot;&gt;Hungarianize&lt;/a&gt; the parameter names and PascalCase the enum names to make them much more friendly looking to people in .NET. It’s optional since it might confuse people that use MSDN documentation and expecting the original names.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In a minute or so, I translated the definitions and gladly got rid of the Hungarian prefixes by converting parameter names of “pszQuery” to just “query.” I also converted all the enums and removed their unnecessary prefixes. The end result was this:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;na&quot;&gt;[ComImport]&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;[Guid(&amp;quot;4e530b0a-e611-4c77-a3ac-9031d022281b&amp;quot;)]&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;internal&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IApplicationAssociationRegistration&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;   
&lt;span class=&quot;na&quot;&gt; [return: MarshalAs(UnmanagedType.LPWStr)]&lt;/span&gt;
 &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;QueryCurrentDefault&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MarshalAs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UnmanagedType&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LPWStr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                           &lt;span class=&quot;n&quot;&gt;AssociationType&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryType&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                           &lt;span class=&quot;n&quot;&gt;AssociationLevel&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryLevel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt; [return: MarshalAs(UnmanagedType.Bool)]&lt;/span&gt;
 &lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;QueryAppIsDefault&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MarshalAs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UnmanagedType&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LPWStr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                        &lt;span class=&quot;n&quot;&gt;AssociationType&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryType&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                        &lt;span class=&quot;n&quot;&gt;AssociationLevel&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryLevel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;                        [MarshalAs(UnmanagedType.LPWStr)]&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;appRegistryName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt; [return: MarshalAs(UnmanagedType.Bool)]&lt;/span&gt;
 &lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;QueryAppIsDefaultAll&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AssociationLevel&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryLevel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;                           [MarshalAs(UnmanagedType.LPWStr)]&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;appRegistryName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SetAppAsDefault&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MarshalAs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UnmanagedType&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LPWStr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;appRegistryName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;                      [MarshalAs(UnmanagedType.LPWStr)]&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                      &lt;span class=&quot;n&quot;&gt;AssociationType&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;setType&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SetAppAsDefaultAll&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MarshalAs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UnmanagedType&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LPWStr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;appRegistryName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;ClearUserAssociations&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Importing the concrete class that implements the interface was just a matter of specifying its CLSID:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;na&quot;&gt;[ComImport]&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;[Guid(&amp;quot;591209c7-767b-42b2-9fba-44ee4615f2c7&amp;quot;)]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;internal&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ApplicationAssociationRegistration&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
 &lt;span class=&quot;c1&quot;&gt;// coclass is implemented by the runtime callable wrapper&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With all of that goo out of the way, you can use the interface like a normal .NET type:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;kt&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;aa&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ApplicationAssociationRegistration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iaar&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IApplicationAssociationRegistration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;myCurrentMp3Player&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iaar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;QueryCurrentDefault&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;.mp3&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AssociationType&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FileExtension&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AssociationLevel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Effective&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Behind the scenes, the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/8bwh56xe.aspx&quot;&gt;runtime callable wrapper&lt;/a&gt; has to do something like this:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Load in ole32.dll where COM functions reside.&lt;/li&gt;
  &lt;li&gt;Call &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms678543.aspx&quot;&gt;CoInitialize&lt;/a&gt; to initialize COM.&lt;/li&gt;
  &lt;li&gt;Look up your CLSID and IID in the registry under HKEY_CLASSES_ROOT and find their associated DLL (in our case, “shell32.dll”)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms684007%28VS.85%29.aspx&quot;&gt;Create a factory&lt;/a&gt; for your class.&lt;/li&gt;
  &lt;li&gt;Use the factory to &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms682215%28VS.85%29.aspx&quot;&gt;create an instance&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Call &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ms682521%28VS.85%29.aspx&quot;&gt;QueryInterface&lt;/a&gt; to get the specific interface we want (e.g. IApplicationAssociationRegistration)&lt;/li&gt;
  &lt;li&gt;Get a pointer to the function we want using the &lt;a href=&quot;http://blogs.msdn.com/oldnewthing/archive/2004/02/05/68017.aspx&quot;&gt;vtable&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After all that, we &lt;em&gt;finally&lt;/em&gt; have a place to jump to like we did with P/Invoke.&lt;/p&gt;

&lt;p&gt;Why bother with all of this? One reason is that Microsoft has a huge legacy investment in C and C++ in Windows. There’s no compelling reason for them to rewrite things in .NET. A natural consequence is that the C++ code that implements their latest APIs will be exposed using COM for the foreseeable future. Recently, Microsoft has gone ahead and published .NET COM &lt;a href=&quot;http://code.msdn.microsoft.com/Windows7Taskbar/Release/ProjectReleases.aspx?ReleaseId=2246&quot;&gt;wrappers&lt;/a&gt; for some of the popular new APIs like the &lt;a href=&quot;http://windowsteamblog.com/blogs/developers/archive/2009/04/23/consuming-the-contents-of-windows-7-libraries.aspx&quot;&gt;Libraries feature&lt;/a&gt; in Windows 7. With just a little work, you don’t have to wait on Microsoft to do this for you.&lt;/p&gt;

&lt;p&gt;Given that .NET was designed as a successor to COM, it’s no surprise that Microsoft has made interoperability with it very seamless. The &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/8bwh56xe.aspx&quot;&gt;runtime callable wrapper&lt;/a&gt; does a good job of hiding most of the messier details. The garbage collector handles much of the bookkeeping involved with memory management that used to be the bane of COM programming. The runtime is very aware of typical COM semantics of when to allocate and free memory. It’s not always perfect. Sometimes you can be pre-emptive and force your COM object to be cleaned up via &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.marshal.releasecomobject.aspx&quot;&gt;Marshal.ReleaseComObject&lt;/a&gt; so you don’t have to wait on the garbage collector, but you should &lt;a href=&quot;http://blogs.msdn.com/cbrumme/archive/2003/04/16/51355.aspx&quot;&gt;be careful&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I just presented the basics of what I learned to get my job done. There’s a lot more out there for more advanced scenarios. I’ve found the &lt;a href=&quot;http://www.theserverside.net/tt/articles/showarticle.tss?id=ComAndDotNetInterop_Book&quot;&gt;free book&lt;/a&gt; “COM and .NET Interop” by Andrew Troelsen to be helpful.&lt;/p&gt;

&lt;p&gt;There’s plenty of obscure Windows APIs out there for the taking. Enjoy!&lt;/p&gt;
</description>
        <pubDate>Fri, 24 Apr 2009 08:37:00 +0000</pubDate>
        <link>http://www.moserware.com/2009/04/using-obscure-windows-com-apis-in-net.html</link>
        <guid isPermaLink="true">http://www.moserware.com/2009/04/using-obscure-windows-com-apis-in-net.html</guid>
        
        
      </item>
    
      <item>
        <title>How .NET Regular Expressions Really Work</title>
        <description>&lt;p&gt;Remember when you first tried to parse text?&lt;/p&gt;

&lt;p&gt;My early BASIC programs were littered with &lt;code&gt;IF&lt;/code&gt; statements that dissected strings using &lt;code&gt;LEFT$&lt;/code&gt;, &lt;code&gt;RIGHT$&lt;/code&gt;, &lt;code&gt;MID$&lt;/code&gt;, &lt;code&gt;TRIM$&lt;/code&gt;, and &lt;code&gt;UCASE$&lt;/code&gt;. It took me hours to write a program that parsed a simple text file. Just trying to support whitespace and mixed casing was enough to drive me crazy.&lt;/p&gt;

&lt;p&gt;Years later when I started programming in Java, I discovered the &lt;a href=&quot;http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html&quot;&gt;StringTokenizer&lt;/a&gt; class. I thought it was a huge leap forward. I no longer had to worry about whitespace. However, I still had to use functions like “substring” and “toUpperCase”, but I thought that was as good as it could get.&lt;/p&gt;

&lt;p&gt;And then one day I found &lt;a href=&quot;http://www.regular-expressions.info/quickstart.html&quot;&gt;regular&lt;/a&gt; &lt;a href=&quot;http://www.dijksterhuis.org/regular-expressions-in-csharp-the-basics/&quot;&gt;expressions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I almost cried when I realized that I could replace parsing code that took me hours to write with a simple regular expression. It still took me several years to become comfortable with &lt;a href=&quot;http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/&quot;&gt;the syntax&lt;/a&gt;, but the learning curve was worth the power obtained.&lt;/p&gt;

&lt;p&gt;And yet with all of this love, I still had this nagging suspicion that I was doing it wrong. After &lt;a href=&quot;http://www.moserware.com/2009/01/wetware-refactorings.html&quot;&gt;reading Pragmatic Thinking and Learning&lt;/a&gt;, I was determined to try to imagine what life was like inside the code I wrote. But I just couldn’t connect with a regular expression.&lt;/p&gt;

&lt;p&gt;The last straw came recently when I was trying to help a &lt;a href=&quot;http://www.aaronlerch.com/blog/&quot;&gt;coworker&lt;/a&gt; craft a regex to properly handle name/value string pairs with escaped strings. In the end, our regex worked, but I felt that it was duct-taped together. I knew there was a better way.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/0596528124?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0596528124&quot;&gt;&lt;img style=&quot;margin: 20px&quot; src=&quot;/assets/how-net-regular-expressions-really-work/masteringregex_200.jpg&quot; align=&quot;left&quot; /&gt;&lt;/a&gt; I picked up a copy of Jeffrey Friedl’s book “&lt;a href=&quot;http://www.amazon.com/gp/product/0596528124?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0596528124&quot;&gt;Mastering Regular Expressions&lt;/a&gt;” and couldn’t put it down. In less than a week, I had flown through 400+ pages and had finally started to feel like I understood how regular expressions worked. I finally had a sense for what backtracking really meant and I had a better idea for how a regex could go &lt;a href=&quot;http://www.regular-expressions.info/catastrophic.html&quot;&gt;catastrophically&lt;/a&gt; out of control.&lt;/p&gt;

&lt;p&gt;I had extremely high hopes for chapter 9 which covered the .NET regular expression “flavor.” Since I work with .NET every day, I thought this would be the best chapter. I did learn a few things like how to properly use &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx&quot;&gt;RegexOptions.ExplicitCapture&lt;/a&gt;, how to use the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/ewy2t5e0.aspx&quot;&gt;special per-match replacement sequences&lt;/a&gt; that &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace.aspx&quot;&gt;Regex.Replace&lt;/a&gt; offers, how to &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.compiletoassembly.aspx&quot;&gt;save compiled regular expressions to a DLL&lt;/a&gt;, and how to &lt;a href=&quot;http://weblogs.asp.net/whaggard/archive/2005/02/20/377025.aspx&quot;&gt;match balanced parentheses&lt;/a&gt; – a feat that’s theoretically not possible with a regex. Despite learning all of this in the chapter, I still didn’t feel that I could “connect” with the very .NET regular expression engine that I know and love.&lt;/p&gt;

&lt;p&gt;To be fair, the vast benefit of the book comes from the first six chapters that deal with how regular expressions work &lt;em&gt;in general&lt;/em&gt; since regex implementations share many ideas. The book laid a solid foundation, but I wanted more.&lt;/p&gt;

&lt;p&gt;I wanted to stop all my hand-waving at regular expressions and actually understand how they &lt;em&gt;really&lt;/em&gt; work.&lt;/p&gt;

&lt;p&gt;I knew I wanted to drill into the code. Although tools like &lt;a href=&quot;http://www.red-gate.com/products/reflector/&quot;&gt;Reflector&lt;/a&gt; are amazing, I knew I wanted to see the actual code. It’s fairly easy now to &lt;a href=&quot;http://weblogs.asp.net/scottgu/archive/2008/01/16/net-framework-library-source-code-now-available.aspx&quot;&gt;step into the framework source code&lt;/a&gt; in the debugger. Unlike &lt;a href=&quot;http://www.moserware.com/2008/09/how-do-locks-lock.html&quot;&gt;understanding the details of locking&lt;/a&gt;, which had me dive into C++ and x86 assembly, it was refreshing to see that the .NET regular expression engine was written entirely in C#.&lt;/p&gt;

&lt;p&gt;I decided to use a really simple regular expression and search string and then follow it from cradle to grave. If you’d like to follow along at home, I’ve &lt;a href=&quot;https://web.archive.org/web/20090402061020id_/http://www.koders.com/csharp/fid7F5AE3CBB76E9E51E24DEA0DB54B86C173369E88.aspx&quot;&gt;linked&lt;/a&gt; to relevant lines in the .NET regular expression source code.&lt;/p&gt;

&lt;p&gt;My very simple regex consisted of looking for a basic URL:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;textToSearch&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Welcome to http://www.moserware.com/!&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regexPattern&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;@&amp;quot;http://([^\s/]+)/?&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Match&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Regex&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Match&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;textToSearch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regexPattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; 
&lt;span class=&quot;n&quot;&gt;Console&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WriteLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Full uri = &amp;#39;{0}&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Console&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WriteLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Host =&amp;#39;{0}&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Groups&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Our journey begins at &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/0z2heewz.aspx&quot;&gt;Regex.Match&lt;/a&gt; where we &lt;a href=&quot;https://web.archive.org/web/20090402061020id_/http://www.koders.com/csharp/fid7F5AE3CBB76E9E51E24DEA0DB54B86C173369E88.aspx#L113&quot;&gt;checking an internal cache&lt;/a&gt; of the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.cachesize.aspx&quot;&gt;past 15&lt;/a&gt; regex values to see if there a match for:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&quot;0:ENU:http://([^\\s/]+)/?&quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is a compact representation of:&lt;/p&gt;

&lt;pre&gt;&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx&quot;&gt;RegexOptions&lt;/a&gt; : &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterwindowslanguagename.aspx&quot;&gt;Culture&lt;/a&gt; : Regex pattern&lt;/pre&gt;

&lt;p&gt;The regex doesn’t find this in the cache, so it starts &lt;a href=&quot;https://web.archive.org/web/20090402013015id_/http://www.koders.com/csharp/fidC88A6970F260F6826C679E703634322F3C553827.aspx#L224&quot;&gt;scanning the pattern&lt;/a&gt;. Note that &lt;a href=&quot;https://web.archive.org/web/20090402013015id_/http://www.koders.com/csharp/fidC88A6970F260F6826C679E703634322F3C553827.aspx#L21&quot;&gt;out of respect for the authors&lt;/a&gt;, our regex pattern doesn’t have any comments or whitespace in it:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// It would be nice to get rid of the comment modes, since the &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ScanBlank() calls are just kind of duct-taped in.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We &lt;a href=&quot;https://web.archive.org/web/20090402013015id_/http://www.koders.com/csharp/fidC88A6970F260F6826C679E703634322F3C553827.aspx#L265&quot;&gt;start&lt;/a&gt; creating an internal tree representation of the regex by &lt;a href=&quot;https://web.archive.org/web/20090402013015id_/http://www.koders.com/csharp/fidC88A6970F260F6826C679E703634322F3C553827.aspx#L1869&quot;&gt;adding&lt;/a&gt; a multi-character (aka “&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L83&quot;&gt;Multi&lt;/a&gt;”) node to contain the “http://” part. Next, we see that the scanner made it to first real capture:&lt;/p&gt;

&lt;pre&gt;http://&lt;span style=&quot;font-weight: bold&quot;&gt;([^\s/]+)&lt;/span&gt;/?&lt;/pre&gt;

&lt;p&gt;This capture contains a &lt;a href=&quot;https://web.archive.org/web/20090402012958id_/http://www.koders.com/csharp/fid14AB8BA02EE8A6DBA830F1DCC147C2B17F0B3DE0.aspx&quot;&gt;character class&lt;/a&gt; that says that we don’t want to match spaces or a forward slash. It is converted into an obscure five character string:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&quot;\x1\x2\x1\x2F\x30\x64&quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Later we’ll see why it had to all fit in one string, but for now we can use a &lt;a href=&quot;https://web.archive.org/web/20090402012958id_/http://www.koders.com/csharp/fid14AB8BA02EE8A6DBA830F1DCC147C2B17F0B3DE0.aspx#L16&quot;&gt;helpful comment&lt;/a&gt; to decode each character:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Offset&lt;/th&gt;
      &lt;th&gt;Hex Value&lt;/th&gt;
      &lt;th&gt;Meaning&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0x01&lt;/td&gt;
      &lt;td&gt;The set should be negated&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;0x02&lt;/td&gt;
      &lt;td&gt;There are two characters in the character part of the set&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;0x01&lt;/td&gt;
      &lt;td&gt;There is one Unicode category&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;0x2F&lt;/td&gt;
      &lt;td&gt;Inclusive lower-bound of the character set. It’s a ‘/’ in Unicode&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;0x30&lt;/td&gt;
      &lt;td&gt;Exclusive upper-bound of the character set. It’s a ‘0’ in Unicode&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;5&lt;/td&gt;
      &lt;td&gt;0x64&lt;/td&gt;
      &lt;td&gt;This is a magic number that means the “&lt;a href=&quot;https://web.archive.org/web/20090402012958id_/http://www.koders.com/csharp/fid14AB8BA02EE8A6DBA830F1DCC147C2B17F0B3DE0.aspx#L63&quot;&gt;Space&lt;/a&gt;” category.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Before I realized that this string had meaning, I was utterly confused.&lt;/p&gt;

&lt;p&gt;As we continue scanning, we &lt;a href=&quot;https://web.archive.org/web/20090402013015id_/http://www.koders.com/csharp/fidC88A6970F260F6826C679E703634322F3C553827.aspx#L373&quot;&gt;find a ‘+’ quantifier&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;http://([^\\s/]&lt;span style=&quot;font-size:180%; font-weight:bold;&quot;&gt;+&lt;/span&gt;)/?&lt;/pre&gt;

&lt;p&gt;This is noted as a &lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L71&quot;&gt;Oneloop&lt;/a&gt; node since it’s a “loop” of what came before (e.g. the character class set). It has arguments of 1 and &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.int32.maxvalue.aspx&quot;&gt;Int32.MaxValue&lt;/a&gt; to denote 1 or more matches. We see that the next character isn’t a ‘?’, so we can assert this is &lt;a href=&quot;https://web.archive.org/web/20090402013015id_/http://www.koders.com/csharp/fidC88A6970F260F6826C679E703634322F3C553827.aspx#L406&quot;&gt;not a lazy match&lt;/a&gt; which means it’s a &lt;a href=&quot;http://en.wikipedia.org/wiki/Regular_expression#Lazy_quantification&quot;&gt;greedy&lt;/a&gt; match.&lt;/p&gt;

&lt;p&gt;The first &lt;a href=&quot;https://web.archive.org/web/20090402013015id_/http://www.koders.com/csharp/fidC88A6970F260F6826C679E703634322F3C553827.aspx#L301&quot;&gt;group is recorded&lt;/a&gt; when we hit the ‘)’ character. At the end of the pattern, we note a &lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L79&quot;&gt;One&lt;/a&gt; (character) node for the ‘/’ and we see it’s followed by a ‘&lt;a href=&quot;https://web.archive.org/web/20090402013015id_/http://www.koders.com/csharp/fidC88A6970F260F6826C679E703634322F3C553827.aspx#L368&quot;&gt;?&lt;/a&gt;’ which is just &lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L551&quot;&gt;another quantifier&lt;/a&gt;, this time with a minimum of 0 and a maximum of 1.&lt;/p&gt;

&lt;p&gt;All those nodes come together to give us this “&lt;a href=&quot;https://web.archive.org/web/20090429025846id_/http://www.koders.com/csharp/fidECC3E02EC33C0F92A3E24574A0673340C8A22B2C.aspx&quot;&gt;RegexTree&lt;/a&gt;:”&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/how-net-regular-expressions-really-work/RegexParseTree.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We still need to &lt;a href=&quot;https://web.archive.org/web/20090402061020id_/http://www.koders.com/csharp/fid7F5AE3CBB76E9E51E24DEA0DB54B86C173369E88.aspx#L133&quot;&gt;convert the tree to code&lt;/a&gt; that the regular expression “machine” can execute later. The bulk of the work is done by an aptly named &lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L223&quot;&gt;RegexCodeFromRegexTree&lt;/a&gt; function that has a decent &lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L212&quot;&gt;comment&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;cm&quot;&gt;/*&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * The top level RegexCode generator. It does a depth-first walk &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * through the tree and calls EmitFragment to emits code before &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * and after each child of an interior node, and at each leaf. &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * It runs two passes, first to count the size of the generated &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * code, and second to generate the code. &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * &amp;lt;CONSIDER&amp;gt;we need to time it against the alternative, which is &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * to just generate the code and grow the array as we go.&amp;lt;/CONSIDER&amp;gt;;&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; */&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I love the anonymous “CONSIDER” comment and would have had a similar reaction. Instead of using an &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.collections.arraylist.aspx&quot;&gt;ArrayList&lt;/a&gt; or &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx&quot;&gt;List&lt;/a&gt;&amp;lt;int&amp;gt; to store the op codes, which can automatically resize as needed, the code diligently goes through the entire RegexTree &lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L248&quot;&gt;&lt;em&gt;twice&lt;/em&gt;&lt;/a&gt;. The class is peppered with “&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L126&quot;&gt;if(_counting)&lt;/a&gt;” expressions that just increase a counter by the size they will use in the next pass.&lt;/p&gt;

&lt;p&gt;As predicted by the comment, the bulk of the work is done by the 250 line switch statement that makes up the &lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L305&quot;&gt;EmitFragment function&lt;/a&gt;. This function breaks up RegexTree “fragments” and converts them to a simpler &lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx&quot;&gt;RegexCode&lt;/a&gt;. The first fragment is:&lt;/p&gt;

&lt;pre&gt;
EmitFragment(nodetype=&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L450&quot;&gt;RegexNode.Capture | BeforeChild&lt;/a&gt;, 
             node=[&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L113&quot;&gt;RegexNode.Capture&lt;/a&gt;, Group=0, Length=-1], 
             childIndex=0)&lt;/pre&gt;

&lt;p&gt;This is shorthand for emitting the RegexCode that should come before the children of the top level “&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L113&quot;&gt;RegexNode.Capture&lt;/a&gt;” node that represents group 0 and that goes until the end of the string (e.g. has length -1). The last 0 means that it’s the 0th child of the parent node (this is sort of meaningless since it has no parent). The subsequent calls walk the rest of the tree:&lt;/p&gt;

&lt;pre&gt;
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L321&quot;&gt;RegexNode.Concatenate | BeforeChild&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L108&quot;&gt;RegexNode.Concatenate&lt;/a&gt;], childIndex=0)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L522&quot;&gt;RegexNode.Multi&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L83&quot;&gt;RegexNode.Multi&lt;/a&gt;, string=&quot;http://&quot;], childIndex=0)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L322&quot;&gt;RegexNode.Concatenate | AfterChild&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L108&quot;&gt;RegexNode.Concatenate&lt;/a&gt;], childIndex=0)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L321&quot;&gt;RegexNode.Concatenate | BeforeChild&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L108&quot;&gt;RegexNode.Concatenate&lt;/a&gt;], childIndex=1)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L450&quot;&gt;RegexNode.Capture | BeforeChild&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L113&quot;&gt;RegexNode.Capture&lt;/a&gt;, Group=1, -1], childIndex=0)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L513&quot;&gt;RegexNode.SetLoop&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L73&quot;&gt;RegexNode.SetLoop&lt;/a&gt;, min=1, max=Int32.MaxValue], childIndex=0)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L454&quot;&gt;RegexNode.Capture | AfterChild&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L113&quot;&gt;RegexNode.Capture&lt;/a&gt;, Group=1, Length=-1], childIndex=0)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L322&quot;&gt;RegexNode.Concatenate | AfterChild&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L108&quot;&gt;RegexNode.Concatenate&lt;/a&gt;], childIndex=1)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L321&quot;&gt;RegexNode.Concatenate | BeforeChild&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L108&quot;&gt;RegexNode.Concatenate&lt;/a&gt;], childIndex=2)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L503&quot;&gt;RegexNode.Oneloop&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L71&quot;&gt;RegexNode.Oneloop&lt;/a&gt;, min=0, max=1, character=&#39;/&#39;], childIndex=0)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L322&quot;&gt;RegexNode.Concatenate | AfterChild&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L108&quot;&gt;RegexNode.Concatenate&lt;/a&gt;], childIndex=2)
EmitFragment(&lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L454&quot;&gt;RegexNode.Capture | AfterChild&lt;/a&gt;, [&lt;a href=&quot;https://web.archive.org/web/20090401224614id_/http://www.koders.com/csharp/fid0C0231291E4A7914C135C0A730D5A85182F872EB.aspx#L113&quot;&gt;RegexNode.Capture&lt;/a&gt;, Group=0, Length=-1], childIndex=0)
&lt;/pre&gt;

&lt;p&gt;The reward for all this work is an integer array that describes the RegexCode “op codes” and their arguments. You can see that some instructions like “&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L43&quot;&gt;Setrep&lt;/a&gt;” take a string argument. These arguments point to offsets in a string table. This is why it was critical to pack everything about a set into the obscure string we saw earlier. It was the only way to pass that information to the instruction.&lt;/p&gt;

&lt;p&gt;Decoding the code array, we see:&lt;/p&gt;

&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;2&quot; border=&quot;1&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;92&quot;&gt;Index&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;Instruction&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;148&quot;&gt;Op Code/Argument&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;158&quot;&gt;String Table Reference&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot;&gt;Description&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;93&quot;&gt;0&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L73&quot;&gt;Lazybranch&lt;/a&gt;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;148&quot;&gt;23&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;158&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot; rowspan=&quot;2&quot;&gt;Lazily branch to the &lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L91&quot;&gt;Stop&lt;/a&gt; instruction at offset 21.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;93&quot;&gt;1&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;148&quot;&gt;21&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;157&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;93&quot;&gt;2&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L81&quot;&gt;Setmark&lt;/a&gt;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;31&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;156&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot;&gt;Push our current state onto a stack in case we need to backtrack later.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;93&quot;&gt;3&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L57&quot;&gt;Multi&lt;/a&gt;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;12&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;156&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot; rowspan=&quot;2&quot;&gt;Perform a multi-character match of string table item 0 which is &#39;http://&#39;.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;93&quot;&gt;4&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;139&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;0&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;156&quot;&gt;&quot;http://&quot;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;93&quot;&gt;5&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L81&quot;&gt;Setmark&lt;/a&gt;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;31&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;156&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot;&gt;Push our current state onto a stack in case we need to backtrack later.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;6&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L43&quot;&gt;Setrep&lt;/a&gt;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;2&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot; rowspan=&quot;3&quot;&gt;Perform a set repetition match of length 1 on the set stored at string table position 1, which represents [^\s/].&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;7&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;1&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;quot;\x1\x2\x1\x2F\x30\x64&amp;quot;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;8&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;1&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;9&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L47&quot;&gt;Setloop&lt;/a&gt;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;5&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot; rowspan=&quot;3&quot;&gt;Match the set [^\s/] in a loop at most Int32.MaxValue times.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;10&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;1&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;quot;\x1\x2\x1\x2F\x30\x64&amp;quot;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;11&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;2147483647&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;12&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L82&quot;&gt;Capturemark&lt;/a&gt;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;32&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot; rowspan=&quot;3&quot;&gt;Capture into group #1, the string between the mark set by the last Setmark and the current position.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;13&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;1&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;14&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;-1&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;15&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L45&quot;&gt;Oneloop&lt;/a&gt;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;3&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot; rowspan=&quot;3&quot;&gt;Match Unicode character 47 (a &#39;/&#39;) in a loop for a maximum of 1 time.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;16&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;47&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;17&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;1&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;18&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L82&quot;&gt;Capturemark&lt;/a&gt;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;32&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot; rowspan=&quot;3&quot;&gt;Capture into group #0, the contents between the first Setmark instruction and the current position.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;19&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;0&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;20&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;138&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;149&quot;&gt;-1&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;155&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot; width=&quot;94&quot;&gt;21&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;139&quot;&gt;&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx#L91&quot;&gt;Stop&lt;/a&gt;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;151&quot;&gt;40&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;160&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot; width=&quot;174&quot;&gt;Stop the regex.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;We can now see that our regex has turned into a simple “program” that will be executed later.&lt;/p&gt;

&lt;h2 id=&quot;prefix-optimizations&quot;&gt;Prefix Optimizations&lt;/h2&gt;

&lt;p&gt;We could stop here, but we’d miss the fun “optimizations.” With our pattern and search string, the optimizations will actually slow things down, but the code generator is oblivious to that. The basic idea behind prefix optimizations is to quickly jump to where the match &lt;em&gt;might&lt;/em&gt; start. It does this by &lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L289&quot;&gt;using&lt;/a&gt; a &lt;a href=&quot;https://web.archive.org/web/20090402044607id_/http://www.koders.com/csharp/fid4F4894401F2873F7C00CC3CF96F851ED3ED10D69.aspx&quot;&gt;RegexFCD&lt;/a&gt; class that I’m guessing stands for “Regex First Character Descriptor.”&lt;/p&gt;

&lt;p&gt;With our regex, the &lt;a href=&quot;https://web.archive.org/web/20090402044607id_/http://www.koders.com/csharp/fid4F4894401F2873F7C00CC3CF96F851ED3ED10D69.aspx#L53&quot;&gt;FirstChars&lt;/a&gt; functions &lt;a href=&quot;https://web.archive.org/web/20090402044607id_/http://www.koders.com/csharp/fid4F4894401F2873F7C00CC3CF96F851ED3ED10D69.aspx#L462&quot;&gt;notices our “http://” ‘Multi’ node&lt;/a&gt; and &lt;a href=&quot;https://web.archive.org/web/20090402044607id_/http://www.koders.com/csharp/fid4F4894401F2873F7C00CC3CF96F851ED3ED10D69.aspx#L524&quot;&gt;determines&lt;/a&gt; that any match must start with an ‘h’. If we had alternations, the first character of each alternation would be added to make a limited set of potential first characters. With this optimization alone, we can skip all characters in the text that aren’t in this approved “white list” of first characters without having to execute any of the above RegexCode.&lt;/p&gt;

&lt;p&gt;But wait… there’s an even trickier optimization! The optimizer &lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L295&quot;&gt;discovers&lt;/a&gt; that the first thing the regex must match is a simple string literal: &lt;a href=&quot;https://web.archive.org/web/20090402044607id_/http://www.koders.com/csharp/fid4F4894401F2873F7C00CC3CF96F851ED3ED10D69.aspx#L106&quot;&gt;a ‘Multi’ node&lt;/a&gt;. This means that we can use the &lt;a href=&quot;https://web.archive.org/web/20090402044617id_/http://www.koders.com/csharp/fidA16EF1E737BCF735FD1DE4D39E0E1AD9851FC2A7.aspx&quot;&gt;RegexBoyerMoore&lt;/a&gt; class which applies the &lt;a href=&quot;http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm&quot;&gt;Boyer-Moore&lt;/a&gt; search algorithm.&lt;/p&gt;

&lt;p&gt;The key insight is that we don’t have to check each character of the text. We only need to look at last character to see if it’s even worth checking the rest.&lt;/p&gt;

&lt;p&gt;For example, if our sample text is “Welcome to http://www.moserware.com/!” and we’re searching for “http://” which is 7 characters, we first look at the 7th character of the text which is ‘e’. Since ‘e’ is not the 7th character of what we’re looking for (which is a ‘/’), we know that there couldn’t possibly be a match and so we don’t need to bother checking all previous 6 characters because there isn’t even an ‘e’ in what we’re looking for. The tricky part is what to do if the what we find &lt;em&gt;is&lt;/em&gt; in the string that we’re trying to match, but it isn’t the last ‘/’ character.&lt;/p&gt;

&lt;p&gt;The specifics are handled in &lt;a href=&quot;https://web.archive.org/web/20090402044617id_/http://www.koders.com/csharp/fidA16EF1E737BCF735FD1DE4D39E0E1AD9851FC2A7.aspx#L91&quot;&gt;straightforward&lt;/a&gt; &lt;a href=&quot;https://web.archive.org/web/20090402044617id_/http://www.koders.com/csharp/fidA16EF1E737BCF735FD1DE4D39E0E1AD9851FC2A7.aspx#L166&quot;&gt;way&lt;/a&gt; with some minor optimizations to reduce memory needs given 65,000+ possible Unicode characters. For each character, the maximum possible skip is calculated.&lt;/p&gt;

&lt;p&gt;For “http://”, we come up with this skip table:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Character&lt;/th&gt;
      &lt;th&gt;Characters to skip ahead&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;/&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;:&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;h&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;p&lt;/td&gt;
      &lt;td&gt;3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;t&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;all others&lt;/td&gt;
      &lt;td&gt;7&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This table tells us that if we find an ‘e’ then we can skip ahead 7 characters without even checking the previous 6 characters. If we find a ‘p’, then we can skip ahead at least 3 characters before performing a full check, and if we find a ‘/’ then we could be on the last character and need to check other characters (e.g. skip ahead 0).&lt;/p&gt;

&lt;p&gt;There is one more optimization that &lt;a href=&quot;https://web.archive.org/web/20090402044607id_/http://www.koders.com/csharp/fid4F4894401F2873F7C00CC3CF96F851ED3ED10D69.aspx#L133&quot;&gt;looks for anchors&lt;/a&gt;, but none apply to our regex, so it’s ignored.&lt;/p&gt;

&lt;p&gt;We’re done! We made it to the end of the &lt;a href=&quot;https://web.archive.org/web/20090401224618id_/http://www.koders.com/csharp/fidC6F7EB8E11A6BF080CFA281BEEE003B5FAAB4AD9.aspx#L302&quot;&gt;RegexWriter phase&lt;/a&gt;. The “&lt;a href=&quot;https://web.archive.org/web/20090401224645id_/http://www.koders.com/csharp/fidF4B2B64D471D5B7401063DE2054CB33F28BDA026.aspx&quot;&gt;RegexCode&lt;/a&gt;” internal representation consists of these critical parts:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The regex code we created.&lt;/li&gt;
  &lt;li&gt;The string table derived from the regex that the code uses (e.g. our “Multi” and “Setrep” instructions have string table references).&lt;/li&gt;
  &lt;li&gt;The maximum size of our backtracking stack. (Ours is 7, this will make more sense later.)&lt;/li&gt;
  &lt;li&gt;A mapping of named captures to their group numbers. (We don’t have any in our regex, so this is empty.)&lt;/li&gt;
  &lt;li&gt;The total number of captures. (We have 2.)&lt;/li&gt;
  &lt;li&gt;The RegexBoyerMoore prefix that we calculated. (This applies to us since we have a string literal at the start.)&lt;/li&gt;
  &lt;li&gt;The possible first characters in our prefix. (In our case, we calculated this to be an ‘h’.)&lt;/li&gt;
  &lt;li&gt;Our anchors. (We don’t have any.)&lt;/li&gt;
  &lt;li&gt;An indicator whether this should be a RightToLeft match. (In our case, we use the default which is false.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every regex passes through this step. It applies to our measly regex with a code size of 21 as much as it does to a gnarly &lt;a href=&quot;http://tools.ietf.org/html/rfc2822#section-3.4.1&quot;&gt;RFC2822&lt;/a&gt; &lt;a href=&quot;http://www.regular-expressions.info/email.html&quot;&gt;compliant regex&lt;/a&gt; that has 175. These nine items completely describe &lt;em&gt;everything&lt;/em&gt; that we’ll do with our regex and they never change.&lt;/p&gt;

&lt;h2 id=&quot;in-need-of-an-interpreter&quot;&gt;In need of an interpreter&lt;/h2&gt;

&lt;p&gt;Now that we have the RegexCode, the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/twcw2f1c.aspx&quot;&gt;match&lt;/a&gt; method will &lt;a href=&quot;https://web.archive.org/web/20090402061020id_/http://www.koders.com/csharp/fid7F5AE3CBB76E9E51E24DEA0DB54B86C173369E88.aspx#L919&quot;&gt;run&lt;/a&gt; and create a &lt;a href=&quot;http://www.koders.com/csharp/fidABFA3D15F7A596443DCE29D6AE984F1192048031.aspx&quot;&gt;RegexRunner&lt;/a&gt; which is the “driver” for the regex matching process. Since we didn’t specify the “Compiled” flag, we’ll use the &lt;a href=&quot;http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx&quot;&gt;RegexInterpreter&lt;/a&gt; runner.&lt;/p&gt;

&lt;p&gt;Before the interpreter starts &lt;a href=&quot;http://www.koders.com/csharp/fidABFA3D15F7A596443DCE29D6AE984F1192048031.aspx#L81&quot;&gt;scanning&lt;/a&gt;, it &lt;a href=&quot;http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L364&quot;&gt;notices&lt;/a&gt; that we have a valid Boyer-Moore prefix optimization and it &lt;a href=&quot;https://web.archive.org/web/20090402044617id_/http://www.koders.com/csharp/fidA16EF1E737BCF735FD1DE4D39E0E1AD9851FC2A7.aspx#L269&quot;&gt;uses it&lt;/a&gt; to quickly locate the start of the regex:&lt;/p&gt;

&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;2&quot; border=&quot;1&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot;&gt;Index&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;0&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;1&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;2&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;3&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;4&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;5&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;6&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;7&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;8&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;9&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;10&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;11&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;12&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;13&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;14&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;15&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;16&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;17&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;18&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;19&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;20&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;21&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;22&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;23&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;24&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;25&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;26&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;27&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;28&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;29&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;30&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;31&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;32&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;33&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;34&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;35&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;36&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot;&gt;Character&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;W&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;e&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;l&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;c&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;o&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;m&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;e&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;t&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;o&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;h&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;t&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;t&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;p&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;:&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;/&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;/&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;w&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;w&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;w&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;.&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;m&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;o&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;s&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;e&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;r&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;w&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;a&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;r&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;e&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;.&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;c&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;o&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;m&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;/&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;!&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign=&quot;top&quot;&gt;Scan Order&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;1&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;9&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;8&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;2 &amp;amp; 7&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;6&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;5&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;4&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;3&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td valign=&quot;top&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;It first looks at the 7th character and finds an ‘e’ instead of the ‘/’ that it wanted. The skip table tells it that ‘e’ isn’t in any possible match, so it jumps ahead 7 more characters where it finds a ‘t’. The skip table &lt;a href=&quot;https://web.archive.org/web/20090402044617id_/http://www.koders.com/csharp/fidA16EF1E737BCF735FD1DE4D39E0E1AD9851FC2A7.aspx#L318&quot;&gt;tells it&lt;/a&gt; to jump ahead 4 more characters where it &lt;em&gt;finally&lt;/em&gt; finds the ‘/’ it wanted. It then &lt;a href=&quot;https://web.archive.org/web/20090402044617id_/http://www.koders.com/csharp/fidA16EF1E737BCF735FD1DE4D39E0E1AD9851FC2A7.aspx#L326&quot;&gt;verifies&lt;/a&gt; that this is the last character of our “http://” prefix. With a valid prefix found, we &lt;a href=&quot;https://web.archive.org/web/20090402044622id_/http://www.koders.com/csharp/fidABFA3D15F7A596443DCE29D6AE984F1192048031.aspx#L187&quot;&gt;prepare for a match&lt;/a&gt; in case we’re lucky and the rest of the regex matches.&lt;/p&gt;

&lt;p&gt;The bulk of the interpreter is in its “&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L403&quot;&gt;Go&lt;/a&gt;” method which is a 700 line switch statement that interprets the RegexCode we created earlier. The only interesting part is that the interpreter keeps two stacks to keep its state in case it needs to backtrack and abandon a path it took. The “run &lt;strong&gt;s&lt;/strong&gt;tack” records where in the search string an operation begins while the “run &lt;strong&gt;t&lt;/strong&gt;rack” records the RegexCode instruction that could potentially backtrack. Any time there is a chance that the interpreter could go down a wrong path, it pushes its state onto these stacks so that it can potentially try something else later.&lt;/p&gt;

&lt;p&gt;On our string, the following instructions execute:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L430&quot;&gt;Lazybranch&lt;/a&gt; - This is a branch that is “lazy.” It will only occur if we fail and have to backtrack to this instruction. In case there are problems, we push 11 (the string offset to the start of “http://”) onto the “run &lt;strong&gt;s&lt;/strong&gt;tack” and 0 (the RegexCode offset for this instruction) onto the “run &lt;strong&gt;t&lt;/strong&gt;rack.” The branch is to code offset 21 which is the “Stop” instruction.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L441&quot;&gt;Setmark&lt;/a&gt; - We save our position in case we have to backtrack.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L824&quot;&gt;Multi&lt;/a&gt; - A multi-character match. The string to match is at offset 0 in the string table (which is “http://”).&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L441&quot;&gt;Setmark&lt;/a&gt; - Another position save in case of a backtrack. Since the Multi code succeeded, we push our “run &lt;strong&gt;s&lt;/strong&gt;tack” offset of 18 (the start of “www.”) and our “run &lt;strong&gt;t&lt;/strong&gt;rack” code position of 5&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L883&quot;&gt;Setrep&lt;/a&gt; - Loads the “\x1\x2\x1\x2F\x30\x64” set representation at offset 1 in the string table that we calculated earlier. It reads an operand from the execution stack that we should verify that the set &lt;strong&gt;rep&lt;/strong&gt;eats exactly once. It calls &lt;a href=&quot;https://web.archive.org/web/20090402012958id_/http://www.koders.com/csharp/fid14AB8BA02EE8A6DBA830F1DCC147C2B17F0B3DE0.aspx#L815&quot;&gt;CharInClassRecursive&lt;/a&gt; that does the following:&lt;/li&gt;
  &lt;li&gt;It sees that the first character, ‘w’, is not in the character range [’/’, ‘0’). This check corresponds to the ‘/’ in the “[^\s/]” part of the regex.&lt;/li&gt;
  &lt;li&gt;It next tries &lt;a href=&quot;https://web.archive.org/web/20090402012958id_/http://www.koders.com/csharp/fid14AB8BA02EE8A6DBA830F1DCC147C2B17F0B3DE0.aspx#L874&quot;&gt;CharInCategory&lt;/a&gt; which notes that ‘w’ is part of the “LowercaseLetter” &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.globalization.unicodecategory.aspx&quot;&gt;UnicodeCategory&lt;/a&gt;. The magic number 0x64 in our set tells us to &lt;a href=&quot;https://web.archive.org/web/20090402012958id_/http://www.koders.com/csharp/fid14AB8BA02EE8A6DBA830F1DCC147C2B17F0B3DE0.aspx#L890&quot;&gt;do&lt;/a&gt; a &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/t809ektx.aspx&quot;&gt;Char.IsWhiteSpace&lt;/a&gt; check on it. This too fails.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Although both checks fail, the interpreter sees that it needs to &lt;a href=&quot;https://web.archive.org/web/20090402012958id_/http://www.koders.com/csharp/fid14AB8BA02EE8A6DBA830F1DCC147C2B17F0B3DE0.aspx#L828&quot;&gt;flip the result&lt;/a&gt; since it is a negated (^) set. This makes the character class match succeed.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L948&quot;&gt;Setloop&lt;/a&gt; - A “loop” instruction is like a “rep” one except that it isn’t forced to match anything. In our case, we see that we loop for a maximum of Int32.MaxValue times on the same set we saw in “Setrep.” Here you can see that the code generation phase turned the “+” in “[^\s/]+” of the regex into a Setrep of 1 followed by a Setloop. This is equivalent to “[^\s/][^\s/]*”. The loop keeps chomping characters until it finds the ‘/’ which causes it to call &lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L321&quot;&gt;BackwardNext()&lt;/a&gt; which sets the current position to just before the final ‘/’.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L470&quot;&gt;CaptureMark&lt;/a&gt; - Here we start capturing group 1 by popping the “run &lt;strong&gt;s&lt;/strong&gt;tack” which gives us 18. Our current offset is 35. We &lt;a href=&quot;https://web.archive.org/web/20090402044622id_/http://www.koders.com/csharp/fidABFA3D15F7A596443DCE29D6AE984F1192048031.aspx#L354&quot;&gt;capture&lt;/a&gt; the string between these two positions, “www.moserware.com”, and &lt;a href=&quot;https://web.archive.org/web/20090402044622id_/http://www.koders.com/csharp/fidABFA3D15F7A596443DCE29D6AE984F1192048031.aspx#L330&quot;&gt;keep it&lt;/a&gt; for later use in case the entire regex succeeds.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L900&quot;&gt;Oneloop&lt;/a&gt; - Here we do a loop at most one time that will check for the ‘/’ character. It succeeds.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L470&quot;&gt;CaptureMark&lt;/a&gt; - We capture into group 0 the value between the offset on the “run &lt;strong&gt;s&lt;/strong&gt;tack”, which is 11 (the start of “http://”), and the last character of the string at offset 36. The string between these offsets is “http://www.moserware.com/”.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L414&quot;&gt;Stop&lt;/a&gt; - We’re done executing RegexCode and can stop the interpreter.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Since we stopped with successful captures, the Match is declared &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.success.aspx&quot;&gt;a success&lt;/a&gt;. Sure enough, if we look at our console window, we see:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Full uri = &#39;http://www.moserware.com/&#39; 
Host =&#39;www.moserware.com&#39;
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&quot;backtracking-down-unhappy-paths&quot;&gt;Backtracking Down Unhappy Paths&lt;/h2&gt;

&lt;p&gt;I can hear the cursing shouts of ^#!@.*#!$ from the regex mob coming towards me. They’re miffed that I used a toy regular expression with a pathetically easy search text that didn’t do anything “interesting.”&lt;/p&gt;

&lt;p&gt;The mob really shouldn’t be that worried. We already have all the essential tools we need to understand how things work.&lt;/p&gt;

&lt;p&gt;One common issue that you have to deal with in a “real” regular expression is backtracking.&lt;/p&gt;

&lt;p&gt;Let’s say you have a search text and pattern like this:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;This text has 1 digit in it&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; 
&lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;@&amp;quot;.*\d&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Regex&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Match&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You’d recognize the parse tree:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/how-net-regular-expressions-really-work/RegexParseTreeDotStarDigit.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The only thing new about it is that the ‘.’ pattern was translated into a “Notone” node that matches anything except one particular character (in our case, a line feed). We see that the set follows the obscure, but compact representation. The only thing new to report is that ‘\x09’ is the magic number to represent all Unicode digits (which the &lt;a href=&quot;http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html&quot;&gt;Turkey Test&lt;/a&gt; showed is more than just [0-9]).&lt;/p&gt;

&lt;p&gt;It’s painful to watch the regex interpreter work so hard for this match. The “.*” puts it &lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L924&quot;&gt;in a Notoneloop&lt;/a&gt; that goes right to the end of the string since it doesn’t find a line feed (‘\n’). It then looks for &lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L817&quot;&gt;the Set&lt;/a&gt; that represents “\d” and it fails. It has no choice but to backtrack by executing the “&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L973&quot;&gt;RegexCode.Notoneloop | RegexCode.Back&lt;/a&gt;” composite instruction which backtracks one character by resetting the “run &lt;strong&gt;t&lt;/strong&gt;rack” to be the Set instruction again, but this time it will start one character earlier.&lt;/p&gt;

&lt;p&gt;Even in our insanely simple search string, the interpreter has to backtrack by executing “&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L973&quot;&gt;RegexCode.Notoneloop | RegexCode.Back&lt;/a&gt;” and retesting &lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L817&quot;&gt;the Set&lt;/a&gt; a total of &lt;em&gt;thirteen times&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;An almost identical process occurs if we had used a lazy match regular expression like “.*?\d”. The difference is that it does a “&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L1004&quot;&gt;Notonelazy&lt;/a&gt;” instruction and then gets caught up in a “&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L1050&quot;&gt;RegexCode.Notonelazy | RegexCode.Back&lt;/a&gt;” backtrack and &lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L817&quot;&gt;Set match&lt;/a&gt; attempt that happens &lt;em&gt;fourteen times&lt;/em&gt;. Each iteration of the loop causes the “&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L1004&quot;&gt;Notonelazy&lt;/a&gt;” instruction to add one more character instead of removing one like the “&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L924&quot;&gt;Notoneloop&lt;/a&gt;” instruction had to. This is typical:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;In situations where the decision is between “make an attempt” and “skip an attempt,” as with items governed by quantifiers, the engine always chooses to first &lt;em&gt;make&lt;/em&gt; the attempt for &lt;em&gt;greedy&lt;/em&gt; quantifiers, and to first &lt;em&gt;skip&lt;/em&gt; the attempt for &lt;em&gt;lazy&lt;/em&gt; (non-greedy) ones. &lt;em&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/0596528124?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0596528124&quot;&gt;Mastering Regular Expressions&lt;/a&gt;&lt;/em&gt;, p.159&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If we had a little more empathy for the regex interpreter, we would have written “[^\d]*\d” and avoided all the backtracking, but it wouldn’t have shown this common error.&lt;/p&gt;

&lt;p&gt;Alternations such as “hello|world” are handled with backtracking. Before each alternative is attempted, the current position is saved on the “run &lt;strong&gt;t&lt;/strong&gt;rack” and “run &lt;strong&gt;s&lt;/strong&gt;tack.” If the alternate fails, the regex engine resets the position to what it was before the alternate was tried and the next alternate is attempted.&lt;/p&gt;

&lt;p&gt;Now, we can even understand how more advanced concepts like &lt;a href=&quot;http://www.regular-expressions.info/atomic.html&quot;&gt;atomic grouping&lt;/a&gt; work. If we use a regex like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;\w+:
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;to match the names of email headers as in:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Subject: Hello World!
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Things will work well. The problem will come when we try to match against&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Subject
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We already know that there is going to be a backtracking since “\w+” will match the whole string and then backtracking will occur as the interpreter desperately tries to match a ‘:’. If we used atomic grouping, as in:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(?&amp;gt;\w+):
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We would see that the generated RegexCode has two extra instructions of &lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L701&quot;&gt;Setjump&lt;/a&gt; and &lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L723&quot;&gt;Forejump&lt;/a&gt; in it. These instructions tell the interpreter to do unconditional jumps after matching the “\w+”. As &lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L723&quot;&gt;the comment&lt;/a&gt; for “Forejump” indicates, these unconditional jumps will “zap backtracking state” and be much more efficient for a failed match since backtracking won’t occur.&lt;/p&gt;

&lt;h2 id=&quot;loose-ends&quot;&gt;Loose Ends&lt;/h2&gt;

&lt;p&gt;There are some minor details left. The first time you use any regex, a &lt;a href=&quot;https://web.archive.org/web/20090402012958id_/http://www.koders.com/csharp/fid14AB8BA02EE8A6DBA830F1DCC147C2B17F0B3DE0.aspx#L102&quot;&gt;lot&lt;/a&gt; &lt;a href=&quot;https://web.archive.org/web/20090402012958id_/http://www.koders.com/csharp/fid14AB8BA02EE8A6DBA830F1DCC147C2B17F0B3DE0.aspx#L260&quot;&gt;of&lt;/a&gt; &lt;a href=&quot;https://web.archive.org/web/20090402012958id_/http://www.koders.com/csharp/fid14AB8BA02EE8A6DBA830F1DCC147C2B17F0B3DE0.aspx#L358&quot;&gt;work&lt;/a&gt; goes on initializing all the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/20bw873z.aspx&quot;&gt;character classes&lt;/a&gt; that are stored as static variables. If you just timed a single Regex, your numbers would be highly skewed by this process.&lt;/p&gt;

&lt;p&gt;Another common issue is whether you should use the RegexOptions.Compiled flag. Compiling is handled by the &lt;a href=&quot;https://web.archive.org/web/20090429030946id_/http://www.koders.com/csharp/fid7CC2751EC539A3CCF3A96A3D82E38D6E6D7B79F3.aspx#L44&quot;&gt;RegexCompiler&lt;/a&gt; class. The interesting aspects of the IL code generation is handled exactly like the interpreter, as indicated by &lt;a href=&quot;https://web.archive.org/web/20090429030946id_/http://www.koders.com/csharp/fid7CC2751EC539A3CCF3A96A3D82E38D6E6D7B79F3.aspx#L1554&quot;&gt;this comment&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;cm&quot;&gt;/* &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * The main translation function. It translates the logic for a single opcode at &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * the current position. The structure of this function exactly mirrors &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * the structure of the inner loop of RegexInterpreter.Go(). &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * The C# code from RegexInterpreter.Go() that corresponds to each case is &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * included as a comment. &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * Note that since we&amp;#39;re generating code, we can collapse many cases that are &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * dealt with one-at-a-time in RegexIntepreter. We can also unroll loops that &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * iterate over constant strings or sets. &lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; */&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We can see that there &lt;em&gt;is&lt;/em&gt; some optimization in the generated code. The down side is that we have to generate all the code regardless of if we use all of it or not. The interpreter only uses what it needs. Additionally, unless we use &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.compiletoassembly.aspx&quot;&gt;Regex.CompileToAssembly&lt;/a&gt; to save the compiled code to a DLL, we’ll end up doing the entire process of creating the parse tree, RegexCode, and code generation at runtime.&lt;/p&gt;

&lt;p&gt;Thus, for most cases, it seems that RegexOptions.Compiled isn’t worth the effort. But it’s good to keep in mind that there are exceptions when performance is critical and your regex can benefit from it (otherwise, why have the option at all?).&lt;/p&gt;

&lt;p&gt;Another option is &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx&quot;&gt;RegexOptions&lt;/a&gt;.IgnoreCase that makes everything case insensitive. The vast majority of the process stays the same. The only difference is that all instructions that compare characters will convert each &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.char.aspx&quot;&gt;System.Char&lt;/a&gt; to lower case, mostly using the &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/xt041c19.aspx&quot;&gt;Char.ToLower&lt;/a&gt; method. This sounds reasonable, but it’s not quite perfect. For example, in Koine Greek, the word for “&lt;a href=&quot;http://www.blueletterbible.org/lang/lexicon/lexicon.cfm?Strongs=G4597&amp;amp;t=NKJV&quot;&gt;moth&lt;/a&gt;” goes from uppercase to lowercase like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/how-net-regular-expressions-really-work/mothUpperAndLower.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;That is, in Greek, when a “sigma” (Σ) appears in lowercase at the end of a word, it uses a &lt;a href=&quot;http://blogs.msdn.com/michkap/archive/2005/05/26/421987.aspx&quot;&gt;different letter&lt;/a&gt; (ς) than if it appeared anywhere else (σ). RegexOptions.IgnoreCase can’t handle cases that need more context than a single &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.char.aspx&quot;&gt;System.Char&lt;/a&gt; even though the string comparison functions can handle this. Consider this example:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mothLower&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;σής&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mothUpper&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mothLower&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ToUpper&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// &amp;quot;ΣΉΣ&amp;quot;&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stringsAreEqualIgnoreCase&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mothUpper&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Equals&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mothLower&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StringComparison&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CurrentCultureIgnoreCase&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// true &lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stringsAreEqualRegex&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Regex&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IsMatch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mothLower&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mothUpper&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RegexOptions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IgnoreCase&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// false&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This also means that .NET’s Regex won’t do well with characters outside the &lt;a href=&quot;http://en.wikipedia.org/wiki/Basic_Multilingual_Plane&quot;&gt;Basic Multilingual Plane&lt;/a&gt; that need to be represented by more than one &lt;a href=&quot;http://msdn.microsoft.com/en-us/library/system.char.aspx&quot;&gt;System.Char&lt;/a&gt; as a “&lt;a href=&quot;http://msdn.microsoft.com/en-us/library/8k5611at.aspx&quot;&gt;surrogate pair&lt;/a&gt;.”&lt;/p&gt;

&lt;p&gt;I bring all of these “cases” up because it obviously troubled one of the Regex programmers who wrote this &lt;a href=&quot;https://web.archive.org/web/20090402013015id_/http://www.koders.com/csharp/fidC88A6970F260F6826C679E703634322F3C553827.aspx#L1859&quot;&gt;comment&lt;/a&gt; &lt;em&gt;&lt;a href=&quot;https://web.archive.org/web/20090402044617id_/http://www.koders.com/csharp/fidA16EF1E737BCF735FD1DE4D39E0E1AD9851FC2A7.aspx#L64&quot;&gt;twice&lt;/a&gt;&lt;/em&gt;:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c#&quot; data-lang=&quot;c#&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// We do the ToLower character by character for consistency.  With surrogate chars, doing &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// a ToLower on the entire string could actually change the surrogate pair.  This is more correct &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// linguistically, but since Regex doesn&amp;#39;t support surrogates, it&amp;#39;s more important to be &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// consistent.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can tell the author was fully anticipating the &lt;a href=&quot;http://code.logos.com/blog/2008/07/net_regular_expressions_and_unicode.html&quot;&gt;bug reports&lt;/a&gt; that eventually came as a result of this decision. Unfortunately, due to the way the code is structured, changing this behavior would take a hefty overhaul of the engine and would require a massive amount of regression testing. I’m guessing this is the reason why it won’t be coming in a service pack anytime soon.&lt;/p&gt;

&lt;p&gt;The last interesting option that affects most of the code is RegexOptions.RightToLeft. For the most part, this affects where the searching starts and how a “&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L229&quot;&gt;bump&lt;/a&gt;” is applied. When the engine wants to move forward or get the characters to the “right”, it checks this option to see if it should move +1 or -1 character from the current position. It’s a simple idea, but its implementation is with many “&lt;a href=&quot;https://web.archive.org/web/20090401214129id_/http://www.koders.com/csharp/fidE76CE858561A50AF7A1D9030DC8F2F4D6DEF839D.aspx#L247&quot;&gt;if(!runrtl)&lt;/a&gt;” statements spread throughout the code.&lt;/p&gt;

&lt;p&gt;Finally, you might be interested in how &lt;a href=&quot;http://www.mono-project.com/Main_Page&quot;&gt;Mono&lt;/a&gt;’s regular expression compares with Microsoft’s. The good news is that the code &lt;a href=&quot;http://anonsvn.mono-project.com/viewvc/trunk/mcs/class/System/System.Text.RegularExpressions/&quot;&gt;is also available&lt;/a&gt; online as well. In general, Mono’s implementation is very similar. Here are some of the (minor) differences:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Mono’s parse tree has a similar shape, but it uses more strongly typed classes. For example, sets such as [^\s/] are given their own class rather than encoded as a single string. &lt;/li&gt;
  &lt;li&gt;The Boyer-Moore prefix optimization is done in the &lt;a href=&quot;http://anonsvn.mono-project.com/viewvc/trunk/mcs/class/System/System.Text.RegularExpressions/quicksearch.cs?view=log&quot;&gt;QuickSearch&lt;/a&gt; class. It is calculated at run-time and is only used if the search string is longer than 5 characters. &lt;/li&gt;
  &lt;li&gt;The regex machine doesn’t have a separate string table for referencing strings like “http://”. Each character is passed in as an argument to the instruction.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Weighing in around 14,000 lines of code, .NET’s regular expression engine takes awhile to digest. After getting over the shock of its size, it was relatively straightforward to understand. Seeing the real source code, with its occasional funny comments, provided insight that Reflector simply couldn’t offer. In the end, we see that a .NET regular expression pattern is simply a compact representation for its internal RegexCode machine language.&lt;/p&gt;

&lt;p&gt;This whole process has allowed me to finally connect with regular expressions and give them a splash of empathy. Seeing the horror of backtracking first hand in the debugger was enough for me to want to do everything in my power to get rid of it. Following the translation process down to the RegexCode level clued me into how my regex pattern will actually execute. Feeling the wind fly by a regex using the Boyer-Moore prefix optimization has encouraged me to do whatever I can to put string literals at the front of a pattern.&lt;/p&gt;

&lt;p&gt;It’s all these little things that add up to a blazingly fast regular expression.&lt;/p&gt;
</description>
        <pubDate>Mon, 16 Mar 2009 07:47:00 +0000</pubDate>
        <link>http://www.moserware.com/2009/03/how-net-regular-expressions-really-work.html</link>
        <guid isPermaLink="true">http://www.moserware.com/2009/03/how-net-regular-expressions-really-work.html</guid>
        
        
      </item>
    
      <item>
        <title>Rebooting Computing: Why?</title>
        <description>&lt;p&gt;Have you seen “&lt;a href=&quot;http://www.youtube.com/watch?v=yOOJzQRJfIw&quot; title=&quot;See 01:25 of this video for a further explanation&quot;&gt;The Most Famous Chart in Computer Science Education&lt;/a&gt;?” The exact numbers and data sources vary, but the curve always looks similar:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/rebooting-computing-why/CollegeBoundSeniorEnrollmentsInComputingFromSAT.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I &lt;a href=&quot;http://www.cra.org/CRN/issues/0801.pdf&quot;&gt;used&lt;/a&gt; &lt;a href=&quot;http://professionals.collegeboard.com/profdownload/Total_Group_Report.pdf&quot;&gt;data&lt;/a&gt; from college bound seniors who indicated on their &lt;a href=&quot;http://en.wikipedia.org/wiki/SAT&quot;&gt;SAT&lt;/a&gt; that they intended to major in “Computer and Information Sciences and Support Services.” The curve tends peak between 1999 and 2001 and then you see a huge decline that has just begun to bottom out to numbers less than &lt;em&gt;half&lt;/em&gt; their peak value.&lt;/p&gt;

&lt;p&gt;Some people like to &lt;a href=&quot;http://www.youtube.com/watch?v=yOOJzQRJfIw&quot;&gt;explain away&lt;/a&gt; this drop on another curve:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/assets/rebooting-computing-why/NASDAQ19972009.png&quot; title=&quot;Data from Yahoo! Finance on the NASDAQ index from 1997 - 2009&quot;&gt;&lt;img src=&quot;/assets/rebooting-computing-why/NASDAQ19972009.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although there was a correlation of computer science enrollment and the stock market, you’ll see that the curves diverge around 2003. A popular belief is that the bursting &lt;a href=&quot;http://en.wikipedia.org/wiki/Dot-com_bubble&quot;&gt;dot-com bubble&lt;/a&gt; scared some potential students, but then by 2003 parents thought &lt;a href=&quot;http://www.cato.org/pub_display.php?pub_id=2692&quot;&gt;most software jobs were being offshored&lt;/a&gt; and encouraged their kids to pick a different field.&lt;/p&gt;

&lt;p&gt;Some jobs &lt;em&gt;did&lt;/em&gt; go to &lt;a href=&quot;http://en.wikipedia.org/wiki/Bangalore&quot;&gt;Bangalore&lt;/a&gt;, but the total number of jobs &lt;a href=&quot;http://www.acm.org/globalizationreport/&quot;&gt;actually&lt;/a&gt; &lt;a href=&quot;http://www.bls.gov/news.release/ecopro.t06.htm&quot;&gt;grew&lt;/a&gt; in the US, even beyond their 1999 levels. There are still &lt;a href=&quot;http://www.bls.gov/oco/ocos267.htm#outlook&quot;&gt;excellent job&lt;/a&gt; &lt;a href=&quot;http://www.cra.org/govaffairs/sargent_adequacy_of_S-EW.ppt&quot;&gt;prospects&lt;/a&gt; for the long term. Even in hard times like the 1970’s recession, companies like Apple and Microsoft &lt;a href=&quot;http://www.paulgraham.com/badeconomy.html&quot;&gt;were founded&lt;/a&gt;. In 10 years, we’ll know of a several great companies that got their start in the &lt;a href=&quot;http://en.wikipedia.org/wiki/Global_financial_crisis_of_2008&quot;&gt;current financial crisis&lt;/a&gt;. It’s unfortunate how students and their parents have been misled about the reality.&lt;/p&gt;

&lt;p&gt;We’re faced with a pipeline problem. The fresh-outs you’ll be looking to hire in 10 years are in middle school right now. Are you doing anything to woo them to a career in computing? What do you say to a bright young girl to at least &lt;em&gt;consider&lt;/em&gt; looking at computer science?&lt;/p&gt;

&lt;p&gt;I could start with my story. As a kid, I was captivated by how I could think of an idea for a program and then have a computer execute it exactly. It was as if I could put part of my mind inside the computer. My parents and friends thought this was magic. It &lt;em&gt;was&lt;/em&gt; magic, but it was a magic I could understand. There’s always cool new things being computed. It was magical to see how &lt;a href=&quot;http://en.wikipedia.org/wiki/IBM_Deep_Blue#Deep_Blue_versus_Kasparov&quot;&gt;Deep Blue beat Kasparov&lt;/a&gt; long before I began to understand the beauty of &lt;a href=&quot;http://www.computerhistory.org/chess/main.php?sec=thm-42f15cec6680f&quot;&gt;how it worked&lt;/a&gt;. Every day I listen to &lt;a href=&quot;http://en.wikipedia.org/wiki/MP3&quot;&gt;MP3&lt;/a&gt;s that were created by a compression algorithm that gets rid of sound that my ear can’t hear. Companies like Walmart sift terabytes of data to predict market demand so they can send extra &lt;a href=&quot;http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html&quot;&gt;strawberry Pop-Tarts&lt;/a&gt; when hurricanes are approaching. On a personal level, new algorithms predict with a high accuracy what types of &lt;a href=&quot;http://en.wikipedia.org/wiki/Netflix_Prize&quot;&gt;movies we’ll like&lt;/a&gt;. But that’s just one tiny sliver of what’s out there. Each person can have their own unique experience. There’s a lot of great computing going on and the demand is only increasing.&lt;/p&gt;

&lt;p&gt;Sometimes you’ll hear honest hesitations about a career in computing because of fears that they’ll “sit in front of a computer all day.” The sad irony is that this thinking causes people to go into fields like accounting, graphic design, marketing, or hundreds of other fields where they spend just as much time in front of a computer. The difference is that they’ll spend most of their time using applications like Outlook, Word, or Excel and often have less fun than the software developers that are creating these programs. Moreover, working in the computing fields will give someone a chance to create future interfaces that don’t have everyone in front of giant screens all day.&lt;/p&gt;

&lt;p&gt;This isn’t to say that there aren’t boring software development jobs, but there are also plenty of great jobs. &lt;a href=&quot;http://www.codinghorror.com/blog/archives/001202.html&quot;&gt;It’s a great career&lt;/a&gt;. I sometime feel a little guilty that I can work in a field that I enjoy. As a field, we create software that powers &lt;a href=&quot;http://www.inin.com/&quot;&gt;business communications&lt;/a&gt; and &lt;a href=&quot;http://www.facebook.com/&quot;&gt;connects you with friends&lt;/a&gt;. Software will be a pivotal role in &lt;a href=&quot;http://www.pacificbiosciences.com/index.php&quot;&gt;gene sequencing and analyzing&lt;/a&gt; that will usher in customized medical treatments and drugs. Software will drive many great innovations of the future.&lt;/p&gt;

&lt;p&gt;It’s sad that kids aren’t even given a chance to see the breadth and excitement of our field. I don’t blame them. On the whole, we’re doing a terrible job broadcasting our image.&lt;/p&gt;

&lt;p&gt;Professors often teach computer science as if it is some sterile thing with nothing new in it. This is just not true; we’re in our adolescence. Sure, we bumble around and do silly things at times, but it also means we’re &lt;em&gt;growing&lt;/em&gt;. We’re in an incredible time. &lt;a href=&quot;http://robotics.stanford.edu/%7Esahami/bio.html&quot;&gt;Mehran Sahami&lt;/a&gt; captured this well at the &lt;a href=&quot;http://www.youtube.com/watch?v=nWRGPxSNnag&quot;&gt;conclusion of his intro class at Stanford&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Think about the time that you’re living in. &lt;a href=&quot;http://en.wikipedia.org/wiki/Donald_Knuth&quot;&gt;Don Knuth&lt;/a&gt;, who is considered the father of Computer Science is still alive and he’s in this department. It’s &lt;em&gt;sort of like you’re geometers and you’re living in the time of &lt;a href=&quot;http://en.wikipedia.org/wiki/Euclid&quot;&gt;Euclid&lt;/a&gt;&lt;/em&gt;… It’s all happening now. Don’t think of this stuff as dead people who did this stuff and it just happened and now you’re forced to do it. You’re living in it.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Mehran is one of the few teachers that do a great job of &lt;a href=&quot;http://academicearth.org/lectures/life-after-programming-methodology&quot;&gt;sharing&lt;/a&gt; the new and exciting possibilities of computer science. Too often you see teachers that claim that computer science is some small box that is exactly what they’re teaching.&lt;/p&gt;

&lt;p&gt;When I’ve had the honor of having conversations with computing pioneers &lt;a href=&quot;http://www.codinghorror.com/blog/archives/001213.html&quot;&gt;like Alan Kay&lt;/a&gt;, I often hear how their teachers in the 60’s would admit that they didn’t understand the full possibilities of computing, and they wanted their students do better. The early &lt;a href=&quot;http://en.wikipedia.org/wiki/Defense_Advanced_Research_Projects_Agency&quot;&gt;ARPA&lt;/a&gt; community with &lt;a href=&quot;http://www.moserware.com/2008/05/who-is-this-licklider-guy.html&quot;&gt;J.C.R. Licklider&lt;/a&gt; at the helm is a great example of this style.&lt;/p&gt;

&lt;p&gt;Licklider encouraged &lt;em&gt;and funded&lt;/em&gt; wild and imaginative ideas that caused a huge boom in computer science back in the 60’s and early 70’s. Unfortunately, this slowed down in the 1980s as funders became more conservative, took fewer risks, and as a result got more incremental improvements rather than something big. I think this is why Alan Kay has difficulty &lt;a href=&quot;http://stackoverflow.com/questions/432922/significant-new-inventions-in-computing-since-1980&quot;&gt;finding significant new inventions in computing since 1980&lt;/a&gt;. Alan’s statement sounds crazy until you see just how much of what we use was started before 1980. &lt;a href=&quot;http://stackoverflow.com/questions/432922/significant-new-inventions-in-computing-since-1980#433063&quot;&gt;Some&lt;/a&gt; will point to the web and the browser as a huge new invention, but even &lt;a href=&quot;http://en.wikipedia.org/wiki/Marc_Andreessen&quot;&gt;Marc Andreessen&lt;/a&gt;, &lt;a href=&quot;http://www.npr.org/templates/story/story.php?storyId=96089391&quot; title=&quot;Fastforward to 22:30, and you&#39;ll hear Judy quote this.&quot;&gt;speaking&lt;/a&gt; on how he was able to create the first graphical web browser &lt;a href=&quot;http://www.npr.org/templates/story/story.php?storyId=96089391&quot; title=&quot;Fast forward to 22:30, and you&#39;ll hear Judy quote this.&quot;&gt;revealed&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“I was able to do it so quickly because it was the &lt;em&gt;icing on the cake that had been baking for 30 years&lt;/em&gt;.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Indeed he had. In the mid-60’s, some of the baking started when Licklider funded &lt;a href=&quot;http://en.wikipedia.org/wiki/Douglas_Engelbart&quot;&gt;Doug Englebart’&lt;/a&gt;s amazing &lt;a href=&quot;http://en.wikipedia.org/wiki/On-Line_System&quot;&gt;oN-Line System&lt;/a&gt; that &lt;a href=&quot;http://video.google.com/videoplay?docid=-8734787622017763097&quot;&gt;amazingly&lt;/a&gt; had hypertext links and was operated by a mouse. &lt;a href=&quot;http://en.wikipedia.org/wiki/Leonard_Kleinrock&quot;&gt;Len Kleinrock&lt;/a&gt;’s Ph.D. proposal on packet theory in 1961 gave him a great start that ultimately led to his team sending the first message over ARPAnet in &lt;a href=&quot;http://www.docstoc.com/docs/2187564/Microsoft-PowerPoint---Len-Kleinrocks-Brief-History-of-the&quot;&gt;1969&lt;/a&gt;. &lt;a href=&quot;http://en.wikipedia.org/wiki/Vint_Cerf&quot;&gt;Vint Cerf&lt;/a&gt; and &lt;a href=&quot;http://en.wikipedia.org/wiki/Bob_Kahn&quot;&gt;Bob Kahn&lt;/a&gt; had already &lt;a href=&quot;http://www.cs.princeton.edu/courses/archive/fall06/cos561/papers/cerf74.pdf&quot;&gt;published&lt;/a&gt; a paper on &lt;a href=&quot;http://en.wikipedia.org/wiki/Internet_protocol_suite&quot;&gt;TCP/IP&lt;/a&gt;, the bedrock of the Internet protocols, by 1974. All of these technologies were well refined and in production by the time &lt;a href=&quot;http://en.wikipedia.org/wiki/Tim_Berners-Lee&quot;&gt;Tim Berners-Lee&lt;/a&gt; created HTTP in the early 90’s to which Andreessen would add a graphical front end.&lt;/p&gt;

&lt;p&gt;Are we willing to fund long-term “wild” and “crazy” ideas &lt;em&gt;today&lt;/em&gt; to create Internet-sized future results? We’ve been too focused on short term results. It’s not just academia; most companies focus on short-terms results that dismiss the fact that computing is a young field and miss what really matters:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Many HR departments haven’t figured this out yet, but in reality, It’s less important to know Java, Ruby, .NET, or the iPhone SDK. There’s always going to be a new technology or a new version of an existing technology to be learned. &lt;strong&gt;The technology itself isn’t as important; it’s the constant learning that counts&lt;/strong&gt;.” - &lt;a href=&quot;http://www.amazon.com/gp/product/1934356050?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1934356050&quot;&gt;&lt;em&gt;Pragmatic Thinking and Learning&lt;/em&gt;&lt;/a&gt;, p.145&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;bummer-now-what&quot;&gt;Bummer, Now What?&lt;/h2&gt;

&lt;p&gt;Last March, I was given the opportunity to be on &lt;a href=&quot;http://rebootingcomputing.com/teams/about.html&quot;&gt;a design team&lt;/a&gt; to start tackling some of these &lt;a href=&quot;http://rebootingcomputing.org/manifesto.html&quot;&gt;problems&lt;/a&gt;. We knew that we couldn’t change the whole field and that some people would want to keep it the way it is. But we also knew that we had to do something; we didn’t want to settle for the status quo.&lt;/p&gt;

&lt;p&gt;Our primary task was to plan a “summit” of the best people from academia, government, and industry and get them all in one room so that we could get a good sample of the entire field. We didn’t want to let people have the chance to point fingers outside and say it was somebody else’s problem.&lt;/p&gt;

&lt;p&gt;After working on the basic concept for the summit, we needed to give it a name. I had enjoyed many side discussions of the great days of Licklider, &lt;a href=&quot;http://en.wikipedia.org/wiki/Xerox_PARC&quot;&gt;PARC&lt;/a&gt;, and the early culture that accomplished great things. Sort of as a joke, I thought that we needed to “reboot” computer science to get rid of the cruft that had accumulated over time and get back to the excitement when the field was brand new. After some discussion, we decided to change “computer science” to a broader field of “computing” and use the “magic and beauty” of computer science to be the driver of the “rebooting.”&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://rebootingcomputing.org/&quot;&gt;&lt;img src=&quot;/assets/rebooting-computing-why/rebootingcomputinglogo.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Only later would I realize the rebooting metaphor could be stretched a bit further. We can “reboot” the field without throwing away the good parts just as an operating system can reboot and depend on its valuable &lt;a href=&quot;http://en.wikipedia.org/wiki/Non-volatile_memory&quot;&gt;non-volatile&lt;/a&gt; memory being preserved. Rebooting doesn’t mean that we’ll go down the same crufty path (e.g. perhaps we have better “drivers” now). Most importantly, the &lt;a href=&quot;http://rebootingcomputing.org/&quot;&gt;domain name&lt;/a&gt; was available so we ran with it.&lt;/p&gt;

&lt;p&gt;After nine months of planning and inviting over 220 people, we had our summit in January at the &lt;a href=&quot;http://www.computerhistory.org/&quot;&gt;Computer History Museum&lt;/a&gt;. It was the first time that such a broad representation of the computing field came to work together in the same room.&lt;/p&gt;

&lt;p&gt;The summit was guided by the &lt;a href=&quot;http://appreciativeinquiry.case.edu/&quot;&gt;Appreciative Inquiry&lt;/a&gt; process. It’s a technique that has you work in small teams to discover a positive core of what’s giving the field life and then uses that to start dreaming of a better future. As the three days unraveled, we made it to the “design” phase where we kicked off several projects that fell into three rough categories:&lt;/p&gt;

&lt;p&gt;Education&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;K-8 Fundamentals (Creating engaging introductions of computing fundamentals at the elementary level) &lt;/li&gt;
  &lt;li&gt;Project/Problem Based Learning for Grades 7-14 &lt;/li&gt;
  &lt;li&gt;CS in K-12: Essential Subject (Determining a path to get computing essentials introduced in the K-12 curriculum) &lt;/li&gt;
  &lt;li&gt;Recruiting CS Teachers (Significantly increase the number of computer science teachers) &lt;/li&gt;
  &lt;li&gt;National Curriculum for Multi-Disciplinary Collaboration &lt;/li&gt;
  &lt;li&gt;International Educational Repository (that would include classroom activities and ideas for the K-16 level)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Outreach&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.labrats.org/&quot;&gt;LabRats&lt;/a&gt; (Build learning communities that include after school activities focused on areas like computer science) &lt;/li&gt;
  &lt;li&gt;Recruiting Women &amp;amp; Minorities into CS &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.imageofcomputing.com/&quot;&gt;Image of Computing&lt;/a&gt; (Sort of like a marketing campaign to show how computing is changing the world) &lt;/li&gt;
  &lt;li&gt;Tools for Fun and Beauty (Providing software tools for people sharing the fun and beauty of computing) &lt;/li&gt;
  &lt;li&gt;Relevant Computer Science Intersecting with Socially Relevant Projects (e.g. providing infrastructure in third world countries or disaster response scenarios) &lt;/li&gt;
  &lt;li&gt;Defining Future Computing Requirements for IT Service Verticals (e.g. Health Care, Financial, Government)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Internal Growth&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Parallel Worlds Initiative (Leverage the boom in &lt;a href=&quot;http://en.wikipedia.org/wiki/Multi-core&quot;&gt;multi-core&lt;/a&gt; to drive new areas of thinking in computing) &lt;/li&gt;
  &lt;li&gt;Open Artifacts (e.g. create hardware and software systems that can easily be inspected, understood, assembled, disassembled and reused in new ways to allow for exploration) &lt;/li&gt;
  &lt;li&gt;Rediscovering Computing “Gems” (Revisit ideas from the past that might have been previously abandoned because they were infeasible but now might be possible) &lt;/li&gt;
  &lt;li&gt;Computing Field Guide (Create a resource to show the breadth of computing to both novices and experts)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I joined the Computing Field Guide team. I think it’ll be fun to see if we can come up with a way to leverage many of the great existing resources out there and learn some of the breadth of the field as a result.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/rebooting-computing-why/RebootingGroup.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In the end, the three day summit was just the beginning of a long journey. It was a bit chaotic and some were disappointed we didn’t fix everything right then or do more, but I think the summit gave a good context for the issues our field faces. My best memories include all of the great people that I met and being able to engage in some great conversations.&lt;/p&gt;

&lt;p&gt;Steve Jobs said “&lt;a href=&quot;http://www.youtube.com/watch?v=D1R-jKKp3NA&quot; title=&quot;See 4:45&quot;&gt;you can’t connect the dots looking forward; you can only connect them looking backwards&lt;/a&gt;.” I think that holds true for this summit. There are still a lot of dots between here and &lt;em&gt;there&lt;/em&gt;; wherever “there” might be. With a lot of hard work, I’m confident that the dots will connect somehow. There’s an interesting road ahead and I want to be a part of it. We need to start baking cakes for future generations to ice.&lt;/p&gt;

&lt;h2 id=&quot;your-turn&quot;&gt;Your Turn&lt;/h2&gt;

&lt;p&gt;What do you think of the current state of computer science? What is your dream for the future? What are some things that we can start now to improve the current situation? Are you involved with any existing effort? Would you like to join ours?&lt;/p&gt;

&lt;p&gt;I’d love to hear your thoughts.&lt;/p&gt;

&lt;p&gt;P.S. There are several blog posts by fellow “Rebooters” ([&lt;a href=&quot;http://www.cs.uni.edu/%7Ewallingf/blog/archives/monthly/2009-01.html#e2009-01-17T15_09_23.htm&quot;&gt;1&lt;/a&gt;] [&lt;a href=&quot;http://www.cs.uni.edu/%7Ewallingf/blog/archives/monthly/2009-01.html#e2009-01-19T09_53_10.htm&quot;&gt;2&lt;/a&gt;] [&lt;a href=&quot;http://www.cs.uni.edu/%7Ewallingf/blog/archives/monthly/2009-01.html#e2009-01-20T16_27_09.htm&quot;&gt;3&lt;/a&gt;] [&lt;a href=&quot;http://www.cs.uni.edu/%7Ewallingf/blog/archives/monthly/2009-01.html#e2009-01-21T07_55_01.htm&quot;&gt;4&lt;/a&gt;] [&lt;a href=&quot;http://www.cs.uni.edu/%7Ewallingf/blog/archives/monthly/2009-01.html#e2009-01-22T16_05_19.htm&quot;&gt;5&lt;/a&gt;] [&lt;a href=&quot;http://www.sdtimes.com/link/33201&quot;&gt;6&lt;/a&gt;] [&lt;a href=&quot;http://www.sdtimes.com/ZEICHICK_S_TAKE_REBOOTING_COMPUTER_SCIENCE_PART_2/33214&quot;&gt;7&lt;/a&gt;] [&lt;a href=&quot;http://geek-knitter.blogspot.com/2009/01/rebooting-computing.html&quot;&gt;8&lt;/a&gt;] [&lt;a href=&quot;http://techher.blogspot.com/2009/01/reboot-computing-2009.html&quot;&gt;9&lt;/a&gt;]). There are some &lt;a href=&quot;http://www.flickr.com/photos/cglusky/sets/72157612501297914/detail/&quot;&gt;pictures on flickr&lt;/a&gt; and a &lt;a href=&quot;http://www.rebootingcomputing.org/community/&quot;&gt;Rebooting Computing Community site&lt;/a&gt; that’s starting to have follow-on discussion and might eventually have videos of highlights from the summit. In addition, you might watch &lt;a href=&quot;http://www.youtube.com/watch?v=5a_pO3NYJl0&quot;&gt;this great talk&lt;/a&gt; by Dr. Peter Denning who has been leading this effort for over five years.&lt;/p&gt;
</description>
        <pubDate>Tue, 03 Feb 2009 12:15:00 +0000</pubDate>
        <link>http://www.moserware.com/2009/02/rebooting-computing-why.html</link>
        <guid isPermaLink="true">http://www.moserware.com/2009/02/rebooting-computing-why.html</guid>
        
        
      </item>
    
  </channel>
</rss>
