<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;A0INQHk6cCp7ImA9WhFSF00.&quot;"><id>tag:blogger.com,1999:blog-6800934446457898793</id><updated>2013-06-20T02:26:31.718-04:00</updated><category term="trueskill" /><category term="aes" /><title>Moserware</title><subtitle type="html">Jeff Moser's software development adventures.</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://www.moserware.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://www.moserware.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/6800934446457898793/posts/default?start-index=4&amp;max-results=3&amp;redirect=false&amp;v=2" /><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>45</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>3</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/Moserware" /><feedburner:info uri="moserware" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><geo:lat>39.95645</geo:lat><geo:long>-86.008729</geo:long><feedburner:emailServiceId>Moserware</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><entry gd:etag="W/&quot;CUABRns_eip7ImA9WhRSGEU.&quot;"><id>tag:blogger.com,1999:blog-6800934446457898793.post-86529977875835325</id><published>2011-11-21T08:43:00.002-05:00</published><updated>2011-11-21T08:55:57.542-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-21T08:55:57.542-05:00</app:edited><title>Life, Death, and Splitting Secrets</title><content type="html">&lt;p&gt;(&lt;strong&gt;Summary&lt;/strong&gt;: I created &lt;a href="https://github.com/moserware/SecretSplitter"&gt;a program&lt;/a&gt; to help back up important data like your master password in case something happens to you. By splitting your secret into pieces, it provides a circuit breaker against a single point of failure. I&amp;#8217;m giving it away as a free open source program with the hope that others might find it useful in addressing this aspect of our lives. Feel free to &lt;a href="https://github.com/downloads/moserware/SecretSplitter/SecretSplitter.exe"&gt;use the program&lt;/a&gt; and follow along with just the screenshots below or read all sections of this post if you want more context.)&lt;/p&gt;&lt;h4&gt;Background&lt;/h4&gt;&lt;p&gt;I just couldn&amp;#8217;t do it.&lt;/p&gt;&lt;p&gt;&lt;img alt="Grandma and Jeff" title="Grandma and Jeff" src="http://3.bp.blogspot.com/-IBcyywvqnWA/TsEou9240BI/AAAAAAAAZBE/a5aM-FeH49I/s320/Grandma+and+Jeff.jpg" align="right" style="border:0; margin: 0px 0px 15px 15px; display: inline" /&gt;My grandma died at this time last year from a stroke. She was a great woman. I still miss her. In that emotional last week, I was reminded of great memories with her and the fragility of life. I was also reminded about important documents that I still didn&amp;#8217;t have.&lt;/p&gt;&lt;p&gt;When something happens to you, be it death or incapacitation, there are some important steps that need to occur that can be greatly assisted by legal documents.  For example:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;An &lt;a href="http://en.wikipedia.org/wiki/Advance_health_care_directive"&gt;advance health care directive&lt;/a&gt; (aka &amp;#8220;Living Will&amp;#8221;) specifies what actions should (or shouldn&amp;#8217;t) be taken with regards to your healthcare if you&amp;#8217;re no longer able to make decisions for yourself.&lt;/li&gt;&lt;li&gt;A &lt;a href="http://en.wikipedia.org/wiki/Power_of_attorney#Durable_power_of_attorney"&gt; durable power of attorney&lt;/a&gt; allows you to designate someone to legally act as you if you become incapacitated.&lt;/li&gt;&lt;li&gt;A &lt;a href="http://en.wikipedia.org/wiki/Will_(law)" &gt;last will and testament&lt;/a&gt; allows you to legally assign caregivers for &lt;a href="http://en.wikipedia.org/wiki/Minor_(law)" title="Typically 18 and younger."&gt;minor children&lt;/a&gt; as well as designate where you'd like your possessions to go.&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;My grandma had these and it helped reduce stress and anxiety in this difficult time. We knew what she would have wanted and these documents helped legally enforce that.&lt;/p&gt;&lt;p&gt;I had assumed that these documents were expensive and time-consuming to  create. Furthermore, as a guy in my 20&amp;#8217;s, death still seems like &lt;a href="http://quotationsbook.com/quote/10024/" title="&amp;#8220;Death is a distant rumor to the young.&amp;#8221; - Andy Rooney (1919 - 2011)"&gt;a distant rumor&lt;/a&gt;. As a Christian, I&amp;#8217;m &lt;a href="http://www.biblegateway.com/passage/?search=Philippians%201:21&amp;amp;version=ESV"&gt;not overly concerned&lt;/a&gt; &lt;a href="http://www.biblegateway.com/passage/?search=1%20Corinthians%2015:54-57&amp;amp;version=ESV"&gt;about death itself&lt;/a&gt;, but my grandma&amp;#8217;s death reminded me that these documents  are not really for me, but rather the people I&amp;#8217;d leave behind. I knew that if something  happened to me, I&amp;#8217;d potentially be leaving behind a mess, and that concern of  irresponsibility compelled me to investigate what I could do.&lt;/p&gt;&lt;p&gt;It turns out that creating these documents is essentially a matter of filling  out a form template. I &lt;a href="http://www.amazon.com/gp/product/B004DLCQZ4/ref=as_li_ss_tl?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=217145&amp;amp;creative=399369&amp;amp;creativeASIN=B004DLCQZ4" title="I used Quicken Willmaker 2011 Premium edition. I liked the premium edition because it came with a lot of extra books that made for fun reading. The 2012 version will probably be available soon."&gt; bought a program&lt;/a&gt; that made it about as easy as preparing taxes online. In  most cases, you just need disinterested third parties, such as friends or coworkers,  to witness you signing them to make them fully legal. At most, you might have to  get them notarized or filed in your county for a small fee.&lt;/p&gt;&lt;p&gt;One of the steps involved in filling out the &amp;#8220;Information for Caregivers  and Survivors&amp;#8221; document is to list &amp;#8220;&lt;a href="http://www.nolo.com/legal-encyclopedia/help-executor-secured-places-passwords-29669.html"&gt;Secured  Places and Passwords&lt;/a&gt;.&amp;#8221; It&amp;#8217;s a helpful section that your  &lt;a href="http://en.wikipedia.org/wiki/Executor"&gt;executor&lt;/a&gt; can turn  to if something happened to you in order to do things like unlock your cell phone  or access your online accounts. Sure, your survivors might be able use legal force  to get access without it, but only after months of  &lt;a href="https://mail.google.com/support/bin/answer.py?answer=14300"&gt;sending official documentation&lt;/a&gt;.  That&amp;#8217;s a lot of hassle to put someone through. Also, it&amp;#8217;s very likely that  a lot of important things will be missed and no one would ever know they  existed.&lt;/p&gt;&lt;p&gt;It&amp;#8217;s &lt;a href="http://research.microsoft.com/apps/pubs/?id=80436" title="&amp;#8220;The Rational Rejection of Security Advice by Users&amp;#8221; provides some interesting counterpoints to security advice out there."&gt;probably  rational&lt;/a&gt; to just write your passwords down and put them in a safe which your  executor knows the location of and can access in a timely matter. Alternatively,  you could pay for an attorney or a  &lt;a href="http://mashable.com/2010/10/11/social-media-after-death/"&gt;third-party service&lt;/a&gt; and leave your password list with them.  However, this seemed like it would cause a maintenance problem, especially as I  might add or update my passwords frequently. These options would also force me to trust someone I haven&amp;#8217;t known for a long time. Most importantly, the thought of writing  down my passwords on a piece of paper, even if it was in a relatively safe place,  went against every fiber of my security being.&lt;/p&gt; &lt;p&gt;I just couldn&amp;#8217;t do it.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;DISCLAIMER&lt;/strong&gt;: The above simple approaches are probably fine and have worked  for a lot of people over the years. &lt;strong&gt;If you&amp;#8217;re comfortable with these basic approaches,  by all means use them and ignore this post&lt;/strong&gt;. These simpler approaches have less moving  parts and are easy to understand. However, if you want a little more security, or need to liven up this process with a little spy novel-esque fun, read on.&lt;/p&gt; &lt;h4&gt;The Modern Password &amp;amp; Encryption Problem&lt;/h4&gt; &lt;p&gt;As an online citizen, you don&amp;#8217;t want to be that person. You know, the one  whose password was so &lt;a href="http://blogs.wsj.com/digits/2010/12/13/the-top-50-gawker-media-passwords/" title="If nothing else, promise me that none of your passwords are on this list!"&gt; easy to guess&lt;/a&gt; that his email account was broken into and who &amp;#8220;&lt;a href="http://www.nbclosangeles.com/news/tech/Email-Scams-83600577.html"&gt;wrote&lt;/a&gt;&amp;#8221;  to you saying that he decided to go to Europe on a whim this past weekend but now  needs you to wire him money right now and he&amp;#8217;ll explain everything later: &lt;em&gt;that&lt;/em&gt; guy.&lt;/p&gt; &lt;p&gt;You&amp;#8217;ve learned that passwords like &amp;#8220;thunder&amp;#8221;, &amp;#8220;thunder56&amp;#8221;, and even &amp;#8220;L0u|&amp;gt;Thund3r&amp;#8221;  are terrible because they&amp;#8217;re &lt;a href="http://www.wired.com/politics/security/commentary/securitymatters/2007/01/72458?currentPage=all" title="Password recovery tools are pretty good these days."&gt; easily guessed&lt;/a&gt;. You now know that the most important aspect of a password is  its &lt;a href="http://xkcd.com/936/" title="&amp;#8220;correct horse battery staple&amp;#8221; is a start, but character variation and padding help a lot"&gt;length&lt;/a&gt; combined with &lt;a href="https://www.grc.com/haystack.htm" title="Steve Gibson&amp;#8217;s Password Haystacks page is worth at least a quick glance."&gt;basic padding and character variation&lt;/a&gt;  such as &amp;#8220;/* Thunder is coming! */&amp;#8221;, &amp;#8220;I hear &amp;lt;em&amp;gt;thunder&amp;lt;/em&amp;gt;!&amp;#8221;, or &amp;#8220;1.big.BOOM@thunder.mil&amp;#8221;.&lt;/p&gt; &lt;p&gt;In fact, you&amp;#8217;re probably clever enough that you don&amp;#8217;t create or remember  most of your passwords anymore. You use a &lt;a href="http://en.wikipedia.org/wiki/Password_manager"&gt;password manager&lt;/a&gt; like &lt;a href="https://lastpass.com/"&gt;LastPass&lt;/a&gt; or &lt;a href="http://keepass.info/"&gt;KeePass&lt;/a&gt;  to automatically generate and store unique and completely random passwords for all  of your accounts. This has simplified your life so that you only have to remember  your &amp;#8220;master password&amp;#8221; that will get you into where you keep all the rest of your  usernames and passwords.&lt;/p&gt; &lt;p&gt; &lt;img alt="Skeleton Key" height="320" src="http://3.bp.blogspot.com/-8NBM4ONkCGg/TsRkc_qt7QI/AAAAAAAAZG8/3jYrTXbui-8/s320/450px-Llave_bronce[1].jpg" align="left" style="border:0; margin: 0px 15px 15px 0px; display: inline" width="240"&gt;&lt;/p&gt; &lt;p&gt;You also understand that your email account credentials are a &amp;#8220;&lt;a href="http://www.codinghorror.com/blog/2008/06/please-give-us-your-email-password.html" title="It was especially sad in the Web&amp;#8217;s early day when so many sites asked for your email login to effectively spam your contacts. It&amp;#8217;s just inexcusable that some sites still do today."&gt;skeleton  key&lt;/a&gt;&amp;#8221; for almost everything else due to the widespread use of simple password  reset emails. For this very reason, you probably realize that it&amp;#8217;s critical to &lt;a href="http://googleblog.blogspot.com/2011/06/ensuring-your-information-is-safe.html" title="If you do use Gmail, really consider enabling this for your own safety and to prevent yourself from being *that* guy."&gt; protect your email login with &amp;#8220;two-factor&amp;#8221; authentication&lt;/a&gt;. That is, your email  account should at least be protected by:&lt;/p&gt; &lt;ol&gt;  &lt;li&gt;Something you know (your password) &lt;em&gt;and&lt;/em&gt; &lt;/li&gt;  &lt;li&gt;Something you have (your cellphone), that creates or receives a one-time   use code when you want to login.&lt;/li&gt; &lt;/ol&gt; &lt;p&gt;On top of all of this, you try your best to follow the trusty advice that  your passwords should be ones that nobody could guess and you never ever &lt;a href="http://www.schneier.com/blog/archives/2005/06/write_down_your.html" title="Actually, it&amp;#8217;s probably reasonable to write them down in keep them in your wallet"&gt;write  them&lt;/a&gt; &lt;a href="http://blog.jgc.org/2010/12/write-your-passwords-down.html"&gt;down&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;But what if something happens to you? If you&amp;#8217;ve done everything &amp;#8220;right,&amp;#8221;  then your master password and all your second factor details go with you.  &lt;/p&gt; &lt;p&gt;And then there are your encrypted files. Maybe you&amp;#8217;re keeping a &lt;a href="http://www.youtube.com/watch?feature=player_embedded&amp;amp;v=R4vkVHijdQk" title="You could use an encrypted journal or a separate email account. I wonder if &amp;#8220;dear.sophie.lee@gmail.com&amp;#8221; had two-factor authentication enabled on it. I mean, what happens if she writes back too early?"&gt; private journal&lt;/a&gt; for your children to read when they grow up. Perhaps you&amp;#8217;re  living in some spy novel life where you&amp;#8217;re worried that people will take you  out to prevent something you know from being discovered. Wherever you fall on the  spectrum, what do you do with such encrypted data?&lt;/p&gt; &lt;p&gt;Modern encryption is a bit scary because it&amp;#8217;s so good. If you use a decent  encryption program with a good password/key, then it&amp;#8217;s very likely that no one,  &lt;a href="http://www.extremetech.com/computing/105931-full-disk-encryption-is-too-good-says-us-intelligence-agency"&gt;not even a major government&lt;/a&gt;, could decrypt the file even after hundreds of years.  Encryption is great for keeping prying eyes out, but it could sadden survivors that  you want to have access to your data. The thought of something being lost forever might  make you almost yearn for the days when you just put everything into a good  safe that&amp;#8217;s rated by how many &lt;a href="http://en.wikipedia.org/wiki/Safe#Class_TL-15" title="For example, a TL-15 safe will resist abuse for about 15 minutes from people who know what they&amp;#8217;re doing."&gt;minutes&lt;/a&gt; it might slow  somebody down.&lt;/p&gt; &lt;p&gt;On a much lighter note, the &amp;#8220;something&amp;#8221; that happens to you doesn&amp;#8217;t have  to be so grim. Maybe you had a really relaxing three week vacation and now you can&amp;#8217;t  remember the exact keyboard combination of your password. Given that our brains  have to &lt;a href="http://www.radiolab.org/2007/jun/07/eternal-sunshine-of-the-spotless-rat/" title="Start listening at 16:45 to find out more about this interesting idea."&gt; recreate memories each time you recall something&lt;/a&gt;, it&amp;#8217;s possible  that you could stress yourself out so much trying to remember your password that  you effectively &amp;#8220;forget&amp;#8221; it. What do you do then? &lt;/p&gt; &lt;p&gt;When you put all your eggs into a password manager basket, you really want  to &lt;a href="http://herbison.com/herbison/broken_eggs_watch.html" title="Whether it was Carnegie or Twain, the phrase &amp;#8220;Put all your eggs in one basket and --- WATCH THAT BASKET!&amp;#8221; is some good advice."&gt;watch that basket&lt;/a&gt;.  Fortunately, creating a basic plan isn&amp;#8217;t that hard. &lt;/p&gt; &lt;h4&gt;A Proposed Solution&lt;/h4&gt; &lt;a href="http://en.wikipedia.org/wiki/Permissive_Action_Link"&gt; &lt;img alt="Example nuclear launch keys" height="320" src="http://2.bp.blogspot.com/-ndwAfLFbKIE/TsHnCT1FN0I/AAAAAAAAZBY/fILw-KRvtfY/s320/Nuclear_missile_launch_keys[1].jpg" align="right" style="border:0; margin: 0px 0px 15px 15px; display: inline" width="212"&gt;&lt;/a&gt; &lt;p&gt;Let&amp;#8217;s borrow an &lt;a href="http://www.biblegateway.com/passage/?search=Numbers%2035:30&amp;version=ESV" title="For example, the 2-3 witnesses concept appears several times in the Bible."&gt;ancient&lt;/a&gt; yet incredibly  useful idea: if it&amp;#8217;s really important to get your facts right about something, be  sure to have at least two or three witnesses. This is especially true concerning  matters of life and death but it also comes up when protecting really valuable things.&lt;/p&gt; &lt;p&gt;By the 20th century, this &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/Two-man_rule"&gt;two-man  rule&lt;/a&gt;&amp;#8221; was implemented in hardware to protect nuclear missiles  from being launched by a lone rogue person without proper authorization. The main  vault at &lt;a href="http://en.wikipedia.org/wiki/United_States_Bullion_Depository#Construction_and_security" title="Also known as the &amp;#8220;United States Buillion Depository&amp;#8221;"&gt; Fort Knox&lt;/a&gt; is locked by multiple combinations such that no single person is entrusted  with all of them. On the Internet, the master key for protecting the new secure  domain name system (&lt;a href="http://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions"&gt;DNSSEC&lt;/a&gt;) &lt;a href="http://www.schneier.com/blog/archives/2010/07/dnssec_root_key.html"&gt;is  split between among 7 people from 6 different countries&lt;/a&gt; such that at least 5  people are needed to reconstruct it in the event of an Internet catastrophe.&lt;/p&gt; &lt;p&gt;If this idea is good enough for protecting nuclear weapons, the Fort Knox  vault, and one of the most critical security aspects on the Internet, it&amp;#8217;s probably  good enough for your password list. Besides, it can make a somewhat uncomfortable  process a little more fun.&lt;/p&gt; &lt;p&gt;Let&amp;#8217;s start with a simple example. Let&amp;#8217;s say that your master password  is &amp;#8220;1.big.BOOM@thunder.mil&amp;#8221;. You could  just write it out on a piece of paper and then use scissors to cut it up. This would  work if you wanted to split it among 2 people, but it has some notable downsides:&lt;/p&gt; &lt;ol&gt;  &lt;li&gt;It doesn&amp;#8217;t work if you want redundancy (i.e. any 2 of 3 people being   able to reconstruct it)&lt;/li&gt;  &lt;li&gt;Each piece would tell you something about the password and thus has   value on its own. Ideally, we&amp;#8217;d like the pieces to be worthless unless a threshold   of people came together.&lt;/li&gt;  &lt;li&gt;It doesn&amp;#8217;t really work for more complicated scenarios like requiring 5   of 7 people.&lt;/li&gt; &lt;/ol&gt; &lt;p&gt;Fortunately, some clever math can fix these issues and give you this ability  for free. I created a program called  &lt;a href="https://github.com/moserware/SecretSplitter"&gt;SecretSplitter&lt;/a&gt; to automate  all of this to hopefully make the whole process painless.&lt;/p&gt; &lt;p&gt;Let&amp;#8217;s say you want to require at least 2 witnesses to agree that something  happened to you before your secret is available. You also want to build in  redundancy such that &lt;em&gt;any&lt;/em&gt; pair of people can find out your password. For this scenario, you keep the can use the default settings and press the &amp;#8220;split&amp;#8221; button:&lt;/p&gt; &lt;p&gt; &lt;a href="http://4.bp.blogspot.com/-2key0wl4WCA/TsmS2sduePI/AAAAAAAAZIU/xU5sv04g6l0/s1600/SplitMessageSpecifyMessageThresholdAndShares.png" title="Specify the message &amp;#8220;1.big.BOOM@thunder.mil&amp;#8221;"&gt; &lt;img alt="Specifying message" height="353" src="http://4.bp.blogspot.com/-2key0wl4WCA/TsmS2sduePI/AAAAAAAAZIU/xU5sv04g6l0/s576/SplitMessageSpecifyMessageThresholdAndShares.png" width="576" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;You&amp;#8217;ll get this list of split pieces: &lt;/p&gt; &lt;p&gt; &lt;a href="http://3.bp.blogspot.com/-Or-lhAwWbYo/TsmfaToakII/AAAAAAAAZIg/zN2D719yU8s/s1600/SplitMessageShares.png" title="List of message shares"&gt; &lt;img alt="List of message shares" height="428" src="http://3.bp.blogspot.com/-Or-lhAwWbYo/TsmfaToakII/AAAAAAAAZIg/zN2D719yU8s/s576/SplitMessageShares.png" width="576"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Notice that each piece is twice as long as your original message (about  twice the size of a package tracking number). This is by design. &lt;/p&gt; &lt;p&gt;Now comes the hard part: you have to select three people you trust. You  should have high confidence in anyone you&amp;#8217;d entrust with a secret piece. It&amp;#8217;s easy to  get caught up in &lt;a href="http://xkcd.com/538/"&gt;gee-whiz cryptography&lt;/a&gt; and miss  fundamentals: you ultimately have to &lt;a href="http://cm.bell-labs.com/who/ken/trust.html" title="&amp;#8220;Reflections on Trusting Trust&amp;#8221; is a fascinating read about the fundamentals of security."&gt;trust something&lt;/a&gt;, especially  with important matters. SecretSplitter provides a trust circuit breaker just  in case (because even well-meaning people can &lt;a href="http://abcnews.go.com/WN/president-bill-clinton-lost-nuclear-codes-office-book/story?id=11930878" title="Like the nuclear biscuit"&gt; lose&lt;/a&gt; &lt;a href="http://www.theatlantic.com/politics/archive/2010/10/why-clintons-losing-the-nuclear-biscuit-was-really-really-bad/65009/" title="Thankfully it wasn&amp;#8217;t needed"&gt;important things&lt;/a&gt;). The splitting process adds a bit of complexity, but so do real circuit breakers.  If you trust no one, then you can&amp;#8217;t have anyone help you if something happens.&lt;/p&gt;&lt;p&gt; For  demonstration purposes, let&amp;#8217;s say you trust 3 people. &lt;/p&gt; &lt;p&gt;You now have to distribute these secret pieces. You could do all sorts  of clever things like &lt;a href="http://www.hulu.com/watch/24493/back-to-the-future-part-ii-letter-from-doc" title="Like Doc did to Marty in &amp;#8220;Back to the Future III&amp;#8221;"&gt; send letters to people that will be delivered far in the future&lt;/a&gt; or read them  over the phone. However, distributing them in person is a pretty good option:&lt;/p&gt; &lt;p&gt; &lt;a href="http://1.bp.blogspot.com/-GlUFOp6vyNM/TsSJI4i3XpI/AAAAAAAAZHs/54reXSGrrfA/s1600/CreateShareEnvelope.JPG"&gt; &lt;img alt="Creating an envelope with a share" height="432" src="http://1.bp.blogspot.com/-GlUFOp6vyNM/TsSJI4i3XpI/AAAAAAAAZHs/54reXSGrrfA/s576/CreateShareEnvelope.JPG" width="576"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;It can make the upcoming holiday table discussions even more fun:&lt;/p&gt; &lt;p&gt; &lt;a href="http://3.bp.blogspot.com/-Tme-qGpMt3w/TsSIfYsPoYI/AAAAAAAAZHg/1VhIRxy_yCs/s1600/ShareHandoff.JPG"&gt; &lt;img alt="Handing over the envelope with the secret piece" height="432" src="http://3.bp.blogspot.com/-Tme-qGpMt3w/TsSIfYsPoYI/AAAAAAAAZHg/1VhIRxy_yCs/s576/ShareHandoff.JPG" width="576"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Let&amp;#8217;s pretend that something happened to you. Two of the three family members that you gave pieces to would  come together and agree that &amp;#8220;something&amp;#8221; indeed has happened to you. What happens  now?&lt;/p&gt; &lt;p&gt; &lt;a href="http://2.bp.blogspot.com/-bOPvXCcnFP0/TsSJkvTNn5I/AAAAAAAAZH4/jhNLjOp1r5k/s1600/TwoEnvelopesOpened.JPG"&gt; &lt;img alt="Two opened envelopes with secret shares" height="576" src="http://2.bp.blogspot.com/-bOPvXCcnFP0/TsSJkvTNn5I/AAAAAAAAZH4/jhNLjOp1r5k/s576/TwoEnvelopesOpened.JPG" width="432"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Well, either you included a note with each secret piece or you emailed  them previously with instructions that they&amp;#8217;d just need to download and run  this small program. The pair comes together at a laptop and they each type their piece in quickly and then press &amp;#8220;Recover&amp;#8221;:&lt;/p&gt; &lt;p&gt; &lt;a href="http://2.bp.blogspot.com/-x3K9S0BJ0bw/TsminW_VM1I/AAAAAAAAZIs/FV_MZWOj0Yc/s1600/RecoverMessageWithTypo.png"&gt; &lt;img alt="Typing in secret shares with a typo" height="355" src="http://2.bp.blogspot.com/-x3K9S0BJ0bw/TsminW_VM1I/AAAAAAAAZIs/FV_MZWOj0Yc/s576/RecoverMessageWithTypo.png" width="572"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Oops... they typed so quickly that they mixed up one of the digits. It  told us where to look:&lt;/p&gt; &lt;p&gt; &lt;a href="http://4.bp.blogspot.com/-eEgbphrpUvQ/TsmjFmbmHEI/AAAAAAAAZI4/5xqF-rX2MIo/s1600/RecoverMessageTypoWarning.png"&gt; &lt;img alt="Warning about typo" height="359" src="http://4.bp.blogspot.com/-eEgbphrpUvQ/TsmjFmbmHEI/AAAAAAAAZI4/5xqF-rX2MIo/s576/RecoverMessageTypoWarning.png" width="576"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;They fix the typo and press recover again:&lt;/p&gt; &lt;p&gt; &lt;a href="http://1.bp.blogspot.com/-gKwkODAk2tk/TsmoVJV5ZqI/AAAAAAAAZJc/rNwb7aXqfnk/s1600/RecoverMessageTypoFixed.png"&gt; &lt;img alt="Fixed the typo" height="355" src="http://1.bp.blogspot.com/-gKwkODAk2tk/TsmoVJV5ZqI/AAAAAAAAZJc/rNwb7aXqfnk/s576/RecoverMessageTypoFixed.png" width="572"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;And immediately they see:&lt;/p&gt; &lt;p&gt; &lt;a href="http://3.bp.blogspot.com/-WLjHmSNf0O4/Tsmo8cCS3XI/AAAAAAAAZJo/kGqfPTGLvE0/s1600/RecoveredMessage.png"&gt; &lt;img alt="Recovered message" height="355" src="http://3.bp.blogspot.com/-WLjHmSNf0O4/Tsmo8cCS3XI/AAAAAAAAZJo/kGqfPTGLvE0/s576/RecoveredMessage.png" width="572" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Password recovered! They could now use this master password to log into  your password manager where you&amp;#8217;ve stored further details.&lt;/p&gt; &lt;p&gt;This &amp;#8220;message&amp;#8221; approach is useful if you have a small amount of data such as a  password that you could write on a piece of paper. One downside is that each piece is twice the size of the text message.  If your message becomes much larger then it will no longer be feasible to type it in manually.&lt;/p&gt; &lt;p&gt;One alternative approach is to bundle together all of your important files  into a zip file:&lt;/p&gt; &lt;p&gt; &lt;a href="http://1.bp.blogspot.com/-ipAnpuK26Hg/TsKAGXULYBI/AAAAAAAAZDE/Zqf6Wdmfklc/s1600/CompressedFileExample.png"&gt; &lt;img alt="Example of a compressed file contents" height="242" src="http://1.bp.blogspot.com/-ipAnpuK26Hg/TsKAGXULYBI/AAAAAAAAZDE/Zqf6Wdmfklc/s576/CompressedFileExample.png" width="296"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;To split this file, you&amp;#8217;d click the &amp;#8220;Create&amp;#8221; tab and then find the file,  set the number of shares and click &amp;#8220;Save&amp;#8221;:&lt;/p&gt; &lt;p&gt; &lt;a href="http://3.bp.blogspot.com/-WZ9IT1nQ9_8/TsmrHgVFQhI/AAAAAAAAZJ0/SLsA5a_5dGg/s1600/SplitFileSpecifyFileAndShares.png"&gt; &lt;img alt="Splitting up a file" height="299" src="http://3.bp.blogspot.com/-WZ9IT1nQ9_8/TsmrHgVFQhI/AAAAAAAAZJ0/SLsA5a_5dGg/s576/SplitFileSpecifyFileAndShares.png" width="525" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;You&amp;#8217;ll then be told:&lt;/p&gt; &lt;p&gt; &lt;a href="http://3.bp.blogspot.com/-TOIejemR5HI/TsmsE7ng-XI/AAAAAAAAZKA/37dVpipm_TE/s1600/SplitFileSaveMessageBox.png"&gt; &lt;img alt="MessageBox asking you to save the file" height="171" src="http://3.bp.blogspot.com/-TOIejemR5HI/TsmsE7ng-XI/AAAAAAAAZKA/37dVpipm_TE/s576/SplitFileSaveMessageBox.png" width="431" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;And then you pick where to save the encrypted file:&lt;/p&gt; &lt;p&gt; &lt;a href="http://4.bp.blogspot.com/-K6EedSHTiSM/TsKDt1ngRVI/AAAAAAAAZDo/jXogw62iqQQ/s1600/SplitFileSaveDialog.png"&gt; &lt;img alt="Save file dialog" height="102" src="http://4.bp.blogspot.com/-K6EedSHTiSM/TsKDt1ngRVI/AAAAAAAAZDo/jXogw62iqQQ/s576/SplitFileSaveDialog.png" width="576"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Finally, you&amp;#8217;ll see this screen:&lt;/p&gt; &lt;p&gt; &lt;a href="http://4.bp.blogspot.com/-oENayRV6e_Y/TsmtUsGS1aI/AAAAAAAAZKM/HKs8T2qVpqk/s1600/SplitFileShares.png"&gt; &lt;img alt="Split file pieces" height="556" src="http://4.bp.blogspot.com/-oENayRV6e_Y/TsmtUsGS1aI/AAAAAAAAZKM/HKs8T2qVpqk/s576/SplitFileShares.png" width="557" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;This creates a slightly more complicated scenario because you now have 2 things  to share: the secret pieces and the encrypted file with all your data. The  encrypted file doesn&amp;#8217;t have to be secret at all. You can safely email it to people  that have a secret piece:&lt;/p&gt; &lt;p&gt; &lt;a href="http://4.bp.blogspot.com/-6tgMfuudCcE/TsKKnwFAokI/AAAAAAAAZEQ/er0vI1PDRGw/s1600/SplitFileEmail.png"&gt; &lt;img alt="Sending the fun email" height="365" src="http://4.bp.blogspot.com/-6tgMfuudCcE/TsKKnwFAokI/AAAAAAAAZEQ/er0vI1PDRGw/s576/SplitFileEmail.png" width="576"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Now, if something happens to you, they&amp;#8217;d run the program, and type in two  shares and press &amp;#8220;Recover&amp;#8221;:&lt;/p&gt; &lt;p&gt; &lt;a href="http://2.bp.blogspot.com/-2ycvvqebjac/Tsmu_SrtvXI/AAAAAAAAZKY/1RlJksqIe5E/s1600/RecoverFileShares.png"&gt; &lt;img alt="Entering in file shares" height="398" src="http://2.bp.blogspot.com/-2ycvvqebjac/Tsmu_SrtvXI/AAAAAAAAZKY/1RlJksqIe5E/s576/RecoverFileShares.png" width="525" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;It&amp;#8217;ll then tell them:&lt;/p&gt; &lt;p&gt; &lt;a href="http://4.bp.blogspot.com/-wTvIkMC4n50/TsmvzpNij-I/AAAAAAAAZKk/nlHumsCNVsQ/s1600/RecoverFileSpecifyEncryptedFileMessageBox.png"&gt; &lt;img alt="Specify encrypted file MessageBox" height="184" src="http://4.bp.blogspot.com/-wTvIkMC4n50/TsmvzpNij-I/AAAAAAAAZKk/nlHumsCNVsQ/s576/RecoverFileSpecifyEncryptedFileMessageBox.png" width="496" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;They&amp;#8217;d then go to their email and search for the email from you that includes  your encrypted file:&lt;/p&gt; &lt;p&gt; &lt;a href="http://2.bp.blogspot.com/-UIG-SCT_EMM/TsKM-la3y6I/AAAAAAAAZE0/e4ZXnzLP_DE/s1600/RecoverEmailSearch.png"&gt; &lt;img alt="Searching email" height="38" src="http://2.bp.blogspot.com/-UIG-SCT_EMM/TsKM-la3y6I/AAAAAAAAZE0/e4ZXnzLP_DE/s576/RecoverEmailSearch.png" width="370" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Then they&amp;#8217;d find the single message (or the latest one if you sent out  updates) and download your encrypted attachment:&lt;/p&gt; &lt;p&gt; &lt;a href="http://2.bp.blogspot.com/-07Lvyv3E9OU/TsKOQcUYQtI/AAAAAAAAZFM/SaCiYXOISJs/s1600/RecoverEmailFound.png"&gt; &lt;img alt="Found email" height="29" src="http://2.bp.blogspot.com/-07Lvyv3E9OU/TsKOQcUYQtI/AAAAAAAAZFM/SaCiYXOISJs/s576/RecoverEmailFound.png" width="392" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;They&amp;#8217;d then go back to the program to open it up:&lt;/p&gt; &lt;p&gt; &lt;a href="http://1.bp.blogspot.com/-p17NqyGAuzQ/TsKPIS3DCJI/AAAAAAAAZFY/XuRjjlQ6qUI/s1600/RecoverFileOpen.png"&gt; &lt;img alt="Opening the file" height="71" src="http://1.bp.blogspot.com/-p17NqyGAuzQ/TsKPIS3DCJI/AAAAAAAAZFY/XuRjjlQ6qUI/s576/RecoverFileOpen.png" width="542" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;and then they&amp;#8217;d see a message to be careful where they saved it:&lt;/p&gt; &lt;p&gt; &lt;a href="http://1.bp.blogspot.com/-Boue_w0B6p4/TsmxNb_4voI/AAAAAAAAZKw/c4LwB75ZhxI/s1600/RecoverFileSafetyWarning.png"&gt; &lt;img alt="Will you keep the data safe?" height="199" src="http://1.bp.blogspot.com/-Boue_w0B6p4/TsmxNb_4voI/AAAAAAAAZKw/c4LwB75ZhxI/s576/RecoverFileSafetyWarning.png" width="490" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;and then they&amp;#8217;d save it:&lt;/p&gt; &lt;p&gt; &lt;a href="http://2.bp.blogspot.com/-7EBY2HrKshc/TsKQrF5NGsI/AAAAAAAAZF8/SFf_3srCKmw/s1600/SaveDecryptedFile.png"&gt; &lt;img alt="Save decrypted" height="96" src="http://2.bp.blogspot.com/-7EBY2HrKshc/TsKQrF5NGsI/AAAAAAAAZF8/SFf_3srCKmw/s576/SaveDecryptedFile.png" width="554" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;They'd then be asked if they want to open the decrypted file, which they&amp;#8217;d say &amp;#8220;Yes&amp;#8221;:&lt;/p&gt; &lt;p&gt; &lt;a href="http://2.bp.blogspot.com/-g8X74oYh63M/Tsmz9ED6cII/AAAAAAAAZK8/rj_RK1FIZUQ/s1600/RecoverFileOpenDecryptedFileMessageBox.png"&gt; &lt;img alt="Open decrypted file?" height="171" src="http://2.bp.blogspot.com/-g8X74oYh63M/Tsmz9ED6cII/AAAAAAAAZK8/rj_RK1FIZUQ/s576/RecoverFileOpenDecryptedFileMessageBox.png" width="342" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Now they can see everything:&lt;/p&gt; &lt;p&gt; &lt;a href="http://1.bp.blogspot.com/-ipAnpuK26Hg/TsKAGXULYBI/AAAAAAAAZDE/Zqf6Wdmfklc/s1600/CompressedFileExample.png"&gt; &lt;img alt="Example of a compressed file contents" height="242" src="http://1.bp.blogspot.com/-ipAnpuK26Hg/TsKAGXULYBI/AAAAAAAAZDE/Zqf6Wdmfklc/s576/CompressedFileExample.png" width="296"&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;It might sound complicated, but if you&amp;#8217;re familiar with the process, it  might only take a minute. If you&amp;#8217;re not tech savvy and have never done it before  and type slowly, it might take 30 minutes. In either case, it&amp;#8217;s faster than having  to drive to your home and search around for a folder and it contains everything  you wanted people to know (especially when things are time sensitive).&lt;/p&gt; &lt;p&gt;That&amp;#8217;s it! Your master password and important data are now backed up. The  risk is distributed: if any one piece is compromised (i.e. gets lost or misplaced),  you can have everyone else destroy their secret piece and nothing will be leaked. Also,  the program has an advance feature that lets you save the file encryption key. This feature  allows you to send out updated encrypted files that can be decrypted with the pieces  you&amp;#8217;ve already established in person.&lt;/p&gt; &lt;p&gt;SecretSplitter implements a &amp;#8220;(t,n) &lt;a href="http://en.wikipedia.org/wiki/Threshold_cryptosystem"&gt;threshold cryptosystem&lt;/a&gt;&amp;#8221;  which can be thought of as a mathematical generalization of the physical two-man  rule. The idea is that you split up a secret into pieces (called &amp;#8220;shares&amp;#8221;) and require  at least a threshold of &amp;#8220;t&amp;#8221; shares to be present in order to recover the secret.  If you have less than &amp;#8220;t&amp;#8221; shares, you gain no information about the secret. Whatever  threshold you use, it&amp;#8217;s really important that each &amp;#8220;shareholder&amp;#8221; know the threshold  number of shares.&lt;/p&gt; &lt;p&gt;You can be quite creative in setting the threshold and distributing shares.  For example, you can trust your spouse more by giving her more shares than anyone else. The key idea  is that &lt;strong&gt;a share is an atomic unit of trust&lt;/strong&gt;. You can give more than one unit of trust  to a person, but you can never give less.&lt;/p&gt; &lt;p&gt;Another important practical concern is that you should consider adding  redundancy to any threshold system. This is easily achieved by creating more shares  than the threshold number. The reason is that if you&amp;#8217;re going out of your way to  use a threshold system, then you probably want to make sure you have a backup plan  in case one or more of the shares are unavailable.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;IMPORTANT LEGAL NOTE&lt;/strong&gt;: It&amp;#8217;s tempting to keep everything, including the important  directives and your will in only electronic form (even when they&amp;#8217;re signed). Unfortunately,  most states require the original signed documents to be considered legal and most courts will  not accept a copy. For this reason, you should still have the paper originals somewhere  such as a fireproof safe. However, be careful where you put the originals: although  it might sound convenient to put them in a bank safety deposit box, there&amp;#8217;s usually  a rather long waiting period before a before a bank can legally provide access to  your box to a survivor, so don&amp;#8217;t put any time sensitive items there. My recommendation  at the current time would be to include copies of the signed originals in your encrypted  file and also include detailed instructions on where the originals are located and  how to access them.&lt;/p&gt; &lt;h4&gt;How It Works&lt;/h4&gt; &lt;p&gt;Given the sensitive nature of the data being protected, I wanted to make  sure I understood every part of the mathematics involved and literally every bit  of the encrypted file. You&amp;#8217;re more than welcome to just use the program without  fully understanding the details, but I encourage people to verify my math and code  if you&amp;#8217;re able and curious. &lt;/p&gt; &lt;p&gt;To get started, recall that computers work with &lt;a href="http://en.wikipedia.org/wiki/Bit"&gt;bits&lt;/a&gt;: 1&amp;#8217;s and 0&amp;#8217;s that can represent  anything. For example, the &lt;a href="http://en.wikipedia.org/wiki/UTF-8"&gt;most popular  way of encoding text&lt;/a&gt; will encode &amp;#8220;thunder&amp;#8221; in binary as &lt;/p&gt; &lt;p&gt;01110100 01101000 01110101 01101110 01100100 01100101 01110010&lt;/p&gt; &lt;p&gt;We can write this more efficiently using &lt;a href="http://en.wikipedia.org/wiki/Hexadecimal"&gt;hexadecimal&lt;/a&gt; notation as:  74 68 75 6E 64 65 72. We can also treat this entire sequence of bits as a single  55 bit number whose decimal representation just happens to be 32,765,950,870,971,762.  In fact, &lt;em&gt;any&lt;/em&gt; piece of data can be converted to a single number.&lt;/p&gt; &lt;p&gt;Now that we have a single number, let&amp;#8217;s go back to your algebra class and  remember the equation for a &lt;a href="http://en.wikipedia.org/wiki/Line_(geometry)"&gt;line&lt;/a&gt;:&amp;nbsp; y=mx+b.&lt;/p&gt; &lt;p&gt; &lt;a href="http://4.bp.blogspot.com/-GxROrVUIspY/TsZ0uhWdD1I/AAAAAAAAZII/T2hs8dwPZGk/s1600/LineShowingIntercept.png"&gt; &lt;img alt="Line showing intercept" height="351" src="http://4.bp.blogspot.com/-GxROrVUIspY/TsZ0uhWdD1I/AAAAAAAAZII/T2hs8dwPZGk/s576/LineShowingIntercept.png" width="346"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;In this equation, &amp;#8220;b&amp;#8221; is the &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/Y-intercept"&gt;y-intercept&lt;/a&gt;&amp;#8221;,  which is where the line crosses the y-axis. The &amp;#8220;m&amp;#8221; value is the  &lt;a href="http://en.wikipedia.org/wiki/Slope"&gt;slope&lt;/a&gt; and represents  how steep the line is (i.e. its &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/Grade_(slope)"&gt;grade&lt;/a&gt;&amp;#8221;  if it were a hill).&lt;br /&gt; &lt;/p&gt; &lt;p&gt;This is all the core math you need to understand splitting secrets. In  our particular case, our secret message is always represented by the y-intercept  (i.e. &amp;#8220;b&amp;#8221; in y=mx+b). We want to create a line that will go through this point.  Recall that a line could go through this point at any angle. The slope (i.e. &amp;#8220;m&amp;#8221;  in y=mx+b) will direct us where it goes. For things to work securely, the slope  must be a random number.&lt;/p&gt; &lt;p&gt;Although we use large numbers in practice for security reasons, let&amp;#8217;s keep  it simple here. Let&amp;#8217;s say our secret number is &amp;#8220;7&amp;#8221; and our random slope is &amp;#8220;3.&amp;#8221;  These choices generate this line:&lt;/p&gt; &lt;p&gt; &lt;a href="http://4.bp.blogspot.com/-IasTDH3sDpU/TsMzQA6quLI/AAAAAAAAZGY/CracopxkEBY/s1600/Line3xp7.png"&gt; &lt;img alt="y=3x+7" height="369" src="http://4.bp.blogspot.com/-IasTDH3sDpU/TsMzQA6quLI/AAAAAAAAZGY/CracopxkEBY/s576/Line3xp7.png" width="440"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;With this equation, we can generate an infinite number of points on the  line. For example, we can pick the first three points: (1, 10), (2,  13), and (3, 16):&lt;/p&gt; &lt;p&gt; &lt;a href="http://3.bp.blogspot.com/-PegkAHn5TSk/TsM1TG4VLLI/AAAAAAAAZGk/ZTZbRW5nmOU/s1600/Line3points.png"&gt; &lt;img alt="3 points" height="492" src="http://3.bp.blogspot.com/-PegkAHn5TSk/TsM1TG4VLLI/AAAAAAAAZGk/ZTZbRW5nmOU/s576/Line3points.png" width="576"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;You can see that if you had any two of these points, you could find  the y-intercept.&lt;/p&gt; &lt;p&gt;It&amp;#8217;s critical to realize that having just one of these points gives us  no useful information about the line. However, having any other point on the line  would allow us to use a ruler and draw a straight line to the y-intercept and thus  reveal the secret (we could also work it out algebraically). Each point represents a secret piece or &amp;#8220;share&amp;#8221; and has a unique  &amp;#8220;x&amp;#8221; and &amp;#8220;y&amp;#8221; value.&lt;/p&gt; &lt;p&gt;The mathematically fascinating part about this idea is that a line is just  a simple &lt;a href="http://en.wikipedia.org/wiki/Polynomial"&gt;polynomial&lt;/a&gt; (curve)  and this technique works for polynomials of arbitrarily large &lt;a href="http://en.wikipedia.org/wiki/Polynomial#Degree"&gt;degrees&lt;/a&gt;. For example,  a second degree polynomial is a &lt;a href="http://en.wikipedia.org/wiki/Parabola"&gt; parabola&lt;/a&gt; that requires 3 unique points to completely define it (one more than  a line). Its equation is of the form y=ax^2 + bx + c. In our case &amp;#8220;c&amp;#8221; is the y-intercept  and &amp;#8220;a&amp;#8221; and &amp;#8220;b&amp;#8221; are random as in y = 2x^2 + 3x + 7:&lt;/p&gt; &lt;p&gt;Given this equation, we can generate as many &amp;#8220;shares&amp;#8221; as we&amp;#8217;d like: (1,12),  (2,21), (3,34), (4,51), etc.&lt;/p&gt; &lt;p&gt;Keep in mind that a parabola requires three points to uniquely define  it. If you just had two points, as in (1,12) and (2,21), you could create an infinite  number of parabolas going through these points and thus have infinite choices  for what the y-intercept (i.e. your secret) could be:&lt;/p&gt; &lt;p&gt; &lt;a href="http://2.bp.blogspot.com/-1isSzAsFj_o/TsPeiJxUYLI/AAAAAAAAZGw/L3Lz5K9abCA/s1600/Parabola6Curves.png"&gt; &lt;img alt="6 parabolas going through the same two points" height="387" src="http://2.bp.blogspot.com/-1isSzAsFj_o/TsPeiJxUYLI/AAAAAAAAZGw/L3Lz5K9abCA/s576/Parabola6Curves.png" width="503"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;However, a third point will define the parabola and its y-intercept  exactly:&lt;/p&gt; &lt;p&gt; &lt;a href="http://4.bp.blogspot.com/-nZwrzYySLmY/TsRmAjsT2xI/AAAAAAAAZHI/Hj4Q9fwFZDI/s1600/ParabolaSingleCurve.png"&gt; &lt;img alt="Unique parabola" height="351" src="http://4.bp.blogspot.com/-nZwrzYySLmY/TsRmAjsT2xI/AAAAAAAAZHI/Hj4Q9fwFZDI/s576/ParabolaSingleCurve.png" width="498"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;You&amp;#8217;ve just learned that splitting a secret that requires three people is just a matter of creating a parabola. Requiring more people is just a matter of creating a higher-degree polynomial such as a &lt;a href="http://en.wikipedia.org/wiki/Cubic_function"&gt;cubic&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Quartic_function"&gt;quartic&lt;/a&gt; polynomial. If you understand this basic idea,  the rest is just details:&lt;/p&gt; &lt;ol&gt;  &lt;li&gt;Instead of using numbers, we translate the data to a big polynomial  &lt;a href="http://en.wikipedia.org/wiki/GF(2)"&gt;with binary coefficients&lt;/a&gt;.&lt;/li&gt;  &lt;li&gt;Instead of using middle school algebra, we use a &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/Finite_field"&gt;finite   field&lt;/a&gt;.&amp;#8221; This helps keep results about the same size as the input and adds   some security.&lt;/li&gt; &lt;/ol&gt; &lt;p&gt;Don&amp;#8217;t be intimidated by these changes. The core ideas are the same as the  basic case. The only noticeable difference is that you have to think of operations  like multiplication and division in a more abstract way. For details, check out my source  code&amp;#8217;s use of &lt;a href="https://github.com/moserware/SecretSplitter/blob/1b54b72a87d4bdcc5c84b12b36f17fca382d551d/SecretSplitter/Algebra/FiniteFieldPolynomial.cs#L40"&gt;Horner&amp;#8217;s scheme&lt;/a&gt;  for evaluating polynomials, &lt;a href="https://github.com/moserware/SecretSplitter/blob/1b54b72a87d4bdcc5c84b12b36f17fca382d551d/SecretSplitter/Algebra/FiniteFieldPolynomial.cs#L63"&gt;peasant multiplication&lt;/a&gt;, &lt;a href="https://github.com/moserware/SecretSplitter/blob/1b54b72a87d4bdcc5c84b12b36f17fca382d551d/SecretSplitter/Algebra/IrreduciblePolynomial.cs#L12"&gt;irreducible polynomials&lt;/a&gt; &lt;a href="http://math.stackexchange.com/questions/14787/finding-irreducible-polynomials-over-gf2-with-the-fewest-terms"&gt; with the fewest terms&lt;/a&gt;,  &lt;a href="https://github.com/moserware/SecretSplitter/blob/1b54b72a87d4bdcc5c84b12b36f17fca382d551d/SecretSplitter/Algebra/LagrangeInterpolator.cs#L22"&gt;Lagrange polynomial interpolation&lt;/a&gt; to find the y-intercept, and using  &lt;a href="https://github.com/moserware/SecretSplitter/blob/1b54b72a87d4bdcc5c84b12b36f17fca382d551d/SecretSplitter/Algebra/FiniteFieldPolynomial.cs#L106"&gt;Euclidean inverses&lt;/a&gt; for division.&lt;/p&gt; &lt;p&gt;Again, it probably sounds more complicated than it really is. At its core,  it&amp;#8217;s simple. This technique is formally known as a &lt;a href="http://securespeech.cs.cmu.edu/reports/shamirturing.pdf" title="See &amp;#8220;How to Share a Secret&amp;#8221; by Adi Shamir"&gt;Shamir Secret  Sharing Scheme&lt;/a&gt; and it was discovered in the 1970&amp;#8217;s.&lt;/p&gt; &lt;p&gt;I didn&amp;#8217;t want to invent anything new unless I felt I absolutely had to.  There was already a good tool called &amp;#8220;&lt;a href="http://point-at-infinity.org/ssss/"&gt;ssss-split&lt;/a&gt;&amp;#8221; that generates  shares similar to how I wanted. This program adds a special twist by scrambling  the resulting y-intercept point and therefore adds an extra layer of protection. Since this program  was already the de-facto standard, I wanted to be fully compatible with it. To make  sure I was compatible, I had to copy its method of &amp;#8220;diffusing&amp;#8221; (i.e. scrambling)  the bits using the public domain &lt;a href="http://en.wikipedia.org/wiki/XTEA"&gt;XTEA  algorithm&lt;/a&gt;. However, to ensure complete fidelity, I had to look at the source  code. The only problem was that it was originally released under the &lt;a href="http://www.gnu.org/copyleft/gpl.htmlhttp://www.gnu.org/copyleft/gpl.html"&gt;GNU Public  License &lt;/a&gt; (GPL) and it used  &lt;a href="http://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library"&gt;a GPL library for working with large numbers&lt;/a&gt;. My goal was to make my implementation as open as I could, so I asked  the author if I could look at his code to derive my own implementation that I&amp;#8217;d release  under the more permissive  &lt;a href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT license&lt;/a&gt; and he graciously allowed me to do this.&lt;/p&gt; &lt;p&gt;To prove the compatibility, you can use the &lt;a href="http://point-at-infinity.org/ssss/demo.html"&gt;ssss-split demo page&lt;/a&gt; and  paste the results  &lt;a href="https://github.com/downloads/moserware/SecretSplitter/SecretSplitter.exe"&gt;into SecretSplitter&lt;/a&gt; and it&amp;#8217;ll work just fine. In addition, I  &lt;a href="https://github.com/moserware/SecretSplitter/downloads"&gt;created  command line programs from scratch&lt;/a&gt; that are fully compatible with ssss-split and ssss-combine.&lt;/p&gt; &lt;p&gt;After some basic usability testing, I decided to make one small adjustment. The &amp;#8220;ssss-split&amp;#8221;  command allows you to attach a prefix that it ignores. I wanted to add a special prefix that would  tell what type of share it was (i.e. a message or a file) as well as a &lt;a href="http://en.wikipedia.org/wiki/SHA-1"&gt;simple checksum&lt;/a&gt; because  with all those digits it&amp;#8217;s easy to mistype one. &lt;/p&gt; &lt;p&gt;Now, you can understand all the pieces of the long share:&lt;/p&gt; &lt;p&gt; &lt;a href="http://3.bp.blogspot.com/-z_DjRnzLXEo/TsRtr-yV1oI/AAAAAAAAZHU/pr4h62fFI_k/s1600/ShareComponents.png"&gt; &lt;img alt="Share components" height="77" src="http://3.bp.blogspot.com/-z_DjRnzLXEo/TsRtr-yV1oI/AAAAAAAAZHU/pr4h62fFI_k/s576/ShareComponents.png" width="576"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;In theory, you could &amp;#8220;encrypt&amp;#8221; a large file directly using this technique.  In practice, it doesn&amp;#8217;t work well because each share would be huge and not something  you&amp;#8217;d be able to write down by hand or say over the phone, even using the &lt;a href="http://en.wikipedia.org/wiki/NATO_phonetic_alphabet"&gt;phonetic alphabet&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;For lots of data, we use a hybrid approach: encrypt the file using standard  file encryption with a random key and then split the small &amp;#8220;key&amp;#8221; into pieces.&lt;/p&gt; &lt;p&gt;For file encryption, I again didn&amp;#8217;t want to invent anything new. I decided  to use the &lt;a href="http://tools.ietf.org/html/rfc4880"&gt;OpenPGP Message Format&lt;/a&gt;,  the same format used by &lt;a href="http://en.wikipedia.org/wiki/Pretty_Good_Privacy"&gt;PGP&lt;/a&gt; and &lt;a href="http://www.gnupg.org/"&gt;GNU Privacy Guard&lt;/a&gt; (GPG). I didn&amp;#8217;t want to have  to worry about licensing restrictions or including a &lt;a href="http://www.bouncycastle.org/" title="Like Bouncy Castle"&gt;third-party library&lt;/a&gt;, so I wrote my own  implementation from scratch that did exactly what I wanted. I &lt;a href="http://commondatastorage.googleapis.com/rhuang/rfc4880.mobi" title="I'm a bit embarrassed to admit I read it on my Kindle by the beach. On the subject, I must admit that RFC2MOBI is a great free app for converting text-based RFCs to Kindle MOBI files. It does a remarkably decent job."&gt;read RFC4880&lt;/a&gt;  and started sketching out what I needed to do. A few bug fixes later and I  had a working implementation that was able to interoperate with GPG. To simplify  my implementation, I only support a limited subset of features:&lt;/p&gt; &lt;ol&gt;  &lt;li&gt;I always use   &lt;a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard"&gt;AES&lt;/a&gt; with a 256-bit key for encryption, even if users select a smaller   effective key size. This means that users can pick any size key they want and thus balance security and share length. I picked AES because it&amp;#8217;s strong and  &lt;a href="http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html"&gt;  understandable with stick figures&lt;/a&gt;.&lt;/li&gt;  &lt;li&gt;The actual file encryption key is always a  &lt;a href="http://tools.ietf.org/html/rfc4880#section-3.7.1.3"&gt;hashed, salted,   and stretched version&lt;/a&gt; of the reconstructed shares text.&lt;/li&gt;  &lt;li&gt;The encrypted file has an  &lt;a href="http://tools.ietf.org/html/rfc4880#section-5.13"&gt;integrity protection   packet&lt;/a&gt; to detect if the file has been modified and ensure it was decrypted correctly.&lt;/li&gt; &lt;/ol&gt; &lt;p&gt;Since I used common formats, you can verify the correctness of the generated  files using a Linux shell. You can also create files using the shell and  have them interoperate with SecretSplitter. I included  &lt;a href="https://github.com/moserware/SecretSplitter/blob/master/Compatibility.txt"&gt;a sample of how to do this  with the source code&lt;/a&gt;.&lt;/p&gt; &lt;h4&gt;Help Wanted / Future Possibilities&lt;/h4&gt; &lt;p&gt;SecretSplitter still looks and feels like a prototype. There are lots of  possible improvements that could be made:&lt;/p&gt; &lt;ol&gt;  &lt;li&gt;Secret splitting is a relatively complicated idea. In  &lt;a href="http://www.amazon.com/gp/product/0470474246/ref=as_li_ss_tl?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=217145&amp;amp;creative=399369&amp;amp;creativeASIN=0470474246"&gt;  Cryptography Engineering&lt;/a&gt;, the authors write &amp;#8220;secret sharing schemes are   rarely used because they are too complex. They are complex to implement, but   more importantly, they are complex to administrate and operate.&amp;#8221; &lt;br /&gt;  &lt;br /&gt;  Although I tried to simplify the user experience for broad use, it could still   use some user experience enhancements to simplify it further. &lt;/li&gt;  &lt;li&gt;I wrote it in C# for the .net platform because that is what I&amp;#8217;m most   familiar with (and it has some built-in powerful primitives like BigIntegers,   AES, and hash functions). I suspect that an HTML5 version using JavaScript, a   nice interface, and coming from a trusted domain would get much broader usage.   In addition, since this is a problem that affects everyone, having great internationalization   support would be a nice touch. It also would be nice to have a polished look with a good logo and other graphics.&lt;/li&gt;  &lt;li&gt;You could use more  &lt;a href="http://en.wikipedia.org/wiki/Verifiable_secret_sharing"&gt;elaborate secret   sharing schemes&lt;/a&gt; than what I implemented in SecretSplitter. I considered   these, but ultimately wanted to use a technique that was already compatible   with widely deployed tools. I also considered enhancing shares with  &lt;a href="http://www.google.com/url?q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FTime-based_One-time_Password_Algorithm&amp;amp;sa=D&amp;amp;sntz=1&amp;amp;usg=AFQjCNEG4XPPcQbdiivr7kuRUBxExU6Aqw"&gt;  two-factor&lt;/a&gt; support or using  &lt;a href="http://en.wikipedia.org/wiki/Public_key_infrastructure"&gt;existing public   key infrastructure&lt;/a&gt;, but decided that added too much complexity. Perhaps   it&amp;#8217;s possible to incorporate these in a good design.&lt;/li&gt;  &lt;li&gt;It&amp;#8217;d be neat if this scheme or something similar to it was integrated   into LastPass and KeyPass as a core feature. &lt;/li&gt;  &lt;li&gt;Obviously the shares themselves are long. I tried making them shorter   but the downsides outweighed the upsides. Perhaps it could be better. Also, a compelling   graphically designed share card might make it more fun for broader use. The long length   is somewhat of a safety mechanism that prevents people from memorizing with   a quick glance. Also, it discourages overhasty use much like &lt;a href="http://vimeo.com/5735591" title="Although, as this video demonstrates a hammer allows for quick access. However, at least you&amp;#8217;d be making a conscious decision at that point."&gt;freezing a credit card&lt;/a&gt;.&lt;/li&gt;  &lt;li&gt;I kept the codes in a format that would be easy to write as well as read over the phone. I used a simple character set that avoids ambiguities like &amp;#8220;O&amp;#8221; vs &amp;#8220;0&amp;#8221;.   One additional strategy could be to embed the share as a  &lt;a href="http://qrcodenet.codeplex.com/"&gt;QR code&lt;/a&gt; or something similar. I   didn&amp;#8217;t pursue this approach in favor of simplicity, but this could be an option.&lt;/li&gt;  &lt;li&gt;Really paranoid people might want to back up their encrypted file to   paper.  &lt;a href="http://www.codinghorror.com/blog/2009/07/the-paper-data-storage-option.html"&gt;  This is possible&lt;/a&gt;, but I&amp;#8217;m not sure if it should belong inside the program   itself.&lt;/li&gt;  &lt;li&gt;It&amp;#8217;d be good to have suggestions on how to exchange shares or perhaps   borrow ideas from PGP  &lt;a href="http://en.wikipedia.org/wiki/Key_signing_party"&gt;key signing parties&lt;/a&gt;.   I suspect that if secret splitting were to become popular, then &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/Web_of_trust"&gt;web   of trust&lt;/a&gt;&amp;#8221; scenarios would naturally occur (i.e. &amp;#8220;I&amp;#8217;ll hold your secret share   if you hold mine&amp;#8221;).&lt;/li&gt;  &lt;li&gt;It&amp;#8217;d be fun to compile a list of non-obvious uses for SecretSplitter   to share with others. For example, it could make for interesting scavenger hunt   clues. &lt;/li&gt; &lt;/ol&gt; &lt;p&gt;If you&amp;#8217;d like to donate your time to any of the above ideas, I&amp;#8217;d encourage  you to just give it a go. You don&amp;#8217;t have to ask for my permission but it would be  nice if you posted your results somewhere or left a comment to this post. You can  use my code for whatever purpose you&amp;#8217;d like. My only hope is that you might get  some benefit out of it.&lt;/p&gt; &lt;h4&gt;Conclusion&lt;/h4&gt; &lt;p&gt;SecretSplitter is just a tool that gives another option for backing up  very sensitive information by splitting it up into pieces. It&amp;#8217;s not a full solution,  only a tool. By relying on people I trust instead of  &lt;a href="http://mashable.com/2010/10/11/social-media-after-death/" title="Besides, I don't want to have to worry about a third-party company &amp;#8220;dying&amp;#8221; before I do."&gt;a third-party company&lt;/a&gt;, it helped me remove one excuse I had for not  preparing somewhat unpleasant but important documents that we should all probably  have. I still don&amp;#8217;t have this all figured out, but writing SecretSplitter help me get started. &lt;/p&gt; &lt;p&gt;If you&amp;#8217;re young, don&amp;#8217;t have any &lt;a href="http://en.wikipedia.org/wiki/Minor_(law)"&gt;minor children&lt;/a&gt;, and don&amp;#8217;t  care at all what happens to your stuff, then you could run some mental actuarial  model and convince yourself that the probability of you or your survivors needing  these documents or password recovery procedure anytime soon is low, but you&amp;#8217;re not  given any guarantees. &lt;/p&gt; &lt;p&gt;At the very least, it&amp;#8217;s a good idea to make sure all of your financial  assets and life insurance policies have a named beneficiary and at perhaps at least  one alternate. You can also declare things like organ donor preferences on your  driver&amp;#8217;s license instead of making declarations in other documents. It&amp;#8217;s also a good  idea to have an &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/In_case_of_emergency" title="In Case of Emergency"&gt;ICE&lt;/a&gt;&amp;#8221;  entry in your cell phone. However, going the extra step and making very basic final  documents doesn&amp;#8217;t require that much more work. Besides, once you have baseline documents, keeping them fresh is just a matter of occasional updates due to life events.&lt;/p&gt; &lt;p&gt;The increasing digitization of our lives means that more personal things will only be stored digitally. From our journals to email to videos  to health records, all of this will eventually only exist digitally and likely hidden  behind passwords. This future needs some safety net for backing up sensitive things in  a safe and accessible way.&lt;/p&gt; &lt;p&gt;Everything doesn&amp;#8217;t need to be backed up. There are also lots of files,  usernames and passwords that don&amp;#8217;t really matter. Don&amp;#8217;t include those. SecretSplitter  was built with the assumption that everything that really mattered could be stored  in a file small enough to email to others. This helps focus and pare down to what  really matters.&lt;/p&gt; &lt;p&gt;It&amp;#8217;s also good to have a healthy dose of common sense. Instead of holding out a secret until after your death, maybe you should get  that resolved today. You&amp;#8217;ll probably live better. My general view is that these  final &amp;#8220;secrets&amp;#8221; should be mostly boring by just containing account details and  credentials.&lt;/p&gt; &lt;p&gt;Finally, on a more personal level, I think it&amp;#8217;s healthy to be reminded  about our own mortality at least once every year or so. It&amp;#8217;s a helpful reminder  of how much a gift every day is and helps focus what we do and not worry about things  that don&amp;#8217;t matter. &lt;/p&gt; &lt;p&gt;If a little bit of fancy math can help you sleep better at night, well  then, I&amp;#8217;d consider it a success.&lt;/p&gt; &lt;p&gt;&lt;em&gt;Special thanks to B. Poettering for creating the original &lt;/em&gt; &lt;a href="http://point-at-infinity.org/ssss/"&gt;&lt;em&gt;ssss&lt;/em&gt;&lt;/a&gt;&lt;em&gt; program and allowing me to  clone its format.&lt;/em&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=DgDYwGU8zrI:nKYQvQkar2o:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=DgDYwGU8zrI:nKYQvQkar2o:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=DgDYwGU8zrI:nKYQvQkar2o:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=DgDYwGU8zrI:nKYQvQkar2o:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=DgDYwGU8zrI:nKYQvQkar2o:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=DgDYwGU8zrI:nKYQvQkar2o:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=DgDYwGU8zrI:nKYQvQkar2o:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=DgDYwGU8zrI:nKYQvQkar2o:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=DgDYwGU8zrI:nKYQvQkar2o:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=DgDYwGU8zrI:nKYQvQkar2o:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Moserware/~4/DgDYwGU8zrI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.moserware.com/feeds/86529977875835325/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6800934446457898793&amp;postID=86529977875835325" title="28 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6800934446457898793/posts/default/86529977875835325?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6800934446457898793/posts/default/86529977875835325?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Moserware/~3/DgDYwGU8zrI/life-death-and-splitting-secrets.html" title="Life, Death, and Splitting Secrets" /><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-IBcyywvqnWA/TsEou9240BI/AAAAAAAAZBE/a5aM-FeH49I/s72-c/Grandma+and+Jeff.jpg" height="72" width="72" /><thr:total>28</thr:total><feedburner:origLink>http://www.moserware.com/2011/11/life-death-and-splitting-secrets.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0UGQ3g8cCp7ImA9Wx5bEU0.&quot;"><id>tag:blogger.com,1999:blog-6800934446457898793.post-5145078661665722279</id><published>2010-10-26T08:34:00.003-04:00</published><updated>2010-10-26T11:00:22.678-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-10-26T11:00:22.678-04:00</app:edited><title>Notes from porting C# code to PHP</title><content type="html">&lt;p&gt;(&lt;strong&gt;Summary&lt;/strong&gt;: I ported my TrueSkill implementation from &lt;a href="http://github.com/moserware/Skills"&gt;C#&lt;/a&gt; to PHP and &lt;a title="Patches welcome :)" href="http://github.com/moserware/PHPSkills"&gt;posted it on GitHub&lt;/a&gt;. It was my first real encounter with PHP and I learned a few things.)&lt;/p&gt; &lt;p&gt;I braced for the worst. &lt;a href="http://php.net/download-logos.php"&gt;&lt;img style="display: inline; margin-left: 15px; margin-right: 0px; border:0px" align="right" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/TMMv2y_zyKI/AAAAAAAAXkA/JKpA8oOpCiE/s200/1000px-PHP-logo.svg.png"/&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;After years of hearing &lt;a title="Jeff Atwood's: 'PHP Sucks, But It Doesn't Matter.' Jeff has gone on record many times bemoaning the PHP language." href="http://www.codinghorror.com/blog/2008/05/php-sucks-but-it-doesnt-matter.html"&gt;negative&lt;/a&gt; &lt;a title="Stack Overflow Question: 'Defend PHP; convince me it isn't horrible'" href="http://stackoverflow.com/questions/309300/defend-php-convince-me-it-isnt-horrible"&gt;things&lt;/a&gt; about PHP, I had been led to believe that touching it would rot my brain. Ok, maybe that&amp;#8217;s a &lt;em&gt;bit&lt;/em&gt; much, but its &lt;a title="In the 'Homerun' episode of 'This Developer's Life', David Heinemeier Hansson mentioned that one of the reasons why he switched to Ruby and created Rails was that he basically thought PHP (and Java) were beyond hope." href="http://thisdeveloperslife.com/post/1270441885/1-0-5-homerun"&gt;reputation&lt;/a&gt; had me believe it was full of &lt;a href="http://www.softwarebyrob.com/2006/11/17/single-important-rule-retaining-software-developers/" title="To quote Paul Graham: 'Not every kind of hard is good. There is good pain and bad pain. You want the kind of pain you get from going running, not the kind you get from stepping on a nail. A difficult problem could be good for a designer, but a fickle client or unreliable materials would not be.' The basic idea is that bad problems just wear you out without giving you any benefit or insight."&gt;bad problems&lt;/a&gt;. Even the &lt;a href="http://www.mailchimp.com/blog/ewww-you-use-php/#more-10515" title="The guys at MailChimp recently wrote about how they're having some difficulties hiring programmers because their site is in PHP. This is probably indicative of a larger trend, especially among alpha geeks."&gt;cool kids&lt;/a&gt; &lt;a href="http://news.ycombinator.com/item?id=1818954" title="I think some of the general attitude can be summed up by this quote by pilif on Hacker News: 'While I really hate some aspects of PHP by now and I would love to have a Ruby or Python codebase to work with instead, rewriting all of this is out of the question.' which I can respect."&gt;had&lt;/a&gt; &lt;a href="http://www.reddit.com/r/programming/comments/dutgs/ewww_you_use_php/" title="Selected comment from skillet-thief on Reddit: 'PHP hinders you on a lot of levels: the community has such a wide range of skill levels, including a huge class of users who mostly know how to install and uninstall and reinstall until something works; code reuse is much harder than in other languages because there is a lot of bad code out there, the good code is packaged in a way that makes it hard to share (as a stand-alone tool a lot of times). Abstractions are generally harder to make too. There were no real anonymous functions until very recently.'"&gt;issues&lt;/a&gt; with PHP. But I thought that it couldn&amp;#8217;t be too bad because there was &lt;a href="http://www.facebook.com/" title="Formerly known as thefacebook"&gt;that one website&lt;/a&gt; that gets a few hits using a &lt;a href="http://github.com/facebook/hiphop-php/wiki" title="I think that Zuckerberg's usage of PHP is similar to most people's in that it was easy to get started. Throw in lots of programmers and bam! You have a large codebase and a ship that's not feasible to rewrite. This probably justified the whole HipHop compiler rather than a rewrite. This is similar to FogBugz programmers using Wasabi to avoid rewriting VBScript code."&gt;dialect of it&lt;/a&gt;. When &lt;a href="http://kaggle.com/"&gt;Kaggle&lt;/a&gt; offered to sponsor a port of my &lt;a href="http://www.moserware.com/2010/03/computing-your-skill.html"&gt;TrueSkill&lt;/a&gt; &lt;a href="http://github.com/moserware/Skills"&gt;C# code&lt;/a&gt; to PHP, I thought I&amp;#8217;d finally have my first real encounter with PHP.&lt;/p&gt; &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;lt;?php&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: green;"&gt;echo&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: darkred;"&gt;"&lt;/span&gt;&lt;span style="color: red;"&gt;Disclaimer:&lt;/span&gt;&lt;span style="color: darkred;"&gt;"&lt;/span&gt;&lt;span style="color: gray;"&gt;; &lt;/span&gt;&lt;span style="color: blue;"&gt;?&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; &lt;p&gt;To make the port quick, I kept most of the design and class structure from my C# implementation. This led to a less-than-optimal result since PHP really &lt;a href="http://michaelkimsal.com/blog/php-is-not-object-oriented/" title="Yes, it has the 'class' keyword, but that was bolted on relatively late and wasn't the primary focus in PHP's design."&gt;isn&amp;#8217;t object-oriented&lt;/a&gt;. I didn&amp;#8217;t do a deep dive on redesigning it in the native PHP way. I stuck with the philosophy that you can &lt;a href="http://queue.acm.org/detail.cfm?id=1039535" title="The classic phrase is: 'You can write Fortran in any language.' By not catering to PHP's strengths, I might have brought too much C#-ness to PHP without better factoring things."&gt;write quasi-C# in any language&lt;/a&gt;. Also, I didn&amp;#8217;t use any of the web and database features that motivate most people to choose PHP in the first place. In other words, I didn&amp;#8217;t cater to PHP&amp;#8217;s &lt;a href="http://stackoverflow.com/questions/694246/how-is-php-done-the-right-way"&gt;specialty&lt;/a&gt;, so my reflections are probably an unfair and biased comparison as I was not using PHP the way it was intended. I &lt;a href="http://www.lessonsoffailure.com/developers/language-flamewars-blub-paradox/"&gt;expect&lt;/a&gt; that I missed tons of great things about PHP.&lt;/p&gt; &lt;p&gt;Personal disclaimers aside, even PHP book authors don&amp;#8217;t claim that it&amp;#8217;s the nicest language. Instead, they highlight the language&amp;#8217;s popularity. I sort of got the feeling that people mainly choose PHP in lieu of languages like C# because of its &lt;a href="http://www.tiobe.com/index.php/paperinfo/tpci/PHP.html"&gt;current popularity&lt;/a&gt; and its perception of having a lower upfront cost, especially among cash-strapped startups. Matt Doyle, author of &lt;a href="http://www.amazon.com/gp/product/0470413964?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470413964"&gt;Beginning PHP 5.3&lt;/a&gt;, wrote the following while comparing PHP to other languages:&lt;/p&gt; &lt;blockquote&gt;&lt;a href="http://www.amazon.com/gp/product/0470413964?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470413964" title="Beginning PHP 5.3"&gt;&lt;img align="right" border="0" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/TMTf53q9ydI/AAAAAAAAXkI/ZvqNfJgzgaY/s1600/BeginningPHPBookCover.jpg"/&gt;&lt;/a&gt;&lt;img src="http://www.assoc-amazon.com/e/ir?t=moserware-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=0470413964" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;"/&gt;&amp;#8220;Many would argue that C# is a nicer, better-organized language to program in than PHP, although C# is arguably harder to learn. Another advantage of ASP.NET is that C# is a compiled language, which generally means it runs faster than PHP&amp;#8217;s interpreted scripts (although PHP compilers are available).&amp;#8221; - p.5&lt;/blockquote&gt; &lt;p&gt;He continued:&lt;/p&gt; &lt;blockquote&gt;&amp;#8220;ASP and ASP.NET have a couple of other disadvantages compared to PHP. First of all, they have a commercial license, which can mean spending additional money on server software, and hosting is often more expensive as a result. Secondly, ASP and ASP.NET are fairly heavily tied to the Windows platform, whereas the other technologies in this list are much more cross-platform.&amp;#8221; - p.5&lt;/blockquote&gt; &lt;p&gt;Next, he hinted that Ruby might eventually replace PHP&amp;#8217;s reign:&lt;/p&gt; &lt;blockquote&gt;&amp;#8220;Like Python, Ruby is another general-purpose language that has gained a lot of traction with Web developers in recent years. This is largely due to the excellent Ruby on Rails application framework, which uses the Model-View-Controller (MVC) pattern, along with Ruby&amp;#8217;s extensive object-oriented programming features, to make it easy to build a complete Web application very quickly. As with Python, Ruby is fast becoming a popular choice among Web developers, but for now, PHP is much more popular.&amp;#8221; - p.6&lt;/blockquote&gt; &lt;p&gt;and then elaborating on why PHP might be popular today:&lt;/p&gt; &lt;blockquote&gt;&amp;#8220;[T]his middle ground partly explains the popularity of PHP. The fact that you don&amp;#8217;t need to learn a framework or import tons of libraries to do basic Web tasks makes the language easy to learn and use. On the other hand, if you need the extra functionality of libraries and frameworks, they&amp;#8217;re there for you.&amp;#8221; - p.7&lt;/blockquote&gt; &lt;p&gt;Fair enough. However, to really understand the language, I needed to dive in personally and experience it firsthand. I took notes during the dive about some of the things that stuck out.&lt;/p&gt; &lt;h4&gt;The &lt;a href="http://www.amazon.com/gp/product/0596517742?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0596517742" title="This section title comes from the subtitle of my favorite JavaScript book"&gt;Good&lt;/a&gt; Parts&lt;/h4&gt; &lt;ul&gt; &lt;li&gt;It&amp;#8217;s relatively easy to learn and get started with PHP. As a C# developer, I was able to pick up PHP in a few hours after a brief overview of the syntax from &lt;a href="http://www.amazon.com/gp/product/0470413964?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470413964"&gt;a book&lt;/a&gt;. Also, PHP has some decent &lt;a href="http://php.net/manual/en/index.php"&gt;online help&lt;/a&gt;.&lt;/li&gt; &lt;li&gt;PHP is available on almost all web hosts these days at no extra charge (in contrast with ASP.NET hosting). I can&amp;#8217;t emphasize this enough because it&amp;#8217;s a reason why I would still consider writing a small website in it.&lt;/li&gt; &lt;li&gt;I was pleasantly surprised to have unit test support with &lt;a href="http://www.phpunit.de/"&gt;PHPUnit&lt;/a&gt;. This made me feel at home and made it easier to develop and debug code.&lt;/li&gt; &lt;li&gt;It&amp;#8217;s very easy and reasonable to create a website in PHP using techniques like Model-View-Controller (MVC) designs that separate the view from the actual database model. The language doesn&amp;#8217;t seem to pose any hindrance to this.&lt;/li&gt; &lt;li&gt;PHP has a &amp;#8220;&lt;a href="http://php.net/manual/en/language.oop5.static.php"&gt;static&lt;/a&gt;&amp;#8221; keyword that is sort of like a static version of a &amp;#8220;this&amp;#8221; reference. This was useful in creating a quasi-static &amp;#8220;subclass&amp;#8221; of my &amp;#8220;&lt;a href="http://github.com/moserware/PHPSkills/blob/master/Skills/Numerics/Range.php"&gt;Range&lt;/a&gt;&amp;#8221; class for validating &lt;a href="http://github.com/moserware/PHPSkills/blob/master/Skills/PlayersRange.php"&gt;player&lt;/a&gt; and &lt;a href="http://github.com/moserware/PHPSkills/blob/master/Skills/TeamsRange.php"&gt;team&lt;/a&gt; sizes. This feature is formally known as &lt;a href="http://en.wikipedia.org/wiki/Late_static_binding#Late_static"&gt;late static binding&lt;/a&gt;.&lt;/li&gt; &lt;/ul&gt; &lt;h4&gt;The &amp;#8220;&lt;a href="http://en.wiktionary.org/wiki/when_in_Rome,_do_as_the_Romans_do" title="Si fueris Romae, Romano vivito more; Si fueris alibi, vivito sicut ibi."&gt;When in Rome&lt;/a&gt;...&amp;#8221; Parts&lt;/h4&gt; &lt;ul&gt; &lt;li&gt;Class names use PascalCase while functions tend to use lowerCamelCase like Java whereas C# tends to use PascalCase for both. In addition, .NET in general seems to have &lt;a href="http://www.moserware.com/2008/12/private-life-of-public-api.html"&gt;more universally accepted naming conventions&lt;/a&gt; than PHP has.&lt;/li&gt; &lt;li&gt;PHP variables have a &amp;#8216;$&amp;#8217; prefix which makes variables stick out: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: green;"&gt;function&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;increment&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: darkblue;"&gt;$someNumber&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: darkblue;"&gt;$result&lt;/span&gt;&lt;span style="color: gray;"&gt; = &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$someNumber&lt;/span&gt;&lt;span style="color: gray;"&gt; + &lt;/span&gt;&lt;span style="color: maroon;"&gt;1&lt;/span&gt;&lt;span style="color: gray;"&gt;;&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: green;"&gt;return&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$result&lt;/span&gt;&lt;span style="color: gray;"&gt;;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; This convention was probably copied from &lt;a href="http://en.wikipedia.org/wiki/Perl#Data_types"&gt;Perl&amp;#8217;s scalar variable sigil&lt;/a&gt;. This makes sense because PHP &lt;a href="http://en.wikipedia.org/wiki/PHP#History"&gt;was originally&lt;/a&gt; a set of Perl scripts intended to be a simpler Perl.&lt;/li&gt; &lt;li&gt;You access class members and functions using an arrow operator (&amp;#8220;-&amp;gt;&amp;#8221;) like C++ instead of the C#/Java dot notation (&amp;#8220;.&amp;#8221;). That is, in PHP you say &amp;#8220;&lt;span style="color: darkblue;"&gt;$someClass&lt;/span&gt;&lt;span style="color: gray;"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: blue;"&gt;someMethod&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&amp;#8221; instead of &amp;#8220;&lt;span style="color: darkblue;"&gt;someClass&lt;/span&gt;&lt;span style="color: gray;"&gt;.&lt;/span&gt;&lt;span style="color: blue;"&gt;someMethod&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&amp;#8221;&lt;/li&gt; &lt;li&gt;The arguments in a &amp;#8220;&lt;a href="http://php.net/manual/en/control-structures.foreach.php"&gt;foreach&lt;/a&gt;&amp;#8221; statement are reversed from what C# uses. In PHP, you write: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: green;"&gt;foreach&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: darkblue;"&gt;$allItems&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: green;"&gt;as&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$currentItem&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: gray;"&gt; ... &lt;/span&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; instead of the C# way: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: gray;"&gt;&lt;span style="color: green;"&gt;foreach&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: darkblue;"&gt;currentItem&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: green;"&gt;in&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: darkblue;"&gt;allItems&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: gray;"&gt; ... &lt;/span&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; One advantage to the PHP way is its special syntax that makes iterating through key/value pairs in an map easier: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: gray;"&gt;&lt;span style="color: green;"&gt;foreach&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: darkblue;"&gt;$someArray&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: green;"&gt;as&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$key&lt;/span&gt;&lt;span style="color: gray;"&gt; =&amp;gt; &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$value&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: gray;"&gt; ... &lt;/span&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; vs. the C# way of something like this: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: green;"&gt;foreach&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: green;"&gt;var&lt;/span&gt; &lt;span style="color: darkblue;"&gt;pair&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: green;"&gt;in&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: darkblue;"&gt;someDictionary&lt;/span&gt;)&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;br /&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: orange;"&gt;// use pair.Key and pair.Value&lt;/span&gt;&lt;br /&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; &lt;/li&gt; &lt;li&gt;The &amp;#8220;=&amp;gt;&amp;#8221; operator in PHP denotes a map entry as in &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: darkblue;"&gt;$numbers&lt;/span&gt;&lt;span style="color: gray;"&gt; = &lt;/span&gt;&lt;span style="color: green;"&gt;array&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: maroon;"&gt;1&lt;/span&gt;&lt;span style="color: gray;"&gt; =&amp;gt; &amp;#8216;&lt;/span&gt;&lt;span style="color: blue;"&gt;one&lt;/span&gt;&lt;span style="color: gray;"&gt;&amp;#8217;, &lt;/span&gt;&lt;span style="color: maroon;"&gt;2&lt;/span&gt;&lt;span style="color: gray;"&gt; =&amp;gt; &amp;#8216;&lt;/span&gt;&lt;span style="color: blue;"&gt;two&lt;/span&gt;&lt;span style="color: gray;"&gt;&amp;#8217;, ...&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; In C#, the arrow &amp;#8220;=&amp;gt;&amp;#8221; is instead used for a lightweight &lt;a href="http://msdn.microsoft.com/en-us/library/bb308966.aspx#csharp3.0overview_topic7"&gt;lambda expression syntax&lt;/a&gt;: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: darkblue;"&gt;x&lt;/span&gt;&lt;span style="color: gray;"&gt; =&amp;gt; &lt;/span&gt;&lt;span style="color: darkblue;"&gt;x&lt;/span&gt;&lt;span style="color: gray;"&gt; * &lt;/span&gt;&lt;span style="color: darkblue;"&gt;x&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; To define the rough equivalent of the PHP array, you&amp;#8217;d have to write this in C# &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: green;"&gt;var&lt;/span&gt; &lt;span style="color: darkblue;"&gt;numbers&lt;/span&gt;&lt;span style="color: gray;"&gt; = &lt;/span&gt;&lt;span style="color: green;"&gt;new&lt;/span&gt; &lt;span style="color: blue;"&gt;Dictionary&lt;/span&gt;&amp;lt;&lt;span style="color: green;"&gt;int&lt;/span&gt;, &lt;span style="color: green;"&gt;string&lt;/span&gt;&amp;gt;&lt;span style="color: olive;"&gt;{ {&lt;/span&gt;&lt;span style="color: maroon;"&gt;1&lt;/span&gt;, &lt;span style="color: darkred;"&gt;"&lt;/span&gt;&lt;span style="color: red;"&gt;one&lt;/span&gt;&lt;span style="color: darkred;"&gt;"&lt;/span&gt; &lt;span style="color: olive;"&gt;}&lt;/span&gt;, &lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: maroon;"&gt;2&lt;/span&gt;, &lt;span style="color: darkred;"&gt;"&lt;/span&gt;&lt;span style="color: red;"&gt;two&lt;/span&gt;&lt;span style="color: darkred;"&gt;"&lt;/span&gt;&lt;span style="color: olive;"&gt;} }&lt;/span&gt;;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; On the one hand, the PHP notations for maps is cleaner, but it comes at a cost of having no lightweight lambda syntax (more on that later).&lt;/li&gt; &lt;li&gt;PHP has some &amp;#8220;&lt;a href="http://php.net/manual/en/language.oop5.magic.php"&gt;magical methods&lt;/a&gt;&amp;#8221; such as &amp;#8220;&lt;a href="http://www.php.net/manual/en/language.oop5.decon.php#language.oop5.decon.constructor"&gt;__construct&lt;/a&gt;&amp;#8221; and &amp;#8220;&lt;a href="http://www.php.net/manual/en/language.oop5.magic.php#language.oop5.magic.tostring"&gt;__toString&lt;/a&gt;&amp;#8221; for the equivalent of C#&amp;#8217;s &lt;a href="http://msdn.microsoft.com/en-us/library/ms173115.aspx"&gt;constructor&lt;/a&gt; and &lt;a href="http://msdn.microsoft.com/en-us/library/system.object.tostring.aspx"&gt;ToString&lt;/a&gt; functionality. I like C#&amp;#8217;s approach here, but I&amp;#8217;m biased.&lt;/li&gt; &lt;/ul&gt; &lt;h4&gt;The &amp;#8220;Ok, &lt;em&gt;I guess&lt;/em&gt;&amp;#8221; Parts&lt;/h4&gt; &lt;ul&gt; &lt;li&gt;The free &lt;a href="http://netbeans.org/features/php/index.html"&gt;NetBeans IDE for PHP&lt;/a&gt; is pretty &lt;a href="http://stackoverflow.com/questions/6166/any-good-php-ide-preferably-free-or-cheap/6169#6169" title="I first learned about NetBeans through the StackOverflow question 'Any good PHP IDE, preferably free or cheap?'"&gt;decent&lt;/a&gt; for writing PHP code. Using it in conjunction with PHP&amp;#8217;s &lt;a href="http://www.xdebug.org/"&gt;XDebug&lt;/a&gt; debugger functionality is a must. After my initial attempts at writing code with a &lt;a href="http://www.flos-freeware.ch/notepad2.html"&gt;basic notepad&lt;/a&gt;, I found NetBeans to be a very capable editor. My only real complaint with it is that I had some occasional cases where the editor would lock up and the debugger wouldn&amp;#8217;t support things like watching variables. That said, it&amp;#8217;s still good for being a free editor.&lt;/li&gt; &lt;li&gt;By default, PHP passes function arguments by value instead of by reference like C# does it. This probably caused the &lt;a href="http://github.com/moserware/PHPSkills/commit/4c7cfef8d6c602e733f47965a59676080a81f860" title="As you can tell by my many git commits, it took awhile to figure this out... and I still probably missed something."&gt;most&lt;/a&gt; &lt;a href="http://github.com/moserware/PHPSkills/commit/803a0816a84879ebfa651ec975664c6ba2f7b93f"&gt;difficulty&lt;/a&gt; with the port. Complicating things further is that &lt;a href="http://www.php.net/manual/en/language.references.return.php" title="They're more like symlinks on a filesystem than pointers"&gt;PHP references are not like references in other languages&lt;/a&gt;. For example, using references usually incurs a performance penalty since extra work is required. &lt;a href="http://www.php.net/manual/en/language.references.return.php"&gt;&lt;/a&gt;&lt;/li&gt; &lt;li&gt;You &lt;a href="http://bugs.php.net/bug.php?id=47872"&gt;can&amp;#8217;t&lt;/a&gt; import types via namespaces alone like you can in C# (and Java for that matter). In PHP, you have to import each type manually: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: green;"&gt;use&lt;/span&gt; Moserware\Skills\FactorGraphs\ScheduleLoop;&lt;br /&gt;&lt;span style="color: green;"&gt;use&lt;/span&gt; Moserware\Skills\FactorGraphs\ScheduleSequence;&lt;br /&gt;&lt;span style="color: green;"&gt;use&lt;/span&gt; Moserware\Skills\FactorGraphs\ScheduleStep;&lt;br /&gt;&lt;span style="color: green;"&gt;use&lt;/span&gt; Moserware\Skills\FactorGraphs\Variable;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; whereas in C# you can just say: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: green;"&gt;using&lt;/span&gt; Moserware.Skills.FactorGraphs;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; PHP&amp;#8217;s way makes things explicit and I can see that viewpoint, but it was a bit of a surprising requirement given how PHP usually required less syntax.&lt;/li&gt; &lt;li&gt;PHP lacks support for C#-like &lt;a href="http://msdn.microsoft.com/en-us/library/512aeb7t(v=VS.100).aspx"&gt;generics&lt;/a&gt;. On the one hand, I missed the generic type safety and performance benefits, but on the other hand it forced me to redesign some classes to not have an army of angle brackets (e.g. compare &lt;a href="http://github.com/moserware/Skills/blob/master/Skills/FactorGraphs/Factor.cs"&gt;this class in C#&lt;/a&gt; to &lt;a href="http://github.com/moserware/PHPSkills/blob/master/Skills/FactorGraphs/Factor.php"&gt;its PHP equivalent&lt;/a&gt;).&lt;/li&gt; &lt;li&gt;You have to manually call your parent class&amp;#8217;s constructor in PHP if you want that feature: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: green;"&gt;class&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;BaseClass&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;&lt;/span&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;  &lt;/span&gt;&lt;span style="color: green;"&gt;function&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;__construct&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: gray;"&gt; ... &lt;/span&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="color: green;"&gt;class&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;DerivedClass&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: green;"&gt;extends&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;BaseClass&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;&lt;/span&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;  &lt;/span&gt;&lt;span style="color: green;"&gt;function&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;__construct&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;  &lt;/span&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;      &lt;/span&gt;&lt;span style="color: orange;"&gt;//&lt;/span&gt;&lt;span style="color: orange;"&gt; this line is optional, but if you omit it, the BaseClass constructor will *not* be called &lt;/span&gt;&lt;span style="color: gray;"&gt;&lt;br /&gt;      &lt;/span&gt;&lt;span style="color: blue;"&gt;parent&lt;/span&gt;&lt;span style="color: gray;"&gt;::&lt;/span&gt;&lt;span style="color: blue;"&gt;__construct&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&lt;span style="color: gray;"&gt;; &lt;br /&gt;  &lt;/span&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;&lt;/span&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; This gives you more flexibility, but it doesn&amp;#8217;t enforce C#-like assumptions that your parent class&amp;#8217;s constructor was called.&lt;/li&gt; &lt;li&gt;PHP doesn&amp;#8217;t seem to have the concept of an implicit &amp;#8220;$this&amp;#8221; inside of a class. This forces you to always qualify class member variables with $this: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: green;"&gt;class&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;SomeClass&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;&lt;/span&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;    &lt;/span&gt;&lt;span style="color: green;"&gt;private&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$_someLocalVariable&lt;/span&gt;&lt;span style="color: gray;"&gt;; &lt;br /&gt;    &lt;/span&gt;&lt;span style="color: green;"&gt;function&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;someMethod&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;    &lt;/span&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;        &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$someMethodVariable&lt;/span&gt;&lt;span style="color: gray;"&gt; = &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$this&lt;/span&gt;&lt;span style="color: gray;"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color: blue;"&gt;_someLocalVariable&lt;/span&gt;&lt;span style="color: gray;"&gt; + &lt;/span&gt;&lt;span style="color: maroon;"&gt;1&lt;/span&gt;&lt;span style="color: gray;"&gt;; &lt;br /&gt;        ... &lt;br /&gt;    &lt;/span&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;br /&gt;&lt;/span&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; I put this in the &amp;#8220;OK&amp;#8221; category because some C# developers &lt;a href="http://blogs.msdn.com/b/omars/archive/2004/02/05/67687.aspx"&gt;prefer&lt;/a&gt; to always be explicit on specifying &amp;#8220;this&amp;#8221; as well.&lt;/li&gt; &lt;li&gt;PHP allows you to specify the type of some (but not all kinds) of the arguments of a function: &lt;div&gt; &lt;pre&gt;&lt;br /&gt;&lt;span style="color: green;"&gt;function&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: blue;"&gt;myFunction&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;SomeClass&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$someClass&lt;/span&gt;&lt;span style="color: gray;"&gt;, &lt;/span&gt;&lt;span style="color: green;"&gt;array&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$someArray&lt;/span&gt;&lt;span style="color: gray;"&gt;, &lt;/span&gt;&lt;span style="color: darkblue;"&gt;$someString&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt;&lt;span style="color: gray;"&gt; &lt;/span&gt;&lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: gray;"&gt; ... &lt;/span&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt; This is called &amp;#8220;&lt;a href="http://php.net/manual/en/language.oop5.typehinting.php"&gt;type hinting&lt;/a&gt;.&amp;#8221; It seems that it is designed for enforcing API contracts instead of general IDE help as it actually causes a &lt;a href="http://stackoverflow.com/questions/3580628/is-type-hinting-helping-the-performance-of-php-scripts/3580660#3580660"&gt;decrease in performance&lt;/a&gt;.&lt;/li&gt; &lt;li&gt;PHP doesn&amp;#8217;t have the concept of &lt;a href="http://msdn.microsoft.com/en-us/netframework/aa904594.aspx"&gt;LINQ&lt;/a&gt;, but it does support some similar functional-like concepts like &lt;a href="http://php.net/manual/en/function.array-map.php"&gt;array_map&lt;/a&gt; and &lt;a href="http://www.php.net/manual/en/function.array-reduce.php"&gt;array_reduce&lt;/a&gt;.&lt;/li&gt; &lt;li&gt;PHP has support for &lt;a href="http://php.net/manual/en/functions.anonymous.php"&gt;anonymous functions&lt;/a&gt; by using the &amp;#8220;&lt;span style="color: green;"&gt;function&lt;/span&gt;&lt;span style="color: olive;"&gt;(&lt;/span&gt;&lt;span style="color: darkblue;"&gt;$arg1&lt;/span&gt;&lt;span style="color: gray;"&gt;, ...&lt;/span&gt;&lt;span style="color: olive;"&gt;)&lt;/span&gt; &lt;span style="color: olive;"&gt;{&lt;/span&gt;&lt;span style="color: olive;"&gt;}&lt;/span&gt;&amp;#8221; syntax. This is sort of reminiscent of how C# did the same thing in version 2.0 where you had to type out &amp;#8220;&lt;a href="http://msdn.microsoft.com/en-us/library/0yw3tz5k(v=VS.100).aspx"&gt;delegate&lt;/a&gt;.&amp;#8221; C# 3.0 simplified this with a lighter weight version (e.g. &amp;#8220;&lt;span style="color: darkblue;"&gt;x&lt;/span&gt; &lt;span style="color: gray;"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color: darkblue;"&gt;x&lt;/span&gt; &lt;span style="color: gray;"&gt;*&lt;/span&gt; &lt;span style="color: darkblue;"&gt;x&lt;/span&gt;&amp;#8221;). I&amp;#8217;ve found that this seemingly tiny change &amp;#8220;isn&amp;#8217;t about doing the same thing faster, it allows me to work in a &lt;a title="The quote comes from Linus talking about how git's speed changes how you work. The full quote is: 'that is the kind of performance that changes how you work. It&amp;#8217;s no longer doing the same thing faster, it&amp;#8217;s allowing you to work in a completely different manner.'" href="http://www.youtube.com/watch?v=4XpnKHJAok8#t=54m47s"&gt;completely different manner&lt;/a&gt;&amp;#8221; by employing functional concepts without thinking. It&amp;#8217;s sort of a shame PHP didn&amp;#8217;t elevate this concept with concise syntax. When C#&amp;#8217;s lambda syntax was introduced in 3.0, it made me want to use them much more often. PHP&amp;#8217;s lack of something similar is a strong discourager to the functional style and is a lesson that &lt;a href="http://herbsutter.com/2010/10/07/c-and-beyond-session-lambdas-lambdas-everywhere/"&gt;C++ guys have recently learned&lt;/a&gt;.&lt;/li&gt; &lt;li&gt;Item 4 of &lt;a href="http://www.php.net/license/index.php#faq-lic"&gt;the PHP license&lt;/a&gt; states:&lt;blockquote&gt;Products derived from this software may not be called &amp;#8220;PHP&amp;#8221;, nor may &amp;#8220;PHP&amp;#8221; appear in their name, without prior written permission from group@php.net. You may indicate that your software works in conjunction with PHP by saying &amp;#8220;Foo for PHP&amp;#8221; instead of calling it &amp;#8220;PHP Foo&amp;#8221; or &amp;#8220;phpfoo&amp;#8221;&lt;/blockquote&gt; This explains why you see carefully worded names like &amp;#8220;&lt;a href="http://developers.facebook.com/blog/post/358"&gt;HipHop for PHP&lt;/a&gt;&amp;#8221; rather than something like &amp;#8220;php2cpp.&amp;#8221; This technically doesn&amp;#8217;t stop you doesn&amp;#8217;t stop you from having a project with the PHP name in it (e.g. &lt;a href="http://www.phpunit.de/"&gt;PHPUnit&lt;/a&gt;) so long as the official PHP code is not included in it. However, it&amp;#8217;s clear that the PHP group is trying to clean up its name from tarnished projects like &lt;a href="http://en.wikipedia.org/wiki/PHP-Nuke"&gt;PHP-Nuke&lt;/a&gt;. I understand their frustration, but this leads to an official preference for names like &lt;a href="http://www.zope.org/"&gt;Zope&lt;/a&gt; and &lt;a href="http://www.smarty.net/"&gt;Smarty&lt;/a&gt; that seem to be less clear on what the project actually does. This position would be like Microsoft declaring that you couldn&amp;#8217;t use the &amp;#8220;#&amp;#8221; suffix or the &amp;#8220;Implementation Running On .Net (&lt;a href="http://stackoverflow.com/questions/1194309/why-are-many-ports-of-languages-to-net-prefixed-with-iron"&gt;Iron&lt;/a&gt;)&amp;#8221; prefix in your project name (but maybe that would lead to more creativity?).&lt;/li&gt; &lt;/ul&gt; &lt;h4&gt;The &lt;a href="http://www.joelonsoftware.com/uibook/chapters/fog0000000057.html" title="Like Joel mentions in this post from 2000, tiny frustrations add up to a really bad experience"&gt;Frustrating&lt;/a&gt; Parts:&lt;/h4&gt; &lt;ul&gt; &lt;li&gt;As someone who&amp;#8217;s primarily worked with a statically typed language for the past 15 years, I prefer upfront compiler errors and warnings that C# offers and agree with &lt;a href="http://en.wikipedia.org/wiki/Anders_Hejlsberg"&gt;Anders Hejlsberg&lt;/a&gt;&amp;#8217;s &lt;a href="http://www.se-radio.net/2008/05/episode-97-interview-anders-hejlsberg/" title="The quote begins around 35:45"&gt;philosophy&lt;/a&gt;: &lt;blockquote&gt;&amp;#8220;I think one of the reasons that languages like Ruby for example (or Python) are becoming popular is really in many ways in spite of the fact that they are not typed... but because of the fact that they [have] very good metaprogramming support. I don&amp;#8217;t see a lot of downsides to static typing other than the fact that it may not be practical to put in place, and it &lt;em&gt;is&lt;/em&gt; harder to put in place and therefore takes longer for us to get there with static typing, but once you do have static typing. I mean, gosh, you know, like hey -- the compiler is going to report the errors before the space shuttle flies instead of whilst it&amp;#8217;s flying, that&amp;#8217;s a good thing!&amp;#8221;&lt;/blockquote&gt; But more dynamic languages like PHP have their supporters. For example, &lt;a href="http://en.wikipedia.org/wiki/Douglas_Crockford"&gt;Douglas Crockford&lt;/a&gt; &lt;a title="See video starting at the -18:14 mark" href="http://video.yahoo.com/watch/111596/1710658"&gt;raves&lt;/a&gt; about JavaScript&amp;#8217;s dynamic aspects: &lt;blockquote&gt;&amp;#8220;I found over the years of working with JavaScript... I used to be of the religion that said &amp;#8216;Yeah, absolutely brutally strong type systems. Figure it all out at compile time.&amp;#8217; I've now been converted to the other camp. I've found that the expressive power of JavaScript is so great. I've not found that I've lost anything in giving up the early protection [of statically compiled code]&amp;#8221;&lt;/blockquote&gt; I still haven&amp;#8217;t seen where Crockford is coming from given my recent work with PHP. Personally, I think that given C# 4.0&amp;#8217;s optional support of &lt;a href="http://msdn.microsoft.com/en-us/library/dd264736.aspx"&gt;dynamic&lt;/a&gt; objects, the lines between the two worlds are grayer and that with C# you get the best of both worlds, but I&amp;#8217;m probably biased here.&lt;/li&gt; &lt;li&gt;You don&amp;#8217;t have to define &lt;a href="http://www.php.net/manual/en/language.variables.basics.php"&gt;variables&lt;/a&gt; in PHP. This reduces some coding &amp;#8220;&lt;a href="http://msdn.microsoft.com/en-us/magazine/dd419655.aspx" title="There's a lot of talk out there about ceremony vs essence."&gt;ceremony&lt;/a&gt;&amp;#8221; to get to the essence of your code, but I think it removes a &lt;a href="http://podcasts.pragprog.com/2007-10/michael-nygard-interview.mp3" title="Quote is at 3:46 - 'We should have shock absorbers and circuit breakers so that [our systems] can be resilient to failure.'"&gt;shock absorber/circuit-breaker&lt;/a&gt; that can be built into the language. This &amp;#8220;feature&amp;#8221; &lt;a href="http://github.com/moserware/PHPSkills/commit/fa10d276d6121f390b930b655a66edd9376e114e#L0L24"&gt;turned my typo into a bug&lt;/a&gt; and led to a runtime error. Fortunately, options like &lt;a href="http://php.net/manual/en/errorfunc.configuration.php"&gt;E_NOTICE&lt;/a&gt; can catch these, but it caught me off guard. Thankfully, NetBean&amp;#8217;s auto-completion saved me from most of these types of errors.&lt;/li&gt;&lt;li&gt;PHP has built-in support for associative arrays, but you &lt;a href="http://php.net/manual/en/language.types.array.php"&gt;can&amp;#8217;t use objects as keys&lt;/a&gt; or else you&amp;#8217;ll get an &amp;#8220;Illegal Offset Type&amp;#8221; error. Because my C# API heavily relied on this ability and I didn&amp;#8217;t want to redesign the structure, I &lt;a href="http://github.com/moserware/PHPSkills/blob/master/Skills/HashMap.php"&gt;created my own hashmap&lt;/a&gt; that supports object keys. This omission tended to reinforce the belief that &lt;a href="http://michaelkimsal.com/blog/php-is-not-object-oriented/"&gt;PHP is not really object oriented&lt;/a&gt;. That said, I&amp;#8217;m probably missing something and did it wrong.&lt;/li&gt; &lt;li&gt;PHP &lt;a href="http://bugs.php.net/bug.php?id=9331&amp;amp;edit=1"&gt;doesn&amp;#8217;t support operator overloading&lt;/a&gt;. This made my &lt;a href="http://github.com/moserware/PHPSkills/blob/master/Skills/Numerics/GaussianDistribution.php"&gt;GaussianDistribution&lt;/a&gt; and &lt;a href="http://github.com/moserware/PHPSkills/blob/master/Skills/Numerics/Matrix.php"&gt;Matrix&lt;/a&gt; classes a little harder to work with by having to invent explicit names for the operators.&lt;/li&gt; &lt;li&gt;PHP lacks support for a C#-like &lt;a href="http://msdn.microsoft.com/en-us/library/x9fsa0sw(v=VS.100).aspx"&gt;property syntax&lt;/a&gt;. Having to write getters and setters made me feel like I was back programming in Java again.&lt;/li&gt; &lt;li&gt;My code ran &lt;a href="http://twitter.com/GregB/status/27244912213"&gt;slower in PHP&lt;/a&gt;. To be fair, most of the performance problem was in &lt;a href="http://github.com/moserware/PHPSkills/blob/master/Skills/Numerics/Matrix.php"&gt;my horribly naive matrix implementation&lt;/a&gt; which could be improved with a better implementation. Regardless, it seems that larger sites deal with PHP&amp;#8217;s performance problem by writing critical parts in compiled languages &lt;a href="http://news.ycombinator.com/item?id=1820451"&gt;like C/C++&lt;/a&gt; or by using caching layers such as &lt;a href="http://en.wikipedia.org/wiki/Memcached"&gt;memcached&lt;/a&gt;. One interesting observation is that the performance issue isn't really with the &lt;a href="http://en.wikipedia.org/wiki/Zend_Engine"&gt;Zend Engine&lt;/a&gt; per-se but rather the semantics of the PHP language itself. Haiping Zhao on the HipHop for PHP team &lt;a href="http://www.youtube.com/watch?v=p5S1K60mhQU#t=51m44s" title="From the Stanford lecture on HipHop for PHP"&gt;gave a good overview of the issue&lt;/a&gt;: &lt;blockquote&gt;&amp;#8220;Around the time that we started the [HipHop for PHP] project, we absolutely looked into the Zend Engine. The first question you ask is 'The Zend Engine must be terribly implemented. That&amp;#8217;s why it&amp;#8217;s slow, right?' So we looked into the Zend Engine and tried different places, we looked at the hash functions to see if it&amp;#8217;s sufficient and look some of the profiles the Zend Engine has and different parts of the Zend Engine. You finally realize that the Zend Engine is pretty compact. It just does what it promises. If you have that kind of semantics you just cannot avoid the dynamic function table, you cannot avoid the variable table, you just cannot avoid a lot of the things that they built... that&amp;#8217;s the point that [you realize] PHP can also be called C++Script because the syntax is so similar then you ask yourself, 'What is the difference between the speed of these two different languages and those are the items that are... different like the dynamic symbol lookup (it&amp;#8217;s not present in C++), the weak typing is not present in C++, everything else is pretty much the same. The Zend Engine is very close to C implementation. The layer is very very thin. I don&amp;#8217;t think we can blame the Zend Engine for the slowness PHP has.&amp;#8221;&lt;/blockquote&gt; That said, I don&amp;#8217;t think that performance alone would stop me from using PHP. It&amp;#8217;s good enough for most things. Furthermore, I'm sure optimizers could use tricks like what the &lt;a href="http://en.wikipedia.org/wiki/Dynamic_Language_Runtime"&gt;DLR&lt;/a&gt; and &lt;a href="http://code.google.com/p/v8/"&gt;V8&lt;/a&gt; use to squeak out more performance. However, I think that in practice, there is a case of &lt;a href="http://en.wikipedia.org/wiki/Amdahl's_law"&gt;diminishing returns&lt;/a&gt; where I/O (and not CPU time) typically become the limiting factor.&lt;/li&gt; &lt;/ul&gt; &lt;h4&gt;Parting Thoughts&lt;/h4&gt; &lt;p&gt;Despite my brief encounter, I feel that I learned quite a bit and feel comfortable around PHP code now. I think my quick ramp-up highlights a core value of PHP: its simplicity. I did miss C#-like compiler warnings and type safety, but maybe that&amp;#8217;s my own personal acquired taste. Although PHP &lt;em&gt;does&lt;/em&gt; have some &lt;a href="http://www.reddit.com/r/programming/comments/dst56/today_i_learned_about_php_variable_variables/c12n0w9"&gt;dubious features&lt;/a&gt;, it&amp;#8217;s not nearly as bad as some people make it out to be. I think that its simplicity makes it a very respectable choice for the type of things it was originally designed to do like &lt;a href="http://wordpress.org/extend/themes/" title="e.g. Wordpress ones"&gt;web templates&lt;/a&gt;. Although I still wouldn&amp;#8217;t pick PHP as my &lt;a href="http://weblogs.asp.net/scottgu/archive/tags/MVC/default.aspx"&gt;first choice&lt;/a&gt; as a general purpose web programming language, I can now look at its features in a much more balanced way.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;P.S.&lt;/strong&gt; I&amp;#8217;d love to hear suggestions on how to improve my implementation and learn where I did something wrong. Please feel free to use &lt;a href="http://github.com/moserware/PHPSkills"&gt;my PHP TrueSkill code&lt;/a&gt; and submit &lt;a href="http://help.github.com/pull-requests/"&gt;pull requests&lt;/a&gt;. As always, feel free to fork the code and port it to another language like &lt;a href="http://github.com/nsp"&gt;Nate Parsons&lt;/a&gt; did with his &lt;a href="http://github.com/nsp/JSkills"&gt;JSkills Java port&lt;/a&gt;.&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=l3hyKQam1ZU:VAe_0gC3zMQ:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=l3hyKQam1ZU:VAe_0gC3zMQ:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=l3hyKQam1ZU:VAe_0gC3zMQ:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=l3hyKQam1ZU:VAe_0gC3zMQ:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=l3hyKQam1ZU:VAe_0gC3zMQ:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=l3hyKQam1ZU:VAe_0gC3zMQ:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=l3hyKQam1ZU:VAe_0gC3zMQ:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=l3hyKQam1ZU:VAe_0gC3zMQ:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=l3hyKQam1ZU:VAe_0gC3zMQ:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=l3hyKQam1ZU:VAe_0gC3zMQ:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Moserware/~4/l3hyKQam1ZU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.moserware.com/feeds/5145078661665722279/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6800934446457898793&amp;postID=5145078661665722279" title="36 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6800934446457898793/posts/default/5145078661665722279?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6800934446457898793/posts/default/5145078661665722279?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Moserware/~3/l3hyKQam1ZU/notes-from-porting-c-code-to-php.html" title="Notes from porting C# code to PHP" /><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_Zfbv3mHcYrc/TMMv2y_zyKI/AAAAAAAAXkA/JKpA8oOpCiE/s72-c/1000px-PHP-logo.svg.png" height="72" width="72" /><thr:total>36</thr:total><feedburner:origLink>http://www.moserware.com/2010/10/notes-from-porting-c-code-to-php.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0INQXY8eyp7ImA9WxBbGU8.&quot;"><id>tag:blogger.com,1999:blog-6800934446457898793.post-793968235021421709</id><published>2010-03-18T08:33:00.006-04:00</published><updated>2010-03-18T12:26:30.873-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-03-18T12:26:30.873-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="trueskill" /><title>Computing Your Skill</title><content type="html">&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;: I describe how the &lt;a href="http://research.microsoft.com/en-us/projects/trueskill/"&gt;TrueSkill algorithm&lt;/a&gt; works using concepts you're already familiar with. TrueSkill is used on &lt;a href="http://www.xbox.com/en-US/LIVE/" title="I'm actually not a gamer myself, I just like the math of their ranking algorithm :-)"&gt;Xbox Live&lt;/a&gt; to rank and match players and it serves as a great way to understand how statistical machine learning is actually applied today. I&amp;#8217;ve also created an &lt;a href="http://github.com/moserware/Skills"&gt;open source project&lt;/a&gt; where I implemented TrueSkill three different times in increasing complexity and capability. In addition, I've created a &lt;a href="http://dl.dropbox.com/u/1083108/Moserware/Skill/The%20Math%20Behind%20TrueSkill.pdf" title="It's over 40 pages because I had a fun time with the equation editor."&gt;detailed supplemental math paper&lt;/a&gt; that works out equations that I gloss over here. Feel free to jump to sections that look interesting and ignore ones that seem boring. Don't worry if this post seems a bit long, there are &lt;em&gt;lots&lt;/em&gt; of pictures.&lt;/p&gt;&lt;h4&gt;Introduction&lt;/h4&gt;&lt;p&gt;It seemed easy enough: I wanted to create a database to track the skill levels of my coworkers in &lt;a href="http://en.wikipedia.org/wiki/Chess"&gt;chess&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Table_football"&gt;foosball&lt;/a&gt;. I already knew that I wasn&amp;#8217;t very good at foosball and would bring down better players. I was curious if an algorithm could do a better job at creating well-balanced matches. I also wanted to see if I was improving at chess. I knew I needed to have an easy way to collect results from everyone and then use an algorithm that would keep getting better with &lt;a title="Peter Norvig's 'Theorizing From Data' talk is fantastic, I highly recommend it." href="http://www.facebook.com/techtalks#/video/video.php?v=644326502463"&gt;more&lt;/a&gt; &lt;a title="Microsoft Research put out this interesting book on how massive amounts of data will dominate scientific discoveries." href="http://research.microsoft.com/en-us/collaboration/fourthparadigm/"&gt;data&lt;/a&gt;. I was looking for a way to compress all that data and distill it down to some simple knowledge of how skilled people are. Based on some &lt;a title="I think the lasting legacy of the Netflix prize is that if you make something interesting and put it online, it shouldn't be a surprise that you can get PhDs to work on it for a dollar an hour or less. There's probably a deep lesson there for most tech companies." href="http://bits.blogs.nytimes.com/2009/09/21/netflix-awards-1-million-prize-and-starts-a-new-contest/?ref=technology"&gt;previous&lt;/a&gt; &lt;a title="If you haven't seen it yet, you should check out the PBS NOVA episode that covered this." href="http://www.pbs.org/wgbh/nova/darpa/"&gt;things&lt;/a&gt; that I had heard about, this seemed like a good fit for &amp;#8220;&lt;a href="http://tv.theiet.org/technology/infopro/turing-2010.cfm" title="If you want a friendly introduction to machine learning, especially how it's applied at Microsoft, then Christopher Bishop's 2010 Turing lecture is a fantastic high level overview."&gt;machine learning&lt;/a&gt;.&amp;#8221;&lt;/p&gt;&lt;p&gt;But, there&amp;#8217;s a problem.&lt;/p&gt;&lt;p&gt;Machine learning is a &lt;em&gt;hot&lt;/em&gt; area in Computer Science&amp;#8212; but it&amp;#8217;s intimidating. Like most subjects, there&amp;#8217;s &lt;a href="http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html"&gt;a lot&lt;/a&gt; &lt;a title="There are lots of machine learning resources out there, unfortunately most of them scare off beginners." href="http://news.ycombinator.com/item?id=1055042"&gt;to learn&lt;/a&gt; to be an expert in the field. I didn&amp;#8217;t need to go very deep; I just needed to understand enough to solve my problem. I found a link to &lt;a href="http://research.microsoft.com/apps/pubs/default.aspx?id=67956"&gt;the paper&lt;/a&gt; describing the TrueSkill algorithm and I read it several times, but it didn&amp;#8217;t make sense. It was only 8 pages long, but it seemed beyond my capability to understand. I felt dumb. Even so, I was too stubborn to give up. Jamie Zawinski &lt;a title="The quote comes from Coders @ Work" href="http://books.google.com/books?id=nneBa6-mWfgC&amp;amp;printsec=frontcover&amp;amp;dq=coders+at+work&amp;amp;ei=hVFeS6CSI5G2NJadyPQC&amp;amp;cd=1#v=onepage&amp;amp;q=%22Not%20knowing%20something%20doesn%27t%20mean%20you%27re%20dumb%22&amp;amp;f=false"&gt;said it well&lt;/a&gt;:&lt;/p&gt;&lt;blockquote&gt;&lt;br /&gt;    &amp;#8220;Not knowing something doesn&amp;#8217;t mean you&amp;#8217;re dumb&amp;#8212; it just means you don&amp;#8217;t know it.&amp;#8221;&lt;br /&gt;  &lt;/blockquote&gt;&lt;p&gt;I learned that the problem isn&amp;#8217;t the difficulty of the ideas themselves, but rather that the ideas make too big of a jump from &lt;a title="If you're like most people, then top of your math career was calculus. Although it has interesting concepts, you probably don't use it anymore. You would have been far better off learning more about statistics to handle all the data you're faced with. Arthur Benjamin's 2009 TED talk goes into this." href="http://www.ted.com/talks/arthur_benjamin_s_formula_for_changing_math_education.html"&gt;the math&lt;/a&gt; that &lt;a title="We spend way too much tyime learning how calculate, long-divide, integrate by parts, yadda yadda, instead of learning why you'd want to do that or what it's actually useful for. In the era of Moore's law, you can bank on computers getting better at doing computational grunt work, but it's sad that you can't depend on the education system teaching kids how to take advantage of all that power. Although *slightly* biased towards using tools like Mathematica, this talk by Conrad Wolfram shares a similar viewpoint." href="http://www.youtube.com/watch?v=TsvPE1EqwQ8"&gt;we typically learn&lt;/a&gt; &lt;a href="http://news.ycombinator.com/item?id=1058584" title="To prove this, start talking about even concepts in this blog post at your next party and look at the reaction."&gt;in school&lt;/a&gt;. This is sad because underneath the apparent complexity lies some beautiful concepts. In hindsight, the algorithm seems relatively simple, but it took me several months to arrive at that conclusion. My hope is that I can short-circuit the haphazard and slow process I went through and take you directly to the beauty of &lt;em&gt;understanding&lt;/em&gt; what&amp;#8217;s inside the gem that is the TrueSkill algorithm.&lt;/p&gt;&lt;h4&gt;Skill &amp;#8776; Probability of Winning&lt;/h4&gt;&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Osaka07_D2A_Torri_Edwards.jpg"&gt;&lt;img alt="Women runners in the 100 meter dash." style="border:0; border:0; float:right; margin: 10px 0px; width: 320px; display: inline; height: 260px; border:0;" id="BLOGGER_PHOTO_ID_5432537170014363586" name="BLOGGER_PHOTO_ID_5432537170014363586" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S2Q_xffP38I/AAAAAAAAKYo/YcgBWcpjYtI/s320/100M_dash_Osaka07_D2A_Torri_Edwards.jpg" title="World Athletics Championships 2007 in Osaka. Photo from Wikipedia by Eckhard Pecher. Used under the Creative Commons Attribution 2.5 Generic License" /&gt;&lt;/a&gt;Skill is tricky to measure. Being good at something takes &lt;a href="http://www.moserware.com/2008/03/what-does-it-take-to-become-grandmaster.html"&gt;deliberate practice&lt;/a&gt; and sometimes a bit of luck. How do you measure that in a person? You could just ask someone if they&amp;#8217;re skilled, but this would only give a rough approximation since people tend to be &lt;a title="It's worth reading about the overconfidence effect if you haven't done it before" href="http://en.wikipedia.org/wiki/Overconfidence_effect"&gt;overconfident&lt;/a&gt; in their ability. Perhaps a better question is &amp;#8220;what would the &lt;a title="for example, meters, seconds, etc." href="http://en.wikipedia.org/wiki/Units_of_measurement"&gt;units&lt;/a&gt; of skill be?&amp;#8221; For something like the 100 meter dash, you could just average the number of seconds of several recent sprints. However, for a game like chess, it&amp;#8217;s harder because all that&amp;#8217;s really important is if you win, lose, or draw.&lt;/p&gt;&lt;p&gt;It might make sense to just tally the total number of wins and losses, but this wouldn&amp;#8217;t be fair to people that played a lot (or a little). Slightly better is to record the percent of games that you win. However, this wouldn&amp;#8217;t be fair to people that &lt;a title="Jeff Atwood discussed the concept further." href="http://www.codinghorror.com/blog/archives/000961.html"&gt;beat up on far worse players&lt;/a&gt; or players who got decimated but maybe learned a thing or two. The goal of most games is to win, but if you win &lt;em&gt;too&lt;/em&gt; much, then you&amp;#8217;re probably not challenging yourself. Ideally, if all players won about half of their games, we&amp;#8217;d say things are balanced. In this ideal scenario, everyone would have a near 50% win ratio, making it impossible to compare using that metric.&lt;/p&gt;&lt;p&gt;Finding universal units of skill is too hard, so we&amp;#8217;ll just give up and not use &lt;em&gt;any&lt;/em&gt; units. The only thing we really care about is roughly who&amp;#8217;s better than whom and by how much. One way of doing this is coming up with a &lt;a title="There's a lot of cool stuff you can do with scales, specifically things like the Thurstone Case V and Bradley-Terry models, but there just wasn't enough space to cover these in detail, so I'm only going to passively mention them here, but encourage you to check them out." href="http://en.wikipedia.org/wiki/Scale_%28social_sciences%29"&gt;scale&lt;/a&gt; where each person has a unit-less number expressing their rating that you could use for comparison. If a player has a skill rating much higher than someone else, we&amp;#8217;d expect them to win if they played each other.&lt;/p&gt;&lt;p&gt;The key idea is that a single skill number is meaningless. What&amp;#8217;s important is how that number compares with others. This is an important point worth repeating: &lt;strong&gt;skill only makes sense if it&amp;#8217;s relative to something else&lt;/strong&gt;. We&amp;#8217;d like to come up with a system that gives us numbers that are useful for comparing a person&amp;#8217;s skill. In particular, we&amp;#8217;d like to have a skill rating system that we could use to predict the probability of winning, losing, or drawing in matches based on a numerical rating.&lt;/p&gt;&lt;p&gt;We&amp;#8217;ll spend the rest of our time coming up with a system to calculate and update these skill numbers with the assumption that they can be used to determine the probability of an outcome.&lt;/p&gt;&lt;h4&gt;What Exactly is Probability Anyway?&lt;/h4&gt;&lt;p&gt;You can learn about probability if you&amp;#8217;re willing to flip a coin&amp;#8212; &lt;em&gt;a lot&lt;/em&gt;. You flip a few times:&lt;/p&gt;&lt;a href="http://www.flickr.com/photos/matthiasxc/3600131465/"&gt;&lt;img alt="Heads" style="border:0; border:0; width: 240px; height: 240px" id="BLOGGER_PHOTO_ID_5431258503464036242" name="BLOGGER_PHOTO_ID_5431258503464036242" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S1-01TZLJ5I/AAAAAAAAKXg/kXFh6ZAiC-U/s400/pennyheads.jpg" title="Photo by matthiasxc on Flickr. Used under the Creative Commons Attribution License" /&gt;&lt;/a&gt;&lt;a href="http://www.flickr.com/photos/matthiasxc/3600131465/"&gt;&lt;img alt="Heads" style="border:0; width: 240px; height: 240px" id="BLOGGER_PHOTO_ID_5431258503464036242" name="BLOGGER_PHOTO_ID_5431258503464036242" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S1-01TZLJ5I/AAAAAAAAKXg/kXFh6ZAiC-U/s400/pennyheads.jpg" title="Photo by matthiasxc on Flickr. Used under the Creative Commons Attribution License" /&gt;&lt;/a&gt;&lt;a href="http://www.flickr.com/photos/matthiasxc/3600942160/in/photostream/"&gt;&lt;img alt="Tails" style="border:0; width: 239px; height: 240px" id="BLOGGER_PHOTO_ID_5431259976545222946" name="BLOGGER_PHOTO_ID_5431259976545222946" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S1-2LDDGQSI/AAAAAAAAKXo/MuDWpGPPW3A/s400/pennytails_matthiasxc.jpg" title="Photo by matthiasxc on Flickr. Used under the Creative Commons Attribution License" /&gt;&lt;/a&gt;&lt;p&gt;Heads, heads, tails!&lt;/p&gt;&lt;p&gt;Each flip has a &lt;a title="It turns out that flipping a coin is actually biased towards the side that is face up when you flip it." href="http://www.codingthewheel.com/archives/the-coin-flip-a-fundamentally-unfair-proposition"&gt;seemingly&lt;/a&gt; random outcome. However, &amp;#8220;random&amp;#8221; usually means that you haven&amp;#8217;t looked long enough to see a pattern emerge. If we take the total number of heads and divide it by the total number of flips, we see a very definite pattern emerge:&lt;/p&gt;&lt;a href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S1-2xT-xuBI/AAAAAAAAKXw/xxEPk9xd4Bo/s1600-h/headspercentage.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5431260633925531666" name="BLOGGER_PHOTO_ID_5431260633925531666" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S1-2xT-xuBI/AAAAAAAAKXw/xxEPk9xd4Bo/s576/headspercentage.png" /&gt;&lt;/a&gt;&lt;p&gt;But you knew that it was going to be a 50-50 chance &lt;em&gt;in the long run&lt;/em&gt;. When saying something is random, we often mean it&amp;#8217;s bounded within some range. &lt;a title="Photo is 'Wee!' by 'M i x y' on Flickr. Used under the Creative Commons Attribution License." href="http://www.flickr.com/photos/ladymixy-uk/4063190403/"&gt;&lt;img style="border:0; margin: 10px 0px 0px 15px; width: 320px; display: inline; height: 213px" id="BLOGGER_PHOTO_ID_5432550112612764338" name="BLOGGER_PHOTO_ID_5432550112612764338" align="right" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S2RLi2bKyrI/AAAAAAAAKYw/brunlUsQIQA/s320/target_ladymixy_uk.jpg" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;It turns out that a better metaphor is to think of a bullseye that archers shoot at. Each arrow will land somewhere near that center. It would be extraordinary to see an arrow hit the bullseye exactly. Most of the arrows will seem to be randomly scattered around it. Although &amp;#8220;random,&amp;#8221; it&amp;#8217;s far more likely that arrows will be near the target than, for example, way out in the woods (well, except if &lt;em&gt;I&lt;/em&gt; was the archer).&lt;/p&gt;&lt;p&gt;This isn&amp;#8217;t a new metaphor; the Greek word &amp;#963;&amp;#964;&amp;#972;&amp;#967;&amp;#959;&amp;#962; (stochos) refers to a stick set up to aim at. It&amp;#8217;s where statisticians get the word &lt;a title="besides, stow chass tick is just fun to pronounce" href="http://blogs.wnyc.org/radiolab/2009/06/15/stochasticity/"&gt;stochastic&lt;/a&gt;: a fancy, but slightly more correct word than random. The distribution of arrows brings up another key point:&lt;/p&gt;&lt;p&gt;&lt;strong&gt;All things are possible, but not all things are &lt;em&gt;probable&lt;/em&gt;.&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Probability has &lt;a title="This is a great talk about the history of probability by Keith Devlin. This specific point comes up around the 5 minute mark." href="http://www.youtube.com/watch?v=3pRM4v0O29o#t=5m00s"&gt;changed how ordinary people think&lt;/a&gt;, a feat that rarely happens in mathematics. The very idea that you could understand &lt;em&gt;anything&lt;/em&gt; about future outcomes is such a big leap in thought that it &lt;a title="This is the book that is described in the video of the previous link. It's a quick read and interesting to see how mathematics is really developed." href="http://www.amazon.com/gp/product/0465009107?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0465009107"&gt;baffled Blaise Pascal&lt;/a&gt;, one of the best mathematicians in history.&lt;/p&gt;&lt;p&gt;In the summer of 1654, Pascal exchanged a &lt;a title="You can read the letters here" href="http://www.york.ac.uk/depts/maths/histstat/pascal.pdf"&gt;series of letters&lt;/a&gt; with &lt;a href="http://en.wikipedia.org/wiki/Pierre_de_Fermat"&gt;Pierre de Fermat&lt;/a&gt;, another brilliant mathematician, concerning an &amp;#8220;unfinished game.&amp;#8221; Pascal wanted to know how to divide money among gamblers if they have to leave before the game is finished. Splitting the money fairly required some notion of the probability of outcomes if the game would have been played until the end. This problem gave birth to the field of probability and laid the foundation for lots of fun things like life insurance, casino games, and scary &lt;a title="Warren Buffet calls them financial weapons of mass destruction" href="http://en.wikipedia.org/wiki/Derivative_%28finance%29"&gt;financial derivatives&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;But probability is more general than predicting the future&amp;#8212; it&amp;#8217;s a measure of your ignorance of something. It doesn&amp;#8217;t matter if the event is set to happen in the future or if it happened months ago. All that matters is that &lt;em&gt;you lack knowledge in something&lt;/em&gt;. Just because we lack knowledge doesn&amp;#8217;t mean we can&amp;#8217;t do anything useful, but we&amp;#8217;ll have to do a lot more coin flips to see it.&lt;/p&gt;&lt;h4&gt;Aggregating Observations&lt;/h4&gt;&lt;p&gt;The real magic happens when we aggregate a lot of observations. What would happen if you flipped a coin 1000 times and counted the number of heads? Lots of things are possible, but in my case I got 505 heads. That&amp;#8217;s about half, so it&amp;#8217;s not surprising. I can graph this by creating a bar chart and put all the possible outcomes (getting 0 to 1000 heads) on the bottom and the total number of times that I got that particular count of heads on the vertical axis. For 1 outcome of 505 total heads it would look like this:&lt;/p&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S2EMQZLJ1SI/AAAAAAAAKX4/bZXU0-gOScw/s1600-h/totalheads1.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5431636101360637218" name="BLOGGER_PHOTO_ID_5431636101360637218" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S2EMQZLJ1SI/AAAAAAAAKX4/bZXU0-gOScw/s576/totalheads1.png" /&gt;&lt;/a&gt;&lt;p&gt;Not too exciting. But what if we did it again? This time I got 518 heads. I can add that to the chart:&lt;/p&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S2EMqwPrFwI/AAAAAAAAKYA/6_zLD6matE4/s1600-h/totalheads2.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5431636554230208258" name="BLOGGER_PHOTO_ID_5431636554230208258" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S2EMqwPrFwI/AAAAAAAAKYA/6_zLD6matE4/s576/totalheads2.png" /&gt;&lt;/a&gt;&lt;p&gt;Doing it 8 more times gave me 489, 515, 468, 508, 492, 475, 511, and once again, I got 505. The chart now looks like this:&lt;/p&gt;&lt;a href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S2EM_2oh3mI/AAAAAAAAKYI/x7ytLC1LFeo/s1600-h/totalheads10.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5431636916722327138" name="BLOGGER_PHOTO_ID_5431636916722327138" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S2EM_2oh3mI/AAAAAAAAKYI/x7ytLC1LFeo/s576/totalheads10.png" /&gt;&lt;/a&gt;&lt;p&gt;And after a billion times, a total of one &lt;em&gt;trillion&lt;/em&gt; flips, I got this:&lt;/p&gt;&lt;a href="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S2RUBHvAW5I/AAAAAAAAKY4/2erLR25Cpyc/s1600-h/totalheads1e9.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5432559428748467090" title="In case you're wondering, I used a cryptographically strong random number generator and kept all my two CPU cores busy for a few hours running it as an idle job." name="BLOGGER_PHOTO_ID_5432559428748467090" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S2RUBHvAW5I/AAAAAAAAKY4/2erLR25Cpyc/s576/totalheads1e9.png" /&gt;&lt;/a&gt;&lt;p&gt;In all the flips, I never got less than 407 total heads and I never got more than 600. Just for fun, we can zoom in on this region:&lt;/p&gt;&lt;a href="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S2RUR_oRMCI/AAAAAAAAKZA/xytGIDIXH5I/s1600-h/totalheads1e9_zoomed.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5432559718630502434" name="BLOGGER_PHOTO_ID_5432559718630502434" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S2RUR_oRMCI/AAAAAAAAKZA/xytGIDIXH5I/s576/totalheads1e9_zoomed.png" /&gt;&lt;/a&gt;&lt;p&gt;As we do more sets of flips, the &lt;a title="The jagged edges are actually part of a Binomial Distribution. This is discussed more in the accompanying math paper to this article." href="http://en.wikipedia.org/wiki/Binomial_distribution"&gt;jagged edges&lt;/a&gt; smooth out to give us the famous &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/Normal_distribution"&gt;bell curve&lt;/a&gt;&amp;#8221; that you&amp;#8217;ve probably seen before. Math guys love to refer to it as a &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/Gaussian_function"&gt;Gaussian&lt;/a&gt;&amp;#8221; curve because it was used by the German mathematician Carl Gauss in 1809 to investigate errors in astronomical data. He came up with an exact formula of what to expect if we flipped a coin an infinite number of times (so that we don&amp;#8217;t have to). This is such a famous result that you can see the curve and its equation if you look closely at the middle of an old 10 Deutsche Mark banknote bearing Gauss&amp;#8217;s face:&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S2L0-j7yViI/AAAAAAAAKYg/HhzizCzK2wI/s1600-h/10_DM_Gauss_Cropped.jpg"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5432173456197309986" title="I wonder: what is the probability of having a mathematician on (legal) USA currency?" name="BLOGGER_PHOTO_ID_5432173456197309986" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S2L0-j7yViI/AAAAAAAAKYg/HhzizCzK2wI/s576/10_DM_Gauss_Cropped.jpg" /&gt;&lt;/a&gt;&lt;p&gt;Don&amp;#8217;t miss the forest from all the flippin&amp;#8217; trees. The curve is showing you the density of all possible outcomes. By density, I mean how tall the curve gets at a certain point. For example, in counting the total number of heads out of 1000 flips, I expected that 500 total heads would be the most popular outcome and indeed it was. I saw 25,224,637 out of a billion sets that had exactly 500 heads. This works out to about 2.52% of all outcomes. In contrast, if we look at the bucket for 450 total heads, I only saw this happen 168,941 times, or roughly 0.016% of the time. This confirms your observation that the curve is denser, that is, &lt;em&gt;taller&lt;/em&gt; at the mean of 500 than further away at 450.&lt;/p&gt;&lt;p&gt;This confirms the key point: &lt;strong&gt;all things are possible, but outcomes are not all equally probable&lt;/strong&gt;. There are &lt;a title="Here's a This American Life episode dedicated to longshots" href="http://www.thisamericanlife.org/Radio_Episode.aspx?episode=398"&gt;longshots&lt;/a&gt;. Professional athletes &lt;a title="It's interesting to read Gladwell's description of the difference between these two." href="http://www.gladwell.com/2000/2000_08_21_a_choking.htm"&gt;panic or &amp;#8216;choke&amp;#8217;&lt;/a&gt;. The &lt;a title="Ok, so Kasparov might have had a simple mistake in the last game, but given enough time with Moore's law, it was going to happen eventually, it just so happened that it was him." href="http://en.wikipedia.org/wiki/Deep_Blue_%E2%80%93_Kasparov,_1997,_Game_6"&gt;world&amp;#8217;s best chess players have bad days&lt;/a&gt;. Additionally, tales about underdogs &lt;a title="I think the best part about Mine That Bird winning the Kentucky Derby in 2009 is that it took the TV announcer about 10 seconds to get the horse's name once it took the lead at the end." href="http://www.youtube.com/watch?v=Hv8x9x5A49s"&gt;make us smile&lt;/a&gt;&amp;#8212; the longer the odds the better. Unexpected outcomes happen, but there&amp;#8217;s still a lot of predictability out there.&lt;/p&gt;&lt;p&gt;It&amp;#8217;s not just coin flips. The bell curve shows up in lots of places like casino games, to the thickness of tree bark, to the measurements of a person&amp;#8217;s IQ. Lots of people have looked at the world and have come up with Gaussian models. It&amp;#8217;s easy to think of the world as one big, bell shaped playground.&lt;/p&gt;&lt;p&gt;But the real world isn&amp;#8217;t always Gaussian. History books are full of &amp;#8220;&lt;a title="The story goes that people used to use the phrase 'black swan' to have the same meaning as 'when pigs fly' until black swans were actually discovered to exist." href="http://www.amazon.com/gp/product/1400063515?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1400063515"&gt;Black Swan&lt;/a&gt;&amp;#8221; events. Stock market crashes and the invention of the computer are statistical outliers that Gaussian models tend not to predict well, but these events shock the world and forever change it. This type of reality isn&amp;#8217;t covered by the bell curve, what Black Swan author &lt;a href="http://www.fooledbyrandomness.com/"&gt;Nassim Teleb&lt;/a&gt; calls the &amp;#8220;&lt;a href="http://books.google.com/books?id=YdOYmYA2TJYC&amp;amp;lpg=PA229&amp;amp;dq=%22the%20bell%20curve%20that%20great%20intellectual%20fraud%22&amp;amp;pg=PA229#v=onepage&amp;amp;q=%22the%20bell%20curve%20that%20great%20intellectual%20fraud%22&amp;amp;f=false"&gt;Great Intellectual Fraud&lt;/a&gt;.&amp;#8221; These events would have such low probability that no one would predict them actually happening. There&amp;#8217;s a different view of randomness that is a fascinating playground of &lt;a href="http://en.wikipedia.org/wiki/Beno%C3%AEt_Mandelbrot"&gt;Beno&amp;#238;t Mandelbrot&lt;/a&gt; &lt;a href="http://www.amazon.com/gp/product/0465043577/ref=pd_lpo_k2_dp_sr_1?pf_rd_p=486539851&amp;amp;pf_rd_s=lpo-top-stripe-1&amp;amp;pf_rd_t=201&amp;amp;pf_rd_i=0465043550&amp;amp;pf_rd_m=ATVPDKIKX0DER&amp;amp;pf_rd_r=1J3P8AMPM2MT0QD3S5K3#noop"&gt;and his fractals&lt;/a&gt; that better explain some of these events, but we will ignore all of this to keep things simple. We&amp;#8217;ll acknowledge that the Gaussian view of the world isn&amp;#8217;t &lt;em&gt;always&lt;/em&gt; right, no more than a map of the world is the actual terrain.&lt;/p&gt;&lt;p&gt;The Gaussian worldview assumes everything will typically be some average value and then treats everything else as increasingly less likely &amp;#8220;errors&amp;#8221; as you exponentially drift away from the center (Gauss used the curve to measure &lt;em&gt;errors&lt;/em&gt; in astronomical data after all). However, it&amp;#8217;s not fair to treat real observations from the world as &amp;#8220;errors&amp;#8221; any more than it is to say that a person is an &amp;#8220;error&amp;#8221; from the &amp;#8220;average human&amp;#8221; that is half male and half female. Some of these same problems can come up treating a person as having skill that is Gaussian. Disclaimers aside, we&amp;#8217;ll go along with George Box&amp;#8217;s &lt;a title="See the bottom of page 61 here, although he said it much earlier, at least in 1987. I first heard of this quote in a talk by Peter Norvig on the usefulness of even poor models given lots of data." href="http://books.google.com/books?id=63v--IZrNtsC&amp;amp;lpg=PA61&amp;amp;dq=%22all%20models%20are%20wrong%22%20george%20box&amp;amp;pg=PA61#v=onepage&amp;amp;q=&amp;amp;f=false"&gt;view&lt;/a&gt; that &amp;#8220;all models are wrong, but some models are useful.&amp;#8221;&lt;/p&gt;&lt;h4&gt;Gaussian Basics&lt;/h4&gt;&lt;p&gt;Gaussian curves are completely described by two values:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;The mean (average) value which is often represented by the Greek letter &amp;#956; (mu)&lt;/li&gt;&lt;li&gt;The standard deviation, represented by the Greek letter &amp;#963; (sigma). This indicates how far apart the data is spread out.&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;In counting the total number heads in 1000 flips, the mean was 500 and the standard deviation was &lt;a title="I go into this in more details in the accompanying math paper." href="http://www.wolframalpha.com/input/?i=sqrt%281000*.5*%281-.5%29%29"&gt;about 16&lt;/a&gt;. In general, 68% of the outcomes will be within &amp;#177; 1 standard deviation (e.g. 484-516 in the experiment), 95% within 2 standard deviations (e.g. 468-532) and 99.7% within 3 standard deviations (452-548):&lt;/p&gt;&lt;a title="I got the idea for this diagram from the Wikipedia article on the normal distribution. However, the color and look didn't match the rest of the post, so I recreated it in Excel." href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S5LAgkPdWsI/AAAAAAAAKgI/o3kJX5ccxWU/s1600-h/NormalDistributionWithPercentages.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5445626565161212610" name="BLOGGER_PHOTO_ID_5445626565161212610" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S5LAgkPdWsI/AAAAAAAAKgI/o3kJX5ccxWU/s576/NormalDistributionWithPercentages.png" /&gt;&lt;/a&gt;&lt;p&gt;An important takeaway is that the bell curve allows for &lt;em&gt;all&lt;/em&gt; possibilities, but each possibility is most definitely not equally likely. The bell curve gives us a model to calculate how likely something should be given an average value and a spread. Notice how outcomes sharply become less probable as we drift further away from the mean value.&lt;/p&gt;&lt;p&gt;While we&amp;#8217;re looking at the Gaussian curve, it&amp;#8217;s important to look at -3&amp;#963; away from the mean on the left side. As you can see, &lt;em&gt;most&lt;/em&gt; of the area under the curve is to the right of this point. I mention this because &lt;strong&gt;the TrueSkill algorithm uses the -3&amp;#963; mark as a (very) conservative estimate for your skill&lt;/strong&gt;. You&amp;#8217;re probably better than this conservative estimate, but you&amp;#8217;re most likely not worse than this value. Therefore, it&amp;#8217;s a stable number for comparing yourself to others and is useful for use in sorting a leaderboard.&lt;/p&gt;&lt;h4&gt;3D Bell Curves: Multivariate Gaussians&lt;/h4&gt;&lt;p&gt;A non-intuitive observation is that Gaussian distributions can occur in more than the two dimensions that we&amp;#8217;ve seen so far. You can sort of think of a Gaussian in three dimensions as a mountain. Here&amp;#8217;s an example:&lt;/p&gt;&lt;a href="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S4sa2VEcR-I/AAAAAAAAKe8/tLV8OSenBS8/s1600-h/Gaussian_3D_Circular.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443474095278409698" title="In case you're wondering, I used GNU Plot to make this. See the accompanying math paper for more details." name="BLOGGER_PHOTO_ID_5443474095278409698" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S4sa2VEcR-I/AAAAAAAAKe8/tLV8OSenBS8/s720/Gaussian_3D_Circular.png" /&gt;&lt;/a&gt;&lt;p&gt;In this plot, taller regions represent higher probabilities. As you can see, not all things are equally probable. The most probable value is the mean value that is right in the middle and then things sharply decline away from it.&lt;/p&gt;&lt;p&gt;In maps of &lt;em&gt;real&lt;/em&gt; mountains, you often see a 2D contour plot where each line represents a different elevation (e.g. every 100 feet):&lt;/p&gt;&lt;a title="I took this snapshot from the 7.5-Minute Series Topographic Map of Pikes Peak Quadrangle from the U.S. Geological Survey (USGS). My wife and I went to Pikes Peak the day we landed in Colorado from Indianapolis. One thing is certain: I felt those elevation lines :). For best experiences, acclimate yourself for a few days and then go." href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S5XJiVAPfoI/AAAAAAAAKgg/hPMzakmsZEo/s1600-h/PikesPeakTopoMap.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5446480915965378178" name="BLOGGER_PHOTO_ID_5446480915965378178" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S5XJiVAPfoI/AAAAAAAAKgg/hPMzakmsZEo/s640/PikesPeakTopoMap.png" /&gt;&lt;/a&gt;&lt;p&gt;The closer the lines on the map, the sharper the inclines. You can do something similar for 2D representations of 3D Gaussians. In textbooks, you often just see 2D representation that looks like this:&lt;/p&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S4segBlQOSI/AAAAAAAAKfE/OQNx_Zcg9FA/s1600-h/Gaussian_2D_Contour.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443478110136711458" title="It's unfortunate that most books never show the 3D perspective, it's much easier to see where it comes from." name="BLOGGER_PHOTO_ID_5443478110136711458" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S4segBlQOSI/AAAAAAAAKfE/OQNx_Zcg9FA/s640/Gaussian_2D_Contour.png" /&gt;&lt;/a&gt;&lt;p&gt;This is called an &amp;#8220;isoprobability contour&amp;#8221; plot. It&amp;#8217;s just a fancy way of saying &amp;#8220;things that have the same probability will be the same color.&amp;#8221; Note that it&amp;#8217;s still in three dimensions. In this case, the third dimension is color intensity instead of the height you saw on a surface plot earlier. I like to think of contour plots as treasure maps for playing the &amp;#8220;you&amp;#8217;re getting warmer...&amp;#8221; game. In this case, black means &amp;#8220;you&amp;#8217;re cold,&amp;#8221; red means &amp;#8220;you&amp;#8217;re getting warmer...,&amp;#8221; and yellow means &amp;#8220;you&amp;#8217;re on fire!&amp;#8221; which corresponds to the highest probability.&lt;/p&gt;&lt;p&gt;See? Now you understand Gaussians and know that &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/Multivariate_normal_distribution"&gt;multivariate Gaussians&lt;/a&gt;&amp;#8221; aren&amp;#8217;t as scary as they sound.&lt;/p&gt;&lt;h4&gt;Let&amp;#8217;s Talk About Chess&lt;/h4&gt;&lt;a title="'Chess Set' by Alan Light on Wikipedia, retouched by Andre Riemann. Licensed under the Creative Commons Attribution ShareAlike 3.0 License." href="http://en.wikipedia.org/wiki/File:ChessSet.jpg"&gt;&lt;img style="border:0; margin: 0px 0px 5px 10px; display: inline" id="BLOGGER_PHOTO_ID_5446484220132842210" name="BLOGGER_PHOTO_ID_5446484220132842210" align="right" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S5XMip_J2uI/AAAAAAAAKgo/n0AB8I5DVLg/s160/ChessSet.jpg" /&gt;&lt;/a&gt;&lt;p&gt;There&amp;#8217;s still more to learn, but we&amp;#8217;ll pick up what we need along the way. We already have enough tools to do something useful. To warm up, let&amp;#8217;s talk about chess because ratings are well-defined there.&lt;/p&gt;&lt;p&gt;In chess, a bright beginner is expected to have a rating around 1000. Keep in mind that ratings have no units; it&amp;#8217;s just a number that is only meaningful when compared to someone else&amp;#8217;s number. By &lt;a href="http://www.chessbase.com/newsdetail.asp?newsid=4326" title="This 200 point class tradition was established by the Harkness system developed in the early 1950's. It was a popular precursor to the Elo system that we'll cover shortly."&gt;tradition&lt;/a&gt;, a difference of 200 indicates the better ranked player is expected to win 75% of the time. Again, nothing is special about the number 200, it was just chosen to be the difference needed to get a 75% win ratio and effectively defines a &amp;#8220;class&amp;#8221; of player.&lt;/p&gt;&lt;p&gt;I&amp;#8217;ve slowly been practicing and have a rating around 1200. This means that if I play a bright beginner with a rating of 1000, I&amp;#8217;m expected to win three out of four games.&lt;/p&gt;&lt;p&gt;We can start to visualize a match between me and bright beginner by drawing two bell curves that have a mean of 1000 and 1200 respectively with both having a standard deviation of 200:&lt;/p&gt;&lt;a href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S2Tx3mUnG9I/AAAAAAAAKZo/N3CZWIlhjoI/s1600-h/bell_curves_of_bright_beginner_vs_jeff_before.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5432732987997756370" name="BLOGGER_PHOTO_ID_5432732987997756370" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S2Tx3mUnG9I/AAAAAAAAKZo/N3CZWIlhjoI/s576/bell_curves_of_bright_beginner_vs_jeff_before.png" /&gt;&lt;/a&gt;&lt;p&gt;The above graph shows what the ratings represent: they&amp;#8217;re an indicator of how we&amp;#8217;re &lt;em&gt;expected&lt;/em&gt; to perform if we play a game. The most likely performance is exactly what the rating is (the mean value). One non-obvious point is that you can &lt;a title="This subtraction idea is also covered more in the accompanying math paper." href="http://mathworld.wolfram.com/NormalDifferenceDistribution.html"&gt;subtract two bell curves and get another bell curve&lt;/a&gt;. The new center is the difference of the means and the resulting curve is a bit wider than the previous curves. By taking my skill curve (red) and subtracting the beginner&amp;#8217;s curve (blue), you&amp;#8217;ll get this resulting curve (purple):&lt;/p&gt;&lt;a href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S2TvsakEvuI/AAAAAAAAKZg/1caIpEXPH0Q/s1600-h/bell_curves_difference.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5432730596839571170" name="BLOGGER_PHOTO_ID_5432730596839571170" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S2TvsakEvuI/AAAAAAAAKZg/1caIpEXPH0Q/s576/bell_curves_difference.png" /&gt;&lt;/a&gt;&lt;p&gt;Note that it&amp;#8217;s centered at 1200 - 1000 = 200. Although interesting to look on its own, it gives some useful information. This curve is representing all possible game outcomes between me and the beginner. The middle shows that I&amp;#8217;m expected to be 200 points better. The far left side shows that there is a tiny chance that the beginner has a game where he plays as if he&amp;#8217;s 700 points better than I am. The far right shows that there is a tiny chance that I&amp;#8217;ll play as if I&amp;#8217;m 1100 points better. The curve actually goes on forever in both ways, but the expected probability for those outcomes is so small that it&amp;#8217;s effectively zero.&lt;/p&gt;&lt;p&gt;As a player, you really only care about one very specific point on this curve: zero. Since I have a higher rating, I&amp;#8217;m interested in all possible outcomes where the difference is positive. These are the outcomes where I&amp;#8217;m expected to outperform the beginner. On the other hand, the beginner is keeping his eye on everything to the left of zero. These are the outcomes where the performance difference is negative, implying that he outperforms me.&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S2XY9WNERII/AAAAAAAAKZw/nNzg4ZvrpQM/s1600-h/performance_difference_shaded_to_zero.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5432987073936376962" name="BLOGGER_PHOTO_ID_5432987073936376962" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S2XY9WNERII/AAAAAAAAKZw/nNzg4ZvrpQM/s576/performance_difference_shaded_to_zero.png" /&gt;&lt;/a&gt;&lt;p&gt;We can plug a few numbers into &lt;a title="For example: Wolfram Alpha or Excel" href="http://www.wolframalpha.com/input/?i=CDF%5BNormalDistribution%5B200%2C+200+*+Sqrt%5B2%5D%5D%2C+0%5D"&gt;a calculator&lt;/a&gt; and see that there is about a 24% probability that the performance difference will be negative, implying the beginner wins, and a 76% chance that the difference will be positive, meaning that I win. This is roughly the 75% that we were expecting for a 200 point difference.&lt;/p&gt;&lt;p&gt;This has been a bit too concrete for my particular match with a beginner. We can generalize it by creating another curve where the horizontal axis represents the difference in player ratings and the vertical axis represents the total probability of winning given that rating difference:&lt;/p&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S2X1k7iXRTI/AAAAAAAAKaA/UEwV2FQA5Hk/s1600-h/cdf_chess_given_rating_difference.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5433018540298290482" name="BLOGGER_PHOTO_ID_5433018540298290482" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S2X1k7iXRTI/AAAAAAAAKaA/UEwV2FQA5Hk/s640/cdf_chess_given_rating_difference.png" /&gt;&lt;/a&gt;&lt;p&gt;As expected, having two players with equal ratings, and thus a rating difference of 0, implies the odds of winning are 50%. Likewise, if you look at the -200 mark, you see the curve is at the 24% that we calculated earlier. Similarly, +200 is at the 76% mark. This also shows that outcomes on the far left side are quite unlikely. For example, the odds of me winning a game against &lt;a href="http://en.wikipedia.org/wiki/Magnus_Carlsen" title="Since Kasparov stopped playing professionally, Magnus is the top guy. Not surprisingly, Kasparov is now Magnus's teacher."&gt;Magnus Carlsen&lt;/a&gt;, who is at the top of the &lt;a title="The 19 year old Magnus was at the top of the FIDE leaderboard at the time of this writing (March 2010)" href="http://ratings.fide.com/top.phtml?list=men"&gt;chess leaderboard&lt;/a&gt; with a rating of 2813, would be at the -1613 mark (1200 - 2813) on this chart and have a probability near one in a &lt;em&gt;billion&lt;/em&gt;. I won&amp;#8217;t hold my breath. (Actually, most chess groups use a slightly different curve, but the ideas are the same. See the &lt;a href="http://dl.dropbox.com/u/1083108/Moserware/Skill/The%20Math%20Behind%20TrueSkill.pdf"&gt;accompanying math paper&lt;/a&gt; for details.)&lt;/p&gt;&lt;p&gt;All of these curves were probabilities of what &lt;em&gt;might&lt;/em&gt; happen, not what &lt;em&gt;actually&lt;/em&gt; happened. In actuality, let&amp;#8217;s say I lost the game by some silly blunder (oops!). The question that the beginner wants to know is how much his rating will go up. It also makes sense that my rating will go down as a punishment for the loss. The harder question is just &lt;em&gt;how much&lt;/em&gt; should the ratings change?&lt;/p&gt;&lt;p&gt;By winning, the beginner demonstrated that he was probably better than the 25% winning probability we thought he would have. One way of updating ratings is to imagine that each player bets a certain amount of his rating on each game. The amount of the bet is determined by the probability of the outcome. In addition, we decide how dramatic the ratings change should be for an individual game. If you believe the most recent game should count 100%, then you&amp;#8217;d expect my rating to go down a lot and his to go up a lot. The decision of how much the most recent game should count leads to what chess guys call the multiplicative &amp;#8220;K-factor.&amp;#8221;&lt;/p&gt;&lt;p&gt;The K-Factor is what we multiply a probability by to get the total amount of a rating change. It reflects the maximum possible change in a person&amp;#8217;s rating. A reasonable choice of a weight is that the most recent game counts about 7% which leads to a K-factor of 24. New players tend to have more fluctuations than well-established players, so new players might get a K-Factor of 32 while grand masters have a K-factor around 10. Here&amp;#8217;s how the K-Factor changes with respect to how much the latest game should count:&lt;/p&gt;&lt;a href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S5ZZF5xos8I/AAAAAAAAKg0/K4Whs0yA79Y/s1600-h/KFactorAlphaImpact.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5446638757294420930" name="BLOGGER_PHOTO_ID_5446638757294420930" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S5ZZF5xos8I/AAAAAAAAKg0/K4Whs0yA79Y/s576/KFactorAlphaImpact.png" /&gt;&lt;/a&gt;&lt;p&gt;Using a K-Factor of 24 means that my rating will now be lowered to 1182 and the beginner&amp;#8217;s will rise to 1018. Our curves are now closer together:&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5ZfF6pBUwI/AAAAAAAAKg8/mW6ndagfFDI/s1600-h/BeginnerVsJeffAfterUpdate.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5446645354596487938" name="BLOGGER_PHOTO_ID_5446645354596487938" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5ZfF6pBUwI/AAAAAAAAKg8/mW6ndagfFDI/s576/BeginnerVsJeffAfterUpdate.png" /&gt;&lt;/a&gt;&lt;p&gt;Note that our standard deviations never change. Here are the probabilities if we were to play again:&lt;/p&gt;&lt;p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5aDq4_xYXI/AAAAAAAAKhE/RSM0V6uki_Y/s1600-h/performance_difference_shaded_to_zero_after.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5446685572228800882" name="BLOGGER_PHOTO_ID_5446685572228800882" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5aDq4_xYXI/AAAAAAAAKhE/RSM0V6uki_Y/s576/performance_difference_shaded_to_zero_after.png" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;This method is known as the &lt;a href="http://en.wikipedia.org/wiki/Elo_rating_system"&gt;Elo rating system&lt;/a&gt;, named after &lt;a href="http://en.wikipedia.org/wiki/Arpad_Elo"&gt;Arpad Elo&lt;/a&gt;, the chess enthusiast who created it. It&amp;#8217;s relatively simple to implement and most games that calculate skill end here.&lt;/p&gt;&lt;h4&gt;I Thought You Said You&amp;#8217;d Talk About TrueSkill?&lt;/h4&gt;&lt;p&gt;Everything so far has just been prerequisites to the main event; the TrueSkill paper assumes you&amp;#8217;re already familiar with it. It was all sort of new to me, so it took awhile to get comfortable with the Elo ideas. Although the Elo model will get you far, there are a few notable things it doesn&amp;#8217;t handle well:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;strong&gt;Newbies&lt;/strong&gt; - In the Elo system, you&amp;#8217;re typically assigned a &amp;#8220;provisional&amp;#8221; rating for the first 20 games. These games tend to have a higher K-factor associated with them in order to let the algorithm determine your skill faster before it&amp;#8217;s slowed down by a non-provisional (and smaller) K-factor. We would like an algorithm that converges quickly onto a player&amp;#8217;s true skill (get it?) to not waste their time having unbalanced matches. This means the algorithm should start giving reasonable approximations of skill within 5-10 games.&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Teams&lt;/strong&gt; - Elo was explicitly designed for two players. Efforts to adapt it to work for multiple people on multiple teams have primarily been unsophisticated hacks. One such approach is to treat teams as individual players that duel against the other players on the opposing teams and then apply the average of the duels. This is the &amp;#8220;duelling heuristic&amp;#8221; mentioned in the TrueSkill paper. I implemented it in the &lt;a href="http://github.com/moserware/Skills"&gt;accompanying project&lt;/a&gt;. It&amp;#8217;s ok, but seems a bit too hackish and doesn&amp;#8217;t converge well.&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Draws&lt;/strong&gt; - Elo treats draws as a half win and half loss. This doesn&amp;#8217;t seem fair because draws can tell you a lot. Draws imply you were evenly paired whereas a win indicates you&amp;#8217;re better, but unsure how much better. Likewise, a loss indicates you did worse, but you don&amp;#8217;t really know how much worse. So it seems that a draw is important to explicitly model.&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;The TrueSkill algorithm generalizes Elo by keeping track of two variables: your average (mean) skill &lt;em&gt;and&lt;/em&gt; the system&amp;#8217;s uncertainty about that estimate (your standard deviation). It does this instead of relying on a something like a fixed K-factor. Essentially, this gives the algorithm a dynamic k-factor. This addresses the newbie problem because it removes the need to have &amp;#8220;provisional&amp;#8221; games. In addition, it addresses the other problems in a nice statistical manner. Tracking these two values are so fundamental to the algorithm that Microsoft researchers informally referred to it as the &amp;#956;&amp;#963; (mu-sigma) system until the marketing guys gave it the name TrueSkill.&lt;/p&gt;&lt;p&gt;We&amp;#8217;ll go into the details shortly, but it&amp;#8217;s helpful to get a quick visual overview of what TrueSkill does. Let&amp;#8217;s say we have Eric, an experienced player that has played a lot and established his rating over time. In addition, we have newbie: Natalia.&lt;/p&gt;&lt;p&gt;Here&amp;#8217;s what their skill curves might look like before a game:&lt;/p&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S5cJKZ6voeI/AAAAAAAAKhM/BW_oakG7pRU/s1600-h/TrueSkillCurvesBeforeExample.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5446832348688523746" name="BLOGGER_PHOTO_ID_5446832348688523746" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S5cJKZ6voeI/AAAAAAAAKhM/BW_oakG7pRU/s576/TrueSkillCurvesBeforeExample.png" /&gt;&lt;/a&gt;&lt;p&gt;And after Natalia wins:&lt;/p&gt;&lt;a href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S5cJlUXV0LI/AAAAAAAAKhU/F-1jRnH7mHk/s1600-h/TrueSkillCurvesAfterExample.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5446832811054321842" name="BLOGGER_PHOTO_ID_5446832811054321842" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S5cJlUXV0LI/AAAAAAAAKhU/F-1jRnH7mHk/s576/TrueSkillCurvesAfterExample.png" /&gt;&lt;/a&gt;&lt;p&gt;Notice how Natalia&amp;#8217;s skill curve becomes narrower and taller (i.e. makes a big update) while Eric&amp;#8217;s curve barely moves. This shows that the TrueSkill algorithm thinks that she&amp;#8217;s probably better than Eric, but doesn&amp;#8217;t how much better. Although TrueSkill is a little more confident about Natalia&amp;#8217;s mean after the game (i.e. it&amp;#8217;s now taller in the middle), it&amp;#8217;s still very uncertain. Looking at her updated bell curve shows that her skill could be between 15 and 50.&lt;/p&gt;&lt;p&gt;The rest of this post will explain how calculations like this occurred and how much more complicated scenarios can occur. But to understand it well enough to implement it, we&amp;#8217;ll need to learn a couple of new things.&lt;/p&gt;&lt;h4&gt;Bayesian Probability&lt;/h4&gt;&lt;a href="http://www.amazon.com/gp/product/B00000IWDR?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=B00000IWDR"&gt;&lt;/a&gt;&lt;p&gt;Most basic statistics classes focus on frequencies of events occurring. For example, the probability of getting a red marble when randomly drawing from a jar that has 3 red marbles and 7 blue marbles is 30%. Another example is that the probability of rolling two dice and getting a total of 7 is &lt;a href="http://www.wolframalpha.com/input/?i=probability+getting+7+two+dice"&gt;about 17%&lt;/a&gt;. The key idea in both of these examples is that you can count each type of outcome and then compute the &lt;em&gt;frequency&lt;/em&gt; directly. Although helpful in calculating your odds at casino games, &amp;#8220;frequentist&amp;#8221; thinking is not that helpful with many practical applications, like finding your skill in a team.&lt;/p&gt;&lt;p&gt;A different approach is to think of probability as degree of belief in something. The basic idea is that you have some &lt;strong&gt;prior belief&lt;/strong&gt; and then you observe some &lt;strong&gt;evidence&lt;/strong&gt; that updates your belief leaving you with an updated &lt;strong&gt;posterior&lt;/strong&gt; belief. As you might expect, learning about new evidence will typically make you more certain about your belief.&lt;/p&gt;&lt;p&gt;Let&amp;#8217;s assume that you&amp;#8217;re trying to find a treasure on a map. The treasure could be anywhere on the map, but you have a hunch that it&amp;#8217;s probably around the center of the map and increasingly less likely as you move away from the center. We could track the probability of finding the treasure using the 3D multivariate Gaussian we saw earlier:&lt;/p&gt;&lt;a href="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S4sa2VEcR-I/AAAAAAAAKe8/tLV8OSenBS8/s1600-h/Gaussian_3D_Circular.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443474095278409698" name="BLOGGER_PHOTO_ID_5443474095278409698" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S4sa2VEcR-I/AAAAAAAAKe8/tLV8OSenBS8/s576/Gaussian_3D_Circular.png" /&gt;&lt;/a&gt;&lt;p&gt;Now, let&amp;#8217;s say that after studying a book about the treasure, you&amp;#8217;ve learned that there&amp;#8217;s a strong likelihood that treasure is somewhere along the diagonal line on the map. Perhaps this was based on some secret clue. Your clue information doesn&amp;#8217;t necessarily mean the treasure will be &lt;em&gt;exactly&lt;/em&gt; on that line, but rather that the treasure will most-likely be near it. The &lt;strong&gt;likelihood function&lt;/strong&gt; might look like this in 3D:&lt;/p&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S4smc_4B1DI/AAAAAAAAKfM/cw28slVmK4E/s1600-h/Gaussian_3D_Likelihood.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443486854232003634" name="BLOGGER_PHOTO_ID_5443486854232003634" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S4smc_4B1DI/AAAAAAAAKfM/cw28slVmK4E/s576/Gaussian_3D_Likelihood.png" /&gt;&lt;/a&gt;&lt;p&gt;We&amp;#8217;d like to use our &lt;em&gt;prior&lt;/em&gt; information and this new &lt;em&gt;likelihood&lt;/em&gt; information to come up with a better &lt;em&gt;posterior&lt;/em&gt; guess of the treasure. It turns out that we can just multiply the prior and likelihood to obtain a posterior distribution that looks like this:&lt;/p&gt;&lt;a href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S4snAilcqWI/AAAAAAAAKfU/KQgGACxlMkk/s1600-h/Gaussian_3D_Posterior.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443487464844732770" name="BLOGGER_PHOTO_ID_5443487464844732770" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S4snAilcqWI/AAAAAAAAKfU/KQgGACxlMkk/s576/Gaussian_3D_Posterior.png" /&gt;&lt;/a&gt;&lt;p&gt;This is giving us a smaller and more concentrated area to look at.&lt;/p&gt;&lt;p&gt;If you look at most textbooks, you typically just see this information using 2D isoprobability contour plots that we learned about earlier. Here&amp;#8217;s the same information in 2D:&lt;/p&gt;&lt;p&gt;Prior:&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S4snWvA5YrI/AAAAAAAAKfc/J31SDnp-09s/s1600-h/Gaussian_2D_Prior.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443487846138208946" name="BLOGGER_PHOTO_ID_5443487846138208946" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S4snWvA5YrI/AAAAAAAAKfc/J31SDnp-09s/s576/Gaussian_2D_Prior.png" /&gt;&lt;/a&gt;&lt;p&gt;Likelihood:&lt;/p&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S4sneX0nRNI/AAAAAAAAKfk/_N5AAMmuIO4/s1600-h/Gaussian_2D_Likelihood.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443487977351627986" name="BLOGGER_PHOTO_ID_5443487977351627986" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S4sneX0nRNI/AAAAAAAAKfk/_N5AAMmuIO4/s576/Gaussian_2D_Likelihood.png" /&gt;&lt;/a&gt;&lt;p&gt;Posterior:&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S4snj-YISVI/AAAAAAAAKfs/eAZD78TsGKs/s1600-h/Gaussian_2D_Posterior.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443488073600485714" name="BLOGGER_PHOTO_ID_5443488073600485714" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S4snj-YISVI/AAAAAAAAKfs/eAZD78TsGKs/s576/Gaussian_2D_Posterior.png" /&gt;&lt;/a&gt;&lt;p&gt;For fun, let&amp;#8217;s say we found additional information saying the treasure is along the other diagonal with the following likelihood:&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5hDR3bT4xI/AAAAAAAAKhc/BOoitWBwmPU/s1600-h/Gaussian_2D_Likelihood_Opposite_Direction.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5447177723519951634" name="BLOGGER_PHOTO_ID_5447177723519951634" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5hDR3bT4xI/AAAAAAAAKhc/BOoitWBwmPU/s576/Gaussian_2D_Likelihood_Opposite_Direction.png" /&gt;&lt;/a&gt;&lt;p&gt;To incorporate this information, we&amp;#8217;re able to &lt;a title="The fancy term for the being able to do this is called the 'conjugate prior' since the prior and posterior are 'conjoined' like twins. That is, they're of the same class of function." href="http://en.wikipedia.org/wiki/Conjugate_prior"&gt;take our last posterior and make that the prior for the next iteration&lt;/a&gt; using the new likelihood information to get this updated posterior:&lt;/p&gt;&lt;a href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S5hDivDzF3I/AAAAAAAAKhs/8i9SWy-SNhs/s1600-h/Gaussian_2D_Posterior_Updated.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5447178013331625842" name="BLOGGER_PHOTO_ID_5447178013331625842" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S5hDivDzF3I/AAAAAAAAKhs/8i9SWy-SNhs/s576/Gaussian_2D_Posterior_Updated.png" /&gt;&lt;/a&gt;&lt;p&gt;This is a much more focused estimate than our original belief! We could iterate the procedure and potentially get an even smaller search area.&lt;/p&gt;&lt;p&gt;&lt;a title="Thomas Bayes (c. 1702 - 17 April 1761)" href="http://en.wikipedia.org/wiki/Thomas_Bayes"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5435310499900212850" name="BLOGGER_PHOTO_ID_5435310499900212850" align="right" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S24aGiaAxnI/AAAAAAAAKas/PY0bAN-PDpM/s220/Thomas_Bayes.gif" /&gt;&lt;/a&gt;And that&amp;#8217;s basically all there is to it. In TrueSkill, the buried treasure that we look for is a person&amp;#8217;s skill. This approach to probability is called &amp;#8220;Bayesian&amp;#8221; because it was discovered by a Presbyterian minister in the 1700&amp;#8217;s named &lt;a title="More precisely, it was Bayes' friend Richard Price who found this unpublished paper after Bayes' death and saw that it was useful and then decided to publish it." href="http://en.wikipedia.org/wiki/Thomas_Bayes"&gt;Thomas Bayes&lt;/a&gt; who liked to dabble in math.&lt;/p&gt;&lt;p&gt;The central ideas to Bayesian statistics are the prior, the likelihood, and the posterior. There&amp;#8217;s detailed math that goes along with this and is in the &lt;a href="http://dl.dropbox.com/u/1083108/Moserware/Skill/The%20Math%20Behind%20TrueSkill.pdf"&gt;accompanying paper&lt;/a&gt;, but understanding these basic ideas is more important:&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;&amp;#8220;When you understand something, then you can find the math to express that understanding. The math doesn&amp;#8217;t provide the understanding.&amp;#8221;&amp;#8212; &lt;a href="http://www.reddit.com/r/programming/comments/bblt4/lamport_when_you_understand_something_then_you/"&gt;Lamport&lt;/a&gt;&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;Bayesian methods have only recently become popular in the computer age because computers can quickly iterate through several tedious rounds of priors and posteriors. Bayesian methods have historically been popular inside of Microsoft Research (where TrueSkill was invented). Way &lt;a href="http://people.cs.ubc.ca/~murphyk/Bayes/la.times.html"&gt;back in 1996&lt;/a&gt;, Bill Gates considered Bayesian statistics to be Microsoft Research&amp;#8217;s secret sauce.&lt;/p&gt;&lt;p&gt;As we&amp;#8217;ll see later on, we can use the Bayesian approach to calculate a person&amp;#8217;s skill. In general, it&amp;#8217;s highly useful to update your belief based off previous evidence (e.g. your performance in previous games). This &lt;em&gt;usually&lt;/em&gt; works out well. However, sometimes &amp;#8220;&lt;a href="http://www.amazon.com/gp/product/1400063515?ie=UTF8&amp;amp;tag=moserware-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1400063515"&gt;Black Swans&lt;/a&gt;&amp;#8221; are present. For example, &lt;a title="In general, this is called the Problem of Inductive Knowledge and is discussed in the book." href="http://books.google.com/books?id=gWW4SkJjM08C&amp;amp;lpg=PR2&amp;amp;dq=black%20swan&amp;amp;pg=PA40#v=onepage&amp;amp;q=&amp;amp;f=false"&gt;a turkey&lt;/a&gt; using Bayesian inference would have a very specific posterior distribution of the kindness of a farmer who feeds it every day for 1000 days only to be surprised by a Thanksgiving event that was so many standard deviations away from the turkey&amp;#8217;s mean belief that he never would have saw it coming. Skill has similar potential for a &amp;#8220;Thanksgiving&amp;#8221; event where an average player beats the best player in the world. We&amp;#8217;ll acknowledge that small possibility, but ignore it to simplify things (and give the unlikely winner a great story for the rest of his life).&lt;/p&gt;&lt;p&gt;TrueSkill claims that it is Bayesian, so you can be sure that there is going to be a concept of a prior and a likelihood in it&amp;#8212; and there is. We&amp;#8217;re getting closer, but we still need to learn a few more details.&lt;/p&gt;&lt;h4&gt;The Marginalized, but Not Forgotten Distribution&lt;/h4&gt;&lt;p&gt;&lt;a title="'Running against the light' by clairity on Flickr. Used under the Creative Commons Attribution License" href="http://www.flickr.com/photos/clairity/145758101/"&gt;&lt;img style="border:0; margin: 0px 0px 0px 10px; width: 400px; display: inline; height: 300px" id="BLOGGER_PHOTO_ID_5435222701540710050" name="BLOGGER_PHOTO_ID_5435222701540710050" align="right" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S23KQAB5FqI/AAAAAAAAKak/Ugc3ijWuvYI/s400/clarity_man_running_at_crosswalk.jpg" /&gt;&lt;/a&gt;Next we need to learn about &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/Marginal_distribution"&gt;marginal distributions&lt;/a&gt;&amp;#8221;, often just called &amp;#8220;marginals.&amp;#8221; Marginals are a way of distilling information to focus on what you care about. Imagine you have a table of sales for each month for the past year. Let&amp;#8217;s say that you only care about total sales for the year. You could take out your calculator and add up all the sales in each month to get the total aggregate sales for the year. Since you care about this number and it wasn&amp;#8217;t in the original report, you could add it in the &lt;em&gt;margin&lt;/em&gt; of the table. That&amp;#8217;s roughly where &amp;#8220;margin-al&amp;#8221; got its name.&lt;/p&gt;&lt;p&gt;Wikipedia has a great &lt;a title="This illustration came from the article on Marginal distribution that helped me finally get marginals" href="http://en.wikipedia.org/wiki/Marginal_distribution"&gt;illustration&lt;/a&gt; on the topic: consider a guy that ignores his mom&amp;#8217;s advice and &lt;em&gt;never&lt;/em&gt; looks both ways when crossing the street. Even worse, he&amp;#8217;s too engrossed in listening to his iPod that he doesn&amp;#8217;t look &lt;em&gt;any&lt;/em&gt; way, he just always crosses.&lt;/p&gt;&lt;p&gt;What&amp;#8217;s the probability of him getting hit by a car at a specific intersection? Let&amp;#8217;s simplify things by saying that it just depends on whether the light is red, yellow, or green.&lt;/p&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Light State&lt;/td&gt;&lt;td&gt;Red&lt;/td&gt;&lt;td&gt;Yellow&lt;/td&gt;&lt;td&gt;Green&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Probability of getting hit given light state&lt;/td&gt;&lt;td&gt;1%&lt;/td&gt;&lt;td&gt;9%&lt;/td&gt;&lt;td&gt;90%&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;This is helpful, but it doesn&amp;#8217;t tell us what we want. We also need to know how long the light stays a given color&lt;/p&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Light color&lt;/td&gt;&lt;td&gt;Red&lt;/td&gt;&lt;td&gt;Yellow&lt;/td&gt;&lt;td&gt;Green&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;% Time in Color&lt;/td&gt;&lt;td&gt;60%&lt;/td&gt;&lt;td&gt;10%&lt;/td&gt;&lt;td&gt;30%&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;There&amp;#8217;s a bunch of probability data here that&amp;#8217;s a bit overwhelming. If we join the probabilities together, we&amp;#8217;ll have a &amp;#8220;joint distribution&amp;#8221; that&amp;#8217;s just a big complicated system that tells us &lt;em&gt;too much&lt;/em&gt; information.&lt;/p&gt;&lt;p&gt;We can start to distill this information down by calculating the probability of getting hit given each light state:&lt;/p&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; &lt;/td&gt;&lt;td&gt;Red&lt;/td&gt;&lt;td&gt;Yellow&lt;/td&gt;&lt;td&gt;Green&lt;/td&gt;&lt;td&gt;Total&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Probability of Getting Hit&lt;/td&gt;&lt;td&gt;1%*60% = 0.6%&lt;/td&gt;&lt;td&gt;9%*10% = 0.9%&lt;/td&gt;&lt;td&gt;90%*30% = 27%&lt;/td&gt;&lt;td&gt;28.5%&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;In the right &lt;em&gt;margin&lt;/em&gt; of the table we get the value that really matters to this guy. There&amp;#8217;s a 28.5% &lt;em&gt;marginal probability&lt;/em&gt; of getting hit if the guy never looks for cars and just always crosses the street. We obtained it by &amp;#8220;summing out&amp;#8221; the individual components. That is, we simplified the problem by eliminating variables and we eliminated variables by just focusing on the total rather than the parts.&lt;/p&gt;&lt;p&gt;This idea of marginalization is very general. The central question in this article is &amp;#8220;computing your skill,&amp;#8221; but your skill is complicated. When using Bayesian statistics, we often can&amp;#8217;t observe something directly, so we have to come up with a probability distribution that&amp;#8217;s more complicated and then &amp;#8220;marginalize&amp;#8221; it to get the distribution that we really want. We&amp;#8217;ll need to marginalize your skill by doing a similar &amp;#8220;summing-out&amp;#8221; procedure as we did for the reckless guy above.&lt;/p&gt;&lt;p&gt;But before we do that, we need to learn another technique to make calculations simpler.&lt;/p&gt;&lt;h4&gt;What&amp;#8217;s a Factor Graph, and Why Do I Care?&lt;/h4&gt;&lt;p&gt;Remember your algebra class when you worked with expressions like this?&lt;/p&gt;&lt;a href="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S43UlXqqNeI/AAAAAAAAKf4/NCcMQ--IOcU/s1600-h/equation_not_factored.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5444241263033988578" name="BLOGGER_PHOTO_ID_5444241263033988578" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S43UlXqqNeI/AAAAAAAAKf4/NCcMQ--IOcU/s640/equation_not_factored.png" /&gt;&lt;/a&gt;&lt;p&gt;Your teacher showed you that you could simplify this by &amp;#8220;factor-ing&amp;#8221; out w, like this:&lt;/p&gt;&lt;a href="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S4rSsD2a2kI/AAAAAAAAKds/i6ehQ1bdg_c/s1600-h/expression_factored.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443394754020301378" name="BLOGGER_PHOTO_ID_5443394754020301378" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S4rSsD2a2kI/AAAAAAAAKds/i6ehQ1bdg_c/s576/expression_factored.png" /&gt;&lt;/a&gt;&lt;p&gt;We often factor expressions to make them easier to understand and to simplify calculations. Let&amp;#8217;s replace the variables above with w=4, x=1, y=2, and z=3.&lt;/p&gt;&lt;p&gt;Let&amp;#8217;s say the numbers on our calculator are circles and the operators are squares. We could come up with an &amp;#8220;&lt;a href="http://msdn.microsoft.com/en-us/library/bb397951.aspx"&gt;expression tree&lt;/a&gt;&amp;#8221; to describe the calculation like this:&lt;/p&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S4rMBS94shI/AAAAAAAAKdU/qdxmo3QhFK0/s1600-h/factor_graph_complicated_factorization.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443387422274007570" name="BLOGGER_PHOTO_ID_5443387422274007570" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S4rMBS94shI/AAAAAAAAKdU/qdxmo3QhFK0/s576/factor_graph_complicated_factorization.png" /&gt;&lt;/a&gt;&lt;p&gt;You can tell how tedious this computation is by counting 11 &amp;#8220;buttons&amp;#8221; we&amp;#8217;d have to push. We could also factor it like this&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S4rQpZ8baSI/AAAAAAAAKdc/kacyqjdOm28/s1600-h/factor_graph_complicated_simplified.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443392509388220706" name="BLOGGER_PHOTO_ID_5443392509388220706" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S4rQpZ8baSI/AAAAAAAAKdc/kacyqjdOm28/s576/factor_graph_complicated_simplified.png" /&gt;&lt;/a&gt;&lt;p&gt;This &amp;#8220;factorization&amp;#8221; has a total of 7 buttons, a savings of 4 buttons. It might not seem like much here, but factorizing is a big idea.&lt;/p&gt;&lt;p&gt;We face a similar problem of how to factor things when we&amp;#8217;re looking to simplify a complicated probability distribution. We&amp;#8217;ll soon see how your skill is composed of several &amp;#8220;factors&amp;#8221; in a joint distribution. We can simplify computations based on how variables are related to these factors. We&amp;#8217;ll break up the joint distribution into a bunch of factors on a graph. &lt;strong&gt;This graph that links factors and variables is called a &amp;#8220;factor graph.&amp;#8221;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;The key idea about a factor graph is that we represent the marginal conditional probabilities as variables and then represent each major function of those variables as a &amp;#8220;factor.&amp;#8221; We&amp;#8217;ll take advantage of how the graph &amp;#8220;factorizes&amp;#8221; and imagine that each factor is a node on a network that&amp;#8217;s optimized for efficiency. A key efficiency trick is that factor nodes send &amp;#8220;messages&amp;#8221; to other nodes. These messages help simplify further marginal computations. The &amp;#8220;message passing&amp;#8221; is very important and thus will be highlighted with arrows in the upcoming graphs; gray arrows represent messages going &amp;#8220;down&amp;#8221; the graph and black show messages coming &amp;#8220;up&amp;#8221; the graph.&lt;/p&gt;&lt;p&gt;The accompanying &lt;a href="http://github.com/moserware/Skills"&gt;code&lt;/a&gt; and &lt;a href="http://dl.dropbox.com/u/1083108/Moserware/Skill/The%20Math%20Behind%20TrueSkill.pdf"&gt;math paper&lt;/a&gt; go into details about exactly how this happens, but it&amp;#8217;s important to realize the high level idea first. That is, we want to look at all the factors that go into creating the likelihood function for updating a person&amp;#8217;s skill based on a game outcome. Representing this information in a factor graph helps us see how things are related.&lt;/p&gt;&lt;p&gt;Now we have all the foundational concepts that we&amp;#8217;re ready for the main event: the TrueSkill factor graph!&lt;/p&gt;&lt;h4&gt;Enough Chess, Let&amp;#8217;s Rank Something Harder!&lt;/h4&gt;&lt;p&gt;The TrueSkill algorithm is Bayesian because it&amp;#8217;s composed of a prior multiplied by a likelihood. I&amp;#8217;ve highlighted these two components in the sample factor graph from the TrueSkill paper that looks scary at first glance:&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5vdXDRz9UI/AAAAAAAAKiU/Y5oPcSl4eck/s1600-h/TrueSkillFullFactorgraph.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5448191562321491266" name="BLOGGER_PHOTO_ID_5448191562321491266" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5vdXDRz9UI/AAAAAAAAKiU/Y5oPcSl4eck/s720/TrueSkillFullFactorgraph.png" /&gt;&lt;/a&gt;&lt;p&gt;This factor graph shows the outcome of a match that had 3 teams all playing against each other. The first team (on the left) only has one player, but this player was able to defeat both of the other teams. The second team (in the middle) had two players and this team tied the third team (on the right) that had just one player.&lt;/p&gt;&lt;p&gt;In TrueSkill, we just care about a player&amp;#8217;s marginal skill. However, as is often the case with Bayesian models, we have to explicitly model other things that impact the variable we care about. We&amp;#8217;ll briefly cover each factor (more details are in the &lt;a href="http://github.com/moserware/Skills"&gt;code&lt;/a&gt; and &lt;a href="http://dl.dropbox.com/u/1083108/Moserware/Skill/The%20Math%20Behind%20TrueSkill.pdf"&gt;math paper&lt;/a&gt;).&lt;/p&gt;&lt;h4&gt;Factor #1: What Do We Already Know About Your Skill?&lt;/h4&gt;&lt;a href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S4rh00sIxBI/AAAAAAAAKe0/dVIZsc_f6wk/s1600-h/Layer1_priors.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443411397243880466" name="BLOGGER_PHOTO_ID_5443411397243880466" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S4rh00sIxBI/AAAAAAAAKe0/dVIZsc_f6wk/s576/Layer1_priors.png" /&gt;&lt;/a&gt;&lt;p&gt;The first factor starts the whole process. It&amp;#8217;s where we get a player&amp;#8217;s previous skill level from somewhere (e.g. a player database). At this point, we add some uncertainty to your skill&amp;#8217;s standard deviation to keep game dynamics interesting and prevent the standard deviation from hitting zero since the rest of algorithm will make it smaller (since the whole point is to learn about you and become more certain).&lt;/p&gt;&lt;p&gt;There is a factor and a variable for each player. Each factor is a function that remembers a player&amp;#8217;s previous skill. Each variable node holds the current value of a player&amp;#8217;s skill. I say &amp;#8220;current&amp;#8221; because this is the value that we&amp;#8217;ll want to know about after the whole algorithm is completed. Note that the message arrow on the factor only goes one way; we never go back to the prior factor. It just gets things going. However, we will come back to the variable.&lt;/p&gt;&lt;p&gt;But we&amp;#8217;re getting ahead of ourselves.&lt;/p&gt;&lt;h4&gt;Factor #2: How Are You Going To Perform?&lt;/h4&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S4raUAX5XAI/AAAAAAAAKeM/VVg3AgoHFK8/s1600-h/Layer2_likelihood.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443403136863132674" name="BLOGGER_PHOTO_ID_5443403136863132674" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S4raUAX5XAI/AAAAAAAAKeM/VVg3AgoHFK8/s576/Layer2_likelihood.png" /&gt;&lt;/a&gt;&lt;p&gt;Next, we add in beta (&amp;#946;). You can think of beta as the number of skill points to guarantee about an 80% chance of winning. The TrueSkill inventors &lt;a title="This occurred during a GameFest 2007 presentation. Although this talk gets cut short due to an audio problem, it's pretty good at giving an overview." href="http://www.microsoft.com/downloads/details.aspx?FamilyID=1acc9bf7-920d-477b-a7b1-4945b3cb04dd&amp;amp;DisplayLang=en"&gt;refer&lt;/a&gt; to beta as defining the length of a &amp;#8220;skill chain.&amp;#8221;&lt;/p&gt;&lt;p&gt;&lt;a title="The faceless people and chain in this picture came from the Open Clip Art project and are in the public domain. The idea for this image came from Ralf Herbrich's 2007 GameFest presentation that I linked to in the previous link." href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5locWxUbTI/AAAAAAAAKiE/q0A0x8O_KSQ/s1600-h/BetaSkillChainIllustration.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5447500060639391026" name="BLOGGER_PHOTO_ID_5447500060639391026" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5locWxUbTI/AAAAAAAAKiE/q0A0x8O_KSQ/s640/BetaSkillChainIllustration.png" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;The skill chain is composed of the worst player on the far left and the best player on the far right. Each subsequent person on the skill chain is &amp;#8220;beta&amp;#8221; points better and has an 80% win probability against the weaker player. This means that a small beta value indicates a high-skill game (e.g. &lt;a href="http://en.wikipedia.org/wiki/Go_%2528game%2529"&gt;Go&lt;/a&gt;) since smaller differences in points lead to the 80%:20% ratio. Likewise, a game based on chance (e.g. &lt;a href="http://en.wikipedia.org/wiki/Uno_%2528card_game%2529"&gt;Uno&lt;/a&gt;) is a low-skill game that would have a higher beta and smaller skill chain.&lt;/p&gt;&lt;p&gt;Factor #3: How is Your Team Going to Perform?&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S4rcFykRwGI/AAAAAAAAKeU/leCuvBNE-lI/s1600-h/Layer3_team_sum.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443405091662053474" name="BLOGGER_PHOTO_ID_5443405091662053474" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S4rcFykRwGI/AAAAAAAAKeU/leCuvBNE-lI/s576/Layer3_team_sum.png" /&gt;&lt;/a&gt;&lt;p&gt;Now we&amp;#8217;re ready for one of the most controversial aspects of TrueSkill: computing the performance of a team as a whole. In TrueSkill, we assume the team&amp;#8217;s performance is the sum of each team member&amp;#8217;s performance. I say that it&amp;#8217;s &amp;#8220;controversial&amp;#8221; because some members of the team probably work harder than others. Additionally, sometimes special dynamics occur that make the sum greater than the parts. However, we&amp;#8217;ll fight the urge to make it much more complicated and heed &lt;a title="See page 452, second column, item a" href="http://www.forecastingprinciples.com/files/pdf/Makridakia-The%20M3%20Competition.pdf"&gt;Makridakis&amp;#8217;s advice&lt;/a&gt;:&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;&amp;#8220;Statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones&amp;#8221;&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;One cool thing about this factor is that you can weight each team member&amp;#8217;s contribution by the amount of time that they played. For example, if two players are on a team but each player only played half of the time (e.g. &lt;a href="http://en.wiktionary.org/wiki/tag_team"&gt;a tag team&lt;/a&gt;), then we would treat them differently than if these two players played the entire time. This is officially known as &amp;#8220;partial play.&amp;#8221; Xbox game titles report the percentage of time a player was active in a game under the &amp;#8220;X_PROPERTY_PLAYER_PARTIAL_PLAY_PERCENTAGE&amp;#8221; property that is recorded for each player (it defaults to 100%). This information is used by TrueSkill to perform a fairer update. I implemented this feature in the &lt;a href="http://github.com/moserware/Skills"&gt;accompanying source code&lt;/a&gt;.&lt;/p&gt;&lt;h4&gt;Factor #4: How&amp;#8217;d Your Team Compare?&lt;/h4&gt;&lt;p&gt;Next, we compare team performances in pairs. We do this by subtracting team performances to come up with pairwise differences:&lt;/p&gt;&lt;a href="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S5venQ1MtsI/AAAAAAAAKic/LWYIxzYoKUo/s1600-h/Layer4_Team_Diff.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5448192940349109954" name="BLOGGER_PHOTO_ID_5448192940349109954" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S5venQ1MtsI/AAAAAAAAKic/LWYIxzYoKUo/s576/Layer4_Team_Diff.png" /&gt;&lt;/a&gt;&lt;p&gt;This is similar to what we did earlier with Elo and subtracting curves to get a new curve.&lt;/p&gt;&lt;h4&gt;Factor #5: How Should We Interpret the Team Differences?&lt;/h4&gt;&lt;p&gt;The bottom of the factor graph contains a comparison factor based on the team performance differences we just calculated:&lt;/p&gt;&lt;p&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S5vVGmEepLI/AAAAAAAAKiM/Yt8B7bB7sBA/s1600-h/Layer5_Diff_Comparison.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5448182483510011058" name="BLOGGER_PHOTO_ID_5448182483510011058" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S5vVGmEepLI/AAAAAAAAKiM/Yt8B7bB7sBA/s400/Layer5_Diff_Comparison.png" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;The comparison depends on whether the pairwise difference was considered a &amp;#8220;win&amp;#8221; or a &amp;#8220;draw.&amp;#8221; Obviously, this depends on the rules of the game. It&amp;#8217;s important to realize that TrueSkill only cares about these two types of results. TrueSkill doesn&amp;#8217;t care if you won by a little or a lot, the only thing that matters is if you won. Additionally, in TrueSkill we imagine that there is a buffer of space called a &amp;#8220;draw margin&amp;#8221; where performances are equivalent. For example, in Olympic swimming, two swimmers can &amp;#8220;draw&amp;#8221; because their times are equivalent to 0.01 seconds even though the times differ by several thousandths of a second. In this case, the &amp;#8220;draw margin&amp;#8221; is relatively small around 0.005 seconds. Draws are very common in chess at the grandmaster level, so the draw margin would be much greater there.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;The output of the comparison factor directly relates to how much your skill&amp;#8217;s mean and standard deviation will change&lt;/strong&gt;.&lt;/p&gt;&lt;p&gt;The exact math involved in this factor &lt;a title="Ok, so it's quite complicated" href="http://research.microsoft.com/apps/pubs/default.aspx?id=74554"&gt;is complicated&lt;/a&gt;, but the core idea is simple:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Expected outcomes cause small updates because the algorithm already had a good guess of your skill.&lt;/li&gt;&lt;li&gt;Unexpected outcomes (upsets) cause larger updates to make the algorithm more likely to predict the outcome in the future.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The &lt;a href="http://dl.dropbox.com/u/1083108/Moserware/Skill/The%20Math%20Behind%20TrueSkill.pdf"&gt;accompanying math paper&lt;/a&gt; goes into detail, but conceptually you can think of the performance difference as a number on the bottom (x-axis) of a graph. It represents the difference between the expected winner and the expected loser. A large negative number indicates a big upset (e.g. an underdog won) and a large positive number means the expected person won. The exact update of your skill&amp;#8217;s mean will depend on the probability of a draw, but you can get a feel for it by looking at this graph:&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5v-xI1fggI/AAAAAAAAKik/DBq_48qmtzE/s1600-h/VWinFunctionWithDrawProbabilities.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5448228294373638658" name="BLOGGER_PHOTO_ID_5448228294373638658" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/S5v-xI1fggI/AAAAAAAAKik/DBq_48qmtzE/s576/VWinFunctionWithDrawProbabilities.png" /&gt;&lt;/a&gt;&lt;p&gt;Similarly, the update to a skill&amp;#8217;s standard deviation (i.e. uncertainty) depends on how expected the outcome was. An expected outcome shrinks the uncertainty by a small amount (e.g. we already knew it was going to happen). Likewise, an unexpected outcome shrinks the standard deviation more because it was new information that we didn&amp;#8217;t already have:&lt;/p&gt;&lt;p&gt;&lt;a href="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S5v_HtoWDdI/AAAAAAAAKis/kz1N8gQmRAQ/s1600-h/WWinFunctionWithDrawProbabilities.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5448228682207727058" name="BLOGGER_PHOTO_ID_5448228682207727058" src="http://2.bp.blogspot.com/_Zfbv3mHcYrc/S5v_HtoWDdI/AAAAAAAAKis/kz1N8gQmRAQ/s576/WWinFunctionWithDrawProbabilities.png" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;One problem with this comparison factor is that we use some fancy math that just makes an approximation (a good approximation, but still an approximation). We&amp;#8217;ll refine the approximation in the next step.&lt;/p&gt;&lt;h4&gt;The Inner Schedule: Iterate, Iterate, Iterate!&lt;/h4&gt;&lt;p&gt;We can make a better approximation of the team difference factors by passing around the messages that keep getting updated in the following loop:&lt;/p&gt;&lt;a href="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S4rfGRP2SvI/AAAAAAAAKes/M6g1XAbhOzw/s1600-h/Layer_Iterate_Inner.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5443408398432750322" name="BLOGGER_PHOTO_ID_5443408398432750322" src="http://4.bp.blogspot.com/_Zfbv3mHcYrc/S4rfGRP2SvI/AAAAAAAAKes/M6g1XAbhOzw/s576/Layer_Iterate_Inner.png" /&gt;&lt;/a&gt;&lt;p&gt;After a few iterations of this loop, the changes will be less dramatic and we&amp;#8217;ll arrive at stable values for each marginal.&lt;/p&gt;&lt;h4&gt;Enough Already! Give Me My New Rating!&lt;/h4&gt;&lt;p&gt;Once the inner schedule has stabilized the values at the bottom of the factor graph, we can reverse the direction of each factor and propagate messages back up the graph. These reverse messages are represented by black arrows in the graph of each factor. &lt;strong&gt;Each player&amp;#8217;s new skill rating will be the value of player&amp;#8217;s skill marginal variable once messages have reached the top of the factor graph.&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;By default, we give everyone a &amp;#8220;full&amp;#8221; skill update which is the result of the above procedure. However, there are times when a game title might want to not make the match outcome count much because of less optimal playing conditions (e.g. there was a lot of network lag during the game). Games can do this with a &amp;#8220;partial update&amp;#8221; that is just a way to apply only a fraction of the full update. Game titles specify this via the X_PROPERTY_PLAYER_SKILL_UPDATE_WEIGHTING_FACTOR variable. I implemented this feature in the &lt;a href="http://github.com/moserware/Skills/blob/master/Skills/PartialPlay.cs"&gt;accompanying source code&lt;/a&gt; and describe it in the &lt;a href="http://dl.dropbox.com/u/1083108/Moserware/Skill/The%20Math%20Behind%20TrueSkill.pdf"&gt;math paper&lt;/a&gt;.&lt;/p&gt;&lt;h4&gt;Results&lt;/h4&gt;&lt;p&gt;There are some more details left, but we&amp;#8217;ll stop for now. The &lt;a href="http://dl.dropbox.com/u/1083108/Moserware/Skill/The%20Math%20Behind%20TrueSkill.pdf"&gt;accompanying math paper&lt;/a&gt; and &lt;a href="http://github.com/moserware/Skills"&gt;source code&lt;/a&gt; fill in most of the missing pieces. One of the best ways to learn the details is to implement TrueSkill yourself. Feel free to create a port of the &lt;a href="http://github.com/moserware/Skills"&gt;accompanying project&lt;/a&gt; in your favorite language and share it with the world. Writing your own implementation will help solidify all the concepts presented here.&lt;/p&gt;&lt;p&gt;The most rewarding part of implementing the TrueSkill algorithm is to see it work well in practice. My coworkers have commented on how it&amp;#8217;s almost &amp;#8220;eerily&amp;#8221; accurate at computing the right skill for everyone relatively quickly. After several months of playing foosball, the top of the leaderboard (sorted by TrueSkill: the mean minus 3 standard deviations) was very stable. Recently, a very good player started playing and is now the #2 player. Here&amp;#8217;s a graph of the most recent changes in TrueSkill for the top 5 (of around 40) foosball players:&lt;/p&gt;&lt;a href="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S51C7AczIRI/AAAAAAAAKi0/8NcEXy98hrw/s1600-h/MostRecentFoosballTrueSkill.png"&gt;&lt;img style="border:0;" id="BLOGGER_PHOTO_ID_5448584705688674578" name="BLOGGER_PHOTO_ID_5448584705688674578" src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S51C7AczIRI/AAAAAAAAKi0/8NcEXy98hrw/s720/MostRecentFoosballTrueSkill.png" /&gt;&lt;/a&gt;&lt;p&gt;(Note: Look how quickly the system detected how good this new #2 player is even though his win ratio is right at 50%)&lt;/p&gt;&lt;p&gt;Another interesting aspect of implementing TrueSkill is that it has raised an awareness of ratings among players. People that otherwise wouldn&amp;#8217;t have played together now occasionally play each other because they know they&amp;#8217;re similarly matched and will have a good game. One advantage of TrueSkill is that it&amp;#8217;s not that big of a deal to lose to a much better player, so it&amp;#8217;s still ok to have unbalanced games. In addition, having ratings has been a good way to judge if you&amp;#8217;re improving in ability with a new shot technique in foosball or learning more chess theory.&lt;/p&gt;&lt;h4&gt;Fun Things from Here&lt;/h4&gt;&lt;p&gt;The obvious direction to go from here is to add more games to the system and see if TrueSkill handles them equally well. Given that TrueSkill is the default ranking system on Xbox live, this will probably work out well. Another direction is to see if there&amp;#8217;s a big difference in TrueSkill based on position in a team (e.g. midfield vs. goalie in foosball). Given TrueSkill&amp;#8217;s sound statistics based on ranking and matchmaking, you might even have some success in using it to decide between to several options. You could have each option be a &amp;#8220;player&amp;#8221; and decide each &amp;#8220;match&amp;#8221; based on your personal whims of the day. If nothing else, this would be an interesting way to pick your next vacation spot or even your child&amp;#8217;s name.&lt;/p&gt;&lt;p&gt;If you broaden the scope of your search to using the ideas that we&amp;#8217;ve learned along the way, there&amp;#8217;s a lot more applications. Microsoft&amp;#8217;s &lt;a href="http://videolectures.net/nipsworkshops09_graepel_pmlca/"&gt;AdPredictor&lt;/a&gt; (i.e. the part that delivers relevant ads on &lt;a href="http://www.bing.com/"&gt;Bing&lt;/a&gt;) was created by the TrueSkill team and uses similar math, but is a different application.&lt;/p&gt;&lt;p&gt;As for me, it was rewarding to work with an algorithm that has fun social applications as well as picking up machine learning tidbits along the way. It&amp;#8217;s too bad all of that didn&amp;#8217;t help me hit the top of any of the leaderboards.&lt;/p&gt;&lt;p&gt;Oh well, it&amp;#8217;s been a fun journey. I&amp;#8217;d love to hear if you dived into the algorithm after reading this and would especially appreciate any updates to &lt;a href="http://github.com/moserware/Skills"&gt;my code&lt;/a&gt; or other language forks.&lt;/p&gt;&lt;p&gt;Links:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://dl.dropbox.com/u/1083108/Moserware/Skill/The%20Math%20Behind%20TrueSkill.pdf"&gt;The Math Behind TrueSkill&lt;/a&gt; - A math-filled paper that fills in some of the details left out of this post.&lt;/li&gt;&lt;li&gt;&lt;a href="http://github.com/moserware/Skills"&gt;Moserware.Skills&lt;/a&gt; Project on GitHub - My full implementation of Elo and TrueSkill in C#. Please feel free to create your own language forks.&lt;/li&gt;&lt;li&gt;Microsoft's online &lt;a href="http://research.microsoft.com/en-us/projects/trueskill/calculators.aspx"&gt;TrueSkill Calculators&lt;/a&gt; - Allows you to play with the algorithm without having to download anything. My implementation matches the results of these calculators.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;em&gt;Special thanks to &lt;a href="http://research.microsoft.com/en-us/people/rherb/"&gt;Ralf Herbrich&lt;/a&gt;, &lt;a href="http://research.microsoft.com/en-us/um/people/minka/"&gt;Tom Minka&lt;/a&gt;, and &lt;a href="http://research.microsoft.com/en-us/people/thoreg/"&gt;Thore Graepel&lt;/a&gt; on the &lt;a href="http://research.microsoft.com/en-us/projects/trueskill/"&gt;TrueSkill&lt;/a&gt; team at &lt;a href="http://research.microsoft.com/en-us/labs/cambridge/default.aspx"&gt;Microsoft Research Cambridge&lt;/a&gt; for their help in answering many of my detailed questions about their fascinating algorithm.&lt;/em&gt;&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=zcZYX8VKjXI:qXcDZh1hAWE:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=zcZYX8VKjXI:qXcDZh1hAWE:63t7Ie-LG7Y"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?d=63t7Ie-LG7Y" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=zcZYX8VKjXI:qXcDZh1hAWE:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=zcZYX8VKjXI:qXcDZh1hAWE:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=zcZYX8VKjXI:qXcDZh1hAWE:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=zcZYX8VKjXI:qXcDZh1hAWE:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=zcZYX8VKjXI:qXcDZh1hAWE:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=zcZYX8VKjXI:qXcDZh1hAWE:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Moserware?a=zcZYX8VKjXI:qXcDZh1hAWE:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Moserware?i=zcZYX8VKjXI:qXcDZh1hAWE:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Moserware/~4/zcZYX8VKjXI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.moserware.com/feeds/793968235021421709/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=6800934446457898793&amp;postID=793968235021421709" title="125 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6800934446457898793/posts/default/793968235021421709?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6800934446457898793/posts/default/793968235021421709?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Moserware/~3/zcZYX8VKjXI/computing-your-skill.html" title="Computing Your Skill" /><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="24" height="32" src="http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_Zfbv3mHcYrc/S2Q_xffP38I/AAAAAAAAKYo/YcgBWcpjYtI/s72-c/100M_dash_Osaka07_D2A_Torri_Edwards.jpg" height="72" width="72" /><thr:total>125</thr:total><feedburner:origLink>http://www.moserware.com/2010/03/computing-your-skill.html</feedburner:origLink></entry></feed>
