<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
    <title>Whimsley</title>
    
    
    <link rel="alternate" type="text/html" href="http://whimsley.typepad.com/whimsley/" />
    <id>tag:typepad.com,2003:weblog-254810</id>
    <updated>2012-01-20T15:06:41-05:00</updated>
    <subtitle>Technology and Politics.

I would have written a shorter post, but I did not have the time</subtitle>
    <generator uri="http://www.typepad.com/">TypePad</generator>
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/Whimsley" /><feedburner:info uri="whimsley" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://hubbub.api.typepad.com/" /><entry>
        <title>Alone Together 1: Stories</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Whimsley/~3/OeReG4--pRg/alone-together-1-stories.html" />
        <link rel="replies" type="text/html" href="http://whimsley.typepad.com/whimsley/2012/01/alone-together-1-stories.html" thr:count="3" thr:updated="2012-01-23T09:43:01-05:00" />
        <id>tag:typepad.com,2003:post-6a00d83451d3b369e20168e5dea835970c</id>
        <published>2012-01-20T15:06:41-05:00</published>
        <updated>2012-01-20T15:17:24-05:00</updated>
        <summary>[The first of a few reflections prompted by Sherry Turkle's Alone Together.] The sublime Leonard Cohen in today's Guardian: I don't really like songs with ideas. They tend to become slogans. They tend to be on the right side of...</summary>
        <author>
            <name>tomslee</name>
        </author>
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://whimsley.typepad.com/whimsley/">
<div xmlns="http://www.w3.org/1999/xhtml"><div id="outline-container-1">
<div id="outline-container-1-1">
<div id="text-1-1">
<p>[The first of a few reflections prompted by Sherry Turkle's <em><a href="http://alonetogetherbook.com/" target="_self">Alone Together</a></em>.]</p>
<p>The sublime Leonard Cohen <a href="http://www.guardian.co.uk/music/2012/jan/19/leonard-cohen">in today's Guardian</a>:</p>
<blockquote>
<p>I don't really like songs with ideas. They tend to become slogans. They tend to be on the right side of things: ecology or vegetarianism or antiwar. All these are wonderful ideas but I like to work on a song until those slogans, as wonderful as they are and as wholesome as the ideas they promote are, dissolve into deeper convictions of the heart. I never set out to write a didactic song. It's just my experience. All I've got to put in a song is my own experience.</p>
</blockquote>
<p>The same is true of fiction. Songs and stories are powerful ways of communicating, but literature with an agenda is almost always bad literature, stories with a message are almost always shallow morality tales, and the fables that now pepper popular non-fiction books are often particularly egregious examples. Thomas Friedman's taxi drivers and Malcolm Gladwell's hush puppies are the 21st-century template for books on management, business, economics, politics, and technology only because even badly-told stories seduce us.</p>
<p>Whenever I encounter a story in a non-fiction book my guard goes up, whether it's fictional or a retelling of an actual event. I know that I am being presented with a Trojan horse; there's a message hidden inside and the only reason for telling me the story is to sneak that message past my defenses of scepticism and logic. It's a trick, and critical readers must reject it. At some point the tables will turn and the telling of non-fiction tales will be recognized as the dishonest, slippery tactic that it is, but for now we must simply resist them and the invading armies that they are smuggling. Which makes it all the more surprising that I loved Sherry Turkle's <em>Alone Together</em>, because it is full of stories.</p>
<p>More precisely, I loved the first half of the book, devoted to observing how humans react to robots that react to humans. In Part Two she observes humans reacting to other humans through the medium of digital networks and is less successful. Her use of stories is one reason why Part One works and Part Two fails, so lets stay with the robots and Part One for now.</p>
<p>First, if you are going to use stories you might as well tell them well and Turkle does so. She has an eye for a telling phrase that sets her apart from most non-fiction writers. From a robot that "develops its own origami of lovemaking positions" to the titles and subtitles of the book, she coins phrases that evoke the contradictions and tensions that are her subjects. "Alone together", "The robotic moment: in solitude, new intimacies", "Networked: in intimacy, new solitudes".</p>
<p>Second, the stories are not archetypes conjured up simply to illustrate a point, but emerge from the extensive observational work Turkle has done: more than 700 interviews over 30 years, decades of bringing robots to schools and nursing homes, sending them home with children for weeks at a time, watching participants interact with robots and watching herself also. "I think of the product as an intimate ethnography" she writes {xiii} and the credibility of the book comes from this longstanding and far-reaching work.</p>
<p>More importantly, in Part One she uses stories to provoke questions, rather than to provide answers. Consider two examples.</p>
<p>Early on, Turkle writes of visiting the American Museum of Natural History with her then-teenaged daughter Rebecca. They approach an exhibit of two giant tortoises from the Galapagos Islands. Rebecca looks at the one visible, motionless, creature and declares "They could have used a robot".</p>
<p>Turkle was taken aback, and started a discussion with other parents and children in the line-up. Several of the children shared Rebecca's concern for the animal and her unimpressed reaction to its authenticity: it would be better for the tortoise itself not to have been brought all this way; a robot would not make the water dirty; "for what they do, you didn't have to have the live ones." The parents disagree: "The point is that they are real. That's the whole point."</p>
<p>What I like is that the story does not deliver a message, but instead prompts questions in the mind of the reader. In this way, despite its factual origin, the story is more fictional/literary than many. Does authenticity matter? If so, under what circumstances, and why? It's the open-ended nature of the event that makes reading <em>Alone Together</em> an active, questioning experience, and one I found very rich.</p>
<p>That's not to say Turkle doesn't have a message to deliver. She does, and she is clear enough about what it is: "I am a psychoanalytically trained psychologist. By both temperament and profession, I place high value on relationships of intimacy and authenticity." {6} The book is about Turkle's increasing concern, after years of enthusiasm, that social technologies are serving to erode these qualities.</p>
<p>The second story is one that has stayed with me because it gets right to the heart of the issues the book raises around authenticity {74}. Visiting Japan in the early 1990s, Turkle heard tales of adult children who, too distant and too busy to visit their aging and infirm parents, hired actors to visit in their stead, playing the part of the adult child. What's more, the parents appreciated and enjoyed the gesture. It's slightly shocking to western sensibilities, but once we hear a little more context it becomes more understandable.</p>
<p>First, the actors are not (in all cases, at least) a deception: the parents recognize them for what they are. Yet the parents "enjoyed the company and played the game". In Japan, being elderly is a role, being a child is a role, and parental visits have a strong dose of ritual to them: the recital of scripts by each party. While the child may not be able to act out their role, at least this way the parent gets to enact theirs, and so to reinforce their identity as an elderly, respected person.</p>
<p>The story, again, provokes questions in the mind of the reader rather than leading us to a staged conclusion: questions about authenticity, when it matters, and why. Turkle's reaction was "if you are willing to send an actor, why not send a robot?" If it does not matter that the visitor is really a child does it matter if the visitor is really a visitor? Does it matter if the visitor is not really visiting (a phone call)? Or why, as my wife asked, should we see this as less than a visit when we could see it instead as more than a bunch of flowers?</p>
<p>To me, the story and the questions it prompts undermine my confidence in my own judgements: to make me realise that they are more tied up with cultural conventions, more arbitrary and more shallow, than I thought. And if prompting reflection is the point, then that's OK because the story is, again, open ended: it is not hiding a pre-planned answer.</p>
<p>Part Two of <em>Alone Together</em> fails because Turkle's interviews and observations focus on bringing out our discontents (new solitudes) with networking technologies. She has a message, and it's one we can either agree with or argue with, but by approaching this part in terms of discontents she fails to escape being didactic. The slogans have not dissolved, in Cohen's words.</p>
<p>Part One succeeds because it explores the "new intimacies": what is surprisingly seductive about interacting with even the most crude and obviously artificial robots. It's this seduction that is so unsettling: the notion that things are "alive enough" for a given kind of relationship; that the most powerful thing a robot can do is to have needs which we can meet.</p>
<p>There is something of a taboo against robots in Western societies. In the last few years we have grown to accept human-sounding artificial voices (the iPhone's Siri being the latest) but we shy away from artifical human appearance. We permit robots as toys and vacuum cleaners, but their use as companions for the aged or as visible service employees is still outside the realm of the every day (at least for now). Interacting with a visible robot is still a novel experience for those of us outside childhood, and so Part One of <em>Alone Together</em> has a sense of a report from a slightly alien future. What she shows us is how vulnerable we are to the seductions of even the most crude simulations, and for unexpected reasons, and the disquiet this provokes is something worth reflecting on. Taboos are vulnerable to suddenly being washed away, and the technological imperative may yet carry us forward into a world where robots are more commonly present. I share Turkle's concern about what impact that will have. More on that next time.</p>
</div>
</div>
</div></div>
</content>



    <feedburner:origLink>http://whimsley.typepad.com/whimsley/2012/01/alone-together-1-stories.html</feedburner:origLink></entry>
    <entry>
        <title>Short Notes: Cute Cats (or not) in Central Asia</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Whimsley/~3/B1158oJJX7Q/short-notes-cute-cats-or-not-in-central-asia.html" />
        <link rel="replies" type="text/html" href="http://whimsley.typepad.com/whimsley/2012/01/short-notes-cute-cats-or-not-in-central-asia.html" thr:count="0" />
        <id>tag:typepad.com,2003:post-6a00d83451d3b369e20162ff965279970d</id>
        <published>2012-01-14T22:07:50-05:00</published>
        <updated>2012-01-14T22:07:50-05:00</updated>
        <summary>Doing some reading after up my recent post on Ethan Zuckerman's "Cute Cats" talk I came across this post by Sarah Kendzior at registan.net. I know roughly nothing about the places and events she discusses, but it is a fascinating...</summary>
        <author>
            <name>tomslee</name>
        </author>
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://whimsley.typepad.com/whimsley/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>Doing some reading after up my recent post on Ethan Zuckerman's "Cute Cats" talk I came across <a href="http://www.registan.net/index.php/2012/01/08/central-asia-an-exception-to-the-cute-cats-theory-of-internet-revolution/" target="_self">this</a> post by Sarah Kendzior at registan.net. I know roughly nothing about the places and events she discusses, but it is a fascinating post by an obviously knowledgeable person, and the comments thread following it is one of the most absorbing I've ever read. Lots of people have great things to add, and they do so in a constructive and generous way.</p></div>
</content>



    <feedburner:origLink>http://whimsley.typepad.com/whimsley/2012/01/short-notes-cute-cats-or-not-in-central-asia.html</feedburner:origLink></entry>
    <entry>
        <title>Upcoming: Alone Together</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Whimsley/~3/g7fhxEK0eEE/upcoming-alone-together.html" />
        <link rel="replies" type="text/html" href="http://whimsley.typepad.com/whimsley/2012/01/upcoming-alone-together.html" thr:count="9" thr:updated="2012-01-20T17:59:55-05:00" />
        <id>tag:typepad.com,2003:post-6a00d83451d3b369e20167607d817b970b</id>
        <published>2012-01-13T20:59:45-05:00</published>
        <updated>2012-01-13T20:59:45-05:00</updated>
        <summary>I just read Sherry Turkle's excellent and provocative Alone Together and I plan to put up four wordy posts about it here, more "inspired by" than "review of", which will probably take me a month or so. Does anyone want...</summary>
        <author>
            <name>tomslee</name>
        </author>
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://whimsley.typepad.com/whimsley/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>I just read Sherry Turkle's excellent and provocative <a href="http://alonetogetherbook.com" target="_self">Alone Together</a> and I plan to put up four wordy posts about it here, more "inspired by" than "review of", which will probably take me a month or so. Does anyone want to join in, either at your own blog or here, to make it a conversation instead of a monologue? If so, either leave a comment or by email  (<a href="http://whimsley.typepad.com/about.html" target="_self">here</a>).</p></div>
</content>



    <feedburner:origLink>http://whimsley.typepad.com/whimsley/2012/01/upcoming-alone-together.html</feedburner:origLink></entry>
    <entry>
        <title>Ethan Zuckerman's "Cute Cats and the Arab Spring"</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Whimsley/~3/bb7uDhz2PIw/ethan-zuckermans-cute-cats-and-the-arab-spring.html" />
        <link rel="replies" type="text/html" href="http://whimsley.typepad.com/whimsley/2012/01/ethan-zuckermans-cute-cats-and-the-arab-spring.html" thr:count="13" thr:updated="2012-01-20T20:05:53-05:00" />
        <id>tag:typepad.com,2003:post-6a00d83451d3b369e20167600c2ecf970b</id>
        <published>2012-01-05T22:56:12-05:00</published>
        <updated>2012-01-07T14:29:39-05:00</updated>
        <summary>Table of Contents 1 Dry Tunisian Tinder 2 Cute Cats and Malaysian Opposition 3 Polish lunch rooms 4 Tunisia's Second Act 5 Media Ecology or Network Ecology? Cory Doctorow (*) and Jillian York (*) were both full of praise for...</summary>
        <author>
            <name>tomslee</name>
        </author>
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://whimsley.typepad.com/whimsley/">
<div xmlns="http://www.w3.org/1999/xhtml"><h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#sec-1-1">1 Dry Tunisian Tinder</a></li>
<li><a href="#sec-1-2">2 Cute Cats and Malaysian Opposition</a></li>
<li><a href="#sec-1-3-1">3 Polish lunch rooms</a></li>
<li><a href="#sec-1-3-2">4 Tunisia's Second Act</a></li>
<li><a href="#sec-1-3-3">5 Media Ecology or Network Ecology?</a> 
<ul>
</ul>
</li>
</ul>
</div>
<div class="outline-2" id="outline-container-1">
<h2 id="sec-1"><span style="font-weight: normal; font-size: small;">Cory Doctorow (</span><a href="http://www.guardian.co.uk/technology/blog/2012/jan/03/the-internet-best-dissent-start" style="font-weight: normal; font-size: small;">*</a><span style="font-weight: normal; font-size: small;">) and Jillian York (</span><a href="http://jilliancyork.com/2012/01/03/on-social-media-as-2011-gamechanger/" style="font-weight: normal; font-size: small;">*</a><span style="font-weight: normal; font-size: small;">) were both full of praise for Ethan Zuckerman's Vancouver Human Rights Lecture on </span><em style="font-weight: normal; font-size: small;">Cute Cats and the Arab Spring</em><span style="font-weight: normal; font-size: small;"> (</span><a href="http://www.thelaurier.ca/human-rights/human-rights-lecture-2011" style="font-weight: normal; font-size: small;">*</a><span style="font-weight: normal; font-size: small;">), so I listened to the podcast from CBC's Ideas (</span><a href="http://www.cbc.ca/ideas/episodes/2011/12/09/the-vancouver-human-rights-lecture---cute-cats-and-the-arab-spring/" style="font-weight: normal; font-size: small;">*</a><span style="font-weight: normal; font-size: small;">). You can also watch the lecture on YouTube (</span><a href="http://youtu.be/tkDFVz_VL_I" style="font-weight: normal; font-size: small;">*</a><span style="font-weight: normal; font-size: small;">).</span></h2>
<div class="outline-text-2" id="text-1">
<p>Ethan Zuckerman (EZ) has a long and admirable history of involvement in digital activism and a wide knowledge of both technology and social change; the lecture is worth an hour of your time. But (you knew there was a but) in the end I have to disagree with his main thesis.</p>
</div>
<div class="outline-3" id="outline-container-1-1">
<h3 id="sec-1-1"><span class="section-number-3">1</span> Dry Tunisian Tinder</h3>
<div class="outline-text-3" id="text-1-1">
<p>EZ tells us how, after years of sporadic and failed protests in Tunisia, one particular spark in the city of Sidi Bouzid blossomed into the forest fire of revolution. When Mohamed Bouazizi set himself on fire in protest at official interference with his vegetable stall it was a dramatic and desperate act, but not unique: he wasn't the first person to do so even that year. What was different this time?</p>
<p>EZ's argument is that digital social media was different. The early protest was captured on video using a cheap phone and posted to a social networking site where… it did NOT "go viral". Instead the video was picked up by Tunisians <em>outside</em> the country (including EZ's friend Sami ben Gharbia<sup><a class="footref" href="#fn.1" name="fnr.1">1</a></sup>), who were scanning Tunisian web content for political news and curating it on a site called nawaat.org (<a href="http://nawaat.org">*</a>).</p>
<p>Al Jazeera got the video from nawaat.org and broadcast it back into Tunisia; Tunisians found out in turn what was going on from Al Jazeera. What's important here, says EZ, is that the new low-cost participatory media is an essential part of a larger media ecosystem that helped to stir up feelings within Tunisia.</p>
</div>
</div>
<div class="outline-3" id="outline-container-1-2">
<h3 id="sec-1-2"><span class="section-number-3">2</span> Cute Cats and Malaysian Opposition</h3>
<div class="outline-text-3" id="text-1-2">
<p>In the 1990s EZ ran a web site called Tripod for college/university students. Surprisingly, many people used it not for the Worthy Purposes he and his colleagues had planned, but to share simple and casual things, like pictures of cute cats. Also surprisingly, some of the heaviest use came from Malaysia. Wondering what was going on, Zuckerman got the Malay content translated, only to find that his site was hosting the Malaysian opposition Reformasi movement (<a href="http://en.wikipedia.org/wiki/Reformasi_(Malaysia)">*</a>). Tripod was a space that was difficult for the Malaysian government to censor while being easy to hold discussions.<sup><a class="footref" href="#fn.2" name="fnr.2">2</a></sup></p>
<p>And so we reach the "cute cat theory": the ideal places for those who suddenly have important, politically sensitive material they want to share are sites designed for sharing videos and pictures of "cute cats" (Facebook, Twitter, YouTube, Flickr). These sites are easy to use, have a wide reach, and are difficult to censor – if the government shuts them down it annoys a lot of people and alerts them that something interesting is going on. "Cute cats" sites are natural tinder boxes for revolutionary sparks.</p>
</div>
</div>
<div class="outline-3" id="outline-container-1-3">
<h3 id="sec-1-3"><span style="font-weight: normal; font-size: small;">The events EZ recounts are compelling, but a lot of compelling things happen in this strange world, so my first thoughts whenever I hear a story of the Internet producing some unique chain of events is: can I think of a non-Internet example that matches? So here is the lunch-room theory of political dissent (details from </span><a href="http://sunday.niedziela.pl/artykul.php?lg=gb&amp;nr=200409&amp;dz=z_historii&amp;id_art=00004" style="font-weight: normal; font-size: small;">here</a><span style="font-weight: normal; font-size: small;">).</span></h3>
<div class="outline-4" id="outline-container-1-3-1">
<h3 id="sec-1-3-1">3 Polish lunch rooms</h3>
<div class="outline-text-4" id="text-1-3-1">
<p>On July 8, 1980, in the lunch room at a transport equipment plant in the eastern Polish town of Swidnik, the price of a pork cutlet jumped from 10.20 zloty to 18.10. For Miroslaw Kaczan, this jump was the final straw, and after lunch he switched off the machines he was working on. Others in Department 320 joined him, and other departments in the factory were quick to join. Soon there was a factory-wide stoppage, and it wasn't just about pork cutlets: the demands of the protesters revealed a wealth of pent-up frustration.</p>
<p>News about the strike in Swidnik spread so quickly that within two weeks 50,000 people in the region were on strike. This wave of strikes was resolved on July 25, but the disruption was far from over: three weeks later the strikes at the Gdansk ship yards in northern Poland started, and within a year Solidarnosc had over 9 million members.</p>
<p>In the early days of the strikes, Poles had a hunger for news of the protests, of course, and despite the heavy censorship of official media they found them, through short-wave radio broadcasts from other countries.</p>
<p>So the lunch-room theory is not that different from the cute-cat theory, except that there's no Internet. People gather wherever they gather for their everyday conversations and interactions, and it is in these everyday places that a spark of frustration can catch fire. And once it does catch fire, a combination of broadcast media and a networked public spreads the news quickly.</p>
<p>Perhaps, the Polish example shows, the Internet is not essential for the spark to turn into a fire. Perhaps a digitally networked public is not the only networked public.</p>
</div>
</div>
<div class="outline-4" id="outline-container-1-3-2">
<h3 id="sec-1-3-2"><span class="section-number-4">4</span> Tunisia's Second Act</h3>
<div class="outline-text-4" id="text-1-3-2">
<p>Even in Tunisia, politically sensitive material for which there is a high demand has found its way through dangerous pathways to reach a public desperate for news.</p>
<p>In a long piece called Streetbook (<a href="http://www.technologyreview.com/web/38379/">*</a>) John Pollock interviews two members of an underground Tunisian group called Takriz [<em>update:</em> see Ethan Zuckerman and Jillian York's comments below for reservations about Streetbook]. One of these "Taks" describes how the video that "made the second half of the [Tunisian] revolution" was taken when the regime had shut down the Internet, so "Takriz smuggled a CD of the video over the Algerian border" before forwarding it to Al Jazeera. YouTube may make it it easier and safer to make videos available (at least so long as Google lets it be done anonymously), but when an important video was available, the Internet was not essential to the process of distribution.</p>
</div>
</div>
<div class="outline-4" id="outline-container-1-3-3">
<h3 id="sec-1-3-3">5 Media Ecology or Network Ecology?</h3>
<div class="outline-text-4" id="text-1-3-3">
<p>If we are really going to talk about a "media ecology" in the sense EZ means, we need to include all those gathering places–online and offline–which are difficult to shut down precisely because of their everyday, general purpose role. In addition to Facebook and YouTube we need to include factory lunchrooms, mosques and churches, football stadia (<a href="http://whimsley.typepad.com/whimsley/2011/09/so-three-cheers-for-evgeny-now-back-to-the-mit-review-articles-some-of-which-display-the-very-internet-centrism-that-moroz.html">*</a>), universities, popular music (<a href="http://whimsley.typepad.com/whimsley/2011/02/more-egypt-more-facebook.html">*</a>), balconies (<a href="http://www.juancole.com/2011/08/tv-twitter-facebook-and-the-libyan-revolution.html">*</a>), and more.</p>
<p>All these share a number of properties with Cute Cats sites. They are difficult to shut down without annoying large numbers of previously quiescent people, they are difficult to monitor in detail because of the dispersed and varied nature of the interactions that go on, and they are already familiar places for the gathering and sharing of information. EZ says that "we don't take these 'cute cat tools' seriously enough. These tools that anyone can use, that are used 99% of the time for completely banal purposes" but he doesn't take offline everyday institutions for banal sharing seriously enough.</p>
<p>EZ's mistake is the achilles heel of social media advocates. Talk of a "networked society" is justified by comparing today's digitally connected populations to a population of couch potatoes watching prime time TV, but such a comparison overlooks all those other institutions of public networking. Instead of talking of a "media ecology" we should be talking of a "network ecology": the intricate tapestry of multiple networking institutions and practices that makes up a society.</p>
<p>Do digital social media supplement other networking instutions or displace them? There has been a lot of work on this at the individual level, but it's much more difficult to evaluate on a societal level. It is possible that digital social media increase the richness of social networks in a society, but it's also possible (likely?) that digital social media are the kudzu of networks, thriving while they strangle the other components of a rich and diverse network ecology; the best network left standing in an impoverished environment.</p>
<div id="footnotes">
<h3 class="footnotes">Footnotes</h3>
<div id="text-footnotes">
<p class="footnote"><sup><a class="footnum" href="#fnr.1" name="fn.1">1</a></sup> Among other things, Sami ben Gharbia is author of a fantastic essay on <em>The Internet Freedom Fallacy and Arab Digital Activism</em> (<a href="http://owni.eu/2011/01/15/the-internet-freedom-fallacy-and-the-arab-digital-activism-2/">*</a>)</p>
<p class="footnote"><sup><a class="footnum" href="#fnr.2" name="fn.2">2</a></sup> In fact it may not have been so much that the site was   difficult to censor, as that Malaysian government had decided to   exclude the Internet as a whole from its otherwise-strict censorship   rules (<a href="http://techblog.thepcharbor.com/?p=2174">*</a>).</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div id="postamble">
<p class="date">Date: 2012-01-05 22:50:21 EST</p>
<p class="author">Org version 7.6 with Emacs version 23</p>
</div></div>
</content>



    <feedburner:origLink>http://whimsley.typepad.com/whimsley/2012/01/ethan-zuckermans-cute-cats-and-the-arab-spring.html</feedburner:origLink></entry>
    <entry>
        <title>2012 Predictions: Turning Points for the Web</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Whimsley/~3/w8t7fV1IH3Q/2012-predictions-turning-points-for-the-web.html" />
        <link rel="replies" type="text/html" href="http://whimsley.typepad.com/whimsley/2012/01/2012-predictions-turning-points-for-the-web.html" thr:count="2" thr:updated="2012-01-05T20:38:52-05:00" />
        <id>tag:typepad.com,2003:post-6a00d83451d3b369e20162fee79728970d</id>
        <published>2012-01-02T15:03:35-05:00</published>
        <updated>2012-01-02T15:03:35-05:00</updated>
        <summary>Table of Contents Avoiding Cynicism (As If) Facebook: Privacy Hits the Mainstream Amazon: Abusing Community Apple: Stepping in Front of Google Avoiding Cynicism (As If) Peering into the New Year, my better half Lynne reflected yesterday that it is a...</summary>
        <author>
            <name>tomslee</name>
        </author>
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://whimsley.typepad.com/whimsley/">
<div xmlns="http://www.w3.org/1999/xhtml"><div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#sec-2">Avoiding Cynicism (As If)</a></li>
<li><a href="#sec-3">Facebook: Privacy Hits the Mainstream </a></li>
<li><a href="#sec-4">Amazon: Abusing Community </a></li>
<li><a href="#sec-5">Apple: Stepping in Front of Google </a></li>
</ul>
</div>
</div>
<div class="outline-2" id="outline-container-1">
<h2 id="sec-1">Avoiding Cynicism (As If)</h2>
</div>
<div class="outline-2" id="outline-container-2">
<div class="outline-text-2" id="text-2">
<p>Peering into the New Year, my better half Lynne reflected yesterday that it is a duty of each of us, as we grow older, to be vigilant against encroaching cynicism. She's right (of course!), and I do feel that strong and steady current tugging me sluggishly downstream towards the lazy, easy waters of geezerhood, to a place where everything new shows itself only by its flaws and in which every new glass is basically empty.</p>
<p>Luckily, 2012 looks like being a banner year for those of us who take a critical view of the hype and commercialization around digital technology, so I'm actually feeling quite cheery. The number of digital hecklers is growing<sup><a class="footref" href="#fn.1" name="fnr.1">1</a></sup>, and will continue to do so as the relations between the mainstream Internet and its audience/members sours. A growing wave of disenchantment is gathering enough steam [sic] to become a creative force in its own right, and I think that's going to be fascinating to watch, as well as potentially a period of renewal for alternative culture.</p>
<p>So Happy New Year, and here are a few predictions for 2012. I don't think the full impact of any will be over and done during the calendar year, but I do think we'll look back at 2012 as a turning point in attitudes to digital technologies.</p>
</div>
</div>
<div class="outline-2" id="outline-container-3">
<h2 id="sec-3">Facebook: Privacy Hits the Mainstream</h2>
<div class="outline-text-2" id="text-3"><dl> <dt>Prediction</dt><dd>High-profile privacy cases in 2012 will dramatically                 accelerate the level of public distrust in Facebook,                 which will spill over to other Internet aggregators. </dd> </dl>
<p>Privacy has always been the other side of the openness coin. Everyone loves openness, of course, but the last year or two has made it clear that behind Mark Zuckerberg's Facebook profile (<a href="https://www.facebook.com/zuck?sk=info">*</a>) claim that "I'm trying to make the world a more open place" there is a hard, cold, commercial reality. Are we sharing among each other, or are we feeding Facebook? And where is the boundary between the two?</p>
<p>Here's a dilemma my son faces, which also confronts many other young people. After university one potential employer is the Canadian government. If he clicks his support on Facebook for political protests, will government background checks have access to this information and will it count against him? There's no point asking Facebook even if you did trust it, because today's terms and conditions may change, and the laws governing it may change too. From being an open space where it is easy to express our political views, Facebook is becoming a panopticon where we censor ourselves, not knowing who is watching.</p>
<p>It's not clear that the advertising driven model of web technology is sustainable given its dependence on data that we are increasingly reluctant to give up. As ex-Facebook engineer Jeff Hammerbacher says, "The best minds of my generation are thinking about how to make people click ads," he says. "That sucks." We've lived with this downside until now but as the choices become more stark this may change, and when things change on the Internet, they can change very quickly. danah boyd's view that "Facebook is a utility; utilities get regulated" (<a href="http://www.zephoria.org/thoughts/archives/2010/05/15/facebook-is-a-utility-utilities-get-regulated.html">*</a>) will become mainstream. We'll see demands<sup><a class="footref" href="#fn.2" name="fnr.2">2</a></sup> for changes to Facebook's practices (see the Europe vs Facebook group (<a href="http://europe-v-facebook.org/EN/en.html">*</a>), and the Irish data protection commissioner's report <a href="http://www.guardian.co.uk/technology/2011/dec/21/facebook-advertising-data">here</a>) gaining momentum.</p>
</div>
</div>
<div class="outline-2" id="outline-container-4">
<h2 id="sec-4">Amazon: Abusing Community</h2>
<div class="outline-text-2" id="text-4"><dl> <dt>Prediction</dt><dd>Change in the open source world as                 Google takes on Amazon. </dd> </dl>
<p>Amazon is rapidly making a name for itself as the company to give the Internet a bad name. From brutal working conditions (<a href="http://www.mcall.com/news/local/mc-allentown-amazon-complaints-20110917,0,7937001,full.story">*</a>) to treating physical bookstores as showrooms (<a href="http://www.nytimes.com/2011/12/13/opinion/amazons-jungle-logic.html">*</a>) to union-bashing (<a href="http://www.theatlantic.com/technology/archive/2011/12/in-the-wake-of-protest-one-womans-attempt-to-unionize-amazon/249853/">*</a>) to McCarthyist policies around Wikileaks (<a href="http://en.wikipedia.org/wiki/Amazon.com_controversies#WikiLeaks_hosting">*</a>) to tax opposition (<a href="http://www.huffingtonpost.com/2011/08/15/amazon-boycott-taxes-_n_927667.html">*</a>) to screwing libraries (<a href="http://librarianinblack.net/librarianinblack/2011/10/wegotscrewed.html">*</a>), this company has done everything it can to demolish the image of the Internet as a source of cooperation, collaboration, and open friendship. It has perfected the act of free-riding on open source efforts, building its (remarkable, it must be said) profitable EC2 infrastructure on Xen Hypervisor, using Linux extensively, and not contributing back (<a href="http://www.h-online.com/open/features/Time-for-Amazon-to-pay-its-dues-to-open-source-1249811.html">*</a>), in the same way it happily takes all those volunteer hours put into Wikipedia and uses them to sell its own devices, messing with authors' rights as it does so (<a href="http://www.roughtype.com/archives/2011/09/beyond_words_th.php">*</a>).</p>
<p>The Kindle Fire is the icing on the cake: Amazon has taken the Android operating system and its Linux kernel and used it to power the Amazon tablet. In doing so, it has taken Google's language of openness around Android (always suspect) and thrown it right back in Google's face, removing the Google applications and most evidence that the device is running Android, and making it an Amazon device from end to end.</p>
<p>With the Kindle Fire looking likely to become the top selling Android tablet, you have to wonder how long Google will welcome this state of affairs. There's a lot of talk about the rivalry between Google and Apple, but the tension between Google and Amazon is the conflict that may change the open source world. The licensing terms for open source software have been increasingly friendly to commercial exploitation of community projects, moving steadily away from the more restrictive GPL (<a href="http://blogs.the451group.com/opensource/2011/12/15/on-the-continuing-decline-of-the-gpl/">*</a>), and Amazon's nose-thumbing may be the step that forces a re-evaluation of this enterprise-friendly stance.</p>
</div>
</div>
<div class="outline-2" id="outline-container-5">
<h2 id="sec-5">Apple: Stepping in Front of Google</h2>
<div class="outline-text-2" id="text-5"><dl> <dt>Prediction</dt><dd>As the open web fragments, Google will look to its bottom line. </dd> </dl>
<p>Speaking of Google, Apple's voice control system Siri may be the biggest threat the friendly ad-broker has yet faced, and you could argure that Siri is the major threat to the openness of the web.</p>
<p>It's increasingly obvious that the web has several natural bottlenecks, and that these bottlenecks are simultaneously the places where money can be made and chokepoints where political pressure can be applied. Ever since broadband and mobile access replaced ye olde dialup and Internet access became dominated by telcos and cable companies, ISPs have been one set of bottlenecks. Mobile device makers are another. The DNS system itself is yet another, which SOPA is looking to squeeze. Finally, there is aggregation, Silicon Valley's preferred source of influence.</p>
<p>Aggregation creates a single point of entry into a part of the web, whether it's aggregating consumer items (Amazon), digital products (Apple), people (Facebook), or the web itself (Google), and aggregation is driven by increasing returns to scale. The point of aggregators is to stand between us and what we want to reach, guiding us to those parts of it that seem best.</p>
<p>The thing about Siri is that it stands in front of Google, potentially displacing the search box as iPhone users' point of entry to the web. Just as removal from Google's search engine makes you vanish from the web, so Siri has the potential to make Google vanish. Well, not vanish in the short term, but fade at least. Apple negotiates deals with providers like Yelp and Wolfram Alpha, doing an end run around the PageRank algorithm.</p>
<p>If Siri and other voice-recognition "assistants" move towards the mainstream, we can expect to experience an increasingly curated/censored version of the web (<a href="http://crookedtimber.org/2011/11/30/10-things-the-iphone-siri-will-help-you-get-instead-of-an-abortion/">*</a>). The relationship between Apple and the anti-establishment has always been love-hate, and Siri may drive it into hate-hate.</p>
<p>Google's friendly image can last only so long as its growth rate and profit margins stay healthy. It's already lost the aura of being the place to be for programmers, soon we'll soon see enough competition to force Google into a more orthodox stance, and that will shock a lot of observers.</p>
<div id="footnotes">
<h2 class="footnotes">Footnotes:</h2>
<div id="text-footnotes">
<p class="footnote"><sup><a class="footnum" href="#fnr.1" name="fn.1">1</a></sup> A few years ago Andrew Keen's silly "Cult of the Amateur" was   the most prominent digital criticism book. Now we have Zittrain   ("The Future of the Internet and How to Stop It"), Carr ("The   Shallows"), Turkle ("Alone Together"), Wu ("The Master Switch"),   Lanier ("You Are Not a Gadget") and many more.</p>
<p class="footnote"><sup><a class="footnum" href="#fnr.2" name="fn.2">2</a></sup> "Demands" is not the best word, as Chris Dillow points out (<a href="http://stumblingandmumbling.typepad.com/stumbling_and_mumbling/2012/01/demands.html">*</a>)</p>
</div>
</div>
</div>
</div></div>
</content>



    <feedburner:origLink>http://whimsley.typepad.com/whimsley/2012/01/2012-predictions-turning-points-for-the-web.html</feedburner:origLink></entry>
    <entry>
        <title>Comment problems and one other update</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Whimsley/~3/xqj2WC57dx8/comment-problems-and-one-other-update.html" />
        <link rel="replies" type="text/html" href="http://whimsley.typepad.com/whimsley/2011/10/comment-problems-and-one-other-update.html" thr:count="4" thr:updated="2012-01-09T04:20:36-05:00" />
        <id>tag:typepad.com,2003:post-6a00d83451d3b369e2015392680549970b</id>
        <published>2011-10-18T19:06:54-04:00</published>
        <updated>2011-10-18T19:06:54-04:00</updated>
        <summary>I've had a couple of people tell me they were unable to post comments. After taking this up with Typepad, I have shifted from their "Typepad connect" system back to the vanilla commenting system. If you have problems posting a...</summary>
        <author>
            <name>tomslee</name>
        </author>
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://whimsley.typepad.com/whimsley/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>I've had a couple of people tell me they were unable to post comments. After taking this up with Typepad, I have shifted from their "Typepad connect" system back to the vanilla commenting system. If you have problems posting a comment, then I would appreciate an email at last name dot firstname (gmail). And you can rest assured it isn't personal unless you are a spambot, in which case it is, or would be if you were a person.</p>
<p>Also, in <a href="http://whimsley.typepad.com/whimsley/2011/09/earlier-today-i-thought-i-was-doomed-to-fail-that-part-3-of-this-prematurely-announced-trilogy-was-just-not-going-to-get-wr.html" target="_self">a recent post</a> I suggested that there was a conflict of interest in a <a href="http://ijoc.org/ojs/index.php/ijoc/article/viewFile/1246/613" target="_self">paper</a> I read. The authors have now published an explanation at the end of the paper and I retract the suggestion (I still don't understand why lead author Gilad Lotan lists his affiliation as he does, but that's his personal decision). I've put a note in the post to clarify.</p></div>
</content>



    <feedburner:origLink>http://whimsley.typepad.com/whimsley/2011/10/comment-problems-and-one-other-update.html</feedburner:origLink></entry>
    <entry>
        <title>Morozov on Jarvis: Is There a Point?</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Whimsley/~3/s_EakS8KBp4/morozov-vs-jarvis.html" />
        <link rel="replies" type="text/html" href="http://whimsley.typepad.com/whimsley/2011/10/morozov-vs-jarvis.html" thr:count="3" thr:updated="2011-10-17T21:14:50-04:00" />
        <id>tag:typepad.com,2003:post-6a00d83451d3b369e20153924d58e8970b</id>
        <published>2011-10-16T15:07:42-04:00</published>
        <updated>2011-10-16T22:24:03-04:00</updated>
        <summary>Jeff Jarvis's 2009 book What Would Google Do? is a breathless paean to the benefits of sharing, linking, and being open, but it has not a single reference or footnote, and no bibliography. Jarvis extols the virtues of listening and...</summary>
        <author>
            <name>tomslee</name>
        </author>
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://whimsley.typepad.com/whimsley/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>Jeff Jarvis's 2009 book <em>What Would Google Do? </em>is a breathless paean to the benefits of sharing, linking, and being open, but it has not a single reference or footnote, and no bibliography. Jarvis extols the virtues of listening and speaks of mutuality but in the end, of course, the benefits flow one way. Jeff Jarvis has become wealthy from this new ethic of sharing -- he is fond of "starting conversations" which he can then take ownership of -- but when it comes to giving credit to those who come before, for example by referencing previous writers on the topics he addresses, well it just seems like it's too much work for him. The book is one long argument by assertion, unsupported by facts and liberally sprinkled with utterances like "small is the new big" or "We have shifted from an economy based on scarcity to one based on abundance" or "Google has built its empire on trusting us".</p>
<p>It looks like his new book, <em>Public Parts</em>, is more of the same. <em>The New Republic</em> just published a long review of the book by Evgeny Morozov<a href="http://www.tnr.com/print/article/books/magazine/96116/the-internet-intellectual" target="_self"> here</a> or <a href="http://www.tnr.com/article/books/magazine/96116/the-internet-intellectual" target="_self">here</a>. It's forthright, opinionated, angry, entertaining and also makes some damning arguments against the book. </p>
<p>Jeff Jarvis responds to the review <a href="http://www.buzzmachine.com/2011/10/13/a-bad-review-of-me/" target="_self">here</a> in bizarre fashion. He first raises the prospect of personal prejudice ("Morozov reliably dislikes me, just as he dislikes people I quote") and then dismisses the review as "he writes only a personal attack". Morozov spends 800 words critiquing Jarvis's misunderstanding of ideas about the public sphere and his oversimplification of Habermas, which Jarvis distorts and reduces to a complaint "about the names Habermas and Oprah appearing in the same book". Morozov spends 600 words on <em>Public Parts'</em> culturally narrow ideas about Germany, Finland, and the strange attitudes of non-Americans to privacy, which Jarvis encapsulates as "[Morozov] finds Streetview to be a case of Germans 'tyrannized by an American company'". In short, Jarvis exaggerates and distorts the arguments before dismissing them.</p>
<p style="text-align: center;">*                                *                                 *</p>
<p>To anyone who reads carefully, the argument is over and Morozov wins, but unfortunately that's not the end of the story. Much of Morozov's frustration comes from Jarvis's refusal to engage with the world of facts. He stays safely in the world of pronouncement ("Publicness is a sign of our empowerment", "the crowd owns the wisdom of the crowd" and so on). Jarvis is skilled at the marketing of ideas: if you Google [publicness], four of the first page listings are about or by Jarvis, and this canny use of branding will keep his profile high, well beyond the reach of factual criticism. Jarvis knows his audience and what they want to hear, and what they want is a self-help message for businesses: the world is changing, everything you thought you knew is irrelevant, and I have the key to the future.</p>
<p>So what, then, is the point of the hours Morozov spent writing a 7,000 word review if he won't reach Jarvis's core constituency? There are two other audiences that such pieces can reach. One is to shore up those who broadly agree with Morozov's perspective (yes, like me) that there is an ulterior motive, a very familiar and old-fashioned one, behind this talk of sharing and publicness. We cannot read every new book, watch every new TED talk, attend every conference and yet we do need to stay current and stay informed. I am not going to read <em>Public Parts</em> because there are so many other things to read, but I cannot afford to be completely ignorant of it. Morozov's review does the job for me.</p>
<p>The second is more important. Many people are attracted by the romantic rhetoric of openness, sharing, and the end of existing institutions, but not all have yet sorted out the political consequences of a commitment to these virtues. There are still people on the fence - and it's important for these people to know that, no matter what progressive-sounding language is used, some of the most idealistic arguments for sharing are made by those who will mine the data you provide in order to build fortunes from advertising. To shape that debate and to keep a political space open for an Internet that does not simply follow the venture-capitalist idea of progress, we need fact based arguments, so kudos to Morozov for doing the necessary work in this case.</p></div>
</content>



    <feedburner:origLink>http://whimsley.typepad.com/whimsley/2011/10/morozov-vs-jarvis.html</feedburner:origLink></entry>
    <entry>
        <title>Broken Promises: Following Your Dreams, and the 99 percent</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Whimsley/~3/UMHGH5bb-3w/broken-promises-following-your-dreams-and-the-99-percent.html" />
        <link rel="replies" type="text/html" href="http://whimsley.typepad.com/whimsley/2011/10/broken-promises-following-your-dreams-and-the-99-percent.html" thr:count="7" thr:updated="2012-01-05T03:50:57-05:00" />
        <id>tag:typepad.com,2003:post-6a00d83451d3b369e2015435f39ebe970c</id>
        <published>2011-10-06T22:30:47-04:00</published>
        <updated>2011-10-06T22:41:59-04:00</updated>
        <summary>This speech by Steve Jobs has been posted in many places over the last 24 hours: It is a strange speech: quite moving, personal, modest, and thoughtful. But in the end it’s a “follow your dreams” speech, and as such...</summary>
        <author>
            <name>tomslee</name>
        </author>
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://whimsley.typepad.com/whimsley/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>This speech by Steve Jobs has been posted in many places over the last 24 hours:</p>
<p><iframe frameborder="0" height="315" src="http://www.youtube.com/embed/UF8uR6Z6KLc?rel=0" width="420" /> </p>
<p>It is a strange speech: quite moving, personal, modest, and thoughtful. But in the end it’s a “follow your dreams” speech, and as such is quite a contrast to another Internet event of the moment, the very moving stories being posted at <a href="http://wearethe99percent.tumblr.com/" rel="nofollow">We are the 99 percent</a>.</p>
<p>“Follow your dreams” invokes a cosmic bargain (fortune favours the brave) and it also invokes a social bargain: that if you work hard, and have a little luck, society will ensure that your efforts are rewarded. Meanwhile, the "we are the 99%" posters “sense that the fundamental bargain of our economy – work hard, play by the rules, get ahead – has been broken, and they want to see it restored” (Felix Salmon, quoted <a href="http://www.thestar.com/business/article/1065057--olive-99-percenters-are-literally-sick-of-being-left-out" rel="nofollow">here</a>).</p>
<p>So nothing against the guy, but over the next few days I’ll think more about what the 99%-ers say than what Steve Jobs said at Stanford. One of the stories he tells is of dropping out of college and, instead, monitoring courses independently. It's an inspiring story, but the contrast to <a href="http://wearethe99percent.tumblr.com/post/11123269648/20-years-old-dropped-out-of-college-because-i" target="_self">this post</a>, made today, is glaring.</p></div>
</content>



    <feedburner:origLink>http://whimsley.typepad.com/whimsley/2011/10/broken-promises-following-your-dreams-and-the-99-percent.html</feedburner:origLink></entry>
    <entry>
        <title>My favourite post...</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Whimsley/~3/qiEMUbdtF68/my-favourite-post.html" />
        <link rel="replies" type="text/html" href="http://whimsley.typepad.com/whimsley/2011/09/my-favourite-post.html" thr:count="2" thr:updated="2011-10-10T11:43:38-04:00" />
        <id>tag:typepad.com,2003:post-6a00d83451d3b369e2015435c37d40970c</id>
        <published>2011-09-28T20:13:52-04:00</published>
        <updated>2011-09-28T20:13:52-04:00</updated>
        <summary>... is this one. Every now and then I look back at previous posts on this blog. Some I still like, some not so much. Some got a lot of views at one time or another, but my favourite post...</summary>
        <author>
            <name>tomslee</name>
        </author>
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://whimsley.typepad.com/whimsley/">
<div xmlns="http://www.w3.org/1999/xhtml"><p>... is <a href="http://whimsley.typepad.com/whimsley/2009/11/pirates-dilemma-review-remixed.html" target="_self">this one</a>.</p>
<p>Every now and then I look back at previous posts on this blog. Some I still like, some not so much.  Some got a lot of views at one time or another, but my favourite post of all got little attention. </p>
<p>I think that this time right now, with Amazon's and Facebook's recent announcements and Apple's to come next month, mark a turning point in attitudes to the web and the companies that dominate it.</p>
<p>So please, I don't often trumpet my own writing and it's not that easy to read, but <a href="http://whimsley.typepad.com/whimsley/2009/11/pirates-dilemma-review-remixed.html" target="_self">this post</a> is exactly what this blog is all about.</p></div>
</content>



    <feedburner:origLink>http://whimsley.typepad.com/whimsley/2011/09/my-favourite-post.html</feedburner:origLink></entry>
    <entry>
        <title>Data Anonymization and Re-identification: Some Basics Of Data Privacy</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Whimsley/~3/kU3jrt6oGvc/data-anonymization-and-re-identification-some-basics-of-data-privacy.html" />
        <link rel="replies" type="text/html" href="http://whimsley.typepad.com/whimsley/2011/09/data-anonymization-and-re-identification-some-basics-of-data-privacy.html" thr:count="6" thr:updated="2011-10-21T08:59:44-04:00" />
        <id>tag:typepad.com,2003:post-6a00d83451d3b369e2014e8bd8be15970d</id>
        <published>2011-09-26T23:49:24-04:00</published>
        <updated>2011-09-27T12:58:15-04:00</updated>
        <summary>Why Personally Identifiable Information is irrelevant. An introduction to information entropy, open data, and the possible end of crowdsourcing. Tim O'Reilly and ZIP Codes From his Strata Conference on Data Science, Tim O'Reilly tweeted with dismay the recent California court...</summary>
        <author>
            <name>tomslee</name>
        </author>
        
        
<content type="xhtml" xml:lang="en-US" xml:base="http://whimsley.typepad.com/whimsley/">
<div xmlns="http://www.w3.org/1999/xhtml"><div><em>Why Personally Identifiable Information is irrelevant. </em><em><span style="font-weight: normal; font-size: small;">An introduction to information entropy, open data, and the possible  end of crowdsourcing. </span></em></div>
<div class="outline-3" id="outline-container-1_1">
<h3 id="sec-1_1">Tim O'Reilly and ZIP Codes</h3>
<div class="outline-text-3" id="text-1_1">
<p>From his <a href="http://strataconf.com/strata2012">Strata Conference on Data Science</a>, Tim O'Reilly <a href="https://twitter.com/#!/timoreilly/status/115772828120383489">tweeted</a> with  dismay the recent California court decision that the zipcode is now to  be classified as "personally identifiable information". "No more  demographics" he lamented. A little later he <a href="https://twitter.com/#!/timoreilly/status/115776752017620992">retweeted</a> a response that  "apparently 87% of US residents can be uniquely identified by  zip+DOB+gender: bit.ly/qysMqs" and later <a href="https://twitter.com/#!/timoreilly/status/117592754439192576">followed up</a> with "Here's a  reference for the claim that zip code, gender and DOB uniquely  identify 87% of individuals:  <a href="http://www.citeulike.org/user/burd/article/5822736">http://www.citeulike.org/user/burd/article/5822736</a> via @crdant".</p>
<p>These tweets are odd and disturbing. The zip/DOB/gender finding is a  basic one in studies of privacy, published years ago by Latanya  Sweeney of Carnegie Mellon University. I gave a talk at work on  privacy a year ago, and this was one of the first references I came  across. Tim O'Reilly has been pushing an agenda of Open Data,  particularly Open Government Data, for the last couple of years, and  yet it looks as if he isn't aware of the basic privacy issues around  such data. Can that really be the case?</p>
<p>If it is, then here, to help Tim along, are some notes from my talk as  a kind of introduction to data privacy, or at least to  data-anonymization and re-identification. A great resource on some of  these issues from a legal perspective is Paul Ohm's 2009 paper "Broken  Promises of Privacy: Responding to the Surprising Failures of  Anonymization" (<a href="http://epic.org/privacy/reidentification/ohm_article.pdf">PDF</a>), University of Colorado Law Legal Studies  Research Paper No. 09-12. It's long, but it's so well written it's an  easy read. Much of these notes originated with this paper, in one form  or another.</p>
</div>
</div>
<div class="outline-3" id="outline-container-1_2">
<h3 id="sec-1_2">How Privacy Broke Crowdsourcing</h3>
<div class="outline-text-3" id="text-1_2">
<p>A few years ago Netflix ran its highly successful and widely  publicised crowdsourced prize competition, in which it released a data  set of users and their movie ratings and let competitors download them  and search for patterns. The data consisted of a customer ID (faked),  a movie, the customer's rating of the movie, and the date of the  rating.</p>
<p>In the FAQ for the competition, Netflix said this:</p>
<p>Q. Is there any customer information in the dataset that should be  kept private?</p>
<p>A. No, all customer identifying information has been removed; all that  remains are ratings and dates. This follows our privacy policy… Even  if, for example, you knew all your own ratings and their dates you  probably couldn't identify them reliably in the data because only a  small sample was included (less than one tenth of our complete  dataset) and that data was subject to perturbation.</p>
<p>This certainly looked reasonable enough, but Arvind Narayanan and  Vitaly Shmatikov of the University of Texas had other ideas.<sup><a class="footref" href="#fn.1" name="fnr.1">1</a></sup> First,  they looked at the claim that the data was perturbed by asking  acquaintances for their rankings. They found that only a small number  of the ratings were perturbed at all, which makes sense because  perturbing the data gets in the way of its usefulness.</p>
<p>In the Netflix data set, different users have distinct sets of movies  that they have watched. The data set is sparse (most people have not  seen most movies), and there are many different movies available, so  individual tastes and viewing histories leave a clear  fingerprint. That is, if you knew what movies someone watched, you  could pick them out of the data set because no one else would have  seen the same combination.</p>
<p>A closer look showed that with 8 ratings (of which 2 may be completely  wrong) and dates that may have a 14-day error, 99% of the records in  the Netflix data set uniquely identify an individual.  For 68% of  records, two ratings and dates are sufficient. Various combinations of  information are sufficient to identify users, eg 84% by 6 of 8 movies  outside the top 500.</p>
<p>But of course there is no personally identifiable information in the  data set. So is this a privacy issue? It is when you have another data  set to look at. The researchers took a sample of 50 IMDB users. The  IMDB data is noisy - there is no ranking, for example. Still, they  identified two users whose Netflix records were 28 and 15 standard  deviations away from the next best. One from ratings, another from  dates.</p>
<p>So despite Netflix's best efforts, the data set included enough  information to identify some individuals. Partly because of this, a  planned follow-up competition was scrapped, and the whole enterprise  of crowdsourcing recommender algorithms was given a possibly terminal  blow.</p>
</div>
</div>
<div class="outline-3" id="outline-container-1_3">
<h3 id="sec-1_3">What's this all about?</h3>
<div class="outline-text-3" id="text-1_3">
<p>Just to be clear, this set of notes is not about the following things:</p>
<ul>
<li> Encryption </li>
<li> Restricting access to data </li>
<li> Lost USB keys and CDs </li>
</ul>
<p>It is about these:</p>
<ul>
<li> Deliberately released data that turns out to infringe on privacy </li>
<li> HIPAA, EU Data Directive, corporate rules for handling customer data </li>
<li> Advertising and ISPs </li>
<li> Gov 2.0, data.gov, and "openness" </li>
</ul>
<p>It's about claims such as: "Attorneys on Monday accused Google of  intentionally divulging millions of users' search queries to third  parties in violation of federal law and its own terms of service"  (October 26 2010)</p>
<p>"MySpace and some popular applications on the social-networking site  have been transmitting data to outside advertising companies that  could be used to identify users, a Wall Street Journal investigation  has found" (October 23, 2010)</p>
<p>"Facebook users may inadvertently reveal their sexual preference to  advertisers in an apparent wrinkle in the social-networking site's  advertising system, researchers have found" (October 22, 2010)</p>
<p>(These claims are a year old, found in the week before I gave the  talk. I'm sure there are many more.) The Facebook case was one in  which advertisers (for a nursing program I believe) asked to target  their ads specifically at females and at men interested in other  men. But unlike, for example, an ad about a gay bar where the target  demographic is blatantly obvious, a male user reading the ad text would  have no idea that it had been targeted solely at a very specific  demographic, and that by clicking it he would reveal to the advertiser  both his sexual preference and a unique identifier (cookie, IP  address, or e-mail address if he signs up on the advertiser's site).  "Furthermore (the researchers <a href="http://www.pcworld.com/article/208583/facebook_ads_could_out_gay_users_researchers_say.html">wrote</a>) such deceptive ads are not  uncommon; indeed exactly half of the 66 ads shown exclusively to gay  men (more than 50 times) during our experiment did not mention 'gay'  anywhere in the ad text."</p>
</div>
</div>
<div class="outline-3" id="outline-container-1_4">
<h3 id="sec-1_4">Don't we have laws to deal with this?</h3>
<div class="outline-text-3" id="text-1_4">
<p>Indeed we do. Europe and the USA adopt different approaches to  balancing privacy and utility, with the US adopting industry-specific  rules (HIPAA for health, FERPA for education, Driver's Privacy  Protection Act, FDA regulations, Video Privacy Protection Act etc),  while the EU has taken a global approach with the Data Protection  Directive. But both approaches are based on a common set of concepts  and assumptions.</p>
<p>The big thing is that there is an assumption that data can be  anonymized, and once it is then you can share it, because where's the  harm? Both sets of rules are built on the idea that there is such a  thing as personally identifiable information (PII) and that you can  hide it, while still releasing data that is useful. The release  process is "release and forget" because if data is properly anonymized  why do you have to track what's done with it? There is a faith in the  anonymization process, and that faith was broken by the Netflix study  and a couple of other related studies.</p>
</div>
</div>
<div class="outline-3" id="outline-container-1_5">
<h3 id="sec-1_5">Latanya Sweeney and the Massachusetts Governor</h3>
<div class="outline-text-3" id="text-1_5">
<p>Let's go back to a time before HIPAA, when the debate was focused in  terms of <em>how much</em> anonymization you needed to do. Here are some  quotations from Latanya Sweeney's paper (<a href="http://epic.org/privacy/reidentification/Sweeney_Article.pdf">PDF</a>), that Tim O'Reilly appeared  unaware of.</p>
<p>"Figure 1" below is a simple Venn diagram with two intersecting  circles. The left circle holds medical data: Ethnicity, Visit Date,  Diagnosis, Procedure, Medication, Total Charge. The right circle holds  a voter list: Name, Address, Date Registered, Party Affiliation, Date  Last Voted. And in the intersection is ZIP, Date of Birth, Sex.</p>
<blockquote>
<p>The National Association of Health Data Organizations (NAHDO) reported  that 37 states in the USA have legislative mandates to collect  hospital level data and that 17 states have started collecting  ambulatory care data from hospitals, physicians offices, clinics, and  so forth. The leftmost circle in Figure 1 contains a subset of the  fields of information, or attributes, that NAHDO recommends these  states collect; these attributes include the patient's ZIP code, birth  date, gender, and ethnicity.</p>
<p>In Massachusetts, the Group Insurance Commission (GIC) is responsible  for purchasing health insurance for state employees. GIC collected  patient-specific data with nearly one hundred attributes per encounter  along the lines of the those shown in the leftmost circle of Figure 1  for approximately 135,000 state employees and their families. Because  the data were believed to be anonymous, GIC gave a copy of the data to  researchers and sold a copy to industry.</p>
<p>For twenty dollars I purchased the voter registration list for  Cambridge Massachusetts and received the information on two  diskettes. The rightmost circle in Figure 1 shows that these data  included the name, address, ZIP code, birth date, and gender of each  voter. This information can be linked using ZIP code, birth date and  gender to the medical information, thereby linking diagnosis,  procedures and medications to particularly named individuals.</p>
<p>For example, William Weld was governor of Massachusetts at that time  and his medical records were in the GIC data. Governor Weld lived in  Cambridge Massachusetts. According to the Cambridge Voter list, six  people had his particular birth date; only three of them were men;  and, he was the only one in his 5-digit ZIP code. [Editor's note: a  5-digit zip code may have several thousand people in it.]</p>
<p>The example above provides a demonstration of re-identification by  directly linking (or "matching") on shared attributes. The work  presented in this paper shows that altering the released information  to map to many possible people, thereby making the linking ambiguous,  can thwart this kind of attack. The greater the number of candidates  provided, the more ambiguous the linking, and therefore, the more  anonymous the data.</p>
</blockquote>
<p>In a theatrical flourish, Dr. Sweeney sent the Governor's health  records (which included diagnoses and prescriptions) to his office.</p>
<p>Now, of course, health information in the US is governed by HIPAA, but  according to HIPAA, "de-identified" health information is unregulated.  <em>De-identified</em> means either: a statistician says it is de-identified,  or the 18 Personally Identifying Information (PII) identifiers are  suppressed or generalized. These PIIs are things like Name, e-mail  address, social security numbers, computer IP addresses, and so on.</p>
<p>The EU doesn't list specifics. Instead it says that PII is "anything  that can be used to identify you". But what does that cover? IP  addresses perhaps? Here is Google in their argument to the EU:</p>
<ul>
<li> we "are strong supporters of the idea that data protection laws  should apply to any data that could identify you. The reality is  though that in most cases, an IP address without additional  information cannot." </li>
<li> "We believe anonymizing IP addresses after 9 months and cookies in  our search engine logs after 18 months strikes the right balance." </li>
<li> "we delete the last octet after nine months (170.123.456.XXX)" </li>
</ul>
<p>The Latanya Sweeney result was the first to show that once you can mix  and match data sets, PII is just not enough to provide privacy. And  nowadays, of course, data mining multiple data sets is big business.</p>
</div>
</div>
<div class="outline-3" id="outline-container-1_6">
<h3 id="sec-1_6">How Do You Anonymize Data? k-anonymity</h3>
<div class="outline-text-3" id="text-1_6">
<p>Let's step back a little and look at the technical side of  anonymization. There are four basic methods for anonymizing data:</p>
<dl> 
<ul>
<dt>Replacement - substitute identifying numbers</dt><dt>Suppression - omit from the released data</dt><dt>Generalization - for example, replace birth date with something  less specific, like year of birth</dt><dt>Perturbation - make random changes to the data</dt> 
</ul>
</dl>
<p>Then you have to measure how private a data set. Latanya Sweeney came  up with the notion of k-anonymity to define this. Here's how it works.</p>
<p>Think about a table, with rows and attributes. Each attributes is  either part of a quasi-identifier (like a name or address), or is  sensitive information (like the fact you had an operation on a  particular afternoon).  A quasi-identifier is a set of attributes  that, perhaps in combination, can uniquely identify individuals.  Sensitive information includes the attributes that we want to keep  private.  Your driving license number is an identifier; our driving  record is sensitive information.  The table satisfies <em>k-anonymity</em> iff  each sequence of values in any quasi-identifier appears with at least  k occurrences.  Bigger k is better.</p>
<p>So here is a table with 11 rows.</p>
<table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups">
<caption /> <colgroup><col class="left" /><col class="left" /><col class="right" /><col class="left" /><col class="right" /><col class="left" /> </colgroup> <thead> 
<tr>
<th class="left" scope="col">Name</th><th class="left" scope="col">Race</th><th class="right" scope="col">Birth</th><th class="left" scope="col">Gender</th><th class="right" scope="col">Zip</th><th class="left" scope="col">Problem</th>
</tr>
</thead> 
<tbody>
<tr>
<td class="left">Sean</td>
<td class="left">Black</td>
<td class="right">1965-09-20</td>
<td class="left">M</td>
<td class="right">02141</td>
<td class="left">Short breath</td>
</tr>
<tr>
<td class="left">Daniel</td>
<td class="left">Black</td>
<td class="right">1965-02-14</td>
<td class="left">M</td>
<td class="right">02141</td>
<td class="left">Chest pain</td>
</tr>
<tr>
<td class="left">Kate</td>
<td class="left">Black</td>
<td class="right">1965-10-23</td>
<td class="left">F</td>
<td class="right">02138</td>
<td class="left">Hypertension</td>
</tr>
<tr>
<td class="left">Marion</td>
<td class="left">Black</td>
<td class="right">1965-08-24</td>
<td class="left">F</td>
<td class="right">02138</td>
<td class="left">Hypertension</td>
</tr>
<tr>
<td class="left">Helen</td>
<td class="left">Black</td>
<td class="right">1964-07-11</td>
<td class="left">F</td>
<td class="right">02138</td>
<td class="left">Obesity</td>
</tr>
<tr>
<td class="left">Reese</td>
<td class="left">Black</td>
<td class="right">1964-01-12</td>
<td class="left">F</td>
<td class="right">02138</td>
<td class="left">Chest Pain</td>
</tr>
<tr>
<td class="left">Forest</td>
<td class="left">White</td>
<td class="right">1964-10-23</td>
<td class="left">M</td>
<td class="right">02138</td>
<td class="left">Chest Pain</td>
</tr>
<tr>
<td class="left">Hilary</td>
<td class="left">White</td>
<td class="right">1964-03-15</td>
<td class="left">F</td>
<td class="right">02139</td>
<td class="left">Obesity</td>
</tr>
<tr>
<td class="left">Philip</td>
<td class="left">White</td>
<td class="right">1964-08-13</td>
<td class="left">M</td>
<td class="right">02139</td>
<td class="left">Short breath</td>
</tr>
<tr>
<td class="left">Jamie</td>
<td class="left">White</td>
<td class="right">1967-05-05</td>
<td class="left">M</td>
<td class="right">02139</td>
<td class="left">Chest pain</td>
</tr>
<tr>
<td class="left">Sean</td>
<td class="left">White</td>
<td class="right">1967-03-21</td>
<td class="left">M</td>
<td class="right">02138</td>
<td class="left">Chest pain</td>
</tr>
</tbody>
</table>
<p>If we remove all the attributes except for the problem we have a very  anonymized data set (k = 11):</p>
<table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups">
<caption /> <colgroup><col class="left" /><col class="left" /><col class="left" /><col class="left" /><col class="left" /><col class="left" /> </colgroup> <thead> 
<tr>
<th class="left" scope="col">Name</th><th class="left" scope="col">Race</th><th class="left" scope="col">Birth</th><th class="left" scope="col">Gender</th><th class="left" scope="col">Zip</th><th class="left" scope="col">Problem</th>
</tr>
</thead> 
<tbody>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Short breath</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Chest pain</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Hypertension</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Hypertension</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Obesity</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Chest Pain</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Chest Pain</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Obesity</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Short breath</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Chest pain</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left"> </td>
<td class="left">Chest pain</td>
</tr>
</tbody>
</table>
<p>On the other hand, if we just remove the name and generalize the zip  code and date of birth we have a less anonymized set. Exercise: convince yourself that  k=2 for this set.</p>
<table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups">
<caption /> <colgroup><col class="left" /><col class="left" /><col class="right" /><col class="left" /><col class="left" /><col class="left" /> </colgroup> <thead> 
<tr>
<th class="left" scope="col">Name</th><th class="left" scope="col">Race</th><th class="right" scope="col">Birth</th><th class="left" scope="col">Gender</th><th class="left" scope="col">Zip</th><th class="left" scope="col">Problem</th>
</tr>
</thead> 
<tbody>
<tr>
<td class="left"> </td>
<td class="left">Black</td>
<td class="right">1965</td>
<td class="left">M</td>
<td class="left">0214*</td>
<td class="left">Short breath</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left">Black</td>
<td class="right">1965</td>
<td class="left">M</td>
<td class="left">0214*</td>
<td class="left">Chest pain</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left">Black</td>
<td class="right">1965</td>
<td class="left">F</td>
<td class="left">0213*</td>
<td class="left">Hypertension</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left">Black</td>
<td class="right">1965</td>
<td class="left">F</td>
<td class="left">0213*</td>
<td class="left">Hypertension</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left">Black</td>
<td class="right">1964</td>
<td class="left">F</td>
<td class="left">0213*</td>
<td class="left">Obesity</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left">Black</td>
<td class="right">1964</td>
<td class="left">F</td>
<td class="left">0213*</td>
<td class="left">Chest Pain</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left">White</td>
<td class="right">1964</td>
<td class="left">M</td>
<td class="left">0213*</td>
<td class="left">Chest Pain</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left">White</td>
<td class="right">1964</td>
<td class="left">F</td>
<td class="left">0213*</td>
<td class="left">Obesity</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left">White</td>
<td class="right">1964</td>
<td class="left">M</td>
<td class="left">0213*</td>
<td class="left">Short breath</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left">White</td>
<td class="right">1967</td>
<td class="left">M</td>
<td class="left">0213*</td>
<td class="left">Chest pain</td>
</tr>
<tr>
<td class="left"> </td>
<td class="left">White</td>
<td class="right">1967</td>
<td class="left">M</td>
<td class="left">0213*</td>
<td class="left">Chest pain</td>
</tr>
</tbody>
</table>
<p>Of course, the issue is utility. There is a tradeoff between keeping  the data useful for research and maintaining privacy. Researchers and  attackers are doing the same thing after all: looking for useful  patterns in the data.  With the k=2 data set you can ask questions  about correlation of problems with gender, or with geography to some  extent (although not very specific geographical factors, like toxic  leaks).</p>
<p>It would be nice if you could make the data set anonymous for the  purposes of attackers, but still useful for researchers. But it turns  out you can't. In a paper called <em>The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing</em>, Justin Brickell and  Vitaly Shmatikov investigated the problem. They took a set of  different sanitization methods and compared it to a data set with  trivial sanitization (removal of identifiers). Here are the results.</p>
<p><a href="http://whimsley.typepad.com/.a/6a00d83451d3b369e2015391e4feb4970b-popup" onclick="window.open( this.href, '_blank', 'width=640,height=480,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0' ); return false" style="display: inline;"><img alt="Privacy_utility" class="asset  asset-image at-xid-6a00d83451d3b369e2015391e4feb4970b" src="http://whimsley.typepad.com/.a/6a00d83451d3b369e2015391e4feb4970b-320wi" title="Privacy_utility" /></a> <br /><br /></p>
<p>The left bar of each pair is the privacy (smaller = more private), The  right represent the utility to the researcher (bigger = more  useful). Anonymization seeks to shorten left without shortening the  right, but the results show, depressingly, that small increases in  privacy cause large decreases in utility.</p>
</div>
</div>
<div class="outline-3" id="outline-container-1_7">
<h3 id="sec-1_7">Please could you tell me about the Database of Ruin?</h3>
<div class="outline-text-3" id="text-1_7">
<p>OK, if you insist.</p>
<p>If we are going to take a new look, we need to recognize that privacy  is not a binary issue, and <em>it is not a property of a single data set</em>. We need to worry about reidentification attacks that do not reveal  sensitive information. As Paul Ohm writes: "For every person on earth,  there is at least one fact about them stored in a computer database  that an adversary could use to blackmail, discriminate against,  harass, or steal the identity of him or her… the 'database of ruin'  containing this fact but now splintered across dozens of databases on  computers around the world."</p>
<p>Privacy is erased incrementally as successive queries reduce  uncertainty and narrow in on an individual. The way to quantify this  reduction in uncertainty is to use the idea of <em>information entropy</em>,  adopted from the thermodynamic concept, and usually identified by  H. The information gained in a query is</p>
<p>H(before) - H(after)</p>
<p>as you  increase your knowledge of a system, the entropy (loosely, disorder)  decreases.</p>
<p>So what is the formula for H?</p>
<p>For a set of outcomes {i,…}, each with probability p<sub>i</sub>, the  information entropy is:</p>
<p>H = - SUM p<sub>i</sub> log<sub>2</sub>(p<sub>i</sub>)</p>
<p>(excuse the lack of greek sigma), and is measured in bits. The  logarithm appears because the probability of two independent events  occurring is the product of the probabilities of each event, but the  information we gain from observing two independent events is the sum  of the information we gain from each event</p>
<p>Take a simple example: a coin toss. Before the toss, there are two  outcomes with equal probabilities, so</p>
<p>H =  -(1/2) log<sub>2</sub>(1/2) - (1/2)log<sub>2</sub>(1/2)</p>
<p>= - log<sub>2</sub>(1/2)</p>
<p>= log<sub>2</sub>(2)</p>
<p>= 1 bits</p>
<p>which makes sense if you think about it, because the coin could be heads  (1) or tails (0).</p>
<p>After the toss, H = log<sub>2</sub>(1) = 0 : there is no uncertainty left and we  have complete information about the system.</p>
<p>In the same way, a dice roll has (before rolling it) an entropy of</p>
<p>H = log<sub>2</sub>(6) ~ 2.6 bits</p>
<p>So if the challenge is to identify one person in the population of the  world, how much information entropy is there?  The identity of a  random, unknown person is just under 33 bits (233 ~ 8 billion). Hence  the web site 33bits.org.</p>
<p>Learning a fact about the individual reduces the uncertainty (reduces  information entropy). So if you learn that the star sign is Capricorn  then that's -log<sub>2</sub>(1/12) = log<sub>2</sub>(12) = 3.58 bits of information.</p>
<p>If you find out other independent pieces of information you add up the  contributions to find out how much the the entropy has been  reduced. So a ZIP code may provide 23.81 bits of information, a  birthday 8.51, and gender 1 bit for a total of 33.32 bits: it probably  identifies one individual.</p>
<p>The Netflix de-anonymization paper used these ideas a bit.  The a  priori entropy of the data set (additional information required for  complete de-anonymization) is 19 bits (2<sup>19</sup> = 524288, which is about  the number of individuals in the data set).  Individual movies give  from 1 to 18 bits of information, depending on what you know about  them (dates within 14 days, rankings +/- 1).  Very popular movies gave  little information, but niche movies viewed by few individuals yielded  many bits of information. So little auxiliary information is needed to  re-identify records in the database.  In a theoretical excursion, the  researchers showed that de-anonymization is going to be robust against  noise, and does not need much additional information, so long as the  data set is large and sparse enough.</p>
</div>
</div>
<div class="outline-3" id="outline-container-1_8">
<h3 id="sec-1_8">So what now?</h3>
<div class="outline-text-3" id="text-1_8">
<p>I can do nothing better than quote from Paul Ohm to summarize the  privacy dilemma we find ourselves in.</p>
<p>"Abandoning PII is a disruptive and necessary first step, but it is  not enough alone to restore the balance between privacy and utility  that we once enjoyed. How do we fix the dozens, perhaps hundreds, of  laws and regulations that we once believed reflected a finely  calibrated balance but in reality rested on a fundamental  misunderstanding of science?</p>
<p>Techniques that eschew release-and-forget may improve over time, but  because of inherent limitations like those described above, they will  never supply a silver bullet alternative.  Technology cannot save the  day, and regulation must play a role.</p>
<p>Ohm notes that the US sectoral approach to regulation sets the privacy  bar too low, by focusing on explicitly listed PII. Meanwhile the EU, by  saying "anything that can be used to identify you" would, if  interpreted in the light of modern de-anonymization techniques, be too  high.</p>
<p>The direction to take, says Ohm, is to focus on the people, not the  anonymization, and to distinguish private from public release. We need  to codify notions of trust and practices and apply strong sanctions  against re-identification. This will put more administrative and  procedural burdens in our future, but is needed to preserve privacy.</p>
<p>And, to return to Tim O'Reilly's tweet, open data advocates and big  data enthusiasts need to pay more attention to these issues rather  than relying, as some do, on personally identifying information as a  sufficient solution.</p>
</div>
</div>
<div id="footnotes">
<h2 class="footnotes">Footnotes:</h2>
<div id="text-footnotes">
<p class="footnote"><sup><a class="footnum" href="#fnr.1" name="fn.1">1</a></sup> Arvind Narayanan and Vitaly Shmatikov, <em>Robust De-anonymization of Large Sparse Datasets</em>, IEEE Symposiom on Security and  Privacy, 2008. (<a href="http://dl.acm.org/citation.cfm?id=1398064">gated link</a>)</p>
</div>
</div></div>
</content>



    <feedburner:origLink>http://whimsley.typepad.com/whimsley/2011/09/data-anonymization-and-re-identification-some-basics-of-data-privacy.html</feedburner:origLink></entry>
 
</feed><!-- ph=1 -->

