<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>My Place in the Crowd</title>
	
	<link>http://blog.myplaceinthecrowd.org</link>
	<description>The Common Data Project Blog</description>
	<lastBuildDate>Fri, 12 Mar 2010 21:00:02 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/MyPlaceInTheCrowd" /><feedburner:info uri="myplaceinthecrowd" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>In the mix</title>
		<link>http://feedproxy.google.com/~r/MyPlaceInTheCrowd/~3/RnMeVIGbkUA/</link>
		<comments>http://blog.myplaceinthecrowd.org/2010/03/12/in-the-mix-25/#comments</comments>
		<pubDate>Fri, 12 Mar 2010 21:00:02 +0000</pubDate>
		<dc:creator>Grace Meng</dc:creator>
				<category><![CDATA[Interesting Uses of Data]]></category>
		<category><![CDATA[Protecting Privacy in Meaningful Ways]]></category>

		<guid isPermaLink="false">http://blog.myplaceinthecrowd.org/?p=1123</guid>
		<description><![CDATA[1) The CDC recently used shopper-card data to track a salmonella outbreak that sickened 245 in 44 states.  It turned out the pepper in salami made in Rhode Island was the culprit.  Although the CDC began to suspect through interviews and questionnaires that some sort of Italian meat product was the problem, the people they [...]]]></description>
			<content:encoded><![CDATA[<p>1) <a href="http://news.yahoo.com/s/ap/20100311/ap_on_sc/us_shopper_card_sleuths#mwpphu-container"><strong>The CDC recently used shopper-card data to track a salmonella outbreak that sickened 245 in 44 states</strong></a>.  It turned out the pepper in salami made in Rhode Island was the culprit.  Although the CDC began to suspect through interviews and questionnaires that some sort of Italian meat product was the problem, the people they talked to couldn&#8217;t remember precisely what they had bought and the shopper-card records helped them identify the actual product.</p>
<p>Great story, right?  Unless you&#8217;re the director of Consumers Against Supermarket Privacy Invasion and Numbering, in which case, the story smacks of privacy invasion by the government.  The CDC got the records with the permission of the account holders, but to Katherine Albrecht and several of the commenters to the Yahoo News Story, that didn&#8217;t assuage their fears.</p>
<p>Here&#8217;s a choice quote: &#8220;I&#8217;d rather have a few die from poisoning and then they fix the problem  then have the entire country enslaved, thank you very much.&#8221;</p>
<p>There was at least one person who pointed out commenting on a Yahoo news story wasn&#8217;t going to do much to preserve their privacy either.</p>
<p>2) <strong><a href="http://www.readwriteweb.com/archives/myspace_bulk_data.php?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+readwriteweb+%28ReadWriteWeb%29">MySpace is selling bulk user data!</a> </strong>I&#8217;m with ReadWriteWeb:</p>
<blockquote><p>I think the world is an awfully unfair mess and I&#8217;m hoping that data  analysis will help illuminate some of the hows and the whys. Like the  way that <a href="http://en.wikipedia.org/wiki/Redlining">real-estate  redlining</a> was exposed back in the day by cross referencing census  data around racial demographics and housing loan data. That illuminated  systematic discrimination against black families in applying for home  loans in certain parts of town. So too I think we&#8217;ll find a lot of  undeniable proof of injustices and clues for how we might deal with them  in big data today.</p></blockquote>
<p><a href="http://commondataproject.org/">We don&#8217;t want another AOL debacle on our hands</a>, but we also don&#8217;t want to give up on the possibilities of &#8220;big data&#8221; because we prematurely assume better privacy-creating techniques and standards aren&#8217;t available.</p>
<p>3) My, it&#8217;s a privacy-obsessed week!  <strong>Here&#8217;s one person&#8217;s argument &#8220;<a href="http://news.cnet.com/8301-13578_3-20000336-38.html?part=rss&amp;amp;subj=news&amp;amp;tag=2547-1_3-0-20">why no one cares about privacy</a>.&#8221;</strong> It&#8217;s a good round-up of pithy quotes from people like Judge Posner, new &#8220;talk about me&#8221; sites like Blippy.com, and surveys demonstrating the change in the public&#8217;s attitude over time.  <a href="Compare this to a Harris poll conducted in 1998, the same year Google was founded, that found a remarkable 80 percent of people were hesitant to shop online because of privacy worries. ">Wow, in 1998, 80% of people in a Harris poll said they were hesitant to shop online because of privacy worries</a>.</p>
<p>Still, articles like this and the comments to the <a href="http://news.yahoo.com/s/ap/20100311/ap_on_sc/us_shopper_card_sleuths">Yahoo CDC-shopper data article</a> show how much our discussion of privacy involves people yelling at each other across a very big divide.  Is the choice really a binary one?  Privacy + a few deaths versus Big Brother + public health data?  I don&#8217;t care if the CDC has access to my grocery records; at the same time, I don&#8217;t plan to sign up for Blippy.com and broadcast my purchase of kale and four kinds of cheese this morning.  (Oops, I just did.)  Maybe we should stop talking about &#8220;privacy&#8221; and start talking about specific situations.</p>
<img src="http://feeds.feedburner.com/~r/MyPlaceInTheCrowd/~4/RnMeVIGbkUA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.myplaceinthecrowd.org/2010/03/12/in-the-mix-25/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.myplaceinthecrowd.org/2010/03/12/in-the-mix-25/</feedburner:origLink></item>
		<item>
		<title>Prostate Cancer and the Inexorable Pull To Act On Unlikely Events</title>
		<link>http://feedproxy.google.com/~r/MyPlaceInTheCrowd/~3/2XGUiSS2hLA/</link>
		<comments>http://blog.myplaceinthecrowd.org/2010/03/10/prostate-cancer-and-the-inexorable-pull-to-act-on-unlikely-events/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 19:35:39 +0000</pubDate>
		<dc:creator>Mimi Yin</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Protecting Privacy in Meaningful Ways]]></category>
		<category><![CDATA[Privacy Guarantee]]></category>

		<guid isPermaLink="false">http://blog.myplaceinthecrowd.org/?p=1111</guid>
		<description><![CDATA[
Here&#8217;s another example of how we seize on numbers we can see, no matter how uncertain and meaningless they might be, because there&#8217;s not yet a viable alternative source of information.
As a society, we will probably opt for prostate testing no matter how flawed it is until there&#8217;s a better, more accurate alternative. In other [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.onewhale.com/sharks/shark155.jpg" alt="" /></p>
<p><a href="http://www.nytimes.com/2010/03/10/opinion/10Ablin.html?em">Here&#8217;s another example of how we seize on numbers we can see, no matter how uncertain and meaningless they might be</a>, because there&#8217;s not yet a viable alternative source of information.</p>
<p>As a society, we will probably opt for prostate testing no matter how flawed it is until there&#8217;s a better, more accurate alternative. In other words, bad, misleading information is better than no information, especially in a culture that prizes initiative and can-do-ness over a more fatalistic view of life: Yes We Can!</p>
<p>This is a design challenge for anybody trying to help people make sense of data. It is also especially important for us right now as we try to figure out a meaningful privacy guarantee for the datatrust. It&#8217;s easy for us to guarantee that you&#8217;ll never know with 100% certainty the answer to any question. But in many situations, people won&#8217;t need anything close to 100% certainty to feel compelled to act. </p>
<p>Certainly in the case of screening for diseases, it&#8217;s incredibly hard to do nothing if there is even a hint of a chance that we might be fatally ill.</p>
<p>What are other examples of numbers we make too much of and can&#8217;t get enough of?</p>
<ul>
<li>Poll numbers
<li>Housing data
<li> Almost any study that comes about health and nutrition
</ul>
<img src="http://feeds.feedburner.com/~r/MyPlaceInTheCrowd/~4/2XGUiSS2hLA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.myplaceinthecrowd.org/2010/03/10/prostate-cancer-and-the-inexorable-pull-to-act-on-unlikely-events/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.myplaceinthecrowd.org/2010/03/10/prostate-cancer-and-the-inexorable-pull-to-act-on-unlikely-events/</feedburner:origLink></item>
		<item>
		<title>In the mix</title>
		<link>http://feedproxy.google.com/~r/MyPlaceInTheCrowd/~3/B5aX9AAzDpk/</link>
		<comments>http://blog.myplaceinthecrowd.org/2010/03/10/in-the-mix-24/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 15:47:59 +0000</pubDate>
		<dc:creator>Grace Meng</dc:creator>
				<category><![CDATA[Interesting Uses of Data]]></category>
		<category><![CDATA[Protecting Privacy in Meaningful Ways]]></category>
		<category><![CDATA[Public Policy]]></category>
		<category><![CDATA[The Future of Advertising and Media]]></category>

		<guid isPermaLink="false">http://blog.myplaceinthecrowd.org/?p=1101</guid>
		<description><![CDATA[1) We&#8217;ve wondered in the past, why don&#8217;t targeted advertising companies just ask you to opt-in to be tracked?  When I first heard about it, I thought this newish website, Blippy.com, described on NPR, was doing something like that.  You actively register a credit card with the site and it shares ALL your transactions with [...]]]></description>
			<content:encoded><![CDATA[<p>1) We&#8217;ve wondered in the past, <strong>why don&#8217;t targeted advertising companies just ask you to opt-in to be tracked</strong>?  When I first heard about it, I thought this newish website, <a href="http://blippy.com/">Blippy.com</a>, <a href="http://www.npr.org/templates/story/story.php?storyId=124445825">described on NPR</a>, was doing something like that.  You actively register a credit card with the site and it shares ALL your transactions with your friends.  Except NPR reports the company was rather vague about how the information gets to marketing companies.  And what exactly are they offering anyway, other than the opportunity to broadcast, &#8220;I am what I buy&#8221;?  The only news being broadcast seem to be about people&#8217;s Netflix and iTunes buying tendencies.  Services like <a href="http://www.mint.com">Mint.com</a> and and <a href="http://www.patientslikeme.com/">Patients Like Me</a> are also using customers&#8217; data to make money, but they&#8217;re offering a real, identifiable service in return.</p>
<p>2) <strong>Google <a href="http://arstechnica.com/tech-policy/news/2010/03/google-keeps-your-data-to-learn-from-good-guys-fight-off-bad-guys.ars">explains</a> why it needs your data to provide a better service.</strong></p>
<blockquote><p>Search data is mined to &#8220;learn from the good guys,&#8221; in Google&#8217;s parlance, by watching how users correct their own spelling mistakes, how they write in their native language, and what sites they visit after searches. That information has been crucial to Google&#8217;s famously algorithm-driven approach to problems like spell check, machine language translation, and improving its main search engine. Without the algorithms, Google Translate wouldn&#8217;t be able to support less-used languages like Catalan and Welsh.</p>
<p>Data is also mined to watch how the &#8220;bad guys&#8221; run link farms and other Web irritants so that Google can takecountermeasures.</p></blockquote>
<p>This is an argument I&#8217;m really glad to hear.  It doesn&#8217;t make the issue of privacy go away, but I&#8217;d love to see privacy advocates and Google talk honestly and thoughtfully about what Google does with the data, how important that is to making Google&#8217;s services useful, and what trade-offs people are willing to make when they ask Google to destroy the data.</p>
<p>3) <strong>Nat Torkington <a href="http://radar.oreilly.com/2010/03/truly-open-data.html">describes</a> how open source principles could be applied for open data.</strong> We heartily agree that these principles could be useful for making data public and useful, though <a href="http://www.commondataproject.org/about#mimi">Mimi</a>, who&#8217;s worked on open source projects, points out that open source production, with its standard processes, is something  that&#8217;s been worked out over decades.  Data management is still relatively in its infancy, so open-sourcing data management will definitely take some work.  Onward ho!</p>
<p>4) <strong>The <a href="http://www.cdt.org/blogs/cdt/who-owns-smart-grid-data-not-right-question">Center for Democracy and Technology</a> and <a href="http://www.eff.org/deeplinks/2010/03/new-smart-meters-energy-use-put-privacy-risk">EFF</a> are thinking about privacy and Smart Grids, </strong>which monitor energy consumption so that consumers can better control their energy use.  I&#8217;m more enthusiastic than EFF about the &#8220;potentially beneficial&#8221; aspects of smart meters, but in any case, it&#8217;s interesting to see these two blog posts within two days of each other.  Energy consumption data, as well as health data, are going to be two huge areas of debate, because the benefits of large-scale data collection and analysis are obvious, even though detailed personal information is involved.</p>
<p>5) <strong><a href="http://www.theonion.com/content/news/google_responds_to_privacy">The Onion reports</a> Google is apologizing for its privacy problems, directed to very specific people.</strong> Ha ha.</p>
<blockquote><p>&#8220;Americans have every right to be angry at us,&#8221; Google spokesperson Janet Kemper told reporters. &#8220;Though perhaps Dale Gilbert should just take a few deep breaths and go sit in his car and relax, like they tell him to do at the anger management classes he attends over at St. Francis Church every Tuesday night.&#8221;</p></blockquote>
<img src="http://feeds.feedburner.com/~r/MyPlaceInTheCrowd/~4/B5aX9AAzDpk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.myplaceinthecrowd.org/2010/03/10/in-the-mix-24/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.myplaceinthecrowd.org/2010/03/10/in-the-mix-24/</feedburner:origLink></item>
		<item>
		<title>In the mix</title>
		<link>http://feedproxy.google.com/~r/MyPlaceInTheCrowd/~3/Uv2s_V0mKXI/</link>
		<comments>http://blog.myplaceinthecrowd.org/2010/03/02/in-the-mix-23/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 15:14:05 +0000</pubDate>
		<dc:creator>Grace Meng</dc:creator>
				<category><![CDATA[Protecting Privacy in Meaningful Ways]]></category>
		<category><![CDATA[Public Policy]]></category>
		<category><![CDATA[The Future of Advertising and Media]]></category>
		<category><![CDATA[browsers]]></category>
		<category><![CDATA[CLOUD]]></category>
		<category><![CDATA[data retention laws]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[government data]]></category>

		<guid isPermaLink="false">http://blog.myplaceinthecrowd.org/?p=1084</guid>
		<description><![CDATA[1) I&#8217;m looking forward to reading this series of blog posts from the Freedom to Tinker blog at Princeton&#8217;s Center for Information Technology Policy on what government datasets should look like to facilitate innovation, as the first one is incredibly clear and smart.
2) The NYTimes Bits blog recently interviewed Esther Dyson, &#8220;Health Tech Investor and [...]]]></description>
			<content:encoded><![CDATA[<p>1) I&#8217;m looking forward to reading <a href="http://www.freedom-to-tinker.com/blog/jcalandr/government-datasets-facilitate-innovation">this series of blog posts</a> from the <a href="http://www.freedom-to-tinker.com/">Freedom to Tinker</a> blog at <a href="http://citp.princeton.edu/">Princeton&#8217;s Center for Information Technology Policy</a> on what government datasets should look like to facilitate innovation, as the first one is incredibly clear and smart.</p>
<p>2) The <a href="http://bits.blogs.nytimes.com">NYTimes Bits blog</a> recently <a href="http://bits.blogs.nytimes.com/2010/02/26/one-on-one-esther-dyson-health-tech-investor-and-space-tourist/">interviewed</a> Esther Dyson, &#8220;Health Tech Investor and Space Tourist&#8221; as the Times calls her, where she shares her thoughts on why ordinary people might want to track their own data and why we shouldn&#8217;t worry so much about privacy.</p>
<p>3) A <a href="http://community.nytimes.com/comments/bits.blogs.nytimes.com/2010/02/26/one-on-one-esther-dyson-health-tech-investor-and-space-tourist/?permid=2#comment2">commenter</a> on the Bits interview with Esther Dyson referenced this new 501(c)(6) nonprofit, <a href="http://www.cloudinc.org/">CLOUD: Consortium for Local Ownership and Use of Data</a>.  Their site says, &#8220;CLOUD has been formed to create standards to give people property rights in their personal information on the Web and in the cloud, including the right to decide how and when others might use personal information and whether others might be allowed to connect personal information with identifying information.&#8221;</p>
<p>We&#8217;ve been <a href="http://blog.myplaceinthecrowd.org/2009/11/30/creative-commons-style-licenses-for-personal-information-part-iii-what-are-the-challenges/">thinking</a> about whether personal information could or should be viewed as personal property, as understood by the American legal system, for awhile now.  I&#8217;m not quite sure it&#8217;s the best or most practical solution, but I&#8217;m curious to see where CLOUD goes.</p>
<p>4) The German Federal Constitutional Court has <a href="http://yro.slashdot.org/story/10/03/02/1254212/German-Data-Retention-Law-Ruled-Unconstitutional?from=rss&amp;utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Slashdot%2Fslashdot+%28Slashdot%29">ruled that the law requiring data retention for 6 months is unconstitutional</a>.  Previously, all phone and email records had to be kept for 6 months for law enforcement purposes.  The court criticized the lack of data security and insufficient restrictions to access to the data.</p>
<p>Although Europe has more comprehensive and arguably &#8220;stricter&#8221; privacy laws, many countries also require data retention for law enforcement purposes.  We in the U.S. might think the Fourth Amendment is going to protect our phone and email records from being poked into unnecessarily by law enforcement, but existing law is even less clear than in Europe.  So much privacy law around telephone and email records is built around antiquated ideas of our &#8220;expectations,&#8221; with analogies to what&#8217;s &#8220;inside the envelope&#8221; and what&#8217;s &#8220;outside the envelope,&#8221; as if all our communications can be easily analogized to snail mail.  All these issues are clearly simmering to a boil.</p>
<p>5) Google&#8217;s <a href="http://tech.slashdot.org/story/10/03/02/1344259/New-Chrome-Beta-Adds-Privacy-Controls-Translation-Option?from=rss&amp;utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Slashdot%2Fslashdot+%28Slashdot%29">introduced a new version of Chrome with more privacy controls</a> that allow you to determine how browser cookies, plug-ins, pop-ups and more are handled on a site-by-site basis.  Of course, those controls won&#8217;t necessarily stop a <a href="http://blog.myplaceinthecrowd.org/2010/03/01/ip-addresses-for-targeted-advertising/">publisher from selling your IP address to a third-party behavioral targeting company</a>!</p>
<img src="http://feeds.feedburner.com/~r/MyPlaceInTheCrowd/~4/Uv2s_V0mKXI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.myplaceinthecrowd.org/2010/03/02/in-the-mix-23/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.myplaceinthecrowd.org/2010/03/02/in-the-mix-23/</feedburner:origLink></item>
		<item>
		<title>IP addresses + zip codes = ?</title>
		<link>http://feedproxy.google.com/~r/MyPlaceInTheCrowd/~3/uNDquk10mpc/</link>
		<comments>http://blog.myplaceinthecrowd.org/2010/03/01/ip-addresses-for-targeted-advertising/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 23:44:11 +0000</pubDate>
		<dc:creator>Grace Meng</dc:creator>
				<category><![CDATA[Protecting Privacy in Meaningful Ways]]></category>
		<category><![CDATA[IP addresses]]></category>
		<category><![CDATA[personally identifiable information]]></category>

		<guid isPermaLink="false">http://blog.myplaceinthecrowd.org/?p=1073</guid>
		<description><![CDATA[
ClearSight Interactive, a new behavioral targeting company, has spent the past 18 months collecting more than 100 million IP addresses.  CEO Tom Alison says, in a comment to the article, &#8220;Our goal is to become the bridge between online and offline data.&#8221;
Whoa, baby.
Alison claims in his comment that Wendy Davis, the writer of the article, [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://www.clearsightinteractive.com/pr-022609.pdf"></a><a href="http://blog.myplaceinthecrowd.org/wp-content/uploads/2010/03/zipcodemap.jpg"><img class="aligncenter size-full wp-image-1078" title="zipcodemap" src="http://blog.myplaceinthecrowd.org/wp-content/uploads/2010/03/zipcodemap.jpg" alt="" width="500" /></a></p>
<p><a href="http://www.clearsightinteractive.com/pr-022609.pdf">ClearSight Interactive</a>, a new behavioral targeting company, has <a href="http://www.mediapost.com/publications/?fa=Articles.showArticle&amp;art_aid=123280">spent</a> the past 18 months collecting more than 100 million IP addresses.  CEO Tom Alison says, in a comment to the article, &#8220;Our goal is to become the bridge between online and offline data.&#8221;</p>
<p>Whoa, baby.</p>
<p>Alison claims in his comment that Wendy Davis, the writer of the article, didn&#8217;t accurately describe what ClearSight Interactive is doing.  So let&#8217;s look at the claims he puts out in his comment.</p>
<blockquote><p>We have a file of IP addresses with 9-digit zip code appended. Our data providers supply the zip code linked to IP without any personally identifiable information. We are able to predict a more likely neighborhood or work location than the zip code or longitude and latitude of the ISPs server readily available from many software or online providers&#8230;</p></blockquote>
<p>In other words, <strong>they know where you live. </strong>Their <a href="http://www.clearsightinteractive.com/pr-022609.pdf">press release</a> says more: &#8220;ClearSight Interactive bridges IP addresses to verified postal addresses and email addresses.&#8221;</p>
<p>Alison claims they do not collect data on online behavior:</p>
<blockquote><p>We offer geo-demographic marketplace data, not behavioral data. We collect no online behavior. Unlike those companies and websites that utilize individual household data and set cookies, we append census and de-identified marketing data at the neighborhood level.  We all know that people in the same household or neighborhood are not the same. But for many useful marketing attributes, bird of a feather do flock or even live together.</p></blockquote>
<p>I guess that&#8217;s supposed to make me feel better, that the company knows where I live but it only <em>guesses </em>what I might be looking for in a car.  Actually, the company isn&#8217;t guessing.  It promises in its press release, &#8220;After a consumer views or clicks an ad, the company can then monitor the users future behavior using contact information databases to determine if they later made a purchase &#8211; e.g. did someone who viewed a car ad actually visit the dealership and purchase a vehicle?&#8221;</p>
<p>Almost more shocking is Alison&#8217;s attitude about the privacy implications.  He repeats over and over that they do not have &#8220;PII&#8221; or &#8220;personally identifying information.&#8221;  If nothing else, we&#8217;ve learned from the AOL debacle and numerous other supposedly anonymized databases, that PII like name and address are not necessary to successfully <a href="http://www.freedom-to-tinker.com/blog/paul/netflixs-impending-still-avoidable-multi-million-dollar-privacy-blunder">reidentify</a> large numbers of people in a dataset.</p>
<p>So how did ClearSight Interactive even get this information?  It <em>bought </em>it from publishers, who normally ask their customers if they are okay with their information being shared with third-party marketers.  As the article points out, most people who click &#8220;yes&#8221; assume that means they&#8217;ll get emails from third-party marketers.  They don&#8217;t assume that the publishers will sell IP logs to a third-party targeting company.  ClearSight Interactive promises that if you choose to opt-out later, the company will update its records and remove you from its databases.  To which, all I can say is, if you&#8217;re so sure that people have actively chosen to allow you to have this information, why not build your business around asking them to opt-in?</p>
<p>On some level, Alison is clearly aware privacy could impact his company.  He <a href="http://www.mediapost.com/publications/?fa=Articles.showArticle&amp;art_aid=123280">writes</a>, &#8220;At ClearSight we take privacy matters very seriously,&#8221; and the article quotes him as saying they are waiting to see if Congress passes privacy legislation.  But if it&#8217;s true that &#8220;[a]ll our IP and zip data fall within the appropriate privacy provisions of our partners&#8221; and everything they&#8217;ve done is legal, well, that&#8217;s some of the strongest evidence I&#8217;ve heard in support of better privacy legislation.</p>
<img src="http://feeds.feedburner.com/~r/MyPlaceInTheCrowd/~4/uNDquk10mpc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.myplaceinthecrowd.org/2010/03/01/ip-addresses-for-targeted-advertising/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://blog.myplaceinthecrowd.org/2010/03/01/ip-addresses-for-targeted-advertising/</feedburner:origLink></item>
		<item>
		<title>In the mix</title>
		<link>http://feedproxy.google.com/~r/MyPlaceInTheCrowd/~3/AC_jJ_Y4Cxo/</link>
		<comments>http://blog.myplaceinthecrowd.org/2010/02/25/in-the-mix-22/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 18:34:32 +0000</pubDate>
		<dc:creator>Grace Meng</dc:creator>
				<category><![CDATA[Interesting Uses of Data]]></category>
		<category><![CDATA[Public Policy]]></category>

		<guid isPermaLink="false">http://blog.myplaceinthecrowd.org/?p=1068</guid>
		<description><![CDATA[1) Interesting story on NPR last week about a new study using cellphone data to track people&#8217;s movements.  It turns out they were able to predict the nearest cellphone tower 93% of the time and their actual locations 80% of the time.  The potential value to public policy is significant.  It could affect how we [...]]]></description>
			<content:encoded><![CDATA[<p>1) Interesting <a href="http://www.npr.org/templates/story/story.php?storyId=123879603&amp;ft=1&amp;f=3">story on NPR</a> last week about a new study using cellphone data to track people&#8217;s movements.  It turns out they were able to predict the nearest cellphone tower 93% of the time and their actual locations 80% of the time.  The potential value to public policy is significant.  It could affect how we put money into public transportation, for example.</p>
<p>Interestingly, though, no one mentioned any concerns about privacy, just a short statement that researchers don&#8217;t have names or numbers.  Seems like a perfect, obvious example of how that&#8217;s not sufficiently deidentifying, especially as the conclusion is that you can predict where people are.  Another researcher claims that he has data for half a million people and that &#8220;major carriers around the world are now starting to share data with scientists.&#8221;  What if we end up with another AOL scandal on our hands, and worse, the scandal keeps this kind of research from continuing?</p>
<p>2) The <a href="http://blog.okfn.org/">Open Knowledge Foundation</a> has launched <a href="http://blog.okfn.org/2010/02/19/launch-of-the-panton-principles-for-open-data-in-science/">a set of principles</a> for open data in science, in support of the idea that scientific data should be &#8220;freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain.&#8221;</p>
<p>We certainly support more data being openly and freely available, but we&#8217;re curious.  How will we deal with the rights of people who are in scientific studies?  I&#8217;m not a scientist &#8212; do most agreements to participate in studies anticipate this level of public availability?  And how can we standardize data to be more easily comparable?</p>
<p>3) It&#8217;s not enough to have data. We also need tools to visualize, analyze, and understand data, and more and more tools are available for just that purpose.  Here&#8217;s <a href="http://blog.sunlightfoundation.com/2010/02/19/services-resources-and-tools-for-mapping-data/">a long list of mapping tools</a> from the <a href="http://www.sunlightfoundation.com/">Sunlight Foundation</a>, <a href="http://github.com/sunlightlabs/clearmaps/">ClearMaps</a> from <a href="http://sunlightlabs.com/blog/2010/clearmaps-mapping-framework/">Sunlight Labs</a>, and <a href="http://www.getpivot.com/">Pivot</a>, a new way to combine large groups of similar items on the internet, from <a href="http://www.livelabs.com/AboutUs.html">Microsoft Live Labs</a>.</p>
<img src="http://feeds.feedburner.com/~r/MyPlaceInTheCrowd/~4/AC_jJ_Y4Cxo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.myplaceinthecrowd.org/2010/02/25/in-the-mix-22/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.myplaceinthecrowd.org/2010/02/25/in-the-mix-22/</feedburner:origLink></item>
		<item>
		<title>In the mix</title>
		<link>http://feedproxy.google.com/~r/MyPlaceInTheCrowd/~3/07g-FnJZMmw/</link>
		<comments>http://blog.myplaceinthecrowd.org/2010/02/17/in-the-mix-21/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 15:27:52 +0000</pubDate>
		<dc:creator>Grace Meng</dc:creator>
				<category><![CDATA[Interesting Uses of Data]]></category>
		<category><![CDATA[Protecting Privacy in Meaningful Ways]]></category>
		<category><![CDATA[Public Policy]]></category>

		<guid isPermaLink="false">http://blog.myplaceinthecrowd.org/?p=1063</guid>
		<description><![CDATA[1) A major study of children is having trouble finding volunteers.  A good exposition of how hard it is to set up a longitudinal study, which is why so many of our ideas about health are based on a very small number of studies.
2) The Sunlight Foundation has launched The Data Mine with the Center [...]]]></description>
			<content:encoded><![CDATA[<p>1) <a href="http://www.nytimes.com/2010/02/16/health/16child.html?hpw">A major study of children</a> is having trouble finding volunteers.  A good exposition of how hard it is to set up a longitudinal study, which is why so many of our ideas about health are based on a very small number of studies.</p>
<p>2) The <a href="http://sunlightfoundation.com/">Sunlight Foundation</a> has <a href="http://reporting.sunlightfoundation.com/2010/data-mine/">launched</a> <a href="http://www.publicintegrity.org/data_mine/">The Data Mine</a> with the <a href="http://www.publicintegrity.org/">Center for Public Integrity</a>, &#8220;to highlight inaccessible or poorly presented information from the federal government.&#8221;  On a related note, the Sunlight Foundation analyzed <a href="http://reporting.sunlightfoundation.com/2010/Onefifth/">why</a> the numbers of jobs reported by stimulus fund recipients differed from the number cited by President Obama in his State of the Union Speech.  A great reminder that the promise of data is not the same thing as access to good data.</p>
<p>3) <a href="http://www.kk.org/quantifiedself/2010/02/dan-meyers-quantified-year.php?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+QuantifiedSelf+%28The+Quantified+Self%29">Another person presenting his self-collected personal data</a>.  <a href="http://quantifiedself.com">Some people</a> love collecting and sharing information about themselves; <a href="http://ask.slashdot.org/story/10/02/16/1525257/Did-We-Lose-the-Privacy-War?from=rss&amp;utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Slashdot%2Fslashdot+%28Slashdot%29">others</a> are terrified of anything leaking out about themselves.  How do we make personal data useful and relevant to the people in between?</p>
<img src="http://feeds.feedburner.com/~r/MyPlaceInTheCrowd/~4/07g-FnJZMmw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.myplaceinthecrowd.org/2010/02/17/in-the-mix-21/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.myplaceinthecrowd.org/2010/02/17/in-the-mix-21/</feedburner:origLink></item>
		<item>
		<title>Would PINQ solve the problems with the Census data?</title>
		<link>http://feedproxy.google.com/~r/MyPlaceInTheCrowd/~3/rnSNAfkcOck/</link>
		<comments>http://blog.myplaceinthecrowd.org/2010/02/05/would-pinq-solve-the-problems-with-the-census-data/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 19:01:59 +0000</pubDate>
		<dc:creator>Grace Meng</dc:creator>
				<category><![CDATA[Interesting Uses of Data]]></category>
		<category><![CDATA[Protecting Privacy in Meaningful Ways]]></category>
		<category><![CDATA[census]]></category>
		<category><![CDATA[PINQ]]></category>
		<category><![CDATA[Privacy]]></category>

		<guid isPermaLink="false">http://blog.myplaceinthecrowd.org/?p=1054</guid>
		<description><![CDATA[Frank McSherry, the researcher behind PINQ, has responded to our earlier blog post about the problems found in certain Census datasets and how PINQ might deal with those problems.
Would PINQ solve the problems with the Census data?
No.  But it might help in the future.
The immediate problem facing the Census Bureau is that they want to [...]]]></description>
			<content:encoded><![CDATA[<p><em><a href="http://research.microsoft.com/en-us/people/mcsherry/">Frank McSherry</a>, the researcher behind <a href="http://blog.myplaceinthecrowd.org/2009/08/28/pinq-programmatic-privacy/">PINQ</a>, has responded to our <a href="http://blog.myplaceinthecrowd.org/2010/02/03/can-we-trust-census-data/">earlier blog post</a> about the <a href="http://freakonomics.blogs.nytimes.com/2010/02/02/can-you-trust-census-data/">problems</a> found in certain Census datasets and how PINQ might deal with those problems.</em></p>
<p><strong>Would PINQ solve the problems with the Census data?</strong></p>
<p><strong>No.  But it might help in the future.</strong></p>
<p>The immediate problem facing the Census Bureau is that they want to release a small sample of raw data, a Public Use Microdata Sample or PUMS, about 1/20 of the larger dataset they use for their own aggregates, that is supposed to be a statistical sample of the general population.  To release that data, the Bureau has to protect the confidentiality of people in the PUMS, and they do so, in part, by manipulating the data.  Some of their efforts, though, seem to have altered the data so seriously that it no longer accurately reflects the general population.</p>
<p>PINQ would not solve the immediate problem of allowing the Census Bureau to release a 1/20 sample of their data.  PINQ only allows researchers to query for aggregates.</p>
<p>However, if Census data were released behind PINQ, the Bureau would not have to swap or synthesize data to protect privacy; PINQ would do that.  Presumably, if the danger of violating confidentiality were removed, the Census could release more than 1/20 sample of the data. Furthermore, unlike the Bureau’s disclosure avoidance procedures, PINQ is transparent in describing the range of noise that is being added.  Currently, the Bureau can’t even tell you what it did to protect privacy without potentially violating it.</p>
<p>The mechanism for accessing data through PINQ, of course, would be very different than what researchers are used to today.  Now, with raw data, researchers like to “look at the data” and “fit a line to the data.”  A lot of these things can be approximated with PINQ, but most researchers reflexively pull back when asked to rethink how they approach data.  There are almost certainly research objectives that cannot be met with PINQ alone.  But the objectives that can be met should not be held back by the unavailability of high quality statistical information. Researchers able to express how and why their analyses respect privacy should be rewarded with good data, incentivizing creative rethinking of research processes.</p>
<p>With this research <a href="http://bpp.wharton.upenn.edu/betseys/papers/Inaccurate%20Age%20and%20Sex%20Data%20in%20Census%20PUMS%20Files.pdf">published</a>, it may be easier to argue that<strong> the choice between PUMS (and other microdata) and PINQ is not between raw data/noisy aggregates, but rather bad data/noisy aggregates. </strong>If and when it becomes a choice between these two, any serious scientist would reject bad data and accept noisy aggregates.</p>
<img src="http://feeds.feedburner.com/~r/MyPlaceInTheCrowd/~4/rnSNAfkcOck" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.myplaceinthecrowd.org/2010/02/05/would-pinq-solve-the-problems-with-the-census-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://blog.myplaceinthecrowd.org/2010/02/05/would-pinq-solve-the-problems-with-the-census-data/</feedburner:origLink></item>
		<item>
		<title>Is Public the new Private?</title>
		<link>http://feedproxy.google.com/~r/MyPlaceInTheCrowd/~3/Qsl2qkbUXsU/</link>
		<comments>http://blog.myplaceinthecrowd.org/2010/02/03/is-public-the-new-private/#comments</comments>
		<pubDate>Thu, 04 Feb 2010 03:48:54 +0000</pubDate>
		<dc:creator>Mimi Yin</dc:creator>
				<category><![CDATA[Public Policy]]></category>
		<category><![CDATA[Publicy]]></category>

		<guid isPermaLink="false">http://blog.myplaceinthecrowd.org/?p=1039</guid>
		<description><![CDATA[Publicy (Publi[c] + [Priva]cy)
When the public, not the private, is the default. 
In a world where so much more is out in the public, will people just stop worrying about privacy completely? Maybe in another five years, people simply won&#8217;t care if their names and addresses come up when someone searches for &#8220;people who have [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p><a href="http://schott.blogs.nytimes.com/2010/02/03/publicy/">Publicy</a> (Publi[c] + [Priva]cy)<br />
When the public, not the private, is the default. </p></blockquote>
<p>In a world where so much more is out in the public, will people just stop worrying about privacy completely? Maybe in another five years, people simply won&#8217;t care if their names and addresses come up when someone searches for &#8220;people who have STDs&#8221; or &#8220;people who are 40 year-old virgins.&#8221;</p>
<p><img alt="40 Year-Old Virgin" src="http://upload.wikimedia.org/wikipedia/en/4/43/40-Year-OldVirginMoviePoster.jpg" title="40 Year-Old Virgin" class="alignnone" width="150"/> </p>
<p>For some of us, that&#8217;s hard to believe.</p>
<p>But I wonder if even for the people who are &#8220;most public&#8221; about their lives, the end of all privacy is equally scary. After all, aren&#8217;t Twitter, MySpace and Facebook simply opportunities for all of us to craft public personas we want others to see? Which implicitly includes controlling what people don&#8217;t see.</p>
<p>At the end of the day, the line between public and private has to do with control. Just because we&#8217;re now all sharing volumes more than we used to, doesn&#8217;t mean that we&#8217;re any more willing to share the skeletons in our closets. </p>
<img src="http://feeds.feedburner.com/~r/MyPlaceInTheCrowd/~4/Qsl2qkbUXsU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.myplaceinthecrowd.org/2010/02/03/is-public-the-new-private/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.myplaceinthecrowd.org/2010/02/03/is-public-the-new-private/</feedburner:origLink></item>
		<item>
		<title>Can we trust Census data?</title>
		<link>http://feedproxy.google.com/~r/MyPlaceInTheCrowd/~3/xfHMTNwqo-A/</link>
		<comments>http://blog.myplaceinthecrowd.org/2010/02/03/can-we-trust-census-data/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 19:53:09 +0000</pubDate>
		<dc:creator>Grace Meng</dc:creator>
				<category><![CDATA[Interesting Uses of Data]]></category>
		<category><![CDATA[Protecting Privacy in Meaningful Ways]]></category>
		<category><![CDATA[Public Policy]]></category>
		<category><![CDATA[census]]></category>
		<category><![CDATA[government data]]></category>
		<category><![CDATA[Privacy]]></category>

		<guid isPermaLink="false">http://blog.myplaceinthecrowd.org/?p=1031</guid>
		<description><![CDATA[
Yesterday, the Freakanomics blog at the New York Times reported that a group of researchers had discovered serious errors in PUMS (public-use microdata samples) files released by the U.S. Census Bureau.  When compared to aggregate data released by the Census, the PUMS files revealed up to 15% discrepancies for the 65-and-older population.  As Justin Wolfers [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://blog.myplaceinthecrowd.org/wp-content/uploads/2010/02/PUMS.png"><img class="aligncenter size-full wp-image-1032" title="PUMS" src="http://blog.myplaceinthecrowd.org/wp-content/uploads/2010/02/PUMS.png" alt="" width="600" /></a></p>
<p>Yesterday, the <a href="http://freakonomics.blogs.nytimes.com/">Freakanomics blog</a> at the New York Times <a href="http://freakonomics.blogs.nytimes.com/2010/02/02/can-you-trust-census-data/">reported</a> that a group of researchers had <a href="http://bpp.wharton.upenn.edu/betseys/papers/Inaccurate%20Age%20and%20Sex%20Data%20in%20Census%20PUMS%20Files.pdf">discovered</a> serious errors in PUMS (public-use microdata samples) files released by the <a href="http://www.census.gov/">U.S. Census Bureau</a>.  When compared to aggregate data released by the Census, the PUMS files revealed up to 15% discrepancies for the 65-and-older population.  As Justin Wolfers <a href="http://freakonomics.blogs.nytimes.com/2010/02/02/can-you-trust-census-data/">explains</a>, PUMS files are small samples of the much larger, confidential data used by the Census for the general statistics it releases. These samples are crucial to researchers and policymakers looking to measure trends that the Census itself has not calculated.</p>
<p>When I read this, the first thought I had was, “Hallelujah!”  Not because I felt gleeful about the Census Bureau’s mistakes, but because this little post in the New York Times articulated something we’ve been trying to communicate for awhile: <strong>current methods of data collection (and especially data release) are <em>not</em> perfect.</strong></p>
<p>People love throwing around statistics, and increasingly people love debunking statistics, but that kind of scrutiny is normally directed at surveys conducted by people who are not statisticians.  Most people generally hear words like &#8220;statistical sampling&#8221; and &#8220;disclosure avoidance procedure&#8221; and assume that <em>those </em>people surely know what they&#8217;re doing.</p>
<p>But you don’t have to have training in statistics to read this <a href="http://bpp.wharton.upenn.edu/betseys/papers/Inaccurate%20Age%20and%20Sex%20Data%20in%20Census%20PUMS%20Files.pdf">paper</a> and understand what happened. The Census Bureau, unlike <a href="http://blog.myplaceinthecrowd.org/2008/09/09/google-announces-data-will-be-anonymized-after-nine-months-but-then-what/">many organizations and businesses that claim to &#8220;anonymize&#8221; datasets</a>, knows that individual identities cannot be kept confidential simply by removing &#8220;identifiers&#8221; like name and address, which is why they use techniques like <a href="http://www.census.gov/acs/www/Products/PUMS/pumsaccuracy_archived.html">&#8220;data swapping&#8221; and &#8220;synthetic data.”</a> It doesn’t take a mathematician to understand that when you’re making up data, you might have trouble maintaining the accuracy of the overall microdata sample.</p>
<p>To the Bureau’s credit, it does <a href="http://www.census.gov/acs/www/Products/PUMS/pumsaccuracy_archived.html">acknowledge</a> where inaccuracies exist.  But as the researchers found, the Bureau is unwilling to correct its mistakes because doing so could reveal how they altered the data in the first place and thus compromise someone’s identity.  Which gets to the heart of the problem:</p>
<blockquote><p>Newer techniques, such as swapping or blanking, retain detail <em>and </em>provide better protection of respondents&#8217; confidentiality. However, the effects of the new techniques are less transparent to data users and mistakes can easily be overlooked.</p></blockquote>
<p>The problems with current methods of data collection aren’t limited to the Census PUMS files either.  The weaknesses outlined by <a href="http://freakonomics.blogs.nytimes.com/2010/02/02/can-you-trust-census-data/#comment-536831">this former employee</a> could apply to so many organizations.</p>
<p>This is <a href="http://www.commondataproject.org">why we have to work</a> on new ways to collect, analyze, and release sensitive data.</p>
<img src="http://feeds.feedburner.com/~r/MyPlaceInTheCrowd/~4/xfHMTNwqo-A" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.myplaceinthecrowd.org/2010/02/03/can-we-trust-census-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blog.myplaceinthecrowd.org/2010/02/03/can-we-trust-census-data/</feedburner:origLink></item>
	</channel>
</rss>
