<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>The Space-Times Blog</title>
	
	<link>http://www.spacetimeresearch.com/blog.html</link>
	<description>This blog is about what we are up to and thinking about at Space-Time Research. We are passionate about data transparency and public intelligence, quality, a fantastic customer experience, and better software.</description>
	<pubDate>Wed, 01 Feb 2012 00:58:28 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/spacetimeresearch/jkgK" /><feedburner:info uri="spacetimeresearch/jkgk" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>spacetimeresearch/jkgK</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>5 Links about Privacy Protection in Official Statistics</title>
		<link>http://feedproxy.google.com/~r/spacetimeresearch/jkgK/~3/AWao9Zbc80A/280.html</link>
		<comments>http://www.spacetimeresearch.com/archives/280.html#comments</comments>
		<pubDate>Fri, 27 Jan 2012 22:32:20 +0000</pubDate>
		<dc:creator>Don McIntosh</dc:creator>
		
		<category><![CDATA[Official Statistics]]></category>

		<category><![CDATA[Privacy protection]]></category>

		<category><![CDATA[anonymization]]></category>

		<category><![CDATA[confidentiality]]></category>

		<category><![CDATA[privacy]]></category>

		<guid isPermaLink="false">http://www.spacetimeresearch.com/archives/280.html</guid>
		<description><![CDATA[According to the official site for Data Privacy Day in the US, it is intended to promote “awareness about the many ways personal information is collected, stored, used, and shared, and education about privacy practices that will enable individuals to protect their personal information.” In the spirit of this, here are a few useful links [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.spacetimeresearch.com/images/wordpress/uploads/2012/01/dpdeng.png"><img class="alignright size-full wp-image-287" title="dpdeng" src="http://www.spacetimeresearch.com/images/wordpress/uploads/2012/01/dpdeng.png" alt="dpdeng" width="288" height="288" /></a>According to the official site for <a href="http://www.staysafeonline.org/dpd">Data Privacy Day</a> in the US, it is intended to promote “awareness about the many ways personal information is collected, stored, used, and shared, and education about privacy practices that will enable individuals to protect their personal information.” In the spirit of this, here are a few useful links to help people learn more about protecting privacy in official statistics.</p>
<p>Privacy protection as it relates to official statistics is known as “Statistical Disclosure Control”, or simply “Confidentiality”. It’s all about protecting confidential information about specific individuals while still making sure that we can maximize the usefulness and accessibility of government data.</p>
<ol>
<li><a href="http://www.spacetimeresearch.com/archives/187.html">This post</a> deposes the fairly widespread view that anonymizing data by removing names, addresses and such makes it safe to publish with no threat to people’s privacy.</li>
<li>Our <a href="http://www.spacetimeresearch.com/protecting-privacy.html">page and link to white paper</a> co-authored with <a href="http://www.symbolix.com.au/dataproviders/">our partner Symbolix</a> about safe dissemination through the use of statistical disclosure control</li>
<li><a href="http://www.nss.gov.au/nss/home.NSF/pages/Confidentiality+Information+Sheets">Confidentiality Information Sheets</a> from the Australian Bureau of Statistics.</li>
<li>A comprehensive set of <a href="http://www.staysafeonline.org/dpd/government-created-resources">Government-Created Resources</a> related to privacy put together by the organizers of Data Privacy Day.</li>
<li>The US National Institute of Standards and Technology’s <a href="http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf">Guide to Protecting the Confidentiality of Personally Identifiable Information (PII)</a> –a fairly chunky PDF but worth a look if you are in the business of making government data available.</li>
</ol>
<p>Happy Data Privacy Day!</p>
<img src="http://feeds.feedburner.com/~r/spacetimeresearch/jkgK/~4/AWao9Zbc80A" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spacetimeresearch.com/archives/280/feed.html</wfw:commentRss>
		<feedburner:origLink>http://www.spacetimeresearch.com/archives/280.html</feedburner:origLink></item>
		<item>
		<title>SuperTABLE Survey Results</title>
		<link>http://feedproxy.google.com/~r/spacetimeresearch/jkgK/~3/HIt0dM8qnok/274.html</link>
		<comments>http://www.spacetimeresearch.com/archives/274.html#comments</comments>
		<pubDate>Sun, 30 Jan 2011 23:09:44 +0000</pubDate>
		<dc:creator>Don McIntosh</dc:creator>
		
		<category><![CDATA[Gov2.0]]></category>

		<category><![CDATA[SuperSTAR]]></category>

		<category><![CDATA[SuperWEB]]></category>

		<category><![CDATA[SuperTABLE]]></category>

		<guid isPermaLink="false">http://www.spacetimeresearch.com/archives/274.html</guid>
		<description><![CDATA[Results from the SuperTABLE survey are in. We had a fantastic response and I thank all of you who took the time to give us feedback. Here&#8217;s a set of quick links if you want to jump down to what interests you:  

SuperTABLE Ratings - what people think of SuperTABLE and what they said
Use of [...]]]></description>
			<content:encoded><![CDATA[<p>Results from the SuperTABLE survey are in. We had a fantastic response and I thank all of you who took the time to give us feedback. Here&#8217;s a set of quick links if you want to jump down to what interests you:  </p>
<ul>
<li><a href="#ratings">SuperTABLE Ratings</a> - what people think of SuperTABLE and what they said</li>
<li><a href="#features">Use of SuperTABLE Features</a> - most popular features and segments of users</li>
<li><a href="#usability">Usability and training</a> - what we&#8217;ll do to improve this</li>
<li><a href="#online">SuperTABLE Online</a> - improvements to the online version of SuperTABLE</li>
<li><a href="#next">What&#8217;s next</a> - accessing our survey stats online, voting and sharing ideas</li>
</ul>
<p>  <strong><a name="ratings"></a>SuperTABLE Ratings</strong></p>
<p>The chart below shows that we have over 70% of users who feel that SuperTABLE is &#8220;fine&#8221; or even &#8220;super&#8221;.      </p>
<p> <a href="http://www.spacetimeresearch.com/images/wordpress/uploads/2011/01/supertableratings1.jpg"><img class="aligncenter size-full wp-image-265" title="supertableratings1" src="http://www.spacetimeresearch.com/images/wordpress/uploads/2011/01/supertableratings1.jpg" alt="supertableratings1" width="600" height="495" /></a></p>
<p>We&#8217;re very interested in learning how to get everyone more satisfied with their experience and so thanks in particular to the quarter of the respondents who took the time to add additional comments with this response. Here is a summary of what those commenters had to say:</p>
<ul>
<li>53% referred to usability issues: either they required some training, or felt the product was not intuitive.</li>
<li>13% said they were happy; they liked the software and found it easy to use</li>
<li>8% had installer issues or setup issues.</li>
<li>4% had issues with the content, such as inconsistencies, lack of metadata and problems finding what they needed.</li>
</ul>
<p><strong><a name="features"></a>Use of SuperTABLE Features </strong></p>
<p>The following chart shows the level of use for each feature.    </p>
<p><a href="http://www.spacetimeresearch.com/images/wordpress/uploads/2011/01/supertablefeatures.jpg"><img class="aligncenter size-full wp-image-266" title="supertablefeatures" src="http://www.spacetimeresearch.com/images/wordpress/uploads/2011/01/supertablefeatures.jpg" alt="supertablefeatures" width="800" height="600" /></a></p>
<p>From some follow-up analysis, we identified 3 groups:</p>
<ul>
<li>A third simply build a table (or use the default table) and export to Excel</li>
<li>A third choose a slice of data using mostly recoding and sorting to identify the stats of interest</li>
<li>The remaining third use most of the features that SuperTABLE offers</li>
</ul>
<p>Some users commented that they were unaware of some of the features listed and may have use for them if they understood how to use the product better. We&#8217;ll look into how we can make the features more accessible.  </p>
<p><strong><a name="usability"></a>Usability and training</strong></p>
<p>Clearly from the comments, usability and training is an area where changes would really help. What we plan to do is put up some videos and see how that helps as well as update the <a title="SuperTABLE FAQ" href="http://www.spacetimeresearch.com/component/option,com_moofaq/Itemid,226/id,5/view,categories/">FAQ</a> (if you haven&#8217;t already visited it, you may find what you&#8217;re after is already there now) and create an open space for discussion and sharing.  </p>
<p><strong><a name="online"></a>SuperTABLE Online</strong></p>
<p>As many of you are aware, we already have an online equivalent to SuperTABLE called <a title="superweb" href="http://www.spacetimeresearch.com/superweb.html">SuperWEB</a>. The most popular public deployment of this product is on the ABS website for <a title="cdata" href="http://www.abs.gov.au/cdataonline">CDATA Online</a> or <a title="tablebuilder" href="http://www.abs.gov.au/tablebuilder">Table Builder</a>. We&#8217;ll continue to work with ABS and other government departments this year to help make new datasets available through these interactive online tools. The results from the survey, particularly the rating of online features and the constructive comments, are very helpful in shaping our product roadmap.  </p>
<p><strong><a name="next"></a>What&#8217;s next?</strong></p>
<p>The survey results confirmed that we have a number of passionate and interested users. So, we would like to continue the conversation beyond the survey and invite you to participate in other ways. We&#8217;ll be creating a forum for you to submit and share your own ideas with other users and vote on what you think the best ones are.  </p>
<p>What else would you like to see us do? We have targeted this survey deliberately at SuperTABLE users, but perhaps if you work for a provider, you may want to comment about what is important for your organization for providing easy access to official statistics. In either case, please do feel free to share your thoughts.</p>
<img src="http://feeds.feedburner.com/~r/spacetimeresearch/jkgK/~4/HIt0dM8qnok" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spacetimeresearch.com/archives/274/feed.html</wfw:commentRss>
		<feedburner:origLink>http://www.spacetimeresearch.com/archives/274.html</feedburner:origLink></item>
		<item>
		<title>Crowd Sourcing, Twitter, and Trust - Natstats 2010</title>
		<link>http://feedproxy.google.com/~r/spacetimeresearch/jkgK/~3/cT9egKTsiUI/253.html</link>
		<comments>http://www.spacetimeresearch.com/archives/253.html#comments</comments>
		<pubDate>Thu, 16 Sep 2010 15:05:23 +0000</pubDate>
		<dc:creator>Don McIntosh</dc:creator>
		
		<category><![CDATA[General]]></category>

		<category><![CDATA[ABS]]></category>

		<category><![CDATA[events]]></category>

		<category><![CDATA[natstats]]></category>

		<guid isPermaLink="false">http://www.spacetimeresearch.com/archives/253.html</guid>
		<description><![CDATA[Natstats opened on Wednesday evening and had its first full day today. For those who aren’t aware, it’s a conference all about statistics organized by the Australian Bureau of Statistics, with the theme of this event being around “Measuring what counts: economic development, wellbeing and progress in 21st century Australia”. It’s only the second time [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.abs.gov.au/blog/natstats.nsf">Natstats</a> opened on Wednesday evening and had its first full day today. For those who aren’t aware, it’s a conference all about statistics organized by the Australian Bureau of Statistics, with the theme of this event being around “Measuring what counts: economic development, wellbeing and progress in 21st century Australia”. It’s only the second time it has been staged but judging by the success thus far and quality of speakers, certainly not the last. I’ll leave official reports of talks etc to others but I would like to share some stories from people I’ve had the pleasure of talking with during the conference today. I invite attendees to add their own stories in the comments section.</p>
<p><strong>Landlines</strong></p>
<p>Here’s an odd one to start with. Associate Professor Warren Laffen works in the Institute for Social Science Research, which is part of the University of Queensland. He is involved in a huge range of research projects and one in particular caught my interest. It’s a study they only recently commenced comparing profiles of people who have only mobile phones to those who use only landlines. Why would you want to do that? Well, apparently some phone surveys only use landline numbers and the idea behind this research is to find out how the results of such surveys might be biased because of this. It struck me as rather obscure but at the same time, I could understand the purpose and value in running such a study. It’s the same with many stats collected – they may not be of value to us all but for some, they are very important and meaningful. I formed a picture in my mind of the “landliners”: homely elderly folk sitting around knitting and never having churned from Telstra to another provider, let alone considered moving to a mobile. I was a little surprised when I sat down for the Natstats dinner this evening and discovered that my 20-something year old neighbor and her husband, both with white collar jobs, living in Hobart, had only one landline to share at home and no mobiles.</p>
<p><strong>Wikiprogress Crowd Sourcing</strong></p>
<p>Philippa Lysaght from OECD introduced me to an intriguing idea for a statistical Web site in <a href="http://www.wikiprogress.org/">Wikiprogress</a>. Partially funded by OECD, but also from many independent supporters, the idea behind this innovative site is to measure progress of societies from around the world. Read more about what it’s all about <a href="http://www.wikiprogress.org/index.php/Wikiprogress.org:About">here</a>. Being a wiki, it gathers statistics from any members of “the community” who choose to contribute. There are various indicators based on official stats on the site, as well as data created and shared by individual researchers, academics and the like. An obvious question statisticians would have about this is how the quality of the data is ascertained if everyone has the freedom to contribute. Well, that’s part of the challenge for Wikiprogress and something that Philippa said they are working hard to keep on top of. They certainly have plenty of officially sourced statistics that have been contributed, and working out reasonable ways of accepting and presenting statistics from related (or not) communities is something that they are managing so far. Perhaps they might help us to find a middle path that can help reconcile the gap between official and community statistics (see the <a href="http://www.spacetimeresearch.com/archives/140.html">earlier post</a> about community koala data vs official sources), especially given there are so many things we want to count and only so many statistical organizations around to do the work.</p>
<p><strong>Twitter</strong></p>
<p>Another interesting thing to see was the level of Twitter use at the conference. Jeanette Cotterill, who was the lead person at ABS responsible for organizing the event (well done, Jeanette and team!!), explained that there had been some concerns about making Twitter an official part of Natstats communications, but the decision was taken that it was a good opportunity to engage with people and indeed, so far there has been some <a href="http://twitter.com/#search?q=%23natstats2010">happy Tweeting</a> from a number of participants, as well as from the official <a href="http://twitter.com/NatStats2010">@Natstats2010</a> Twitterer. Seeing as it’s a stats conference, I’d be remiss if I didn’t note something about Twitter use in stats: there were 10 unique people on Twitter on this first day tweeting about Natstats, out of an overall attendance of around 500. Maybe someone can work out how that profile compares to the proportion of Twitter users in the population at large: are Natstats attendees more or less likely to use Twitter than other folk?</p>
<p><strong>Trust</strong></p>
<p>I’ll close with another statistic, this one from the ABS chief statistician, <a href="http://www.abs.gov.au/websitedbs/d3310114.nsf/4a256353001af3ed4b2562bb00121564/82e3e8035097724aca25729e007c18de!OpenDocument">Brian Pink</a>. He noted that 92% of the Australian public trust official statistics. Oddly enough, that stat doesn’t come from the ABS but from an independent survey that the ABS has commissioned to find out more about public opinion around official statistics. You’ll have to wait for <a href="http://unstats.un.org/unsd/wsd/">World Statistics Day</a> on October 20 to find out more about the results.</p>
<p>Well, thanks so much to the many people who attended Natstats and have made it a thoroughly enjoyable experience for myself and my colleague Mark Humphreys and no doubt many others. Please do feel free to add you own sentiments or stories about Natstats in the comments. I wish you well on day two and look forward to speaking with many of you at our booth (and if you mention this post, you’ll get an extra lolly of your choice).</p>
<img src="http://feeds.feedburner.com/~r/spacetimeresearch/jkgK/~4/cT9egKTsiUI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spacetimeresearch.com/archives/253/feed.html</wfw:commentRss>
		<feedburner:origLink>http://www.spacetimeresearch.com/archives/253.html</feedburner:origLink></item>
		<item>
		<title>Exploring Not-So-Open Data</title>
		<link>http://feedproxy.google.com/~r/spacetimeresearch/jkgK/~3/PR0igPs4knY/245.html</link>
		<comments>http://www.spacetimeresearch.com/archives/245.html#comments</comments>
		<pubDate>Wed, 25 Aug 2010 05:12:39 +0000</pubDate>
		<dc:creator>Don McIntosh</dc:creator>
		
		<category><![CDATA[Gov2.0]]></category>

		<category><![CDATA[Privacy protection]]></category>

		<category><![CDATA[opendata]]></category>

		<guid isPermaLink="false">http://www.spacetimeresearch.com/archives/245.html</guid>
		<description><![CDATA[The challenges researchers face in accessing sensitive government data.]]></description>
			<content:encoded><![CDATA[<p class="Style-1" style="line-height: 115%; margin: 0cm 0cm 0pt;"><span style="line-height: 115%; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11pt; mso-fareast-font-family: Arial;" lang="EN-US">Sensitive government data is any data that could be used to infer personally identifiable information. It’s a term that is readily applicable to a very large chunk of government data. As part of research into solutions for making analysis of sensitive government data more possible, I have talked to several researchers of late. How do they find and analyze such data and what are the major challenges in doing so? In terms of whom a researcher is, my sample so far is biased toward those employed by national science organizations and consultants specializing in high end analytics. Emerging from this informal survey is a consistent pattern in the challenges researchers are facing in the world of not-so-open data. </span></p>
<p class="Style-1" style="line-height: 115%; margin: 0cm 0cm 0pt;"> </p>
<p class="Style-1" style="line-height: 115%; margin: 0cm 0cm 0pt;"> </p>
<p class="Style-1" style="line-height: 115%; margin: 0cm 0cm 0pt;"><span style="line-height: 115%; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11pt; mso-fareast-font-family: Arial;" lang="EN-US">First, there are problems that are common to all government data research, not just sensitive stuff. Very often, researchers come across data that <em>may</em> be useful to them. That leads to two challenges. First, the data is often poorly documented, which leads to guess work, assumptions, and blind alleys. Second, researchers need to perform exploratory data analysis (EDA) to find out whether the data is useful, and what kind of hypotheses might be interesting to investigate. As a minimum, this typically requires download of a dataset, some transformations, loading it up in a suitable tool and finally, checking if it there is anything of potential interest there. That can be a time-consuming process. In some cases, the data isn’t freely available, which in the best case scenario leads to emails and spreadsheets going backwards and forth, and in the worst case, leads to a dead end. And when the EDA is complete it may not be of any use after all. On to the next best guess!</span></p>
<p class="Style-1" style="line-height: 115%; margin: 0cm 0cm 0pt;"><span style="line-height: 115%; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11pt; mso-fareast-font-family: Arial;" lang="EN-US"><span style="mso-spacerun: yes;"> </span></span></p>
<p class="Style-1" style="line-height: 115%; margin: 0cm 0cm 0pt;"><span style="line-height: 115%; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11pt; mso-fareast-font-family: Arial;" lang="EN-US">So far, that’s no different from analyzing open government data. However, an additional roadblock faced by those looking at health data for example, where personal privacy is a critical issue, is getting approval to access such data. That’s an administrative headache that can dwarf the technical challenges of performing any initial exploration. For example, an Australian researcher told about how he needed to complete an ethics application which has hundreds of questions on it. Given that this is before they learn whether the data is of any use, how many times does a researcher choose to simply walk away empty handed? Many research projects these days have very fast turn-around times, so this kind of approval process is really just another way of saying that the data is not available. We know there is a cost in terms of privacy violations if sensitive information is disclosed, but what is the cost of researchers not having access to valuable data sources that may contain vital statistical information that can lead to better policy and insights on a whole range of issues such as healthcare, social security, taxation etc?</span></p>
<p class="Style-1" style="line-height: 115%; margin: 0cm 0cm 0pt;"><span style="line-height: 115%; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11pt; mso-fareast-font-family: Arial;" lang="EN-US"> </span></p>
<p class="Style-1" style="line-height: 115%; margin: 0cm 0cm 0pt;"><span style="line-height: 115%; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11pt; mso-fareast-font-family: Arial;" lang="EN-US">I got this feedback from a small and rather biased sample. I’d really like to learn more about what kinds of EDA people need to do and the challenges in getting it done, especially where privacy issues block people from getting the information they need. There is a lot of research containing sensitive personal information, and much of it has a very high reuse value, so this is an important issue for the successful sharing of data for research. What&#8217;s your experience? Have you had to jump through hoops, or travel to a research data laboratory to analyze sensitive government data? Could it have been simpler?</span></p>
<img src="http://feeds.feedburner.com/~r/spacetimeresearch/jkgK/~4/PR0igPs4knY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spacetimeresearch.com/archives/245/feed.html</wfw:commentRss>
		<feedburner:origLink>http://www.spacetimeresearch.com/archives/245.html</feedburner:origLink></item>
		<item>
		<title>Finding Harmony in Open Data</title>
		<link>http://feedproxy.google.com/~r/spacetimeresearch/jkgK/~3/eQ6bIKANxZw/232.html</link>
		<comments>http://www.spacetimeresearch.com/archives/232.html#comments</comments>
		<pubDate>Thu, 05 Aug 2010 22:22:41 +0000</pubDate>
		<dc:creator>Don McIntosh</dc:creator>
		
		<category><![CDATA[Gov2.0]]></category>

		<category><![CDATA[Quality]]></category>

		<category><![CDATA[SDMX/DDI]]></category>

		<category><![CDATA[metadata]]></category>

		<category><![CDATA[SDMX metadata opendata harmonization]]></category>

		<guid isPermaLink="false">http://www.spacetimeresearch.com/archives/232.html</guid>
		<description><![CDATA[To some people, “harmony” means a lot more than something that sounds nice. According to Wikipedia, the term derives from the Greek ??????? (harmonía), meaning "joint, agreement, concord", from the verb ?????? (harmozo), "to fit together, to join". That’s an excellent definition for its use in the context of statistics, where statistical harmonization is very much about working out how to fit data together. Having just visited several European statistical offices and similar organizations, it’s very clear to me that it’s always a challenge to harmonize anything but the most trivial statistics. But before organizations reach this kind of data sharing nirvana, even a few incremental improvements, such as adopting rich open data standards, can help to improve the quality and reliability of shared data. 
]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal" style="margin: 0cm 0cm 10pt;"><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11.5pt;">“Harmony” can mean a lot more than something that sounds nice. </span><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; font-size: 11.5pt;"><a href="http://en.wikipedia.org/wiki/Harmony">Wikipedia</a><span style="color: black;"> suggests the term derives from the Greek <em>h</em></span></span><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11.5pt;"><em>armonía</em>, meaning &#8220;joint, agreement, concord&#8221;, from the verb </span><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11.5pt;"><em>harmozo</em>, &#8220;to fit together, to join&#8221;. That’s an excellent definition for the context of statistics, where statistical </span><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; font-size: 11.5pt;"><a href="http://ccsg.isr.umich.edu/harmonization.cfm">harmonization</a><span style="color: black;"> is about how to fit data together. Having just visited several European statistical offices and similar organizations, it’s clear to me that it’s a challenge to harmonize anything but the most trivial statistics. </span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 10pt;"><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11.5pt;">The quality of data can influence the effectiveness of government and the quality of our lives. What schools have classrooms that are substandard? What areas have the highest vaccination rates? What hospitals have the longest waiting lists? The answers can be fairly straight forward, and accurate responses can greatly improve government services. On the other hand, unfortunately for those responsible for providing the answers, the statistics can be difficult to put together, and it is only after harmonization that many statistics can become a useful foundation for good decision making. </span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 10pt;"><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11.5pt;">Data harmony is what is needed in order to develop, use, and share interesting and meaningful statistics.<span style="mso-spacerun: yes;">  </span>Before organizations reach this kind of data sharing nirvana, a few incremental improvements, such as adopting </span><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; font-size: 11.5pt;"><a href="http://www.opendatafoundation.org/">rich open data standards</a><span style="color: black;">, can improve the quality and reliability of shared data.</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 10pt;"><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11.5pt;">Statistics are derived by collating data from multiple sources that first need to be made comparable. For example, in Germany the first population census in over 20 years will be based on administrative registers owned by regional and municipal statistical offices throughout the country. The registers are maintained independently and don’t all necessarily define citizens in the same way. The data needs to be </span><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; font-size: 11.5pt;">harmonized<span style="color: black;"> into a coherent whole. That means that the census project needs to put together datasets from hundreds of disparate sources. Those familiar with fitting just two or three datasets together will have a feel for how painful this can be.</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 10pt;"><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11.5pt;">If integrators and providers agree on data standards up front, it avoids the huge challenge that the German census team will have to confront. Part of the </span><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; font-size: 11.5pt;"><a href="http://www.sdmx.org/">SDMX</a><span style="color: black;"> (an open data standard) model are specifically dedicated to rigorously defining such agreements. Success stories such as the </span><a href="http://www.jedh.org/">Joint External Debt Hub</a><span style="color: black;">, and the pilot for the </span><a href="http://www.slideshare.net/patruno/census-hub-project">European census hub</a><span style="color: black;">, back up the fact that it can work. These projects are built on top of agreements to adhere to certain data and metadata structure definitions (SDs), where each provider is responsible for harmonizing the data they contribute. The advantage of this scenario is that the resulting data has broader applicability, and multiple different consumers of the data are not required to perform the same harmonization work (often with varying degrees of quality). Unfortunately, getting agreement on SDs can take a very long time. It might be as little as a few days for a small, </span><a href="http://stats.oecd.org/glossary/detail.asp?ID=7029">bilateral exchange</a><span style="color: black;">, but it could also be years for a complex agreement between major organizations. </span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 10pt;"><span style="font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: black; font-size: 11.5pt; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-ansi-language: EN-AU; mso-fareast-language: EN-AU; mso-bidi-language: AR-SA;">The diagram below is an attempt to sketch out a path toward more mature practices for data exchange and show some of the benefits gained along the way. Does it make sense to you? And where does your organization fit: are your data exchange practices in harmony, or more of a clanging dissonance? Either way, it would be great to hear about experiences in the world of data exchange from different perspectives, and if/how you think SDMX or other open data standards might help. </span></p>
<p><span style="font-family: &quot;Trebuchet MS&quot;,&quot;sans-serif&quot;; color: black; font-size: 8pt; mso-bidi-font-family: 'Times New Roman'; mso-bidi-font-size: 10.5pt; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-AU; mso-fareast-language: EN-AU; mso-bidi-language: AR-SA;"><a href="http://www.spacetimeresearch.com/images/wordpress/uploads/2010/08/data-exchange-maturity1.jpg"><img class="aligncenter size-large wp-image-238" title="data-exchange-maturity1" src="http://www.spacetimeresearch.com/images/wordpress/uploads/2010/08/data-exchange-maturity1-1024x631.jpg" alt="data-exchange-maturity1" width="893" height="538" /></a></span></p>
<p> </p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 10pt"><span style="FONT-FAMILY: 'Arial','sans-serif'; COLOR: black; FONT-SIZE: 11.5pt">While comprehensive </span><span style="FONT-FAMILY: 'Arial','sans-serif'; FONT-SIZE: 11.5pt"><a href="http://stats.oecd.org/glossary/detail.asp?ID=7064">multilateral exchange</a><span style="COLOR: black"> agreements facilitated by SDMX may represent the ultimate in efficiency and data quality standards, a relatively simple </span><a href="http://stats.oecd.org/glossary/detail.asp?ID=7044">data sharing exchange</a><span style="COLOR: black"> can still deliver benefits without requiring agreement between all parties. All that is required is for participants to adhere to the SDMX technical standard - something that is relatively easy to do with the right tools. Such simple beginnings can also help organizations prepare for exchange agreements to be worked out over time. Small steps or large, it is a journey worth taking.</span></span></p>
<img src="http://feeds.feedburner.com/~r/spacetimeresearch/jkgK/~4/eQ6bIKANxZw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spacetimeresearch.com/archives/232/feed.html</wfw:commentRss>
		<feedburner:origLink>http://www.spacetimeresearch.com/archives/232.html</feedburner:origLink></item>
		<item>
		<title>SDMX Web Services</title>
		<link>http://feedproxy.google.com/~r/spacetimeresearch/jkgK/~3/9fxOxc-9x-g/213.html</link>
		<comments>http://www.spacetimeresearch.com/archives/213.html#comments</comments>
		<pubDate>Wed, 09 Jun 2010 04:38:56 +0000</pubDate>
		<dc:creator>Don McIntosh</dc:creator>
		
		<category><![CDATA[General]]></category>

		<category><![CDATA[Gov2.0]]></category>

		<category><![CDATA[Quality]]></category>

		<category><![CDATA[SDMX/DDI]]></category>

		<category><![CDATA[SuperSTAR]]></category>

		<category><![CDATA[SDMX]]></category>

		<category><![CDATA[Sharing]]></category>

		<guid isPermaLink="false">http://www.spacetimeresearch.com/archives/213.html</guid>
		<description><![CDATA[Recently, many of us at STR have been working on implementing open data formats, specifically SDMX 2.1 and DDI 3.1. Both are extremely relevant for statistical processing - DDI assumes the key position for planning, data collection, processing and microdata dissemination.  SDMX is most suited for processing and dissemination of aggregated data. Previous blog posts [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, many of us at STR have been working on implementing open data formats, specifically SDMX 2.1 and <a title="DDI 3.1" href="http://bit.ly/9WlrYr" target="_blank">DDI 3.1. </a>Both are extremely relevant for statistical processing - DDI assumes the key position for planning, data collection, processing and microdata dissemination.  <a href="http://www.sdmx.org" target="_blank">SDMX </a>is most suited for processing and dissemination of aggregated data. Previous blog posts and <a title="SDMX Release Announcement" href="http://bit.ly/ctuqDf" target="_blank">news items </a>have provided an overview of SDMX to inform our customers about how how SDMX might help them with their own business processes.  This blog post is all about what we are actually delivering with our  mid year SuperSTAR Release 7.0.  The following SDMX functionality will be included:</p>
<ol>
<li>SDMX output from SuperWEB</li>
<li>Building SDMX-driven SuperVIEW interactive presentations (with no SXV4 db required)</li>
<li>RESTful SDMX Web Services</li>
</ol>
<p>This blog focuses on the Web Services which is arguably the most important capability.  And perhaps the other reason I’m excited by it is because it is the first time that SDMX has been introduced directly to microdata.  I’ll explain what I mean by this a bit later.</p>
<p>From the point of view of many data providers, the advantage of the Web Services is that it can provide their customers with <strong>just the data they need, no more and no less</strong>. This can free up staff devoted to responding to ad hoc queries.</p>
<p>From the customer point of view, it opens up new possibilities for consuming the data and building unique, useful services on top of it. For example, a third party application can convert user responses from a Web app into dynamic SDMX queries and then the results from this can in turn be used to determine how the Web app should behave. Without Web Services, such an app would previously have relied on potentially stale data that was downloaded and loaded into a local database. And thanks to the detailed data model of SDMX, apps can also work out what other data sources might sensibly be combined together to produce richer, more useful results.</p>
<p>The other thing I’ll mention before getting into some specifics about what we’ve done is that our implementation is actually that of a <a title="Restful API Definition" href="http://en.wikipedia.org/wiki/Representational_State_Transfer" target="_blank">RESTful API</a>, not a “traditional” Web Service. We’re glad to see this becoming so much more popular now.  SDMX orginally only had standard SOAP based Web Services defined, but we’ve based our implementation on the proposed RESTful API for SDMX version 2.1.  As developers, a RESTful API is something we find a lot easier to start using, to explore, and to scale and we we think that our customers will find the same.</p>
<p><strong>What we’ve done</strong></p>
<p>The SDMX API that we are focused on can be broken into three logical chunks:</p>
<ol>
<li>Metadata Discovery - what data collections are available, and what concepts/classifications are used where</li>
<li>Database Metadata Discovery - What metadata (eg: concepts and code lists) are used within a particular SDMX dataset?</li>
<li>Queries - Defining and pulling back a slice of an SDMX data cube</li>
</ol>
<p>We’ve implemented parts 2 &amp; 3.  (Part 1 we will consider for a future version, but we are also looking at solving this gap in a different way, such as leveraging existing <a href="http://www.google.com/url?q=http%3A%2F%2Fstats.oecd.org%2Fglossary%2Fdetail.asp%3FID%3D7101&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNERTGtZKscXty-4MyIFN5NxMc2aTQ" target="_blank">SDMX registries</a>, which are used to collate and manage contents that are stored in SDMX repositories. The important thing to note here is that we don’t want SuperSTAR to be an island - many of the organisations we work with would want to reuse the same search and discovery mechanism across many different types of data and applications, so we’d like to learn more about how SDMX solutions can be part of such an environment before we proceed with this.)</p>
<p>Our SDMX Restful API supports access to aggregated data that is managed by SuperSTAR. This can be from several different sources:</p>
<ol>
<li>SuperSTAR data cubes</li>
<li>SuperSTAR tables defined by SuperWEB users</li>
<li>SuperSTAR microdata databases</li>
</ol>
<p>The last case is worth elaborating on, and links back to the point I mentioned earlier about introducing SDMX to microdata. Up until now, SDMX use has been limited to working with pre-aggregated data. This makes sense, especially when you consider the origins of SDMX, which is a group of organizations that deal almost solely with such aggregated statistical data and only rarely with the underlying microdata from which the statistics were derived.</p>
<p>From our point of view, however, and I believe from the point of view of many of our customers, dealing with microdata is very much part of the production process that they are involved in. What is useful about this is that the <strong>users </strong>are <strong>not constrained </strong>to taking<strong> slices of pre-defined cubes of data, </strong>but rather<strong> exploring </strong>and<strong> dynamically defining queries to </strong>run<strong> against the microdata. </strong> This approach can generate <strong>orders of magnitude more possible outputs </strong>and therefore relieve the provider from the burden of manually addressing many ad hoc queries that can’t be satisfied by a query against an existing cube. It does occasionally introduce other problems, namely confidentiality and performance, but these are part of our core capabilities, so our solution addresses potential drawbacks in this regard.</p>
<p>To make it possible to use an SDMX-based API to run tabulation queries against microdata, we’ve made some <strong>necessary innovations </strong>to the SDMX standard. Firstly, while you can query for the data structure definition (DSD) of a very large virtual cube (which is actually a SuperSTAR database), we prevent clients from requesting the full dataset for this cube - it’s simply going to be too big. What we do instead is allow for any subset of dimensions in the DSD to be combined in an SDMX query.</p>
<p>In addition, any tables that a user defines in SuperWEB can be accessed as SDMX datasets; both the DSD and the data from such a table can be obtained through queries against the SDMX RESTful API.</p>
<p>If you’ve read this whole post, you must be interested in what we are doing here. We think that the API can be very useful for many of our customers, so please leave a comment here if you have a question or something say. Or if you want to go one step further, let us know and we’ll discuss providing you with a test package that you can use to try the API against your own data.</p>
<img src="http://feeds.feedburner.com/~r/spacetimeresearch/jkgK/~4/9fxOxc-9x-g" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spacetimeresearch.com/archives/213/feed.html</wfw:commentRss>
		<feedburner:origLink>http://www.spacetimeresearch.com/archives/213.html</feedburner:origLink></item>
		<item>
		<title>Embracing Advanced Visualization - apps4NSW Comp entries</title>
		<link>http://feedproxy.google.com/~r/spacetimeresearch/jkgK/~3/_K7t6VjgQW4/209.html</link>
		<comments>http://www.spacetimeresearch.com/archives/209.html#comments</comments>
		<pubDate>Fri, 26 Mar 2010 05:46:15 +0000</pubDate>
		<dc:creator>Jo Deeker</dc:creator>
		
		<category><![CDATA[Gov2.0]]></category>

		<category><![CDATA[Mapping]]></category>

		<category><![CDATA[Mashup]]></category>

		<category><![CDATA[Privacy protection]]></category>

		<category><![CDATA[Quality]]></category>

		<category><![CDATA[SuperVIEW]]></category>

		<category><![CDATA[apps4NSW]]></category>

		<category><![CDATA[competition]]></category>

		<category><![CDATA[Open data]]></category>

		<category><![CDATA[transparency]]></category>

		<guid isPermaLink="false">http://www.spacetimeresearch.com/archives/209.html</guid>
		<description><![CDATA[Space-Time Research have developed two entries for the apps4NSW competition (for New South Wales, Australia) using SuperVIEW.  The apps4NSW competition, like the Mashup Australia and Apps For Democracy competitions, invited the public to submit ideas and applications that would benefit the citizens of New South Wales.
I&#8217;m excited about our two applications because they are genuinely [...]]]></description>
			<content:encoded><![CDATA[<p>Space-Time Research have developed two entries for the apps4NSW competition (for New South Wales, Australia) using SuperVIEW.  The apps4NSW competition, like the Mashup Australia and Apps For Democracy competitions, invited the public to submit ideas and applications that would benefit the citizens of New South Wales.</p>
<p>I&#8217;m excited about our two applications because they are genuinely useful online interactive publications of complex data that everyone will benefit from.  Our <a title="Why Australians Travel" href="http://bit.ly/aVH3xF" target="_blank">Why Australians Travel</a> application presents a dataset from Tourism Research Australia that has not been made available to the public in an interactive way before.  It also includes advanced visualization in the form of a Motion Chart (Gapminder-style) which we&#8217;re very excited by! The motion chart can tell a story with data over time that you simply don&#8217;t see in static tables or reports.</p>
<p>The <a title="How Safe Is Your Suburb 2-0" href="http://bit.ly/aVH3xF" target="_blank">How Safe Is Your Suburb 2.0 </a> application provides NSW Crime data in an interactive way, allowing users to analyse relative crime rates ot absolute crime rates by suburb.  This application is supported by one of our newest features - metadata -where explanations about the data are provided to the user to help them understand the meaning of the data.</p>
<p>Go check our applications out and vote for us if you like them!  And if you have any feedback on our entries please don&#8217;t hesitate to make a comment on our blog here.</p>
<img src="http://feeds.feedburner.com/~r/spacetimeresearch/jkgK/~4/_K7t6VjgQW4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spacetimeresearch.com/archives/209/feed.html</wfw:commentRss>
		<feedburner:origLink>http://www.spacetimeresearch.com/archives/209.html</feedburner:origLink></item>
		<item>
		<title>Gov 2.0 Radio Interview: The Future of Privacy</title>
		<link>http://feedproxy.google.com/~r/spacetimeresearch/jkgK/~3/mkRgGoJsPpA/203.html</link>
		<comments>http://www.spacetimeresearch.com/archives/203.html#comments</comments>
		<pubDate>Thu, 18 Mar 2010 05:05:56 +0000</pubDate>
		<dc:creator>Jo Deeker</dc:creator>
		
		<category><![CDATA[Gov2.0]]></category>

		<category><![CDATA[Privacy protection]]></category>

		<category><![CDATA[Gov 2.0]]></category>

		<category><![CDATA[Open data]]></category>

		<category><![CDATA[privacy]]></category>

		<guid isPermaLink="false">http://www.spacetimeresearch.com/archives/203.html</guid>
		<description><![CDATA[Don McIntosh was recently a guest on Gov 2.0 Radio discussing the future of Privacy and how it relates to data.
Said Don:
“Many people, especially Gen Y, have the view that privacy is not an issue for them and to quote Eric Schmidt, ‘If you have something that you don’t want anyone to know, maybe you [...]]]></description>
			<content:encoded><![CDATA[<p>Don McIntosh was recently a guest on Gov 2.0 Radio discussing the future of Privacy and how it relates to data.</p>
<p>Said Don:<br />
“Many people, especially Gen Y, have the view that privacy is not an issue for them and to quote Eric Schmidt, ‘If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.’ I much prefer the view of Bruce Schneier, who is pretty much the world’s leading expert in information security, who points out in an excellent essay very clearly that people espousing that view ‘… accept the premise that privacy is about hiding a wrong. It’s not. Privacy is an inherent human right, and a requirement for maintaining the human condition with dignity and respect.’”</p>
<p>Click <a href="http://bit.ly/daGPRc">here </a>to listen to the podcast.</p>
<img src="http://feeds.feedburner.com/~r/spacetimeresearch/jkgK/~4/mkRgGoJsPpA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spacetimeresearch.com/archives/203/feed.html</wfw:commentRss>
		<feedburner:origLink>http://www.spacetimeresearch.com/archives/203.html</feedburner:origLink></item>
		<item>
		<title>Introducing SuperVIEW Collaboration</title>
		<link>http://feedproxy.google.com/~r/spacetimeresearch/jkgK/~3/jPCJLW7kboY/193.html</link>
		<comments>http://www.spacetimeresearch.com/archives/193.html#comments</comments>
		<pubDate>Wed, 03 Feb 2010 05:30:12 +0000</pubDate>
		<dc:creator>Jo Deeker</dc:creator>
		
		<category><![CDATA[General]]></category>

		<category><![CDATA[Gov2.0]]></category>

		<category><![CDATA[Privacy protection]]></category>

		<category><![CDATA[SuperSTAR]]></category>

		<category><![CDATA[SuperVIEW]]></category>

		<category><![CDATA[Collaboration]]></category>

		<category><![CDATA[Gov 2.0]]></category>

		<category><![CDATA[Interactive]]></category>

		<category><![CDATA[Publication]]></category>

		<category><![CDATA[Sharing]]></category>

		<category><![CDATA[Social Media]]></category>

		<category><![CDATA[Social Networking]]></category>

		<category><![CDATA[Twitter]]></category>

		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://www.spacetimeresearch.com/archives/193.html</guid>
		<description><![CDATA[SuperVIEW is our solution for Interactive Publication, Exploration &#38; Visualization of Public Data. Our latest version has a new collaboration feature that we want to share with you.
Using our new SuperVIEW Collaboration features, you can make comments or invite others to make comments on your visualizations using Google Friend Connect.  You can also share your [...]]]></description>
			<content:encoded><![CDATA[<p>SuperVIEW is our solution for Interactive Publication, Exploration &amp; Visualization of Public Data. Our latest version has a new collaboration feature that we want to share with you.</p>
<p>Using our new SuperVIEW Collaboration features, you can make comments or invite others to make comments on your visualizations using Google Friend Connect.  You can also share your customized visualisation with others using our new Share feature. The Share feature allows you to embed a link to your view in a website, blog, Facebook, Twitter or your other favorite social networking application.</p>
<p>Recently Craig Thomler, a well-known active participant and leader in the Australian Gov2.0 movement, wrote a<a href="http://bit.ly/aRwXgn"> blog post </a> on the new <a href="http://data.gov.uk/">data.gov.uk</a> site which he considers is the world leader in open data websites.  He then goes on to make a wishlist of what we could do in Australia to the <a href="http://data.australia.gov.au/">data.australia.gov.au</a> site to make it the best in the world.  Some of what he is asking is for is delivered by SuperVIEW right now including the ability for people to embed visualizations into their own sites, and to allow every set of data to support a discussion to allow people to ask questions to clarify what the dataset contains and discuss how it could be presented in a more usable way.</p>
<p>View this video to see SuperVIEW Collaboration in action.</p>
<p style="text-align: center;"><object width="425" height="344" data="http://www.youtube.com/v/udAsGd-p2Pk&amp;hl=en_US&amp;fs=1&amp;" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/udAsGd-p2Pk&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /></object></p>
<p>If you have any questions about SuperVIEW please contact  <a href="mailto:jo.deeker@spacetimeresearch.com">jo.deeker@spacetimeresearch.com</a></p>
<p><em></em></p>
<img src="http://feeds.feedburner.com/~r/spacetimeresearch/jkgK/~4/jPCJLW7kboY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spacetimeresearch.com/archives/193/feed.html</wfw:commentRss>
		<feedburner:origLink>http://www.spacetimeresearch.com/archives/193.html</feedburner:origLink></item>
		<item>
		<title>Do government agencies know enough about the limits of anonymization?</title>
		<link>http://feedproxy.google.com/~r/spacetimeresearch/jkgK/~3/6NNi9LHCUqE/187.html</link>
		<comments>http://www.spacetimeresearch.com/archives/187.html#comments</comments>
		<pubDate>Mon, 18 Jan 2010 06:49:57 +0000</pubDate>
		<dc:creator>Don McIntosh</dc:creator>
		
		<category><![CDATA[Gov2.0]]></category>

		<category><![CDATA[Privacy protection]]></category>

		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[anonymization]]></category>

		<category><![CDATA[Open data]]></category>

		<guid isPermaLink="false">http://www.spacetimeresearch.com/archives/187.html</guid>
		<description><![CDATA[Are US government agencies aware that anonymization is often not a sufficiently robust method of protecting privacy? The tight deadline for releasing more datasets in line with the Open Government Directive may cause some agencies to cut corners or overlook potential privacy issues.]]></description>
			<content:encoded><![CDATA[<p><span style="font-family: Arial; color: black;"><span style="font-size: small;">There is a new wave of open government data scheduled to crash over the US on January 22 resulting from the government&#8217;s </span><a href="http://www.whitehouse.gov/open/documents/open-government-directive" target="_blank"><span style="font-size: small;">Open Government<span style="font-family: &quot;Comic Sans MS&quot;; mso-bidi-font-family: 'Times New Roman';"> </span>Directive</span></a><span style="font-size: small;">. Is the government paying enough attention to data privacy issues that this deluge could trigger, and how aware are agencies of the well-established fact that anonymizing data is often an inadequate means of protecting privacy in public sector information, and that in many cases more “scrubbing” of the data is needed before any part of it can be safely released for public use?</span></span></p>
<p><span style="font-size: small;"><span style="font-family: Arial; color: black;">Until recently, many government agencies have not been motivated to provide data transparency. Compared with the work that directly aligns with their mission and funding being a visionary supporter of the principles of transparent government is not really high on the agenda. In fact, in many cases, the message from up high hasn&#8217;t really reached them at all (one senior US government</span><span style="font-family: Arial;"> <span style="color: black;">official&#8217;s take on Gov 2.0 was &#8220;oh, that&#8217;s a subset of Web 2.0 isn&#8217;t it?&#8221;). If you add to this reluctance the quite significant disincentives such as the risks of being too transparent, inadvertent privacy breaches, and plain and simple costs, then it&#8217;s not surprising that the average department hasn&#8217;t been as enthusiastic as the Gov 2.0 activist community might like them to be. And if the ROI on the whole deal is often external, why bother?</span></span></span></p>
<p><span style="font-family: Arial; color: black;"><span style="font-size: small;">Well, there&#8217;s nothing like a<span class="apple-converted-space"> </span></span><a title="http://www.whitehouse.gov/omb/assets/memoranda_2010/m10-06.pdf" href="http://www.whitehouse.gov/omb/assets/memoranda_2010/m10-06.pdf" target="_blank"><span style="color: #664d9f;"><span style="font-size: small;">directive</span></span></a><span style="font-size: small;"><span class="apple-converted-space"> </span>straight from the top to get things moving. As of December 8, U.S. federal agencies had 45 days to get three &#8220;high-value datasets&#8221; published online and available through<span class="apple-converted-space"> </span></span><a title="http://data.gov/" href="http://data.gov/" target="_blank"><span style="color: #664d9f;"><span style="font-size: small;">data.gov</span></span></a><span style="font-size: small;">. Wow! Having worked with national statistics agencies for many years, I have some grasp of how long they typically take to publish data and it&#8217;s often longer than this, especially when you are dealing with data that has not previously been published. Of course, the data in some cases might be basic lists of non-sensitive material, in which case perhaps it is not too much extra work to make it suitable for public access. What I&#8217;m interested in examining is what it will take for agencies that don&#8217;t have it that easy, who will need to derive statistics from their data, or reduce it in some way to make it &#8220;safe&#8221; for public consumption.</span></span></p>
<p><span style="font-size: small;"><span style="font-family: Arial; color: black;">Firstly, why bother publishing statistics if the raw data is available? Isn&#8217;t the open data community interested in getting &#8220;raw data now&#8221;, so that it&#8217;s<span class="apple-converted-space"> </span>quick<span class="apple-converted-space"> </span>for the agency and promises maximum flexibility for users? The reality in many cases — and one that seems to still be ignored by some who work in Information Management — is that even after you &#8220;de-identify&#8221; data by stripping obviously identifying attributes from it such as names, addresses, SSNs, etc, it</span><span class="apple-converted-space"><span style="font-family: Arial; color: #9933e5;"> </span></span><span style="font-family: Arial; color: black;">does not necessarily protect privacy. It can still be a fairly trivial exercise for</span><span style="font-family: Arial;"> <span style="color: black;">an ill-meaning</span> <span style="color: black;">data analyst, or even a non-technical person in many cases, to re-identify many of the people in the list. That is why in many cases we&#8217;ll see statistics being released about the data, rather than the raw data itself.</span></span></span></p>
<p><span style="font-family: Arial; color: black;"><span style="font-size: small;">Associate Professor of Law Paul Ohm from the University of Colorado released a<span class="apple-converted-space"> </span></span><a title="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006#621786" href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006#621786" target="_blank"><span style="color: #664d9f;"><span style="font-size: small;">paper</span></span></a><span style="font-size: small;"><span class="apple-converted-space"> </span>about the &#8220;Surprising Failure of Anonymization&#8221; last year, citing some prominent cases where anonymized data was re-identified and pointing out that there are many laws and regulations that are based on the false assumption of anonymization being a panacea for data privacy protection. In one example he describes, a researcher demonstrated how 87.1% of people in the U.S. were uniquely identified by their combined ZIP code, birth date, and sex. He also covers the<span class="apple-converted-space"> </span></span><a title="http://en.wikipedia.org/wiki/AOL_search_data_scandal" href="http://en.wikipedia.org/wiki/AOL_search_data_scandal" target="_blank"><span style="color: #664d9f;"><span style="font-size: small;">AOL search data scandal</span></span></a><span style="font-size: small;">, where individuals were identified from vast volumes of data by their unique search habits, uncovering some embarrassing personal information along the way. </span></span></p>
<p><span style="font-family: Arial; color: black;"><span style="font-size: small;">While the individual agencies may not all have a clear understanding of all the potential privacy issues related to open data, at least the federal administration does have a focus on this. The directive itself states that data can only be made available &#8220;subject to valid privacy, confidentiality &#8230;.. restrictions&#8221;. In addition, the &#8220;</span><a title="http://datagov.ideascale.com/a/panel.do?id=6440" href="http://datagov.ideascale.com/a/panel.do?id=6440" target="_blank"><span style="color: #664d9f;"><span style="font-size: small;">Concept of Operations</span></span></a><span style="font-size: small;">&#8221; paper for<span class="apple-converted-space"> </span></span><a title="http://data.gov/" href="http://data.gov/" target="_blank"><span style="color: #664d9f;"><span style="font-size: small;">data.gov</span></span></a><span style="font-size: small;"><span class="apple-converted-space"> </span>does have<span class="apple-converted-space"> </span>privacy</span></span><span style="font-size: small;"><span class="apple-converted-space"><span style="font-family: Arial; color: #9933e5;"> </span></span><span style="font-family: Arial; color: black;">in its sights, stating that there will be working groups looking into privacy issues arising from how data is mashed up and/or used in applications. I would point out that these groups could make an early head start simply by reading Paul Ohm&#8217;s paper, and not wait until after this round of data has been released. It seems that for the moment at least, the idea of what constitutes adequate privacy protection for open data is really up to each agency to decide.</span></span></p>
<p><span style="font-size: small;"><span style="font-family: Arial; color: black;">While the working groups deliberate how privacy issues that result from data mashups and the like should be addressed, many datasets will be posted to data.gov and despite the proven limits of the effectiveness of anonymization,</span><span style="font-family: Arial;"> <span style="color: black;">the experience that my colleagues and I have</span> <span style="color: black;">gained from talking with people who work in Information Management in government is that key staff in at least some agencies are not sufficiently aware of this, and that in their view, anonymization is essentially all you need to do to make data safe for release. I&#8217;d be interested to know if this agrees with others&#8217; observations.</span></span></span></p>
<p><span style="font-family: Arial; color: black;"><span style="font-size: small;">My observation regarding government&#8217;s understanding of data privacy issues is based largely on anecdotal evidence collected by myself and my colleagues. Perhaps I am overstating things and agencies do have the required skills and knowledge to release data safely. It would be good to hear about how different agencies are dealing with the Open Data Directive and what you think about the challenges of releasing useful data without unduly compromising privacy.</span></span></p>
<p><span style="font-family: Arial; color: black;"><span style="font-size: small;">Note: Ohm&#8217;s paper is<span class="apple-converted-space"> </span>fairly<span class="apple-converted-space"> </span>lengthy. For a very interesting summary of the paper, you can check out<span class="apple-converted-space"> </span></span><a title="http://arstechnica.com/tech-policy/news/2009/09/your-secrets-live-online-in-databases-of-ruin.ars" href="http://arstechnica.com/tech-policy/news/2009/09/your-secrets-live-online-in-databases-of-ruin.ars" target="_blank"><span style="color: #664d9f;"><span style="font-size: small;">this post</span></span></a><span style="font-size: small;"><span class="apple-converted-space"> </span>on ars technica, which sparked a lot of debate regarding the importance of privacy.</span></span></p>
<p class="MsoNormal">
<p class="MsoNormal">
<img src="http://feeds.feedburner.com/~r/spacetimeresearch/jkgK/~4/6NNi9LHCUqE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.spacetimeresearch.com/archives/187/feed.html</wfw:commentRss>
		<feedburner:origLink>http://www.spacetimeresearch.com/archives/187.html</feedburner:origLink></item>
	</channel>
</rss>

