<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Strata</title>
	
	<link>http://strata.oreilly.com</link>
	<description>Making Data Work</description>
	<lastBuildDate>Wed, 19 Jun 2013 21:11:46 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/oreilly/strata" /><feedburner:info uri="oreilly/strata" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Visualization of the Week: Booze, tipple, imbibe, or whatever you’d like to call it</title>
		<link>http://feedproxy.google.com/~r/oreilly/strata/~3/blJHeQ9226I/visualization-of-the-week-booze-tipple-imbibe-or-whatever-youd-like-to-call-it.html</link>
		<comments>http://strata.oreilly.com/2013/06/visualization-of-the-week-booze-tipple-imbibe-or-whatever-youd-like-to-call-it.html#comments</comments>
		<pubDate>Wed, 19 Jun 2013 21:00:18 +0000</pubDate>
		<dc:creator>Ann Spencer</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[data visualization]]></category>

		<guid isPermaLink="false">http://strata.oreilly.com/?p=58186</guid>
		<description><![CDATA[Sooooooo. This is what happens when Jenn Webb attends Velocity this week and I briefly step in to cover the Strata Visualization of the Week element. I could have focused on the Governments Search for Google Data visualization from Chris &#8230; ]]></description>
				<content:encoded><![CDATA[<p>Sooooooo. This is what happens when <a href="http://strata.oreilly.com/jennw">Jenn Webb</a> attends <a href="http://velocityconf.com/velocity2013/public/content/home">Velocity</a> this week and I briefly step in to cover the <em>Strata Visualization of the Week</em> element.</p>
<p>I could have focused on the <a href="http://online.wsj.com/article/SB10001424127887323949904578539680253107690.html">Governments Search for Google Data</a> visualization from Chris Canipe and Madeline Farbman of the <a href="http://online.wsj.com/public/page/news-interactive-features-trends.html">Wall Street Journal</a>. Or, I could have focused on <a href="http://www.fastcompany.com/3013208/these-amazing-twitter-metadata-visualizations-will-blow-your-mind">Neal Ungerleider&#8217;s piece</a> that covers Eric Fisher and MapBox for Gnip&#8217;s twitter metadata visualizations.  Yet, my curiosity took over once I came across <a href="http://www.economist.com/">The Economist&#8217;s</a> <a href="http://www.economist.com/blogs/graphicdetail/2013/06/daily-chart-9">High Spirits</a> graphic. Not only do I make my own bitters which qualifies me for preliminary booze nerd status, I also needed a brief break away from the transparency issues currently dominating the data-oriented conversations. Following my booze nerd curiosity led me to <a href="http://virostatiq.com/data/Sigma.js/sigmacocktaibrands/network/">this interactive data visualization of common cocktail ingredients</a>:</p>
<div id="attachment_58193" class="wp-caption aligncenter" style="width: 650px"><a href="http://virostatiq.com/data/Sigma.js/sigmacocktaibrands/network/"><img class=" wp-image-58193 " alt="Analysis of 25,000 recipes from drinksnation.com and drinksmixer.com" src="http://s.radar.oreilly.com/wp-files/5/2013/06/Screen-Shot-2013-06-19-at-12.51.54-PM-1024x647.png" width="640" height="404" /></a><p class="wp-caption-text">Analysis of 25,000 recipes from drinksnation.com and drinksmixer.com</p></div>
<p><span id="more-58186"></span></p>
<p>Be careful though&#8230;it is easy to get sucked into it. I must have spent about 10 minutes before I realized that I should revisit it later.</p>
<p>Yet, it was a good 10 minute break.</p>
<p><a href="http://virostatiq.com/data-driven-drinking-cocktail-ingredients-and-beverage-brands/"><em>Hat tip to Marko Plahuta of Virostatiq for highlighting this.</em></a></p>
<p><strong>More visualizations:</strong></p>
<ul>
<li><a href="http://strata.oreilly.com/2013/06/visualization-of-the-week-nyc-citi-bike-use-in-real-time.html">NYC Citi Bike</a></li>
<li><a href="http://strata.oreilly.com/2013/06/visualization-of-the-week-stravinskys-the-rite-of-spring.html">Stravinsky&#8217;s &#8220;The Rite of Spring&#8221;</a></li>
<li><a href="http://strata.oreilly.com/2013/05/visualization-of-the-week-hospital-procedure-charges-across-the-u-s-compared.html">Hospital procedure charges across the U.S., compared</a></li>
<li><a href="http://strata.oreilly.com/2013/05/visualization-of-the-week-cia-rendition-flights-of-terror-suspects.html">CIA rendition flights of terror suspects</a></li>
<li><a href="http://strata.oreilly.com/2013/05/visualization-of-the-week-real-time-wikipedia-edits.html">Real-time Wikipedia edits</a></li>
</ul>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px;clear: both">
<p><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><img style="float: left;border: none;padding-right: 10px" alt="" src="http://cdn.oreilly.com/radar/images/promos/2013-strata-rx-london-ny.gif" /></a><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><strong>O&#8217;Reilly Strata Conference</strong></a> — Strata brings together the leading minds in data science and big data — decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.</p>
<p><a href="http://strataconf.com/rx2013?intcmp=il-strata-strx13-strata-blog-banner-148x178">Strata Rx Health Data Conference</a>: September 25-27 | Boston, MA<br />
<a href="http://strataconf.com/stratany2013?intcmp=il-strata-stny13-blog-promo">Strata + Hadoop World</a>: October 28-30 | New York, NY<br />
<a href="http://strataconf.com/strataeu2013/?intcmp=il-strata-steu13-blog-promo">Strata in London</a>: November 15-17 | London, England</p>
</div>
<img src="http://feeds.feedburner.com/~r/oreilly/strata/~4/blJHeQ9226I" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://strata.oreilly.com/2013/06/visualization-of-the-week-booze-tipple-imbibe-or-whatever-youd-like-to-call-it.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://strata.oreilly.com/2013/06/visualization-of-the-week-booze-tipple-imbibe-or-whatever-youd-like-to-call-it.html</feedburner:origLink></item>
		<item>
		<title>HBase looks more appealing to data scientists</title>
		<link>http://feedproxy.google.com/~r/oreilly/strata/~3/nTXR8hBZD-I/hbase-looks-more-appealing-to-data-scientists.html</link>
		<comments>http://strata.oreilly.com/2013/06/hbase-looks-more-appealing-to-data-scientists.html#comments</comments>
		<pubDate>Sun, 16 Jun 2013 16:00:07 +0000</pubDate>
		<dc:creator>Ben Lorica</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[data scientist]]></category>
		<category><![CDATA[Hadoop ecosystem]]></category>
		<category><![CDATA[Hadoop query]]></category>
		<category><![CDATA[HBase]]></category>

		<guid isPermaLink="false">http://strata.oreilly.com/?p=58099</guid>
		<description><![CDATA[When Hadoop users need to develop apps that are &#8220;latency sensitive&#8221;, many of them turn to HBase1. Its tight integration with Hadoop makes it a popular data store for real-time applications. When I attended the first HBase conference last year, &#8230; ]]></description>
				<content:encoded><![CDATA[<p>When Hadoop users need to develop apps that are &#8220;latency sensitive&#8221;, many of them turn to HBase<sup>1</sup>. Its tight integration with Hadoop makes it a popular data store for real-time applications. When I attended the first HBase conference last year, <a href="http://practicalquant.blogspot.com/2012/05/much-to-like-about-hbasecon.html">I was pleasantly surprised</a> by the diversity of companies and applications that rely on HBase. <a href="http://www.hbasecon.com/schedule/">This year&#8217;s conference was even bigger</a> and I ran into attendees from a wide range of companies. Another set of interesting real-world case studies were showcased, along with sessions highlighting work of the <a href="http://hbase.apache.org/team-list.html">HBase team</a>  aimed at improving usability, reliability, and <i>availability</i> (bringing down <i>mean time to recovery</i> has been a recent area of focus).</p>
<p><img src="http://4.bp.blogspot.com/-asDzOl6HmzI/UbtRt2uHRjI/AAAAAAAADOw/frCcossla3k/s1600/hbase-loc.jpg" alt="HBase: lines of code" height="214" width="450"></p>
<p>HBase has had a reputation of being a bit difficult to use &#8211; its core users have been <i>data engineers</i>, not <i>data scientists</i>. The good news is that as HBase gets adopted by more companies, tools are being developed to open it up to more users. Let me highlight some tools that will appeal to data scientists. </p>
<p><span id="more-58099"></span></p>
<p><b>SQL Interfaces</b><br />
Any data store that wants to appeal to data scientists and business analysts needs to be accessible via SQL. Judging from the number of sessions<sup>2</sup> devoted to SQL, HBase users already have several options to choose from (with more to follow in the near future). Widely used inside Salesforce, <a href="https://github.com/forcedotcom/phoenix">Phoenix</a> is a relatively new open source project that targets interactive analysis. Phoenix is an embedded JDBC driver, and it compiles SQL to <a href="https://github.com/forcedotcom/phoenix/wiki/Performance">optimized</a> native HBase calls. Generally available since the start of May, the open source, distributed query execution engine <a href="https://github.com/cloudera/impala">Impala</a> supports both HDFS and HBase.</p>
<p><b>Model development, deployment, and maintenance</b><br />
I <a href="http://strata.oreilly.com/2013/06/its-getting-easier-to-build-big-data-applications.html">recently highlighted</a> the growing number of open source, <i>analytic engines</i>, that make it easier to develop big data applications. Beyond these engines, <a href="https://github.com/cloudera/cdk">tools for developing apps on top of Hadoop</a> are also starting to emerge. The <a href="http://www.kiji.org/">kiji project</a> is a <em>framework</em><sup>3</sup> that makes big data applications easier to develop, maintain, and deploy. With kiji, developers don&#8217;t need to concern themselves with serialization, schema evolution, and other low level details. As an example, <a href="https://github.com/kijiproject/kiji-express">kiji-express</a> lets data scientists encode <a href="http://docs.kiji.org/tutorials/express-recommendation/0.4.0/express-overview/">algorithms in Scalding</a>. Data scientists can also develop models in other frameworks and import <a href="http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language">PMML</a> files into kiji. </p>
<p>While there are many frameworks for developing models and algorithms, there are far fewer tools<sup>4</sup> for deploying and maintaining algorithms in &#8220;production&#8221;. Data scientists usually have to explain their models to engineers who rewrite them for production environments (a process that takes weeks/months in some companies). Just as important, once algorithms are deployed &#8220;in the wild&#8221;, they need to be <a href="http://www.quora.com/What-are-the-keys-to-operationalizing-a-machine-learning-ranking-system-from-an-organization-engineering-management-point-of-view">carefully maintained and monitored</a> &#8211; e.g., models that are severely underperforming need to be revisited. At this stage, the kiji project has tools for integrating model development (<a href="https://github.com/kijiproject/kiji-express">kiji-express</a>) and deployment (<a href="https://github.com/kijiproject/kiji-scoring">kiji-scoring</a>). Over time, kiji will include tools for monitoring, maintaining, and combining models.</p>
<p>Model deployment and maintenance are areas that many more data scientists are paying attention to. Some choose to <a href="http://strata.oreilly.com/2013/03/data-science-tools-all-in-or-mix-and-match.html">work in a single framework</a> (such as kiji), others piece together different tools. New workflow tools such as Chronos, are allowing business analysts to <a href="http://strata.oreilly.com/2013/04/workflow-tools-enable-the-rapid-deployment-of-models.html">develop and maintain long, complex, data processing pipelines</a>. I&#8217;m looking forward to seeing more tools address these critical pain points. </p>
<p><b>Related posts</b>:
<li><a href="http://strata.oreilly.com/2013/06/tracking-the-progress-of-large-scale-query-engines.html">Tracking the progress of large-scale Query Engines</a></li>
<li><a href="http://strata.oreilly.com/2013/06/its-getting-easier-to-build-big-data-applications.html">It’s getting easier to build Big Data applications</a></li>
<li><a href="http://strata.oreilly.com/2013/05/improving-options-for-unlocking-your-graph-data.html">Improving options for unlocking your graph data</a></li>
<hr /><small>1. Other popular alternatives include Cassandra, Riak, MongoDB, and Accumulo.<br />
2. Besides Phoenix and Impala, there were sessions on Hive and Drill as well. In addition, I imagine <a href="https://github.com/amplab/shark/wiki">Shark</a> will support HBase sometime in the near future.<br />
3. Reminiscent of <a href="http://en.wikipedia.org/wiki/Spring_Framework">Spring</a>,  its founders want to make Kiji even easier for developers to use.<br />
4. Some companies use in-database analytics or <a href="http://strata.oreilly.com/2013/04/workflow-tools-enable-the-rapid-deployment-of-models.html">workflow tools</a> to handle this. I <a href="http://strata.oreilly.com/2013/03/python-data-tools-just-keep-getting-better.html">recently highlighted</a> Augustus, a PyData tool for developing <b>and</b> deploying models in production. SAS has a product (<a href="http://www.sas.com/technologies/analytics/modelmanager/manager/index.html">Model Manager</a>) for managing the entire analytics lifecycle.<br />
</small></p>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px;clear: both"><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><img style="float: left;border: none;padding-right: 10px" src="http://cdn.oreilly.com/radar/images/promos/2013-strata-rx-london-ny.gif" /></a><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><strong>O&#8217;Reilly Strata Conference</strong></a> &mdash; Strata brings together the leading minds in data science and big data &mdash; decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.</p>
<p> <a href="http://strataconf.com/rx2013?intcmp=il-strata-strx13-strata-blog-banner-148x178">Strata Rx Health Data Conference</a>: September 25-27 |  Boston, MA<br /> <a href="http://strataconf.com/stratany2013?intcmp=il-strata-stny13-blog-promo">Strata + Hadoop World</a>: October 28-30 |  New York, NY<br /><a href="http://strataconf.com/strataeu2013/?intcmp=il-strata-steu13-blog-promo">Strata in London</a>: November 15-17  |  London, England </div>
<img src="http://feeds.feedburner.com/~r/oreilly/strata/~4/nTXR8hBZD-I" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://strata.oreilly.com/2013/06/hbase-looks-more-appealing-to-data-scientists.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://strata.oreilly.com/2013/06/hbase-looks-more-appealing-to-data-scientists.html</feedburner:origLink></item>
		<item>
		<title>Strata Week: Why we should care about what the NSA may or may not be doing</title>
		<link>http://feedproxy.google.com/~r/oreilly/strata/~3/dyiphHUMvvk/strata-week-why-we-should-care-about-what-the-nsa-may-or-may-not-be-doing.html</link>
		<comments>http://strata.oreilly.com/2013/06/strata-week-why-we-should-care-about-what-the-nsa-may-or-may-not-be-doing.html#comments</comments>
		<pubDate>Fri, 14 Jun 2013 16:30:51 +0000</pubDate>
		<dc:creator>Jenn Webb</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[data privacy]]></category>
		<category><![CDATA[DNA data]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[NSA]]></category>
		<category><![CDATA[Open Compute]]></category>
		<category><![CDATA[surveillance]]></category>

		<guid isPermaLink="false">http://strata.oreilly.com/?p=58072</guid>
		<description><![CDATA[It&#8217;s a question of power, not privacy &#8212; and what is the NSA really doing? In the wake of the leaked NSA data-collection programs, the Pew Research Center conducted a national survey to measure American&#8217;s response. The survey found that &#8230; ]]></description>
				<content:encoded><![CDATA[<h2 id="NSA-response">It&#8217;s a question of power, not privacy &mdash; and what is the NSA really doing?</h2>
<p><div id="attachment_58074" class="wp-caption alignright" style="width: 260px"><a href="http://www.people-press.org/2013/06/10/majority-views-nsa-phone-tracking-as-acceptable-anti-terror-tactic/"><img src="http://s.radar.oreilly.com/wp-files/5/2013/06/PEWgraph.png" alt="PEW graph" width="250" height="319" class="size-full wp-image-58074" /></a><p class="wp-caption-text">Pew Research Center national survey</p></div>In the wake of the <a href="http://strata.oreilly.com/2013/06/wireless-body-area-network-iot-authenticators-nsa-surveillance.html#NSA-spies">leaked NSA data-collection programs</a>, the Pew Research Center conducted <a href="http://www.people-press.org/2013/06/10/majority-views-nsa-phone-tracking-as-acceptable-anti-terror-tactic/">a national survey</a> to measure American&#8217;s response. The survey found that 56% of respondents think NSA&#8217;s telephone record tracking program is an acceptable method to investigate terrorism, and 62% said the government&#8217;s investigations into possible terrorist threats are more important than personal privacy.</p>
<p><a href="http://www.theatlantic.com/technology/archive/2013/06/why-should-we-even-care-if-the-government-is-collecting-our-data/276732/">Rebecca J. Rosen at The Atlantic</a> took a look at legal scholar Daniel J. Solove&#8217;s argument that we should care about the government&#8217;s collection of our data, but not for the reasons one might think &mdash; the collection itself, he argues, isn&#8217;t as troubling as the fact that they&#8217;re holding the data in perpetuity and that we don&#8217;t have access to it. Rosen quotes Solove:</p>
<blockquote><p>&#8220;The NSA program involves a massive database of information that individuals cannot access. &#8230; This kind of information processing, which forbids people&#8217;s knowledge or involvement, resembles in some ways a kind of due process problem. It is a structural problem involving the way people are treated by government institutions. Moreover, it creates a power imbalance between individuals and the government. &#8230; This issue is not about whether the information gathered is something people want to hide, but rather about the power and the structure of government.&#8221;</p></blockquote>
<p><span id="more-58072"></span></p>
<p>In a similar vein, <a href="http://www.wired.com/opinion/2013/06/why-i-have-nothing-to-hide-is-the-wrong-way-to-think-about-surveillance/">Moxie Marlinspike at Wired</a> tackled the response from some that the NSA&#8217;s data gathering efforts aren&#8217;t worrisome if individuals have nothing to hide.<!--more--> He quotes from &lt;a href=&quot;<a href="http://www.youtube.com/watch?v=6wXkI4t7nuc">&#8220;&gt;James Duane</a>, a professor at Regent Law School and former defense attorney, and from <a href="http://www.law.cornell.edu/supct/html/98-93.ZD.html">Supreme Court Justice Breyer</a> to make the point that federal criminal laws span 50 titles of the United States Code in 27,000 pages and that the actual total number of laws and regulations isn&#8217;t exactly known, creating the problem of knowing what you may or may not need to hide. Marlinspike writes:</p>
<blockquote><p>&#8220;If the federal government had access to every email you&#8217;ve ever written and every phone call you&#8217;ve ever made, it&#8217;s almost certain that they could find something you&#8217;ve done which violates a provision in the 27,000 pages of federal statues or 10,000 administrative regulations. You probably do have something to hide, you just don&#8217;t know it yet.&#8221;</p></blockquote>
<p>Marlinspike also argues that having something to hide is an important part of our growth as a society&mdash;the recent legal victories in same-sex marriage and legalization of marijuana &#8220;would probably not have been possible <em>without the ability to break the law</em>,&#8221; he writes. Furthermore, he argues, a dystopian world where law enforcement is 100% effective opens wide avenues for abuse of power: &#8220;&#8230;if everyone&#8217;s every action were being monitored, and everyone technically violates some obscure law at some time, then punishment becomes purely selective,&#8221; Marlinspike writes. &#8220;Those in power will essentially have what they need to punish anyone they&#8217;d like, whenever they choose, as if there were no rules at all.&#8221; You can read his full piece <a href="http://www.wired.com/opinion/2013/06/why-i-have-nothing-to-hide-is-the-wrong-way-to-think-about-surveillance/">at Wired</a>.</p>
<p><a href="https://medium.com/prism-truth/82a1791c94d3">Mark Jaquith writes on Medium</a> that based on news reports of the PRISM program, we don&#8217;t actually know what the NSA is doing or how they&#8217;re gathering data, making it impossible for citizens to judge whether or not to be outraged. Jaquith notes the troubling lack of technical details in Glenn Greenwald and Ewan MacAskill&#8217;s account of the PRISM program <a href="http://www.guardian.co.uk/world/2013/jun/06/us-tech-giants-nsa-data">at The Guardian</a> and their <a href="http://www.guardian.co.uk/world/2013/jun/09/edward-snowden-nsa-whistleblower-surveillance">follow-up article on Edward Snowden</a>, who leaked the story. Jaquith points out the the <em>authors</em> &mdash; not Snowden &mdash; described the program as &#8220;[allowing] the agency to directly and unilaterally seize the communications off the companies&#8217; servers.&#8221; Likewise, he notes, the <a href="http://articles.washingtonpost.com/2013-06-06/news/39784046_1_prism-nsa-u-s-servers">Washington Post reported</a> that &#8220;[f]rom inside a company&#8217;s data stream the NSA is capable of pulling out anything it likes.&#8221;</p>
<p>All of which, if true, calls for outrage. But on the other side, a report <a href="http://www.nytimes.com/2013/06/08/technology/tech-companies-bristling-concede-to-government-surveillance-efforts.html?_r=0">at The New York Times</a> tells a slightly different story. Jaquith quotes from the piece:</p>
<blockquote><p>&#8220;But instead of adding a back door to their servers, the companies were essentially asked to erect a locked mailbox and give the government the key, people briefed on the negotiations said. &#8230; The data shared in these ways, the people said, is shared after company lawyers have reviewed the FISA request according to company practice. It is not sent automatically or in bulk, and the government does not have full access to company servers.&#8221;</p></blockquote>
<p>Jaquith points out this is &#8220;indirect and moderated&#8221; access &mdash; in direct opposition to the access Greenwald and MacAskill described as &#8220;direct and unilateral.&#8221; &#8220;The difference between these two explanations isn&#8217;t some nuanced distinction that only tech geeks should care about,&#8221; Jaquith says. &#8220;This is the difference between companies voluntarily giving the government direct and unilateral access to arbitrary customer data and companies merely complying with the law in a technically efficient way that doesn&#8217;t change the nature of the data received by the government.&#8221; He notes that the only way Greenwald and MacAskill can be correct at this point, without offering further corroborating evidence from Snowden, is if everyone &mdash; all the companies involved, the sources for The New York Times, the NSA, and the U.S. President &mdash; are lying, which he allows isn&#8217;t impossible, but stresses that the technical details do matter. &#8220;There is no aspect of this story more important,&#8221; he says, &#8220;than finding out which account is accurate.&#8221; You can read his full report <a href="https://medium.com/prism-truth/82a1791c94d3">at Medium</a>.</p>
<h2 id="open-compute">Facebook opens an Open Compute data center</h2>
<p>Facebook opened its first data center in Europe this week in Lule&aring;, Sweden, housed only with its <a href="http://www.opencompute.org">Open Compute</a> servers. In <a href="https://www.facebook.com/notes/lule%25C3%25A5-data-center/lule%25C3%25A5-goes-live/474321655969861">a post announcing the launch</a>, the company described the data center as &#8220;likely to be one of the most efficient and sustainable&#8221; centers in the world and explained that the equipment is powered by 100% renewable, locally generated hydro-electric power &mdash; power so reliable, they&#8217;ve been able to &#8220;reduce the number of backup generators required at the site by more than 70 percent.&#8221;</p>
<p><a href="http://arstechnica.com/information-technology/2013/06/facebook-opens-data-center-filled-entirely-with-servers-it-designed/">Jon Brodkin notes at Ars Technica</a> that the data center&#8217;s power usage effectiveness (PUE) rating is an impressive 1.07 and that Facebook plans to post near real-time PUE data for this center, <a href="http://slashdot.org/topic/datacenter/facebook-provides-near-real-time-pue-data-for-its-data-centers/">as it does for its US data centers</a>. Brodkin reports that Facebook&#8217;s next goal is to provide companies with an alternative to Cisco and other network vendors by releasing an Open Compute design for <a href="http://arstechnica.com/information-technology/2013/05/facebook-aims-to-knock-cisco-down-a-peg-with-open-network-hardware/">a top-of-rack switch</a> that will work with any networking software.</p>
<h2 id="dna-databanking">Local police now &#8220;databanking&#8221; DNA</h2>
<p>In <a href="http://www.nytimes.com/2013/06/13/us/police-agencies-are-assembling-records-of-dna.html">a post at The New York Times</a>, Joseph Goldstein took a look at the growing DNA-gathering practices of local law enforcement agencies. Instead of waiting for state and federal agencies, local agencies are developing their own DNA databases, and their methods are causing some concern. Goldstein writes:</p>
<blockquote><p>&#8220;These local databases operate under their own rules, providing the police much more leeway than state and federal regulations. And the police sometimes collect samples from far more than those convicted of or arrested for serious offenses &mdash; in some cases, innocent victims of crimes who do not necessarily realize their DNA will be saved for future searches.&#8221;</p></blockquote>
<p>Barry Scheck, a co-director of the Innocence Project, told Goldstein that they&#8217;ve warned local law enforcement that the public would be &#8220;disturbed&#8221; when these &#8220;rogue, unregulated&#8221; databases came to light. Goldstein reports that DNA samples are being taken from people &#8220;on the mere suspicion of a crime&#8221; and entered into a database regardless of whether or not the subject was charged or found guilty. Samples also are gathered from people to rule them out of a crime &mdash; say, a homeowner burglary victim &mdash; but then kept on file. Goldstein notes that the Supreme Court&#8217;s recent decision in <em><a href="http://www.scotusblog.com/case-files/cases/maryland-v-king/">Maryland v. King</a></em> was the first to address this sort of DNA &#8220;databanking&#8221; and that it could serve to accelerate the practice. &#8220;While that decision said nothing explicit about the authority of local law enforcement to keep DNA databases,&#8221; Goldstein reports, &#8220;it could well encourage local jurisdictions to push ahead, several experts said.&#8221;</p>
<h2>Tip us off</h2>
<p>News tips and suggestions are always welcome, so please send them <a href="pitchstrata@oreilly.com ">along</a>.</p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://strata.oreilly.com/2013/06/wireless-body-area-network-iot-authenticators-nsa-surveillance.html">The NSA is spying on us</a></li>
<li> <a href="http://strata.oreilly.com/2013/04/fbi-stingray-irs-email-liquid-robotics-data-driven-campaigns.html">Court case sheds light on FBI stingray surveillance</a></li>
<li> <a href="http://strata.oreilly.com/2013/05/us-open-data-nicole-wong-fbi-data-privacy-anonymity.html#email-privacy">FBI documents &#8220;strongly suggest&#8221; they read citizens emails sans warrants</a></li>
<li> <a href="http://radar.oreilly.com/2011/04/facebook-open-compute-ops.html">What Facebook&#8217;s Open Compute Project means</a></li>
<li> <a href="http://strata.oreilly.com/tag/strata-week">More Strata Week coverage</a></li>
</ul>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px;clear: both"><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><img style="float: left;border: none;padding-right: 10px" src="http://cdn.oreilly.com/radar/images/promos/2013-strata-rx-london-ny.gif" /></a><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><strong>O&#8217;Reilly Strata Conference</strong></a> &mdash; Strata brings together the leading minds in data science and big data &mdash; decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.</p>
<p> <a href="http://strataconf.com/rx2013?intcmp=il-strata-strx13-strata-blog-banner-148x178">Strata Rx Health Data Conference</a>: September 25-27 |  Boston, MA<br /> <a href="http://strataconf.com/stratany2013?intcmp=il-strata-stny13-blog-promo">Strata + Hadoop World</a>: October 28-30 |  New York, NY<br /><a href="http://strataconf.com/strataeu2013/?intcmp=il-strata-steu13-blog-promo">Strata in London</a>: November 15-17  |  London, England </div>
<img src="http://feeds.feedburner.com/~r/oreilly/strata/~4/dyiphHUMvvk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://strata.oreilly.com/2013/06/strata-week-why-we-should-care-about-what-the-nsa-may-or-may-not-be-doing.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://strata.oreilly.com/2013/06/strata-week-why-we-should-care-about-what-the-nsa-may-or-may-not-be-doing.html</feedburner:origLink></item>
		<item>
		<title>Data Journalists Gather, Transparency, and Data Viz</title>
		<link>http://feedproxy.google.com/~r/oreilly/strata/~3/d_h33wnRjw0/data-journalists-gather-transparency-and-data-viz.html</link>
		<comments>http://strata.oreilly.com/2013/06/data-journalists-gather-transparency-and-data-viz.html#comments</comments>
		<pubDate>Thu, 13 Jun 2013 17:00:33 +0000</pubDate>
		<dc:creator>Janaya Williams</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[databootcamp]]></category>

		<guid isPermaLink="false">http://strata.oreilly.com/?p=58044</guid>
		<description><![CDATA[Data journalism is becoming a truly global practice.  Data journalists from the UK, China, and the US are sharing data-oriented best practices, insights, and tools. Journalists in Latin America are meeting this week to push for more transparency and access &#8230; ]]></description>
				<content:encoded><![CDATA[<p>Data journalism is becoming a truly global practice.  Data journalists from the UK, China, and the US are sharing data-oriented best practices, insights, and tools. Journalists in Latin America are meeting this week to push for more transparency and access to data in the region. At the same time, recent revelations about NSA domestic surveillance programs have pushed big data stories to the front pages of US papers.  Here are a few links from the past week:</p>
<h3><strong>Transparency&#8230;or Lack Thereof</strong></h3>
<ul>
<li><strong><a href="//www.niemanlab.org/2013/06/opendata-latinoamerica-driving-the-demand-side-of-data-and-scraping-towards-transparency/">OpenData Latinoamérica: Driving the demand side of data and scraping towards transparency</a></strong> (Neiman Journalism Lab)<br />
“There’s a saying here, and I’ll translate, because it’s very much how we work,” Miguel Paz said to me over a Skype call from Chile. “But that doesn’t mean that it’s illegal. Here, it’s ‘It’s better to ask forgiveness than to ask permission.” Paz is a veteran of the digital news business. The saying has to do with his approach to scraping public data from governments that may be slow to share it.</li>
<li><strong><a href="http://www.zdnet.com/the-real-story-in-the-nsa-scandal-is-the-collapse-of-journalism-7000016570/">The real story in the NSA scandal is the collapse of journalism</a> </strong>(zdnet.com)<br />
On Thursday, June 6, the Washington Post <a href="http://www.washingtonpost.com/investigations/us-intelligence-mining-data-from-nine-us-internet-companies-in-broad-secret-program/2013/06/06/3a0c0da8-cebf-11e2-8845-d970ccb04497_story.html">published a bombshell of a story</a>, alleging that nine giants of the tech industry had “knowingly participated” in a widespread program by the United States National Security Agency (NSA). One day later, with no acknowledgment except for a change in the timestamp, the Post revised the story, backing down from sensational claims it made originally. But the damage was already done.</li>
<li><strong><a href="http://davidsimon.com/we-are-shocked-shocked/">We are shocked, shocked…</a></strong> (davidsimon.com)<br />
Having labored as a police reporter in the days before the Patriot Act, I can assure all there has always been a stage before the wiretap, a preliminary process involving the capture, retention and analysis of raw data. It has been so for decades now in this country. The only thing new here, from a legal standpoint, is the scale on which the FBI and NSA are apparently attempting to cull anti-terrorism leads from that data. But the legal and moral principles? Same old stuff.</li>
<li><strong><a href="http://www.pbs.org/mediashift/2013/06/big-data-has-big-stage-at-personal-democracy-forum">Big Data Has Big Stage at Personal Democracy Forum</a> </strong>(pbs.org)<br />
<a href="http://engagingnewsproject.org/home/">Engaging News Project</a>’s Talia Stroud tackled the issue of public engagement in news organizations. Polls on websites don’t yield scientifically accurate results, nor do they get people to address difficult issues, she said. “These data are junk. We know they’re junk,” Stroud said. “City council representatives know they’re junk. Even news organizations know that the results of these data are junk. The only reason that this poll is being included on the news organization’s site is to increase interactivity and increase your time on page.”</li>
</ul>
<p><span id="more-58044"></span><br />
<strong><span style="font-size: 16px">Tools and Resources</span></strong></p>
<ul>
<li><strong><a href="http://ijnet.org/blog/latin-americas-first-databootcamp-set-spark-media-innovation">Latin America&#8217;s first DataBootCamp set to spark media innovation</a></strong> (IJNet.org)<br />
Not only will the upcoming DataBootCamp of Bolivia be the first event of its kind in Latin America, it will be the highest-altitude data journalism event the world has seen so far. Journalists, programmers and designers will gather in the capital, La Paz, whose altitude reaches more than 11,800 feet (3,600 meters) above sea level. But the event itself will represent just the first step of a steep yet unstoppable climb toward using open data in the region.</li>
<li><strong><a href="http://www.youtube.com/watch?v=hwbPabP-re4">AAJA-Asia and the Journalism and Media Studies Centre at The University of Hong Kong present at the New.Now.Next Media Conference | Panel 2a, Take a Byte: Data Journalism in the Era of Big Information</a></strong> (YouTube)<br />
Reuters data expert Irene Jay Liu will talk how she was involved in building Connected China, an app that visualises the complex networks of China&#8217;s leadership. Eric Ulken, assistant managing editor for digital at the Seattle Times, will show what smaller organizations can do to &#8220;dip a toe or two&#8221; into data journalism. AP&#8217;s Global Interactive Editor and AAJA National President, Paul Cheung, will tell us what we can really do with data, big or small. Moderator: Henry Williams, data journalist</li>
</ul>
<h3><strong><span style="font-size: medium"><span style="line-height: 24px">Data Visualization </span></span></strong></h3>
<ul>
<li><strong><a href="http://www.journalism.co.uk/news/why-the-global-mail-has-sharpened-focus-on-data-and-investigations/s2/a553205/">Why The Global Mail has focused in on data and investigations</a> </strong>(journalism.co.uk)<br />
The not-for-profit investigative outlet launched data visualisation &#8220;Behind the Wire&#8221; today, as part of a new focus on data and &#8216;more editorially ambitious&#8217; stories. Part of the evolution of Behind The Wire will rely on users actively engaging with the data, both in terms of flagging certain reports and pursuing FOI requests. Bungey said while there have been similar attempts by other news outlets to get the audience involved in looking through data, he has not previously seen a visualisation which &#8220;actually changes&#8221; based on users&#8217; inputs, and gives the ability for them to actually contribute &#8220;graphical evidence&#8221;.</li>
<li><strong><a href="http://reporthers.tumblr.com/post/51643894087/q-a-data-viz-expert-amanda-hickman">Q&amp;A: Data Viz Expert, Amanda Hickman</a> </strong>(reporthers.tumblr.com)<br />
There have always been charts in the newspaper. What’s new is the really incredible mapping and collaborating and exploring that we’re doing online. As data visualization gets more popular in newsrooms, newsrooms are more and more likely to show their work, to publish their raw data, which I think is incredibly valuable. We expect to be able to explore the news, to question reporters. That seems like a good thing to me. The open data movement has gone hand in hand with this transformation–as more journalists do more with data it gets more appalling when a government agency tries to pass off a photograph of a data table as “information.”</li>
</ul>
<img src="http://feeds.feedburner.com/~r/oreilly/strata/~4/d_h33wnRjw0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://strata.oreilly.com/2013/06/data-journalists-gather-transparency-and-data-viz.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://strata.oreilly.com/2013/06/data-journalists-gather-transparency-and-data-viz.html</feedburner:origLink></item>
		<item>
		<title>Visualization of the Week: NYC Citi Bike use, in real time</title>
		<link>http://feedproxy.google.com/~r/oreilly/strata/~3/k1PmPK6ztI4/visualization-of-the-week-nyc-citi-bike-use-in-real-time.html</link>
		<comments>http://strata.oreilly.com/2013/06/visualization-of-the-week-nyc-citi-bike-use-in-real-time.html#comments</comments>
		<pubDate>Wed, 12 Jun 2013 16:00:46 +0000</pubDate>
		<dc:creator>Jenn Webb</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[bike share]]></category>
		<category><![CDATA[Citi Bike]]></category>
		<category><![CDATA[city data]]></category>
		<category><![CDATA[transportation data]]></category>

		<guid isPermaLink="false">http://strata.oreilly.com/?p=58028</guid>
		<description><![CDATA[New York City&#8217;s new bike-share program, Citi Bike, has been underway for a couple of weeks now. Its level of success is still up for debate, but the stats are impressive: as of June 10, there had been 173,516 trips &#8230; ]]></description>
				<content:encoded><![CDATA[<p>New York City&#8217;s new bike-share program, Citi Bike, has been underway for a couple of weeks now. Its level of success <a href="http://www.nytimes.com/2013/06/12/nyregion/two-weeks-in-riders-and-errors-for-bike-share-effort.html?partner=rss&amp;emc=rss&amp;smid=tw-nytimes&amp;_r=0">is still up for debate</a>, but <a href="http://citibikenyc.com/blog/2013/06">the stats are impressive</a>: as of June 10, there had been 173,516 trips traveled over 510,782 miles since the launch. Oliver O&#8217;Brien, a researcher and software developer at the Centre for Advanced Spatial Analysis (CASA), and a contributor to OpenStreetMap, has developed <a href="http://bikes.oobrien.com/newyork/">a visualization of bike share use in real time</a>.</p>
<div id="attachment_58029" class="wp-caption aligncenter" style="width: 610px"><a href="http://bikes.oobrien.com/newyork/"><img src="http://s.radar.oreilly.com/wp-files/5/2013/06/BikeShareMap1.png" alt="BikeShareMap1" width="600" height="319" class="size-full wp-image-58029" /></a><p class="wp-caption-text"><em><a href="http://bikes.oobrien.com/newyork/">Click here for the full visualization.</a></em></p></div>
<p><span id="more-58028"></span></p>
<p>Users can zoom in and select a particular station to pull up detailed information on how many bikes and spaces are available. The screenshot below shows 32 bikes and 3 spaces available at Lafayette St. and Jersey St. at the time of writing.</p>
<div id="attachment_58030" class="wp-caption aligncenter" style="width: 610px"><a href="http://bikes.oobrien.com/newyork/"><img src="http://s.radar.oreilly.com/wp-files/5/2013/06/BikeShareMap2.png" alt="BikeShareMap2" width="600" height="319" class="size-full wp-image-58030" /></a><p class="wp-caption-text"><em><a href="http://bikes.oobrien.com/newyork/">Click here for the full visualization.</a></em></p></div>
<p>The bar across the top of the map also links to real-time bike share information in other cities around the world, including London, Toronto, Washington D.C., and Milan. A full list of all available cities can be viewed in the drop-down list in the upper right corner.</p>
<p>In <a href="http://oliverobrien.co.uk/bikesharemap/">a blog post</a>, O&#8217;Brien explains that the data is updated automatically every two to 10 minutes, noting that data generally comes from the bike share provider&#8217;s website or their official API, but he also gathers data from third-party data collectors, such as <a href="http://citybik.es/">citybik.es</a>.</p>
<p><em><a href="http://velojoy.com/2013/06/06/real-time-visualization-of-citi-bike/">Hat tip to Susi at Velojoy for highlighting O&#8217;Brien&#8217;s work.</a></em></p>
<p><strong>More visualizations:</strong></p>
<ul>
<li><a href="http://strata.oreilly.com/2013/06/visualization-of-the-week-stravinskys-the-rite-of-spring.html">Stravinsky&#8217;s &#8220;The Rite of Spring&#8221;</a></li>
<li><a href="http://strata.oreilly.com/2013/05/visualization-of-the-week-hospital-procedure-charges-across-the-u-s-compared.html">Hospital procedure charges across the U.S., compared</a></li>
<li><a href="http://strata.oreilly.com/2013/05/visualization-of-the-week-cia-rendition-flights-of-terror-suspects.html">CIA rendition flights of terror suspects</a></li>
<li><a href="http://strata.oreilly.com/2013/05/visualization-of-the-week-real-time-wikipedia-edits.html">Real-time Wikipedia edits</a></li>
<li><a href="http://strata.oreilly.com/2013/05/visualization-of-the-week-building-collapse-rescue-efforts.html">Building collapse rescue efforts</a></li>
</ul>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px;clear: both">
<p><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><img style="float: left;border: none;padding-right: 10px" alt="" src="http://cdn.oreilly.com/radar/images/promos/2013-strata-rx-london-ny.gif" /></a><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><strong>O&#8217;Reilly Strata Conference</strong></a> — Strata brings together the leading minds in data science and big data — decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.</p>
<p><a href="http://strataconf.com/rx2013?intcmp=il-strata-strx13-strata-blog-banner-148x178">Strata Rx Health Data Conference</a>: September 25-27 | Boston, MA<br />
<a href="http://strataconf.com/stratany2013?intcmp=il-strata-stny13-blog-promo">Strata + Hadoop World</a>: October 28-30 | New York, NY<br />
<a href="http://strataconf.com/strataeu2013/?intcmp=il-strata-steu13-blog-promo">Strata in London</a>: November 15-17 | London, England</p>
</div>
<img src="http://feeds.feedburner.com/~r/oreilly/strata/~4/k1PmPK6ztI4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://strata.oreilly.com/2013/06/visualization-of-the-week-nyc-citi-bike-use-in-real-time.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://strata.oreilly.com/2013/06/visualization-of-the-week-nyc-citi-bike-use-in-real-time.html</feedburner:origLink></item>
		<item>
		<title>The Future Is Graph Databases</title>
		<link>http://feedproxy.google.com/~r/oreilly/strata/~3/435UtWi_W-g/the-future-is-graph-databases.html</link>
		<comments>http://strata.oreilly.com/2013/06/the-future-is-graph-databases.html#comments</comments>
		<pubDate>Tue, 11 Jun 2013 13:00:56 +0000</pubDate>
		<dc:creator>Rachel Roumeliotis</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[graph databases]]></category>
		<category><![CDATA[Neo4j]]></category>
		<category><![CDATA[nosql]]></category>

		<guid isPermaLink="false">http://strata.oreilly.com/?p=57839</guid>
		<description><![CDATA[Emil Eifrem @emileifrem is the Founder of Neo4j and CEO of Neo Technology. He is also one of the authors of Graph Databases. Recently, I had the opportunity to sit down with Emil and we talked about the current and &#8230; ]]></description>
				<content:encoded><![CDATA[<p>Emil Eifrem <a href="https://twitter.com/emileifrem">@emileifrem</a> is the Founder of Neo4j and CEO of Neo Technology. He is also one of the authors of <a href="http://shop.oreilly.com/product/0636920028246.do">Graph Databases</a>. Recently, I had the opportunity to sit down with Emil and we talked about the current and future opportunities for graph databases.</p>
<p>Key highlights include:</p>
<ul>
<li>Emil explains graph databases [Discussed at <a href="http://www.youtube.com/watch?v=Dr0KfJXqMbs&amp;#t=0m29s">0:29</a>]</li>
<li><a href="https://www.facebook.com/about/graphsearch">Facebook Graph Search</a> is a well-known example of a graph database [Discussed at <a href="http://www.youtube.com/watch?v=Dr0KfJXqMbs&amp;#t=3m28s">3:28</a>]</li>
<li>But really, graph databases can be used more much more than social search [Discussed at <a href="http://www.youtube.com/watch?v=Dr0KfJXqMbs&amp;#t=4m350s">4:50</a>]</li>
<li>Neo4j, the original graph database [Discussed at <a href="http://www.youtube.com/watch?v=Dr0KfJXqMbs&amp;#t=5m25s">5:25</a>]</li>
<li>Graph databases &#8216;shape&#8217; data [Discussed at <a href="http://www.youtube.com/watch?v=Dr0KfJXqMbs&amp;#t=6m20s">6:20</a>]</li>
</ul>
<p>You can view the full interview here:</p>
<p><iframe width="640" height="360" src="http://www.youtube.com/embed/Dr0KfJXqMbs?feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>This <a href="http://programming.oreilly.com/?p=60330">post</a> was originally published on O&#8217;Reilly&#8217;s Programming <a href="http://programming.oreilly.com">blog</a>.</p>
<p><span id="more-57839"></span></p>
<p><strong>Related:</strong></p>
<ul>
<li><a href="https://itunes.apple.com/us/podcast/oreilly-medias-programming-podcast/id520292841">Subscribe to the free programming podcast through iTunes</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/oreilly/strata/~4/435UtWi_W-g" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://strata.oreilly.com/2013/06/the-future-is-graph-databases.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://strata.oreilly.com/2013/06/the-future-is-graph-databases.html</feedburner:origLink></item>
		<item>
		<title>Ways to put the patient first when collecting health data</title>
		<link>http://feedproxy.google.com/~r/oreilly/strata/~3/tB8RTynpmeQ/ways-to-put-the-patient-first-when-collecting-health-data.html</link>
		<comments>http://strata.oreilly.com/2013/06/ways-to-put-the-patient-first-when-collecting-health-data.html#comments</comments>
		<pubDate>Mon, 10 Jun 2013 21:05:09 +0000</pubDate>
		<dc:creator>Andy Oram</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[data segmentation]]></category>
		<category><![CDATA[health care]]></category>
		<category><![CDATA[health IT]]></category>
		<category><![CDATA[Health Privacy Summit]]></category>
		<category><![CDATA[medical]]></category>
		<category><![CDATA[open data]]></category>
		<category><![CDATA[patient engagement]]></category>
		<category><![CDATA[patient-centered medicine]]></category>
		<category><![CDATA[theDataMap]]></category>

		<guid isPermaLink="false">http://strata.oreilly.com/?p=58020</guid>
		<description><![CDATA[The timing was superb for last week&#8217;s Health Privacy Summit, held on June 5 and 6 in Washington, DC. First, it immediately followed the 2000-strong Health Data Forum (Health Datapalooza), where concern for patients rights came up repeatedly. Secondly, scandals &#8230; ]]></description>
				<content:encoded><![CDATA[<p>The timing was superb for last week&#8217;s <a>Health Privacy Summit</a>, held on June 5 and 6 in Washington, DC. First, it immediately followed the 2000-strong <a href="http://healthdatapalooza.org/">Health Data Forum (Health Datapalooza)</a>, where concern for patients rights came up repeatedly. Secondly, <a href="http://www.guardian.co.uk/world/2013/jun/06/us-tech-giants-nsa-data">scandals about US government spying</a> were breaking out and providing a good backdrop for talking about protection our most sensitive personal information&#8211;our health data.</p>
<p>The health privacy summit, now in its third year, provides a crucial spotlight on the worries patients and their doctors have about their data. Did you know that two out of three doctors (and probably more&#8211;this statistic cites just the ones who admit to it on a survey) have left data out of a patient&#8217;s record upon the patient&#8217;s request? I have found that the summit reveals the most sophisticated and realistic assessment of data protection in health care available, which is why I look forward to it each year. (I&#8217;m also on the planning committee for the summit.) For instance, it took a harder look than most observers at how health care would be affected by patient access to data, and the practice of sharing selected subsets of data, called segmentation.</p>
<h3>What effect would patient access have?</h3>
<p>An odd perceptual discontinuity exists around patient access to health records. If you go to your doctor and ask to see your records, chances are you will be turned down outright or forced to go through expensive and frustrating magical passes. One wouldn&#8217;t know that HIPAA explicitly required doctors long ago to give patients their data, or that the most recent meaningful use rules from the Department of Health and Human Services require doctors to let patients view, download, and transmit their information within four business days of its addition to the record.</p>
<p><span id="more-58020"></span></p>
<p>At both the Datapalooza and the privacy summit, patient access was not universally acclaimed but considered a minimum requirement for improving health care. It allows the patient to find critical errors, simplifies the sharing with other doctors who are treating the patient, and helps the patient comply with medical advice. Old complaints that patients don&#8217;t care or might not understand the contents of the record no longer hold water. Law professor Barbara Evans pointed out that, &#8220;To police privacy, you have to know what&#8217;s in your records.&#8221; In fact, four days is seen as unreasonably long to withhold information, given how quickly medical conditions can become dangerous.</p>
<p>Read access is one thing, but write access raises its own concerns&#8211;actually, the same old concerns with a new urgency. Doctors worry that they cannot trust data from patients, and might be held liable for decisions they make based on wrong data. The question of provenance thus requires electronic records to contain fields that indicate the source of data.</p>
<p>Write access by patients cannot be a matter for compromise. Fitness devices and other medical advances are opening up huge new sources of data that can save lives. For instance, one can wear a device that monitors one&#8217;s heart rate or blood pressure and sends information to the medical record at regular intervals. There are probably encryption and signing techniques that can assure doctors the patient has not falsified results. The bottleneck to deploying these technologies is the medical record itself, which is rarely set up to accept such data. The doctor must also learn how to use it.</p>
<p>A still higher stage of patient control would take the records away from the doctors altogether and store them with the patient. This is eminently logical&#8211;why should you have to go to a dozen different providers to gather a few years&#8217; worth of records? Separate health information exchanges&#8211;organizations that charge doctors for exchanging data and, even so, have trouble surviving financially&#8211;would evolve into more useful services. Patient control over records would also give patients some control over the <a href="http://thedatamap.org/">dizzying array of places their data goes, often without their consent</a>.</p>
<p>Skeptical doctors point out how little interest their patients (especially the ones who need to show the most interest) take in their health. Moving data to patients will start among a small, educated cohort, but could be widespread if the medical field seriously promoted it, and if a cloud solution like <a href="https://www.healthvault.com/">Microsoft HealthVault</a> made it easy.</p>
<p>Cloud computing was discussed at the summit. Lie most aspects of health care, panelists insisted on more transparency there, such as knowing what company runs the underlying storage network, but their general view seemed positive.</p>
<h3>Segmentation&#8211;still a lot of mire</h3>
<p>Ideally, a health provider dealing with a broken bone or a cut would not be prejudiced to learn of your sexual orientation, mental health, or drug habits. But in many places, patients have to protect themselves. They also may choose to participate in research studies without releasing certain sensitive information.</p>
<p>Therefore, one of phantoms being chased by privacy advocates is <em>data segmentation</em>, the ability of a patient to share data with certain people while withholding it from others. Segmentation even made it into a major <a href="http://www.whitehouse.gov/sites/default/files/microsites/ostp/pcast-health-it-report.pdf">report by the President&#8217;s Council of Advisors on Science and Technology</a> on health data.</p>
<p>Scads of problems are raised by segmentation, not so much technical as procedural, with some policy thrown in.</p>
<p>For instance, patients have trouble telling what might reveal a condition they want hidden. If an HIV patient reveals the list of medications he&#8217;s taking, a knowledgeable outsider can easily tell he&#8217;s HIV-positive. Even a note in the record that a patient is sensitive to some medication can reveal the condition that makes him sensitive. Free text, which is a large part of medical records, is difficult to handle in segmentation.</p>
<p>Although we don&#8217;t seem to have corralled this beast yet, many advances were described at the summit. Pilot systems can tag data with the patient&#8217;s preferences. When requests for data come in, these systems hide any data that the patient has restricted and that the requester is not supposed to see. One such system analyzes the record to remove all information that could inadvertently reveal the hidden diagnosis. When there&#8217;s a sensitive condition, the system usually omits all free text.</p>
<p>As many people pointed out at the summit, segmentation is an imperfect solution to the stigma attached to many medical conditions. Different diseases have been frowned on at different stages in history, and public education has alleviated some of the shame. Such education is needed so that people don&#8217;t have to be afraid about sharing their mental health, or other such conditions.</p>
<img src="http://feeds.feedburner.com/~r/oreilly/strata/~4/tB8RTynpmeQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://strata.oreilly.com/2013/06/ways-to-put-the-patient-first-when-collecting-health-data.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://strata.oreilly.com/2013/06/ways-to-put-the-patient-first-when-collecting-health-data.html</feedburner:origLink></item>
		<item>
		<title>Big data vs. big reality</title>
		<link>http://feedproxy.google.com/~r/oreilly/strata/~3/yVGMfEHXUHQ/big-data-vs-big-reality.html</link>
		<comments>http://strata.oreilly.com/2013/06/big-data-vs-big-reality.html#comments</comments>
		<pubDate>Mon, 10 Jun 2013 16:00:18 +0000</pubDate>
		<dc:creator>Mike Barlow</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data impact]]></category>

		<guid isPermaLink="false">http://strata.oreilly.com/?p=57826</guid>
		<description><![CDATA[This post originally appeared on Cumulus Partners. It&#8217;s republished with permission. Quentin Hardy&#8217;s recent post in the Bits blog of The New York Times touched on the gap between representation and reality that is a core element of practically every &#8230; ]]></description>
				<content:encoded><![CDATA[<p><em>This post originally appeared <a href="http://www.cumuluspartners.com/cumulus/post/2013/06/02/Big-data-vs-big-reality.aspx">on Cumulus Partners</a>. It&#8217;s republished with permission.</em></p>
<p>Quentin Hardy&#8217;s recent post in the Bits blog of The New York Times touched on the gap between representation and reality that is a core element of practically every human enterprise. His post is titled &#8220;<a href="http://bits.blogs.nytimes.com/2013/06/01/why-big-data-is-not-truth/?emc=eta1">Why Big Data is Not Truth</a>,&#8221; and I recommend it for anyone who feels like joining the phony argument over whether &#8220;big data&#8221; represents reality better than traditional data.</p>
<p>In a nutshell, this &#8220;us&#8221; versus &#8220;them&#8221; approach is like trying to poke a fight between oil painters and water colorists. Neither oil painting nor water colors are &#8220;truth&#8221;; both are forms of representation. And here&#8217;s the important part: Representation is exactly that — a representation or interpretation of someone&#8217;s perceived reality. Pitting &#8220;big data&#8221; against traditional data is like asking you if Rembrandt is more &#8220;real&#8221; than Gainsborough. Both of them are artists and both painted representations of the world they perceived around them.</p>
<p><span id="more-57826"></span></p>
<p>The problem with false arguments like the one posed by Hardy is that they obscure the value of data — traditional data and big data — and the impact of data on our culture. I&#8217;m now working my way through <em><a href="http://www.amazon.com/Raw-Data-Oxymoron-Infrastructures/dp/0262518287/">&#8220;Raw Data&#8221; is an Oxymoron</a></em>, an anthology of short essays about data. I recommend it for anyone who is seriously interested in thinking about the many ways in which data has influenced (and continues influencing) our lives. I especially recommend &#8220;<span style="text-decoration: underline">facts</span> and FACTS: Abolitionists&#8217; Database Innovations,&#8221; by Ellen Gruber Garvey. As its title suggests, the essay focuses on what proves to be an absolutely fascinating period of U.S. history in which the anti-slavery movement harvested data from real advertisements in Southern newspapers to paint a vivid and believable picture of the routine horrors inflicted by the slave system on real human beings.</p>
<p>That 19th century use of data mining built support for the anti-slavery movement, both in the U.S. and in England. The data played a key role in making the case for abolishing slavery — even though it required the bloodiest war in U.S. history to make abolition a fact.</p>
<p>Data itself has no quality. It&#8217;s what you do with it that counts.</p>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px;clear: both"><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><img style="float: left;border: none;padding-right: 10px" alt="" src="http://cdn.oreilly.com/radar/images/promos/2013-strata-rx-london-ny.gif" /></a><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><strong>O&#8217;Reilly Strata Conference</strong></a> — Strata brings together the leading minds in data science and big data — decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.<a href="http://strataconf.com/rx2013?intcmp=il-strata-strx13-strata-blog-banner-148x178">Strata Rx Health Data Conference</a>: September 25-27 | Boston, MA<br />
<a href="http://strataconf.com/stratany2013?intcmp=il-strata-stny13-blog-promo">Strata + Hadoop World</a>: October 28-30 | New York, NY<br />
<a href="http://strataconf.com/strataeu2013/?intcmp=il-strata-steu13-blog-promo">Strata in London</a>: November 15-17 | London, England</p>
</div>
<img src="http://feeds.feedburner.com/~r/oreilly/strata/~4/yVGMfEHXUHQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://strata.oreilly.com/2013/06/big-data-vs-big-reality.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://strata.oreilly.com/2013/06/big-data-vs-big-reality.html</feedburner:origLink></item>
		<item>
		<title>It’s getting easier to build Big Data applications</title>
		<link>http://feedproxy.google.com/~r/oreilly/strata/~3/PH5r_kSc_XE/its-getting-easier-to-build-big-data-applications.html</link>
		<comments>http://strata.oreilly.com/2013/06/its-getting-easier-to-build-big-data-applications.html#comments</comments>
		<pubDate>Sun, 09 Jun 2013 16:00:04 +0000</pubDate>
		<dc:creator>Ben Lorica</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[bdas]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Hadoop ecosystem]]></category>
		<category><![CDATA[Hadoop query]]></category>
		<category><![CDATA[Hadoop World]]></category>
		<category><![CDATA[Strata Hadoop World]]></category>

		<guid isPermaLink="false">http://strata.oreilly.com/?p=57922</guid>
		<description><![CDATA[Hadoop&#8217;s low-cost, scale-out architecture has made it a new platform for data storage. With a storage system in place, the Hadoop community is slowly building a collection of open source, analytic engines. Beginning with batch processing (MapReduce, Pig, Hive), Cloudera &#8230; ]]></description>
				<content:encoded><![CDATA[<p>Hadoop&#8217;s low-cost, scale-out architecture has made it a new platform for data storage. With a storage system in place, the Hadoop community is slowly building a collection of open source, analytic engines. Beginning with batch processing (MapReduce, Pig, Hive), Cloudera has added interactive SQL (<a href="https://github.com/cloudera/impala">Impala</a>), analytics (<a href="http://blog.cloudera.com/blog/2013/03/cloudera_ml_data_science_tools/">Cloudera ML</a> + a <a href="http://www.cloudera.com/content/cloudera/en/solutions/partner/SAS.html">partnership with SAS</a>), and as of early this week, <a href="http://www.cloudera.com/content/cloudera/en/campaign/introducing-search.html">real-time search</a>. The economics that led to Hadoop dominating batch processing is permeating other types of analytics.</p>
<p>Another collection of open source, Hadoop-compatible analytic engines, the <a href="https://amplab.cs.berkeley.edu/bdas/">Berkeley Data Analytics Stack</a> (<a href="http://strata.oreilly.com/tag/bdas">BDAS</a>), is being built just across the San Francisco Bay. Starting with a batch-processing framework that&#8217;s faster than MapReduce (<a href="http://spark-project.org/">Spark</a>), it now includes interactive SQL (<a href="http://shark.cs.berkeley.edu/">Shark</a>), and real-time analytics (<a href="http://spark-project.org/docs/latest/streaming-programming-guide.html">Spark Streaming</a>). Sometime this summer, frameworks for machine-learning (<a href="http://www.mlbase.org/">MLbase</a>) and graph analytics (<a href="http://strata.oreilly.com/2013/05/improving-options-for-unlocking-your-graph-data.html">GraphX</a>) will be released. A cluster manager (<a href="http://incubator.apache.org/mesos/">Mesos</a>) and an in-memory file system (<a href="http://strata.oreilly.com/2013/04/tachyon-open-source-distributed-fault-tolerant-in-memory-file-system.html">Tachyon</a>) allow users of other analytic frameworks to leverage the BDAS platform. (The <a href="http://strata.oreilly.com/2013/03/python-data-tools-just-keep-getting-better.html">Python data community</a> is looking at Tachyon closely.)</p>
<p><span id="more-57922"></span></p>
<p><b>Next up: Applications</b><br />
Many developers aren&#8217;t familiar with the intricacies of deploying, managing<sup>1</sup>, and tuning distributed systems. The good news is that as the infrastructure gets simpler, companies can start focusing on building interesting <i>applications</i>. I&#8217;m starting to hear of many more researchers and startups building interesting solutions on top of one of these integrated <em>platforms</em> (BDAS, Cloudera, and other Hadoop distributions).</p>
<p>One can create Big Data applications by cobbling together different (&#8220;best-of-breed&#8221;) systems, but it&#8217;s usually<sup>2</sup> much easier to use engines built on top of the same platform. There&#8217;s a tradeoff: in many cases it&#8217;s hard<sup>3</sup> to beat highly optimized/targeted solutions. It&#8217;s  easier to use an integrated stack, but you likely have to sacrifice <i>a little bit</i> of performance in exchange. I suspect that in many use cases, the performance of integrated platforms will be &#8220;good enough&#8221;, and  convenience will trump performance. Over time, analytic engines built on top of BDAS and Hadoop will improve and the performance gap will narrow further.</p>
<p>A good place to learn more about interesting Big Data applications (and how they&#8217;re built), is at the combined <a href="http://strataconf.com/stratany2013">Hadoop World + Strata conference</a> in NYC this October.</p>
<p><strong>Related posts</strong>:
<li><a href="http://strata.oreilly.com/2013/03/data-science-tools-all-in-or-mix-and-match.html">Data Science tools: Are you “all in” or do you “mix and match”?</a></li>
<li><a href="http://strata.oreilly.com/2013/06/tracking-the-progress-of-large-scale-query-engines.html">Tracking the progress of large-scale Query Engines</a></li>
<li><a href="http://strata.oreilly.com/2013/04/workflow-tools-enable-the-rapid-deployment-of-models.html">Simpler workflow tools enable the rapid deployment of models</a></li>
<li><a href="http://strata.oreilly.com/2013/04/tachyon-open-source-distributed-fault-tolerant-in-memory-file-system.html">Tachyon: An open source, distributed, fault-tolerant, in-memory file system</a></li>
<hr /><small><br />
(1) The Hadoop community is doing a good job on this front with <a href="http://incubator.apache.org/ambari/">Ambari</a>, <a href="http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html">Cloudera Manager</a> and other tools.<br />
(2) If you&#8217;re willing to use their services, cloud platforms like <a href="http://www.infochimps.com/infochimps-cloud/how-it-works/">Infochimps</a>, Amazon, Google, and Microsoft are starting to make it easier to assemble different systems. In addition, there are companies like <a href="http://www.datastax.com/what-we-offer/products-services/datastax-enterprise">Datastax that integrate different systems</a> in their offerings.<br />
(3) Recent examples of this performance vs. convenience tradeoff: <a href="http://strata.oreilly.com/2013/06/tracking-the-progress-of-large-scale-query-engines.html">interactive (Hadoop) query engines vs. MPP databases</a>, and <a href="http://strata.oreilly.com/2013/05/improving-options-for-unlocking-your-graph-data.html">GraphX vs. GraphLab</a>.<br />
</small></p>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px;clear: both"><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><img style="float: left;border: none;padding-right: 10px" src="http://cdn.oreilly.com/radar/images/promos/2013-strata-rx-london-ny.gif" /></a><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><strong>O&#8217;Reilly Strata Conference</strong></a> &mdash; Strata brings together the leading minds in data science and big data &mdash; decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.</p>
<p> <a href="http://strataconf.com/rx2013?intcmp=il-strata-strx13-strata-blog-banner-148x178">Strata Rx Health Data Conference</a>: September 25-27 |  Boston, MA<br /> <a href="http://strataconf.com/stratany2013?intcmp=il-strata-stny13-blog-promo">Strata + Hadoop World</a>: October 28-30 |  New York, NY<br /><a href="http://strataconf.com/strataeu2013/?intcmp=il-strata-steu13-blog-promo">Strata in London</a>: November 15-17  |  London, England </div>
<img src="http://feeds.feedburner.com/~r/oreilly/strata/~4/PH5r_kSc_XE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://strata.oreilly.com/2013/06/its-getting-easier-to-build-big-data-applications.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://strata.oreilly.com/2013/06/its-getting-easier-to-build-big-data-applications.html</feedburner:origLink></item>
		<item>
		<title>Strata Week: Wireless body area networks bring humans into the Internet of Things</title>
		<link>http://feedproxy.google.com/~r/oreilly/strata/~3/C9q7R-ZpOWk/wireless-body-area-network-iot-authenticators-nsa-surveillance.html</link>
		<comments>http://strata.oreilly.com/2013/06/wireless-body-area-network-iot-authenticators-nsa-surveillance.html#comments</comments>
		<pubDate>Fri, 07 Jun 2013 19:00:18 +0000</pubDate>
		<dc:creator>Jenn Webb</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[data privacy]]></category>
		<category><![CDATA[government data]]></category>
		<category><![CDATA[government surveillance]]></category>
		<category><![CDATA[Internet of Things]]></category>
		<category><![CDATA[NSA]]></category>
		<category><![CDATA[temporal cloak]]></category>
		<category><![CDATA[wireless body area network]]></category>

		<guid isPermaLink="false">http://strata.oreilly.com/?p=57993</guid>
		<description><![CDATA[Collaborative sensor networks of humans, and your body may be the next two-factor authenticator There has been much coverage recently of the Internet of Things, connecting everything from washers and dryers to thermostats to cars to the Internet. Wearable sensors &#8230; ]]></description>
				<content:encoded><![CDATA[<h2 id="human-sensors">Collaborative sensor networks of humans, and your body may be the next two-factor authenticator</h2>
<p>There has been much coverage recently of the <a href="http://gigaom.com/2011/10/13/internet-of-things-will-have-24-billion-devices-by-2020/">Internet of Things</a>, connecting everything from washers and dryers to thermostats to cars to the Internet. Wearable sensors — things like FitBit and health-care-related sensors that can be <a href="http://www.fiercemobilehealthcare.com/story/wearable-sensors-low-cost-solution-remote-monitoring-rehab/2012-12-15">printed onto fabric</a> or even <a href="http://www.technologyreview.com/news/512061/electronic-sensors-printed-directly-on-the-skin/">onto human skin</a> — are also in the spotlight.</p>
<p><a href="http://gigaom.com/2013/06/03/how-you-and-i-could-become-nodes-in-the-internet-of-things/">Kevin Fitchard reports at GigaOm</a> that researchers at CEA-Leti and three French universities believe these areas are not mutually exclusive and have launched a project around wireless body area networks called <a href="http://www.azosensors.com/news.aspx?newsID=6031">CORMORAN</a>. The group believes that one day soon our bodies will be constantly connected to the Internet via sensors and transmitters that &#8220;can be used to form cooperative ad hoc networks that could be used for group indoor navigation, crowd-motion capture, health monitoring on a massive scale and especially collaborative communications,&#8221; Fitchard writes. He takes a look at some of the benefits and potential applications of such a collaborative network — location-based services would be able to direct users to proper gates or trains in busy airports and train stations, for instance — and some of the pitfalls, such as potential security and privacy issues. You can read his full report <a href="http://gigaom.com/2013/06/03/how-you-and-i-could-become-nodes-in-the-internet-of-things/">at GigaOm</a>.</p>
<p>In related news, wearable sensors — and even our bodies — may not only be used to connect us to a network, but also to identify us as well.<span id="more-57993"></span> <a href="http://arstechnica.com/security/2013/05/someday-you-may-ditch-your-two-factor-authenticator-for-an-electronic-tattoo/">Casey Johnston at Ars Technica reports</a> this week on two password protection systems that Motorola presented at All Things Digital&#8217;s D11 conference: electronic tattoos and consumable pills that would replace the security tokens from the two-factor authenticator.</p>
<p>The tattoos, Johnston reports, aren&#8217;t branded onto skin, but are more like flexible stickers, &#8220;islands of high-performance silicon connected by accordion-like structures,&#8221; that move with a person&#8217;s body and remain adhered to the skin. Johnston does point out, though, that the long-term plan likely is to embed the silicon and wires into the skin &#8220;to make the user a proper bionic human.&#8221;</p>
<p>The pills are just that — a pill that the user swallows that &#8220;turns one&#8217;s entire body into an authenticator,&#8221; Johnston writes. Regina Dugan, senior vice president of the Advanced Technology and Projects group at Motorola Mobility explained that the pills use stomach acid to power a switch to &#8220;[create] an 18-bit EKG-like symbol in your body, and your body becomes the authenticator.&#8221; You can read Johnston&#8217;s full report <a href="http://arstechnica.com/author/caseyjohnston/">at Ars Technica</a>.</p>
<h2 id="NSA-spies">The NSA is spying on us</h2>
<p>Headline news this week surrounded the U.S. National Security Agency. First, <a href="http://www.guardian.co.uk/world/2013/jun/06/nsa-phone-records-verizon-court-order">Glenn Greenwald posted a report at The Guardian</a> publishing a leaked copy of <a href="http://www.guardian.co.uk/world/interactive/2013/jun/06/verizon-telephone-data-court-order">a court order</a> that &#8220;requires Verizon on an &#8216;ongoing, daily basis&#8217; to give the NSA information on all telephone calls in its systems, both within the US and between the US and other countries.&#8221;</p>
<p>The order covers &#8220;all call detail records or &#8216;telephony metadata,&#8217;&#8221; Greenwald reports, including telephone numbers, location data, call duration and time of call, and unique identifiers — but it doesn&#8217;t cover conversation content. Because the information is classified as &#8220;metadata,&#8221; or transactional information, as opposed to &#8220;communications,&#8221; it does not require a warrant, Greenwald explains.</p>
<p><a href="http://www.newyorker.com/online/blogs/newsdesk/2013/06/verizon-nsa-metadata-surveillance-problem.html">Jane Mayer at the New Yorker</a> took a close look at this &#8220;metadata&#8221; to see just how bad it could be. Susan Landau, a mathematician and former Sun Microsystems engineer, told her that people don&#8217;t understand — &#8220;[metadata] is more intrusive than content.&#8221; With the amount of information that can be gathered, she explained, &#8220;you know exactly what is happening — you don&#8217;t need the content.&#8221; You can read Mayer&#8217;s in-depth report <a href="http://www.newyorker.com/online/blogs/newsdesk/2013/06/verizon-nsa-metadata-surveillance-problem.html">at the New Yorker</a>.</p>
<p><a href="http://www.theatlantic.com/technology/archive/2013/06/government-phone-surveillance-for-dummies/276629/">Megan Garber at The Atlantic</a> put together a FAQ on the situation, covering exactly what data is being gathered, who has access to it, what they&#8217;re doing with it (&#8220;as far as we know&#8221;), why it doesn&#8217;t violate the Fourth Amendment, if the surveillance is limited to Verizon customers, and much more. You can read the FAQ <a href="http://www.theatlantic.com/technology/archive/2013/06/government-phone-surveillance-for-dummies/276629/">at The Atlantic.</a></p>
<p>An additional NSA surveillance program came to light this week as well. <a href="http://www.washingtonpost.com/www.washingtonpost.com/investigations/us-intelligence-mining-data-from-nine-us-internet-companies-in-broad-secret-program/2013/06/06/3a0c0da8-cebf-11e2-8845-d970ccb04497_story.html">Glenn Greenwald and Ewen MacAskill at The Guardian</a> report they received a 41-slide PowerPoint presentation describing a program called PRISM that allows the NSA direct access to the servers at nine U.S. Internet companies: Microsoft, Yahoo, Google Facebook, PalTalk, YouTube, Skype, AOL and Apple, and according to Greenwald&#8217;s and MacAskill&#8217;s report, &#8220;[t]he program is continuing to expand, with other providers due to come online.&#8221; According to their report Dropbox is &#8220;described as &#8216;coming soon.&#8217;&#8221;Harrison Weber</p>
<p><a href="http://newsfeed.time.com/2013/06/06/prism-by-the-numbers-a-guide-to-the-governments-secret-internet-data-mining-program/">Victor Luckerson at Time</a> broke down the program particulars, noting that the program was established in 2007 with Microsoft and that the program allows NSA officials direct access to 10 types of data: email messages, instant messages, videos, photos, stored data, voice chats, file transfers, video conferences, log-in times and profile details on social networks.</p>
<p>In an <a href="http://www.washingtonpost.com/investigations/us-intelligence-mining-data-from-nine-us-internet-companies-in-broad-secret-program/2013/06/06/3a0c0da8-cebf-11e2-8845-d970ccb04497_story.html">in-depth report at the Washington Post</a>, Barton Gellman and Laura Poitras quote from a statement issued by Director of National Intelligence James R. Clapper: &#8220;information collected under this program is among the most important and valuable foreign intelligence information we collect, and is used to protect our nation from a wide variety of threats. The unauthorized disclosure of information about this important and entirely legal program is reprehensible and risks important protections for the security of Americans.&#8221; The Washington Post also <a href="http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/">published a selection of the slides</a> with annotations.</p>
<p><a href="http://www.washingtonpost.com/politics/obama-defends-sweeping-surveillance-efforts/2013/06/07/2002290a-cf88-11e2-9f1a-1a7cdee20287_story.html">Philip Rucker, Sean Sullivan and and Aaron Blake report at the Washington Post</a> that President Obama defended the program, saying that such programs &#8220;make a difference in our capacity to anticipate and prevent possible terrorist activity&#8221; and that &#8220;[y]ou can&#8217;t have 100 percent security and also then have 100 percent privacy and zero inconvenience.&#8221;</p>
<p><a href="http://thenextweb.com/us/2013/06/07/facebook-apple-google-microsoft-dropbox-and-yahoo-deny-participation-in-us-government-spying-program-prism/">Harrison Weber at The Next Web</a> notes (as do the reports at <a href="http://www.washingtonpost.com/investigations/us-intelligence-mining-data-from-nine-us-internet-companies-in-broad-secret-program/2013/06/06/3a0c0da8-cebf-11e2-8845-d970ccb04497_story_2.html">the Washington Post</a> and <a href="http://www.guardian.co.uk/world/2013/jun/06/us-tech-giants-nsa-data">The Guardian</a>) that the major Internet companies named as participants have denied involvement in and knowledge of the PRISM program. Weber points out that the NSA wouldn&#8217;t have given the Internet companies the program&#8217;s name, and that &#8220;those who receive national security letters are prevented from discussing their existence by law.&#8221;</p>
<h2 id="temporal-cloak">Temporal cloaks hide telecommunication data</h2>
<p><a href="http://www.scientificamerican.com/article.cfm?id=temporal-cloak-erases-data-from-history">Zeeya Merali reports at Scientific American</a> that ultrasecure communications may soon be at hand via a temporal cloak. &#8220;Electrical engineers have used lasers to create a cloak that can hide communications in a &#8216;time hole&#8217;,&#8221; Merali writes, &#8220;so that it seems as if they were never sent.&#8221; The first working temporal cloak was built last year by a team at Cornell, but the cloaking windows opened too rarely to hide data at telecommunication speeds. Merali describes how Joseph Lukens, an electrical engineer at Purdue University and author of this latest study, solved the problem:</p>
<blockquote><p>&#8220;To speed up the cloaking rate, Lukens and his colleagues exploited a wave phenomenon that was first discovered by British inventor Henry Fox Talbot in 1836. When a light wave passes through a series of parallel slits called a diffraction grating, it splits apart. The rays emanating from the slits combine on the other side to create an intricate interference pattern of peaks and troughs. Talbot discovered that this pattern repeats at regular intervals, creating what is now known as a Talbot carpet. There is also a temporal version of this effect in which you manipulate light over time to generate regular periods with zero light intensity, says Lukens. Data can be then be hidden in these holes in time.&#8221;</p></blockquote>
<p>When testing the cloak, the research team determined the cloak was able to hide data added at a rate of 12.7 gigabits per second. You can read more about the research and watch a video showing how it works <a href="http://www.scientificamerican.com/article.cfm?id=temporal-cloak-erases-data-from-history">at Scientific American</a>. You can also access the team&#8217;s published paper <a href="http://www.nature.com/nature/journal/vaop/ncurrent/full/nature12224.html">at Nature</a>.</p>
<h2>Tip us off</h2>
<p>News tips and suggestions are always welcome, so please send them <a href="pitchstrata@oreilly.com ">along</a>.</p>
<p><strong>Related:</strong></p>
<ul>
<li><a href="http://strata.oreilly.com/2013/05/industrial-internet-internet-of-things-cars-as-platforms-learn-to-code.html">The power of the Internet, wielded by machines and things</a></li>
<li><a href="http://oreilly.com/radarreports/industrial-internet.csp">Industrial Internet, the machines are talking</a> (Free Radar Report)</li>
<li><a href="http://strata.oreilly.com/2013/04/fbi-stingray-irs-email-liquid-robotics-data-driven-campaigns.html">Court case sheds light on FBI stingray surveillance</a></li>
<li><a href="http://strata.oreilly.com/2013/03/data-brokers-sensors-search-data-big-brother.html">Data brokers know more about us than we know</a></li>
<li><a href="http://strata.oreilly.com/tag/strata-week">More Strata Week coverage</a></li>
</ul>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px;clear: both"><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><img style="float: left;border: none;padding-right: 10px" alt="" src="http://cdn.oreilly.com/radar/images/promos/2013-strata-rx-london-ny.gif" /></a><a href="http://strataconf.com/?intcmp=il-strata-stny13-blog-promo"><strong>O&#8217;Reilly Strata Conference</strong></a> — Strata brings together the leading minds in data science and big data — decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.<a href="http://strataconf.com/rx2013?intcmp=il-strata-strx13-strata-blog-banner-148x178">Strata Rx Health Data Conference</a>: September 25-27 | Boston, MA<br />
<a href="http://strataconf.com/stratany2013?intcmp=il-strata-stny13-blog-promo">Strata + Hadoop World</a>: October 28-30 | New York, NY<br />
<a href="http://strataconf.com/strataeu2013/?intcmp=il-strata-steu13-blog-promo">Strata in London</a>: November 15-17 | London, England</p>
</div>
<img src="http://feeds.feedburner.com/~r/oreilly/strata/~4/C9q7R-ZpOWk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://strata.oreilly.com/2013/06/wireless-body-area-network-iot-authenticators-nsa-surveillance.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://strata.oreilly.com/2013/06/wireless-body-area-network-iot-authenticators-nsa-surveillance.html</feedburner:origLink></item>
	</channel>
</rss>
