<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><title>Sunlight Labs blog</title><link>http://sunlightlabs.com/blog/</link><description>Latest blog updates from the nerds at Sunlight Labs</description><language>en-us</language><lastBuildDate>Wed, 10 Mar 2010 09:46:28 -0500</lastBuildDate><ttl>120</ttl><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/sunlightlabs/blog" /><feedburner:info uri="sunlightlabs/blog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item><title>Quantifying Data Quality</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/V9K32el_hEI/</link><description>You've already heard me &lt;a href="http://www.sunlightlabs.com/blog/2010/data-quality-deserves-to-be-tackled-on-its-own/"&gt;complain about data quality&lt;/a&gt; -- how it's a bigger problem than most people realize, and a harder problem than many people hope.  But let's not leave it there!  Perfect datasets mostly exist in textbooks and computer simulations.  We need to figure out what we can do with what we have.  In this and other posts, I hope to give the developers in our community some idea of how they can deal with less-than-perfect data.&lt;br/&gt;&lt;br/&gt;

The first step is to figure out how bad things actually are.  To do that, we'll use some simple statistics -- those of you with a strong stat background can skip to the next entry in your RSS reader (or better yet, correct my mistakes in comments).&lt;br/&gt;&lt;br/&gt;

The GAO provides a good example of how to tackle this kind of problem.  They were asked to examine government spending on the nonprofit sector -- a process that ultimately led to &lt;a href="http://www.gao.gov/new.items/d09193.pdf"&gt;this report (PDF)&lt;/a&gt;.  As you might imagine, there are a number of ways that federal dollars make their way to nonprofits, from loan guarantees to tax breaks to medicare payments to nonprofit hospitals. For the most part, each of these is tracked through a distinct system.&lt;br/&gt;&lt;br/&gt;

Let's confine our work to one of the systems that GAO examined: the Federal Award and Assistance Data System, or FAADS.  Along with FPDS, this makes up one half of the data powering &lt;a href="http://www.usaspending.gov"&gt;USASpending.gov&lt;/a&gt;. FAADS tracks grant payments (and some other things that we'll ignore for now).  Let's just examine that question: how do we figure out how much federal grant money went to nonprofits?&lt;br/&gt;&lt;br/&gt;

First we should define the subset of FAADS that deals with the nonprofit sector.  If we can do that, and there aren't any other problems, then we should be able to just sum up the records and figure out the totals.  There's a fairly obvious way to do this: FAADS records have a "recipient type" multiple-choice field, and one of the possible values is "nonprofit".  But of course it's not actually that simple: the full name for the value is "other nonprofit". The field's possible values also include "private higher education".  If you take a quick look at some records with that value, you'll see that many of those educational institutions are nonprofits.  Worse, a look at the "nonprofit" category shows some suspicious entries.  It's worth taking a closer look at the reliability of the values entered into this field.&lt;br/&gt;&lt;br/&gt;

But we need to get beyond saying the data is "pretty good" or "kind of dodgy" or "really bad".  It would be useful to come up with a quantified estimate of how reliable the field is.  GAO did this by picking a random sample of records, then checking each one to see whether it was correctly classified.  That gave them a precise answer about how good the classification was within the sample -- but what does the quality of the sample tell us about the quality of the larger dataset?  To figure this out, GAO calculated the &lt;em&gt;confidence interval&lt;/em&gt; of their result.&lt;br/&gt;&lt;br/&gt;

The idea is pretty simple.  Let's say there's a big population out there, and you want to quantify some attribute of it.  Unfortunately, it's not practical to examine that value for every member the population.  Instead, you'll take a smaller sample of the population, finding the value of the attribute just for its members.  How close will that value be to the real value you'd get if you &lt;em&gt;did&lt;/em&gt; examine the entire population?  It's impossible to say for sure, but thanks to the magic of the Central Limit Theorem, we can look at how erratic the sample's values are and get an estimate of just how good it is at representing the larger population.&lt;br/&gt;&lt;br/&gt;

You've probably run into confidence intervals before when reading about political polls. "Politician A has a 50% favorability rating, +/- 3 points!" This means that the pollsters are 95% confident that the &lt;em&gt;real&lt;/em&gt; favorability rating -- the one they'd get if they asked every relevant person -- is between 47% and 53%.  Why are they 95% sure of this, instead of 90% or 99%?  Well, 95% is a natural number to use thanks to the properties of the normal distribution, but in truth it's &lt;a href="http://www.jerrydallal.com/LHSP/p05.htm"&gt;a bit arbitrary&lt;/a&gt; -- it's just sort of the stat-industry standard (interestingly, this means that one out of every twenty statistically significant results will be specious, even if the experimenters made no mistakes -- something to keep in mind the next time you read a breathless account of some amazing just-published scientific discovery).&lt;br/&gt;&lt;br/&gt;

As you might imagine, for a given population there are a few factors that can be tweaked when evaluating the confidence interval: sample size, confidence level, and the width of the confidence interval.  As already mentioned, that second term is generally held constant at 95%; but tradeoffs are often made between the other two.  You can get a more precise confidence interval by increasing your sample size, for example -- but that usually costs time or money.&lt;br/&gt;&lt;br/&gt;

But let's get back to the nonprofit problem.  The GAO took a random sample of records from the "nonprofit" category and a random sample of all the other records in FAADS.  They then turned to whatever supplementary sources they could lay their hands on to figure out if the listed recipient for each record was actually a nonprofit -- they used the IRS Business Master File, the Census of Governments, the Higher Education Directory, and various other subject-specific guides to the nonprofits present within specific economic sectors.&lt;br/&gt;&lt;br/&gt;

There were three possible results to each examination for a recipient's true nonprofit status: it could be a nonprofit, or not a nonprofit, or the investigation could be inconclusive.  After examining the samples, they had a percentage value attached to each of these outcomes, which represented each outcome's share of the sample.&lt;br/&gt;&lt;br/&gt;

Each of those values can then plugged into this formula:&lt;br/&gt;&lt;br/&gt;

&lt;img src="http://assets.sunlightlabs.com/blog/confidence_interval_95.png" style="margin:0 auto; display: block; border:1px solid #aaa; margin-bottom: 1em" alt="p = \hat{p} \pm1.96\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}" /&gt;

Where p is the true share of the population that falls within the category (e.g. "is a nonprofit" or "uncertain"), n is the size of the sample, and the p with a hat (called "p-hat", pleasantly enough) is the observed proportion of the sample falling in the category.&lt;br/&gt;&lt;br/&gt;

Simple!  There are a number of assumptions underlying this which hide the statistical machinery: the assumption of equivalence between the standard deviation of the population and that of the sample; the fact that we're after a 95% confidence level.  But this is a pretty standard way of going about it, and it seems to be how GAO approached the problem.  For the nonprofit category, they reported a confidence interval of (.60-.79), meaning that about 70% of the sample turned out to represent genuine nonprofits, and based upon this they could be 95% sure that the true proportion of nonprofits in the larger population fell between 60% and 79%.  They did the same thing with educational institutions, testing how many were nonprofits, and found an interval of (.88-.98).&lt;br/&gt;&lt;br/&gt;

(Some of you might have noticed that this operation can work backwards: some simple algebra reveals that GAO investigated 89 records in order to get these figures.  In fact, the report says that they examined 96.  This discrepancy is no doubt partly the result of rounding errors, but it's also probably somewhat attributable to their use of a &lt;em&gt;t distribution&lt;/em&gt;, a technique invented at the Guinness Brewery in 1908 to help make reliable predictions even when the sample size is pretty small. It's not worth getting into the specifics here, but the upshot is that we need to adjust that 1.96 value for low values of n.)&lt;br/&gt;&lt;br/&gt;

So what can we do with this knowledge? It's tempting to say, "Okay, we have a statistically justified range for how many of these records are correctly classified as nonprofits. Let's take the total dollars for that category, multiply it by the high and low end of the confidence interval, and get an estimated range for total grant spending going to nonprofits."&lt;br/&gt;&lt;br/&gt;

I wouldn't say this is a bad idea, exactly. Certainly this gets you closer to the truth than simply taking FAADS at its word.  You ought to add in the weighted sum for the educational institution category too, of course.  And you should probably do the same thing for all the other recipient categories, to see how many nonprofits have been miscoded into those buckets.  And it would be a good idea to account for that "unknown" category, too -- the records that might or might not be nonprofits.&lt;br/&gt;&lt;br/&gt;

But even then, we're in dangerous territory if we assume we have an answer.  Much of what you read about statistics will be prefaced with "for a random variable" -- when we can't satisfy that assumption, things kind of fall apart. For instance: what if the government is in the habit of giving differently-sized grants to nonprofit recipients versus other recipients? That's something we should test for before we just blindly weight the sum of all grants.  Worse: what if we're missing some records from the population?  None of the above calculation tells us anything about anything other than the data we have in hand.&lt;br/&gt;&lt;br/&gt;

In the end, GAO didn't pursue these lines of inquiry.  Testing the nonprofit classification left things close enough for government work, as they say.  Still, by examining the quality of that classification, GAO peeled back and refined one assumption, and in the process, arrived at a better estimate.  Not a perfect one -- there are still a number of assumptions being made here, and they're worth examining, too.  But they did get an estimate of how useful one aspect of their data was, and that's a very good thing to know.&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/V9K32el_hEI" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tom Lee</dc:creator><pubDate>Wed, 10 Mar 2010 09:46:28 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2010/quantifying-data-quality/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2010/quantifying-data-quality/</feedburner:origLink></item><item><title>Lobbyists and White House Visitors</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/QX7_swR-isE/</link><description>&lt;p&gt;Recently and continuously, the White House has been releasing the &lt;a href="http://blog.sunlightfoundation.com/2010/01/05/so-you-want-to-know-who-is-visiting-the-white-house/"&gt;"White House Visitor Logs,"&lt;/a&gt; showing America who is coming in to meet with the President and his staff. At the same time, the &lt;a href="http://www.opensecrets.org"&gt;Center for Responsive Politics&lt;/a&gt; releases cleaned up data on &lt;a href="http://www.opensecrets.org/lobby/index.php"&gt;lobbyist filings&lt;/a&gt;. We thought it'd be interesting to find the intersect between the names in both sets of data.&lt;/p&gt;
&lt;p&gt;Below, you'll find our results along relevant information from both sets of data. Now-- this is important: just because the names match doesn't mean they're the same person. Because the White House doesn't release any other form of identity information besides the name, we're unable to tell whether or not the name in one dataset actually refers to the same person in the other. John Adams in one dataset may be a different John Adams in another. &lt;/p&gt;
&lt;p&gt;This is intended to be a starting point for journalists and citizen investigators, but isn't a reliable list of lobbyists who've been to the White House. Instead, it is a list of names of lobbyists who share names with people who have been to the White House and could be the same person. Whether they are or not is up to you to figure out.&lt;/p&gt;
&lt;p&gt;Here's the data for your perusal:&lt;/p&gt;
&lt;div&gt;&lt;p style="margin-bottom:3px"&gt;&lt;a href="http://data.sunlightlabs.com/government/White-House-Visitor-Logs-Lobbyists/b2ye-zdex" target="_blank" style="font-size:12px;font-weight:bold;text-decoration:none;color:#333333;font-family:arial;"&gt;White House Visitor Logs + Lobbyists&lt;/a&gt;&lt;/p&gt;&lt;iframe width="650px" height="425px" src="http://data.sunlightlabs.com/widgets/b2ye-zdex/normal?cur=F3gZC-t0j6y" frameborder="0" scrolling="no"&gt;&lt;a href="http://data.sunlightlabs.com/government/White-House-Visitor-Logs-Lobbyists/b2ye-zdex" title="White House Visitor Logs + Lobbyists" target="_blank"&gt;White House Visitor Logs + Lobbyists&lt;/a&gt;&lt;/iframe&gt;&lt;p&gt;&lt;a href="http://www.socrata.com/" target="_blank"&gt;Powered by Socrata&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;

&lt;p&gt;Thanks to the &lt;a href="http://opensecrets.org"&gt;Center for Responsive Politics&lt;/a&gt; for making their data available for us to do this match.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/QX7_swR-isE" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Clay Johnson</dc:creator><pubDate>Tue, 09 Mar 2010 16:02:03 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2010/lobbyists-and-white-house-visitors/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2010/lobbyists-and-white-house-visitors/</feedburner:origLink></item><item><title>Every Non-Profit is an Open Government Non-Profit</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/MdxNDc2QtqQ/</link><description>&lt;p&gt;Often times at Sunlight the non-profit community looks at us strangely. Here in Washington, DC we've probably made more investments in technology than any other non-profit or advocacy organization I've run across. Certainly our mission is focused around the use of technology, so that makes a lot of sense-- we're focused on getting data out of government, doing interesting things with it, and letting you see what happens in Washington better. That means technology investment.&lt;/p&gt;
&lt;p&gt;But one question I struggle with is: why doesn't every non-profit do this? It may be a little hyperbolic, but I have a hard time figuring this out. Every single non-profit stands to benefit from OpenGovernmentDataRightNow(tm).&lt;/p&gt;
&lt;p&gt;Doesn't &lt;a href="http://www.charitywater.org/"&gt;Charity Water&lt;/a&gt; stand to benefit from some form of data coming out of the United States government? Whether it be data from the EPA on water quality and saftey, or aid spending on water for Africa from the State Department? That data is there. It's waiting for you to ask for it.&lt;/p&gt;
&lt;p&gt;Wouldn't the &lt;a href="http://www.leukemia-lymphoma.org/hm_lls"&gt;Leukemia and Lymphoma&lt;/a&gt; society do well to advocate for the opening of raw data from the department of &lt;a href="http://minorityhealth.hhs.gov/templates/news.aspx?ID=635498"&gt;Health and Human Services&lt;/a&gt;? I wonder if the Center for Disease Control tracks &lt;a href="http://www.cdc.gov/Rabies/"&gt;rabies&lt;/a&gt; data for the &lt;a href="http://www.humanesociety.org/"&gt;Humane Society&lt;/a&gt;? Wouldn't open data help Katrina or Haiti relief organizations not only fight waste and corruption on the ground, but also make better decisions on where resources should be provided? That data's there. It's waiting for you to ask for it.&lt;/p&gt;
&lt;p&gt;With the exception of increased &lt;a href="http://en.wikipedia.org/wiki/403(b)"&gt;403(b)&lt;/a&gt; caps or other forms of financial tax incentives, I can't think of a more common issue than open government data that the non-profit community stands to benefit from. Open data can help a non-profit make more informed decisions on how to allocate its very scarce resources, more effectively help those in needs, make better decisions, and drive down a whole bunch of costs.&lt;/p&gt;
&lt;p&gt;We spend a lot of time talking about how open data can create billion dollar industries like &lt;a href="http://www.marketresearch.com/map/prod/1149304.html"&gt;GPS&lt;/a&gt;, but in the case of non-profits the gains are far more immediate and likely both to the organization's bottom line and to increase its impact.&lt;/p&gt;
&lt;p&gt;So if you're working at a non-profit I invite you to think about how you stand to benefit from open government data. If you think of some ideas, you should be &lt;a href="http://sunlightlabs.com/blog/2010/how-request-datasets-government-right-now/"&gt;submitting them to the government&lt;/a&gt;. You should &lt;a href="http://sunlightfoundation.com/campaign/"&gt;join and support our campaign&lt;/a&gt; too. You'll be doing your organization a great service and making the world a better place, too.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/MdxNDc2QtqQ" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Clay Johnson</dc:creator><pubDate>Fri, 19 Feb 2010 13:39:08 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2010/every-non-profit-open-government-non-profit/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2010/every-non-profit-open-government-non-profit/</feedburner:origLink></item><item><title>What if we Google Buzzed Government?</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/rF71DSOWVbE/</link><description>&lt;p&gt;Following up on my &lt;a href="http://sunlightlabs.com/blog/2010/what-if-government-had-google-buzz-moment/"&gt;hypothetical post&lt;/a&gt; on what would happen if Government had done the same thing that Google did with &lt;a href="http://buzz.google.com"&gt;Google Buzz&lt;/a&gt;, I'd like to imagine something different: what if something like Google Buzz happened to government? What if, out of nowhere, the Executive Branch of government started exposing the most frequent contacts of each &lt;a href="http://en.wikipedia.org/wiki/List_of_positions_filled_by_presidential_appointment_with_Senate_confirmation"&gt;Senate Confirmed&lt;/a&gt; appointee based on their email inboxes? What would happen if we could, for instance, pull up Rahm Emmanuel's "Buzz" profile and see who he followed and who was following him, based not on his preferences, but based on the frequency of email contacts alone?&lt;/p&gt;
&lt;p&gt;The answer is: Rahm would stop using e-mail. He'd use the phone instead. And when we bugged his phone, he'd do face to face meetings. And when we said that all face to face meetings must be on video except when you're in the bathroom, Rahm would put a toilet in the Oval Office and next to every other desk in the White House. &lt;/p&gt;
&lt;p&gt;When you turn the lights on in your apartment, the cockroaches don't evaporate, they run under the couch. It's also why ultimately, to keep cockroaches out of your apartment, the answer isn't to keep all your lights on, or even to call an exterminator. It's to clean up after yourself.&lt;/p&gt;
&lt;p&gt;Largely what we do in the transparency movement is turn lights on and watch to see where the roaches scatter to. It's important work, because it gives the bad guys less places to hide. Paul's piece on &lt;a href="http://blog.sunlightfoundation.com/2010/02/12/the-legacy-of-billy-tauzin-the-white-house-phrma-deal/"&gt;Billy Tauzin&lt;/a&gt; for instance, does a great job using the &lt;a href="http://www.sunlightfoundation.com/WhiteHouseVisitors/"&gt;White House Visitor Logs&lt;/a&gt;, at shining light on the pharmaceutical industries lobbying effort, demonstrating not only the power dynamics of the lobbying industry but also the importance of the Visitor Logs themselves that the White House released.&lt;/p&gt;
&lt;p&gt;Let's not take our eye off the long-game though. Transparency is a value, not an issue. There are a variety of issues related to the value of transparency-- but there's not a single piece of legislation that will cause government to be  fully open, accountable and transparent. In the same way an exterminator is useful for solving your apartment's roach problem in the short-term, transparency legislation useful for moving the ball forward but it's only part of the solution. &lt;/p&gt;
&lt;p&gt;The heart of what we're trying to change isn't technology or legislation. It's people's minds. You can't legislate how people think. To truly have a more open, honest, accountable government we need to invoke a &lt;em&gt;cultural shift&lt;/em&gt; of both the people working for us inside the Government, and within ourselves in how we handle our Government. &lt;/p&gt;
&lt;p&gt;To change the values inside of government, the right incentives have to be put in place. Bureaucrats who take risks and make data available should be validated by the public rather than scorned by their bosses. It should be the case that if a bureaucrat errs on the side of being "open" then they'll have a better long-term career success than those who don't. But that's not the case inside of government: too often the opposite is true. &lt;/p&gt;
&lt;p&gt;To further that change, we have to connect the people inside who do the right thing and share the values of openness to one another so that they can share the tactics and ideas that have garnered them that success, and to help recruit new people on the inside to affect that change.&lt;/p&gt;
&lt;p&gt;Together, we have to play both the short game and the long game to make things happen. Should someone build a Google Buzz like service out of the &lt;a href="http://www.sunlightfoundation.com/WhiteHouseVisitors/"&gt;White House Visitor Logs&lt;/a&gt;? 
Absolutely. It's light shone on our government to make it more accountable. But we also realize that as that's done and as that data source becomes more and more of a source for our storytellers to make stories like Paul's that people inside of the White House will begin to have more meetings outside the White House, and that data will become less and less of an accurate representation of what's really happening.&lt;/p&gt;
&lt;p&gt;If I'm to metaphorically call the corrupt forces inside of our government "roaches," then I ought to have a great name for the forces of good inside the government. Let's call the transparency advocates inside of government "kittens".  Because everyone loves kittens and because to you and me a kitten might be cute, harmless and playful, but to a roach: kittens eat roaches. And what we need is not only to keep shining lights in the shadows of this metaphorical apartment, but also an army of kittens waiting under the proverbial couch for the roaches to hide.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/rF71DSOWVbE" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Clay Johnson</dc:creator><pubDate>Thu, 18 Feb 2010 12:52:06 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2010/what-if-we-google-buzzed-government/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2010/what-if-we-google-buzzed-government/</feedburner:origLink></item><item><title>More About the Door</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/t6blyoVP4pk/</link><description>&lt;div style="clear:left; margin: 1em auto; text-align:center"&gt;&lt;object width="425" height="344"&gt;&lt;param name="movie" value="http://www.youtube.com/v/6BUTK1c3L8c&amp;hl=en_US&amp;fs=1&amp;rel=0"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/6BUTK1c3L8c&amp;hl=en_US&amp;fs=1&amp;rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;/div&gt;

The above video -- put together by Noah, Ali &amp;amp; Greg, and featuring star turns by Daniel and Luigi's phone -- shows the current state of &lt;a href="http://sunlightlabs.com/blog/2010/our-door-opener-science-project/"&gt;the door project I wrote about on Tuesday&lt;/a&gt;.  It's working pretty well!  I think I still need to add a bypass capacitor to improve the circuit's stability, but it's certainly good enough for our uses.&lt;br/&gt;&lt;br/&gt;

But the electronics are just one part of the system.  As I mentioned at the end of that last post, my colleagues did an impressive job of springing into action and building out the systems necessary to turn an SSH-accessible script into a useful interface.  Here's how they did it. &lt;br/&gt;&lt;br/&gt;

&lt;hr width="100%" style="background-repeat: repeat-x; margin-right: 9px; height:5px"/&gt;

&lt;strong&gt;Kevin:&lt;/strong&gt; Once Tom attached the door to our local network I took on the task of making it operable from outside of Sunlight's office via an public-facing API. The API enables any mobile device to become a door "key" by using a native client app or, for the non-smartphone users among us,  a Twilio based voice response interface.&lt;br/&gt;&lt;br/&gt; 

We choose to authenticate users based on the device's hardware ID or the Caller ID provided phone number for the voice interface, combined with a PIN number. Once the identity is verified the API triggers the latch code running on the Linksys router, opening the door. I used Django to build the API along with a small administrative interface for our HR staff to create and revoke "keys".  For sake of simplicity (and security) our API and admin interface runs on a separate server and connects with the router over our local network. However, for the minimalists out there, it's possible to make this entire system run within the Linksys router.&lt;br/&gt;&lt;br/&gt;

In addition to offering an endpoint for the native client apps I also provided an interface for &lt;a href="http://www.twilio.com"&gt;Twilio&lt;/a&gt;, an impressively easy to use service for integrating telephone access into web applications. Twilio's servers receive the phone call and then request a simple XML "script" from our API describing the response. Our server verifies the Caller ID information provided by Twilio and sends back a greeting message followed by a request for the user's PIN. Twilio then captures the key presses from the phone and submits another request with the PIN data. Assuming the credentials match, our sever unlocks the door and sends Twilio a response welcoming the user. The amazing thing about Twilio is that you can do all that with 30 lines of code. I had a prototype up and running in 15 minutes!&lt;br/&gt;&lt;br/&gt;

We're open sourcing all our code, including the APIs and administrative interface. The Django application is available &lt;a href="http://github.com/sunlightlabs/door-django/"&gt;here&lt;/a&gt;. Keep in mind this code is for demonstration purposes only -- please carefully consider the security implications of offering a web service to your door lock and understand that Sunlight Labs can't assume any responsibility for the use of this code.&lt;br/&gt;&lt;br/&gt;

&lt;hr width="100%" style="background-repeat: repeat-x; margin-right: 9px; height:5px"/&gt;

&lt;strong&gt;Eric:&lt;/strong&gt; I wanted to make opening the door the simplest experience possible for Sunlight's Android users (and we have many), so I created a native Android app that our G1, myTouch, and Nexus One owners could use.  The Android app talks to the same web interface that the Twilio endpoint does, so we can keep our authentication and analytics logic centralized.  The permissions are locked to a "Device ID" for the phone, which I get from &lt;a href="http://developer.android.com/reference/android/telephony/TelephonyManager.html#getDeviceId()"&gt;the CDMA or GSM device&lt;/a&gt; inside the phone. In this way we tie access to individual devices, like actual physical keys.&lt;br/&gt;&lt;br/&gt;

It's a native, Java-based application, that consists of one main screen, and one widget.  The first time you want to open the door, you need to open up the main screen from the application tray, select a PIN, try to open the door with it, and fail at it.  Then an administrator will see the failed opening attempt, and can enable that device and that PIN for access from then onwards.  The PIN gets stored on the phone so the user doesn't have to type it again, though the PIN can be cleared from memory if the user chooses.&lt;br/&gt;&lt;br/&gt;

For maximum door-opening convenience, a demure widget is included that will open the door in one tap, if the PIN is in memory.  This takes advantage of Android's unique desktop and widget interface, one of the best parts of the Android platform.&lt;br/&gt;&lt;br/&gt;

I've published &lt;a href="http://github.com/sunlightlabs/door-android"&gt;the source code&lt;/a&gt; for the app on Github, if you want to see how it all works.  There honestly isn't a whole lot of crazy stuff going on, but finding good Android code examples is surprisingly hard, so if you want to see a basic widget or service in action, there you go.&lt;br/&gt;&lt;br/&gt;

&lt;img class="detailimage" src="http://assets.sunlightlabs.com/blog/door/android_screenshot2.jpg" style="float:none; margin: 0 auto 20px auto; display:block" alt="screenshot of the android app" /&gt;

&lt;hr width="100%" style="background-repeat: repeat-x; margin-right: 9px; height:5px"/&gt;

&lt;strong&gt;Josh:&lt;/strong&gt; It was very simple to create the iPhone application.  Using the Titanium Mobile SDK, I was able to quickly throw together an application that would authenticate via the web service.  It took about two total hours to write with the majority of that spent playing with the user interface to see what worked best and creating an icon and graphics.  The application was passed out to Sunlight employees with iPhones using the ad-hoc distribution model.&lt;br/&gt;&lt;br/&gt;

Titanium Mobile: &lt;a href="http://appcelerator.com"&gt;http://appcelerator.com&lt;/a&gt;&lt;br/&gt;
Code for Dorky (Door Key) iPhone Application: &lt;a href="http://github.com/jroo/dorky-iphone"&gt;http://github.com/jroo/dorky-iphone&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;

&lt;img class="detailimage" src="http://assets.sunlightlabs.com/blog/door/iphone_screenshot.jpg" style="float:none; margin: 0 auto 20px auto; display:block" alt="screenshot of the iphone app" /&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/t6blyoVP4pk" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tom Lee</dc:creator><pubDate>Thu, 18 Feb 2010 10:54:20 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2010/more-about-door/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2010/more-about-door/</feedburner:origLink></item><item><title>ClearMaps: A Mapping Framework for Data Visualization</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/4P09GHHEBGc/</link><description>&lt;p&gt;Despite the recent explosion of web based cartography tools, making effective maps for data visualization remains a challenge. While tools like Google Maps are great for helping navigate the world they are often poorly suited for &lt;a href="http://en.wikipedia.org/wiki/Thematic_map"&gt;thematic mapping&lt;/a&gt;, as many features like roads and cities only get in the way of telling compelling stories with data. In fact, even the distance between places can be a distraction – who cares how far away Alaska is when the goal is to make a simple comparison between US states?&lt;/p&gt;
&lt;p&gt;To help overcome some of the limitations with existing mapping tools Sunlight Lab is releasing &lt;a href="http://github.com/sunlightlabs/clearmaps/"&gt;ClearMaps&lt;/a&gt;, an ActionScript framework for interactive cartographic visualization. In addition to giving designers and developers more control over presentation the project aims to address some of the common technical challenges faced when building interactive, data driven maps for the web. ClearMaps is designed as a lightweight, flexible set of tools for building complex data visualizations. It is a framework not a plug-and-play component (though it could be a starting point for those wishing to make reusable tools). &lt;/p&gt;
&lt;p&gt;&lt;object width="700" height="500"&gt;
    &lt;param name="movie" value="http://assets.sunlightlabs.com/maps/swf/DemoMap.swf"&gt;
    &lt;param name="scale" value="noscale"&gt;
    &lt;param name="wmode" value="transparent"&gt;
    &lt;embed src="http://assets.sunlightlabs.com/maps/swf/DemoMap.swf" type="application/x-shockwave-flash" scale="noscale" wmode="transparent" width="700" height="500"&gt;
&lt;/object&gt;&lt;/p&gt;
&lt;p&gt;In addition to offering flexibility, ClearMaps attempts to address two of the biggest technical problems encountered when building maps for the web: rendering of vector data in-browser and reducing vector data size for timely loading. Unfortunately there aren't many browser-native approaches to rendering that can scale to the full range of visualization tasks. For example, rendering the boundaries of all 3,100+ US counties isn't performant when using JavaScript and the canvas object. Things only get worse when compromises are made to support non-canvas compliant browsers such as Internet Explore.  Data transmission time is equally problematic for complex geometries. At full scale, US Census Shapefiles for county boundaries total 14MB. Even with detail reduced for web display the raw vector data are nearly 2MB in size.&lt;/p&gt;
&lt;p&gt;Given these challenges ActionScript becomes an attractive solution. The Flash plug-in's rendering engine is faster and more widely available than in-browser vector rendering.  Plus, Flash offers tools for efficiently encoding and compressing binary data streams, significantly speeding up transmission time. Fortunately, with the release of Adobe's open source &lt;a href="http://opensource.adobe.com/wiki/display/flexsdk/Downloads"&gt;ActionScript 3 SDK&lt;/a&gt; and open source IDEs like &lt;a href="http://www.flashdevelop.org/"&gt;FlashDevelop&lt;/a&gt;, the proprietary nature of the Flash platform has become less of a concern.&lt;/p&gt;
&lt;p&gt;ClearMaps provides an Adobe AIR based encoding tool for translating data from Shapefiles into a compressed binary form and a set of ActionScript classes for decoding and rendering vector data. These tools provide the core functionality required to get from raw cartographic data to a web map with a minimum of glue code (the above demo map requires less than a hundred lines of ActionScript). &lt;/p&gt;
&lt;p&gt;However, the library is far from complete. Features like keys and legends are currently missing and much work remains in building an extensible framework for integrating external data sources. If you find these tools useful drop us a line and let us know how you're using the framework. If you have ideas or code for solving common visualization tasks we would love to incorporate them into the library.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/4P09GHHEBGc" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Kevin Webb</dc:creator><pubDate>Wed, 17 Feb 2010 11:17:52 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2010/clearmaps-mapping-framework/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2010/clearmaps-mapping-framework/</feedburner:origLink></item><item><title>Are the American People short on ideas?</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/krajKHwQI6w/</link><description>&lt;p&gt;&lt;img src="http://img.skitch.com/20100216-pajdm785t62hmk4ge1jgpyp7em.jpg" alt="Federal Agency Ideascale Dashboard" class="detailimage"/&gt;A couple of developers from the Sunlight Labs community, including one of our &lt;a href="http://sunlightlabs.com/hackathon09"&gt;Great American Hackathon&lt;/a&gt; organizers &lt;a href="http://sunlightlabs.com/people/jessykate/"&gt;Jessy Cowan-Sharp&lt;/a&gt;, managed to put together something remarkable: &lt;a href="http://www.opengovtracker.com"&gt;OpenGovTracker&lt;/a&gt; (&lt;a href="http://github.com/jessykate/IdeaScale-Dashboard"&gt;source here&lt;/a&gt;). The site lets you see where the ideas are coming in across the various agencies from a single dashboard.&lt;/p&gt;
&lt;p&gt;What's the synopsis? According to this it's that the American People don't have a lot of ideas. Well-- a lot of agencies are pretty low on ideas. Only 611 ideas have been proposed. &lt;a href="http://www.treasury.gov/open/"&gt;Treasury&lt;/a&gt; only has a dozen ideas? The best the American people can do is give Social Security &lt;a href="http://www.ssa.gov/open/"&gt;10 new ideas?&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;As our the Sunlight Foundation's Policy director stated late last week: &lt;a href="http://blog.sunlightfoundation.com/2010/02/14/now-is-the-time/"&gt;now is the time&lt;/a&gt;. Request a dataset or submit an idea to government. Here's &lt;a href="http://sunlightlabs.com/blog/2010/how-request-datasets-government-right-now/"&gt;how&lt;/a&gt;.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/krajKHwQI6w" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Clay Johnson</dc:creator><pubDate>Tue, 16 Feb 2010 17:07:42 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2010/are-american-people-short-ideas/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2010/are-american-people-short-ideas/</feedburner:origLink></item><item><title>Our Door Opener (A Science Project)</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/6XXZq4mwAoc/</link><description>Life in the labs has been pretty good since we moved into our current offices.  Before, we were spread out over two floors: my team was upstairs in a stuffy law office sublet, and the rest of our colleagues were stuck in a homey but increasingly cramped and run-down space four floors below.  Since moving everyone to the third floor we've found ourselves with plenty of room, lots more light and a nicer kitchen.  It's just a more pleasant working environment in general.&lt;br/&gt;&lt;br/&gt;

&lt;img class="detailimage" src="http://assets.sunlightlabs.com/blog/door/door_switch_small.jpg" style="float:right;margin-right:0; margin-left: 20px" alt="wall-mounted button with label reading 'door release'"/&gt;But there's always room for improvement.  For one thing, the new space came with new locks -- ones with really expensive keys.  Issuing keys to the entire staff wasn't practical, and coordinating door-opening responsibilities in a way that accommodated team members' occasionally odd schedules was inconvenient.  Fortunately, the space also came with the button you see to the right.  This button, with a few hacks, now makes it so we can open up our office with our Nexus Ones, iPhones or even SMS.&lt;br/&gt;&lt;br/&gt;

Located near the reception desk, this button opens an electronic latch on the front door.  Pulling the assembly out of the wall revealed the system to be about as simple as possible: the button simply connects two wires.  Bridging them with a screwdriver fired the latch (from their small gauge and uninsulated connections, it was obvious we weren't dealing with dangerous voltages, but please don't start pulling cable from your walls unless you know what you're doing).&lt;br/&gt;&lt;br/&gt;

Connecting two low-voltage wires electronically isn't a particularly hard trick, so I decided it'd be fun to spend some evenings building a system to expose that switch to our network.  I've built a few projects in the past that make use of custom router firmwares and &lt;a href="http://www.arduino.cc"&gt;Arduino&lt;/a&gt; microcontrollers, so I decided to use those tools for the job.  I'm a big fan of this approach -- it's a great, inexpensive way to add a scriptable network interface to your microcontroller projects (if you're interested, I talk more about this technique &lt;a href="http://www.manifestdensity.net/2008/03/26/dorkbot-dc-arduino-meet-fonera/"&gt;here&lt;/a&gt;). &lt;br/&gt;&lt;br/&gt;

Ultimately, I ran into difficulty getting this particular router's serial connection to work, so I abandoned the Arduino -- it was overkill anyway.  Inspired by the excellent new &lt;a href="http://oreilly.com/catalog/9780596153755"&gt;&lt;em&gt;MAKE: Electronics&lt;/em&gt; book&lt;/a&gt;, I decided to build a circuit to do the job.  Here's the finished franken-router, hooked up to the switch:&lt;br/&gt;&lt;br/&gt;

&lt;img class="detailimage" src="http://assets.sunlightlabs.com/blog/door/complete_installation.jpg" style="float:none; margin: 0 auto 20px auto; display:block" alt="router with electrical connector posts added to its case, which are connected to a wire that snakes back under the wall plate of the button previously pictured"/&gt;

So how does this all work? Well, let's start with the router.  It's a Linksys WRT54GL, a still-Linux-friendly descendant of the WRT54G, the router which jump-started the custom firmware scene.  There are a lot of custom router firmwares available -- &lt;a href="http://www.dd-wrt.com/"&gt;DD-WRT&lt;/a&gt;, &lt;a href="http://www.polarcloud.com/tomato"&gt;Tomato&lt;/a&gt;, &lt;a href="http://www.gargoyle-router.com/"&gt;Gargoyle&lt;/a&gt; -- and they can all make your consumer-grade router do some professional-grade things. For this project I used &lt;a href="http://openwrt.org/"&gt;OpenWRT&lt;/a&gt;, the system on which many of those other projects are based.  It's got a much steeper learning curve than those other distros -- you definitely need to be comfortable with the command line -- but it lets you build a custom firmware with exactly the components you want.  With some tweaking I was able to create a firmware that included a stripped-down version of Python, but which still fit into the WRT54GL's meager 2MB flash memory.  (Like the Arduino, Python turned out to be overkill, but at the time I expected to need it -- and it's always nice to have it handy.)&lt;br/&gt;&lt;br/&gt;

With the firmware installed, I was able to SSH into the router and perform some simple manipulations of the system's GPIOs -- General Purpose Input/Outputs.  These connect to things like the system's LEDs and switches, and can be controlled in software.  I selected a GPIO that didn't seem to be used by OpenWRT -- it illuminates the "DMZ" LED on the front panel -- and wrote a &lt;a href="http://pastie.org/822223"&gt;very simple script&lt;/a&gt; to control it. I could now flip a tiny light on and off from a network connection. The hard part was over.&lt;br/&gt;&lt;br/&gt;

Next came the hardware. It was easy to add a &lt;a href="http://www.team-xecuter.com/x3/tutorials/16/x3_16pinheader.jpg"&gt;pin header&lt;/a&gt; to the router's PCB -- it's got holes drilled and ready to go (some of them are connected to router systems -- see &lt;a href="http://wiki.openwrt.org/oldwiki/openwrtdocs/hardware/linksys/wrt54gl"&gt;here&lt;/a&gt;).  To this I hooked up ground, power, and a connection to one of the DMZ LED's pins.&lt;br/&gt;&lt;br/&gt;

&lt;img class="detailimage" src="http://assets.sunlightlabs.com/blog/door/board_bottom_marked_up.jpg" style="float:none; margin: 0 auto 20px auto; display:block" alt="bottom of WRT54GL circuit board, showing additional connections" /&gt;

A cable then connects those three pins to this breakout board:&lt;br/&gt;&lt;br/&gt;

&lt;img class="detailimage" src="http://assets.sunlightlabs.com/blog/door/circuit_marked_up.jpg" style="float:none; margin: 0 auto 20px auto; display:block" alt="breakout board with labels for voltage comparator, 556 chip, relay, voltage regulator, etcetera. various capacitors, resistors, transistors and other components are also present but unlabeled" /&gt;

There are a few things going on here.  That blue cylinder is a relay, and it's the heart of the system.  A relay is basically just a switch that can be closed by the electromagnet that's attached to it.  They're useful for employing a small voltage to turn a larger voltage on and off.  In this case, the switched inputs of the relay are wired up to the connection posts that I added to the outside of the router: when the relay is energized, those posts are electrically connected. When it's not, they aren't.  The relay runs on 5 volts, which is why there's a voltage regulator on the board: to reduce the router's 12 volts to 5 (the other components on the board can run on a range of voltages; they're perfectly happy at 5).&lt;br/&gt;&lt;br/&gt;

The router's GPIO isn't up to the task of controlling the relay directly, so there's some additional circuitry that's needed to turn the GPIO signal into a firing relay. First up: the 556 chip, which contains two 555 timer chips in a single package. When triggered, one of these fires the relay for a set amount of time -- about half a second.   The other timer starts up as soon as the circuit gets power, and stays on for almost half a minute.  It's responsible for disabling the first timer during the router's boot sequence, when various scripts cycle the LEDs (and would therefore accidentally fire the relay and open the door).&lt;br/&gt;&lt;br/&gt;

You trigger a 555 by connecting one of its pins to ground, and at first I thought I'd be able to simply hook the GPIO up to the 555's trigger.  No such luck: this GPIO, at least, doesn't go to ground when you switch it. Instead it changes between 2 and 3.3 volts (in between is the threshold where the LED happens to light up). The solution to this was to use a voltage comparator chip.  This compares its input -- the GPIO -- to a reference value that's supplied on another pin.  In this case the reference voltage is 2.5 volts, made with an extremely simple voltage divider (basically just two equal-value resistors in series -- the junction will be at half of the total voltage across them, which in this case is 5. Science!).  If the input value is above the reference, it outputs a high value. If it's below the reference, it connects its output to ground. So the comparator's output gets connected to the 555, and voila! We're done.&lt;br/&gt;&lt;br/&gt;

This is a pretty simple project, but I'm no electrical engineer, so it took me a while to get it working. But eventually I was able to use my SSH client to open the door latch.  That's not exactly an ideal interface, though: it's both hard to use and hard to administer.  I turned the problem over to my coworkers in the labs, and they built the rest of the system with amazing speed (and, it should be noted, in the middle of a blizzard).  They've come up with some pretty impressive solutions -- more on that in the next post!&lt;br/&gt;&lt;br/&gt;

&lt;strong&gt;UPDATE:&lt;/strong&gt; I've been asked for a schematic of the door-opening circuit; here it is as an &lt;a href="http://assets.sunlightlabs.com/blog/door/sunlight_door.sch"&gt;EAGLE .sch&lt;/a&gt; and as a &lt;a href="http://assets.sunlightlabs.com/blog/door/schematic.png"&gt;simple PNG&lt;/a&gt;. If you do build this, a word of warning: the initial timer, which is meant to disable the system during boot, behaves somewhat differently in-system than it does on a breadboard.  I'm still trying to address this issue -- and have added a bypass capacitor to the schematic in an attempt to do so -- but for us, it's not a particularly pressing issue, as the vulnerability it represents is still substantially harder to take advantage of than it would be to simply smash down the door.  Still, if you plan to use this circuit, please be aware of its limitations and understand that you use it at your own risk.&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/6XXZq4mwAoc" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tom Lee</dc:creator><pubDate>Tue, 16 Feb 2010 10:00:23 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2010/our-door-opener-science-project/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2010/our-door-opener-science-project/</feedburner:origLink></item><item><title>What if Government had a Google Buzz Moment?</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/PlikG6VTiLU/</link><description>&lt;p&gt;Three days ago Google released &lt;a href="http://buzz.gmail.com"&gt;Google Buzz&lt;/a&gt;-- a product that got a lot of folks excited-- especially here in the Labs. But fairly quickly people understood something-- Google took a step across an invisible privacy fence. A &lt;a href="http://www.google.com/support/forum/p/gmail/thread?tid=504fb2c558c43fd9&amp;amp;hl=en"&gt;lot&lt;/a&gt; &lt;a href="http://lifehacker.com/5470513/how-buzz-exposes-private-email-addresses-in-replies"&gt;of&lt;/a&gt; &lt;a href="http://blog.rosania.org/on-privacy-or-what-buzz-failed-to-learn-from"&gt;people&lt;/a&gt; &lt;a href="http://anwag.posterous.com/badly-needed-buzz-improvement-1-individually"&gt;are&lt;/a&gt; &lt;a href="http://mixergy.com/the-problem-with-google-buzz-is-that-it-solves-googles-problem-at-your-expense/"&gt;critical&lt;/a&gt; or downright &lt;a href="http://fugitivus.wordpress.com/2010/02/11/fuck-you-google/"&gt;ticked off&lt;/a&gt;. Google had, in fact, exposed who we communicate with the most to the world. &lt;/p&gt;
&lt;p&gt;If the Federal Government released a product similar to Google Buzz, what would have happened? &lt;/p&gt;
&lt;p&gt;What would it look like if the Government decided to take the most basic social information about you and create and expose online social network with it, using the information it had?&lt;/p&gt;
&lt;p&gt;The first place I'd go if I was the government looking to build such a service would be the IRS. From there, I can determine data about your spouse, dependents and employer. That itself ought to be enough for me to start my service up-- linking you up, online to your spouse or former spouses, and the folks that work at your company in the same area you do. &lt;/p&gt;
&lt;p&gt;Of course, that's just one agency. If, for instance, you were a veteran, the Government could link you up with other people you've worked with. If you were  lobbyist, the Government could link you up with other lobbyists who have lobbied at the same firm on the same issues you have. If you've filed for student loans, perhaps it could link you up with all the people you've gone to college with. &lt;/p&gt;
&lt;p&gt;Now-- Google didn't choose to go this far. It didn't, for instance, choose to link you up with the people you google stalk. And it didn't choose to link your google stalkers up with you either, though I should point out that this is a fairly technically difficult task to pull off-- but the point remains the same. Google took one product-- Gmail, and chose to expose some data to the public about your habits in Gmail without asking your permission first. Though I think they've been doing this less explicitly through &lt;a href="http://reader.google.com"&gt;Google Reader&lt;/a&gt;, this time it has caused a big stink &lt;a href="http://www.google.com/search?sourceid=chrome&amp;amp;ie=UTF-8&amp;amp;q=google+buzz+privacy"&gt;stink&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If the Government did this on Tuesday, I'd imagine there'd be a witch-hunt for who was responsible, there'd be congressional hearings, and probably impeachment papers being drawn up as we speak. &lt;/p&gt;
&lt;p&gt;In Google's case, nobody's shouting "off with his head" about Eric Schmidt. It's unlikely that any congressional hearings will take place about this. Instead, people are tossing out instruction manuals for &lt;a href="http://blogs.howstuffworks.com/2010/02/11/opting-out-of-google-buzz/"&gt;opting out of the service&lt;/a&gt; and talking about their complaints publicly. Google in turn is &lt;a href="http://www.google.com/support/forum/p/gmail/thread?tid=504fb2c558c43fd9&amp;amp;hl=en"&gt;listening to feedback&lt;/a&gt; and &lt;a href="http://gmailblog.blogspot.com/2010/02/millions-of-buzz-users-and-improvements.html"&gt;rolling out functionality rather quickly&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The point is-- as angry as people could get over Google doing something like this, it doesn't come anywhere close to how angry people would get if Government did this. And rightfully so. We're compelled pay taxes. We have a &lt;a href="http://en.wikipedia.org/wiki/Fourth_Amendment_to_the_United_States_Constitution"&gt;4th Amendment&lt;/a&gt;-- it's arguably against the law for the Government to do what I'm suggesting anyhow. But the question you've got to ask yourself is why. &lt;/p&gt;
&lt;p&gt;The answer:  Google, to an extent, has permission from its users to screw up and rectify the situation. Government does not. We have a much lower tolerance for failure in our government than we do the corporations that serve us. Every time government doesn't do something exactly right, it's a political opportunity for someone to strike. And while we all enviously watch Google give its engineers &lt;a href="http://en.wikipedia.org/wiki/Google#Innovation_Time_Off"&gt;20% time&lt;/a&gt; to innovate, the public would likely be outraged if Government tried this. The end result? Our government isn't innovative because it can't fail. If it fails, it becomes a story-- another blemish to breed more cynicism. Another story for &lt;a href="http://en.wikipedia.org/wiki/Bill_O&amp;apos;Reilly_political_commentator"&gt;Bill O'Reilly&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Keith_Olbermann"&gt;Keith Olbermann&lt;/a&gt;. So-- why take any risk at all?&lt;/p&gt;
&lt;p&gt;If we want government to open up and innovate, then the right answer is to give it permission to screw up, have a dialogue with people and rectify it. Opening government is as much a citizens responsibility as it is a government's. It takes not only a shift in thinking on their end, but also on ours.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/PlikG6VTiLU" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Clay Johnson</dc:creator><pubDate>Fri, 12 Feb 2010 15:57:34 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2010/what-if-government-had-google-buzz-moment/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2010/what-if-government-had-google-buzz-moment/</feedburner:origLink></item><item><title>Free yourself from the Shackles of &amp;quot;High Value Data&amp;quot;</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/JZrLAozpSM8/</link><description>&lt;p&gt;When the feds introduced the term High Value Data, my immediate response here was "what the heck is 'High Value Data'?!" We quickly extracted the definition from the &lt;a href="http://sunlightfoundation.com/opengovernmentdirective"&gt;Open Government Directive&lt;/a&gt; and here it is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;"High-value information is information that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Now we've had a chance to go through and take a &lt;a href="http://reporting.sunlightfoundation.com/2010/data-gov-opinion/"&gt;look&lt;/a&gt; at some of the datasets. Our &lt;a href="Reporting Group"&gt;http://reporting.sunlightfoundation.com&lt;/a&gt; is &lt;a href="http://reporting.sunlightfoundation.com/2010/ogd-commerce/"&gt;having&lt;/a&gt; a &lt;a href="http://reporting.sunlightfoundation.com/2010/ogd-defense-releases-what-it-already-releases/"&gt;field&lt;/a&gt; &lt;a href="http://reporting.sunlightfoundation.com/2010/ogd-irs-migration-data-doesnt-capture-everyone/"&gt;day&lt;/a&gt; analyzing the data, pointing out flaws in the data and generally doing a great job of figuring out what's &lt;em&gt;actually&lt;/em&gt; new in the datasets.&lt;/p&gt;
&lt;p&gt;Predictably, a new complaint has emerged: people keep trying to figure out why this data is "high value." And what we have here is the equivalent of legions of sci-fi fans complaining that Huckleberry Finn didn't have enough Yoda. The fact is-- it's impossible for anyone, government or otherwise, to claim that data is "high value" by any universal standard. It depends too much on the person, the timing, and other completely subjective factors.&lt;/p&gt;
&lt;p&gt;Investigative reporters point out that much of the data isn't "high value" when what they really mean is that it isn't high value to them. Much of it doesn't help to "increase the accountability of the agency" or provide them with the ability to conduct investigative reports. Researchers and specialists will sometimes say "But this data has been online for years. All you've done is release it in a .csv. This isn't high value." Developers will say "you've provided me a bunch of bulk data, that's great-- but I have no idea what it is or what to do with it. Where's the context?"&lt;/p&gt;
&lt;p&gt;"High value" is a subjective term. Data has no value without context. Data's value varies based on the expertise and interests of the individual. I suspect that while I don't really find the &lt;a href="http://www.data.gov/details/1352"&gt;Feed Grains Database&lt;/a&gt; particularly interesting, &lt;a href="http://bit.ly/9sTeEP"&gt;Michael Pollan&lt;/a&gt; may find that data particularly useful.&lt;br /&gt;
&lt;/p&gt;
&lt;p&gt;"High value" also depends on timing. Today's "junk" dataset could be tomorrow's gold mine. Would a dataset of gas floor pedals be considered high value before the &lt;a href="http://www.toyota.com/recall/?srchid=K610_p228906387"&gt;Toyota Recall?&lt;/a&gt; Would &lt;a href="http://recovery.gov"&gt;Recovery.gov's&lt;/a&gt; data be considered high value without the political background of Obama's economic stimulus? &lt;/p&gt;
&lt;p&gt;Let's do away with the term "high value." It'd be better to measure specific goals for each type of dataset the people want released. Take, for instance, accountability data. What would happen if we set benchmarks specifically for accountability data? If the Open Government Directive said that each agency had to release 2-3 datasets allowing for the detection of the most common types of waste, fraud or corruption affecting the agency. Then the investigative journalists would be able to more accurately measure whether or not the data is of value. They can tell you fairly quickly if a dataset's going to help them write the stories that help citizens hold agencies accountable for their work. &lt;/p&gt;
&lt;p&gt;Because a dataset's value is entirely subjective, discoverability and usability become the bigger factors for most datasets, too. How do we get the right datasets in front of the right people? Because &lt;a href="http://sunlightlabs.com/blog/2010/coming-data-flood/"&gt;the flood of data is coming&lt;/a&gt;. And one person's junk is another's treasure.&lt;/p&gt;
&lt;p&gt;We're almost there. The White House should consider pushing a little harder and pushing the agencies to create new datasets along specific units of measurements. And you-- yes, you-- the open government community could do the same. Join me and free yourself from the shackles of "high value" abstraction. Create your own measurements by which to measure the impact of this data being released and start tracking.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/JZrLAozpSM8" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Clay Johnson</dc:creator><pubDate>Thu, 11 Feb 2010 14:39:29 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2010/term-high-value-dataset-bunk/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2010/term-high-value-dataset-bunk/</feedburner:origLink></item></channel></rss>
