<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><title>Sunlight Labs blog</title><link>http://sunlightlabs.com/blog/</link><description>Latest blog updates from the nerds at Sunlight Labs</description><language>en-us</language><lastBuildDate>Thu, 17 Jan 2013 13:37:19 -0500</lastBuildDate><ttl>120</ttl><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/sunlightlabs/blog" /><feedburner:info uri="sunlightlabs/blog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item><title>FDLP Allergic to Curl</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/BuwRNLbOtMU/</link><description>&lt;p&gt;
Waldo Jaquith &lt;a href="https://twitter.com/waldojaquith/status/286301765702410240"&gt;discovered&lt;/a&gt; that the FDLP (Federal Depository Library Program) appears to have an allergic reaction to people downloading their data with basic command line tools.
&lt;/p&gt;

&lt;blockquote class="twitter-tweet tw-align-center"&gt;&lt;p&gt;fdlp․gov blocks requests from cURL with a 403 and a "malware detected" error. &amp;gt;:-/&lt;/p&gt;&amp;mdash; Waldo Jaquith (@waldojaquith) &lt;a href="https://twitter.com/waldojaquith/status/286301765702410240" data-datetime="2013-01-02T02:43:54+00:00"&gt;January 2, 2013&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;

&lt;p&gt;
The &lt;a href="http://www.fdlp.gov/home/repository/doc_download/1995-newly-added-congressional-document-item-numbers"&gt;CSV's URL&lt;/a&gt; (linked from &lt;a href="http://www.fdlp.gov/component/content/article/42-distribution/1005-csv-file-newly-added-cong-doc-item-nos"&gt;this post&lt;/a&gt;) is not blocked by their &lt;a href="http://fdlp.gov/robots.txt"&gt;robots.txt&lt;/a&gt;. Using an alternate tool, wget, worked fine. My colleague Thom Neale &lt;a href="https://twitter.com/twneale/status/286314145165496320"&gt;humorously noted&lt;/a&gt; that having curl tell FDLP.gov that it's "microsoft-malware-professional-2013" also worked, but Waldo &lt;a href="https://twitter.com/waldojaquith/status/286314422979411968"&gt;found&lt;/a&gt; that "Mozilla/5.0" did not. So FDLP has some weird, specific logic around who is approved to download their data and who isn't.
&lt;/p&gt;

&lt;p&gt;
In the course of verifying all this, after trying to download the CSV only a handful of times, FDLP blocked the entire Sunlight Foundation office from any access to FDLP.gov. This was 2 weeks ago, and this is still what Sunlight staff see when they visit FDLP.gov:
&lt;/p&gt;

&lt;p&gt;&lt;img src="http://assets.sunlightlabs.com/blog/fdlp.gov.png" style="margin: 0px auto; display: block"/&gt;&lt;/p&gt;

&lt;p&gt;
The ridiculousness of permanently blocking us after so few requests aside, considering requests to download structured data by non-browsers to be "malware" is seriously backwards thinking, especially for a government agency. 
&lt;/p&gt;

&lt;p&gt;
Restricting abusive behavior is obviously fine, but that abuse should be measured by behavior, not by user agent profiling. If you host structured data at a public, permanent link, expect people to want to obtain that data through a great variety of reasonable means.
&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/BuwRNLbOtMU" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Eric Mill</dc:creator><pubDate>Thu, 17 Jan 2013 13:37:19 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2013/fdlp-allergic-to-curl/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2013/fdlp-allergic-to-curl/</feedburner:origLink></item><item><title>The Data and Tech Behind Sitegeist</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/yM1fSPlkX20/</link><description>&lt;p&gt;&lt;img src="http://assets.sunlightlabs.com.s3.amazonaws.com/blog/sitegeist-tech-data/sitegeist-logo.png" style="float: right; width: 100px; height: 100px; margin: 0 0 1em 1em;"&gt;&lt;/p&gt;

&lt;p&gt;It's been three weeks since we launched &lt;a href="http://sitegeist.sunlightfoundation.com"&gt;Sitegeist&lt;/a&gt; and the response has greatly exceeded (my) expectations! There have been over 27,000 downloads of the &lt;a href="http://sitegeist.sunlightfoundation.com/ios/"&gt;iOS&lt;/a&gt; and &lt;a href="http://sitegeist.sunlightfoundation.com/android/"&gt;Android&lt;/a&gt; apps and a flood of feedback emails. Thanks again to the &lt;a href="http://www.knightfoundation.org/"&gt;Knight Foundation&lt;/a&gt; and &lt;a href="http://www.ideo.com/"&gt;IDEO&lt;/a&gt; for their help in creating the app!&lt;/p&gt;

&lt;p&gt;There were a bunch of questions about the data and technology used to produce Sitegeist, so let's dive into how the project works.&lt;/p&gt;

&lt;h3&gt;Data&lt;/h3&gt;

&lt;p&gt;Our goal with Sitegeist was to show how government data could be made useful to the average person. We incorporated demographic information, campaign contributions, weather and other data that comes directly from the government or through a secondary source. We also use some privately owned data sets to supplement this information.&lt;/p&gt;

&lt;h4&gt;US Census Bureau&lt;/h4&gt;

&lt;p&gt;Tha majority of the data used in Sitegeist comes from the US Census Bureau and their &lt;a href="http://www.census.gov/developers/"&gt;wonderful API&lt;/a&gt;. To help with development, we've created &lt;a href="https://crate.io/packages/census/"&gt;census, a Python wrapper for the Census API&lt;/a&gt;. For example, a call to get the name and number of children under 5 years of age for every state:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;c = Census("MY_API_KEY")
c.acs.state(("NAME", "B01001_004E"), Census.ALL)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I've never worked with the Census bulk downloads, but I hear from colleagues that it can be a daunting task. Their API makes it incredibly easy to slice and dice the numbers as needed, combining data across "tables" for any geography you are working with.&lt;/p&gt;

&lt;h4&gt;Influence Explorer&lt;/h4&gt;

&lt;p&gt;Campaign contributions by ZIP code is provided by our own &lt;a href="http://influenceexplorer.com/"&gt;Influence Explorer&lt;/a&gt; project. Contributions within a ZIP code are totaled based on party affiliation of the recipient. Pretty straightforward, not much to say about this.&lt;/p&gt;

&lt;h4&gt;EPA Easter Egg&lt;/h4&gt;

&lt;p&gt;&lt;img src="http://assets.sunlightlabs.com.s3.amazonaws.com/blog/sitegeist-tech-data/contaminated.png" style="width: 200px; float: right; margin: 1em 0 1em 1em;"&gt;&lt;/p&gt;

&lt;p&gt;Have you ever looked at the Environment pane and seen the oozey alert? No? Good for you, you haven't been around any contaminated sites! We wanted Sitgeist to include a number of "easter eggs" that show up or change only in certain contexts. It's an easy way to make apps like this fun and more interesting. Due to time contraints, we were only able to get a few things like this added, but there are some ideas for more that we might work on.&lt;/p&gt;

&lt;p&gt;We loaded a bunch of locations that the EPA considers to be contaminated; &lt;a href="http://www.epa.gov/superfund/"&gt;Superfund sites&lt;/a&gt; and more. If a site exists within five miles of your current location, we show the scary ooze creeping down the Environment pane. Clicking on this ooze will show you a page from the EPA site with more information on the contaminated location.&lt;/p&gt;

&lt;h4&gt;Third-party Services&lt;/h4&gt;

&lt;p&gt;All of the previously mentioned data sets have been loaded, cached and are sitting in a database on our server. The other third-party services, whether due to terms of service restrictions or praticality, are loaded on demand via their respective APIs. These services include &lt;a href="http://www.yelp.com/"&gt;Yelp&lt;/a&gt; and &lt;a href="https://foursquare.com/"&gt;Foursquare&lt;/a&gt; for local business information and &lt;a href="http://darkskyapp.com/"&gt;Dark Sky&lt;/a&gt; and &lt;a href="http://www.wunderground.com/"&gt;Weather Underground&lt;/a&gt; for weather data.&lt;/p&gt;

&lt;h3&gt;A Note On Locations&lt;/h3&gt;

&lt;p&gt;The data sets that power Sitegeist use a number of different geographries: census tracts, ZIP codes, top-n closest from a location and things within a certain radius of a location. As far as the app is concerned, you exist at a point in a 2d plane. It would be inaccurate to say you are in San Francisco if the data you are seeing is tied only to a few small blocks within the city. Likewise, the names of some geographies make no sense at all; do you really care that you are in Census Tract 107?&lt;/p&gt;

&lt;p&gt;The goal of the design was to present a clean and readable infographic-style display of the information. Geographic metadata was left out because it only contributed to clutter that most people don't care about. I emphasize, &lt;em&gt;most&lt;/em&gt; people; the people that do care have definitely let us know!&lt;/p&gt;

&lt;p&gt;It's definitely an interesting problem. If you have any insights, let me know in the comments.&lt;/p&gt;

&lt;h3&gt;The Web Servers&lt;/h3&gt;

&lt;p&gt;&lt;img src="http://assets.sunlightlabs.com.s3.amazonaws.com/blog/sitegeist-tech-data/boundary.png" style="float: right; width: 300px; margin: 1em 0 1em 1em;"&gt;&lt;/p&gt;

&lt;p&gt;All of this wonderful data resides on our servers. When you select a location, the latitude and longitude are passed along with the ID of the pane you want to view. Of the various geographies we keep track of (census tracts, ZIP codes, etc.), the boundaries of any shape that contains your location is found. This uses a customized version of Chicago Tribune's &lt;a href="https://github.com/newsapps/django-boundaryservice"&gt;boundaryservice&lt;/a&gt;. We then match those geographic boundaries with any data we have, making calls to third-party APIs as needed. The collected data is rendered into templates and returned to you as the beautiful infographics you see in the app.&lt;/p&gt;

&lt;p&gt;In order to reduce load on our servers, the rendered panes are cached for a short period of time. So if you make a request again for the same pane in the same location, we'll just return the pane instead of making the API and database calls again. The cache timeout is very short, 10 minutes, so you'll always get relatively fresh data.&lt;/p&gt;

&lt;p&gt;Mobile GPS devices are not incredibly accurate and your location can move around many meters even if you are standing still. I really don't want to waste the time doing database calls just because your phone corrected itself and moved you 5 meters to the left. The cache takes your location into account as well and will return cached data if you are within a certain "snap radius" of a previously rendered and cached request.&lt;/p&gt;

&lt;h3&gt;Android and iOS&lt;/h3&gt;

&lt;p&gt;Finally we have the mobile apps themselves. Though the infographic panes are HTML, the apps are native to their respective platforms. At the start of the project I looked at the possibility of using a pure HTML or cross-platform framework, but none of them allowed for the responsiveness and system integration we wanted. Going native allows us to take advantage of platform features like social sharing, maps, GPS and others while still using a single platform for generating the pane content. The other advantage of using web views for content is that we can make certain updates to the application, such as adding a new data view, without having to update both apps or waiting for Apple's App Store approval process.&lt;/p&gt;

&lt;h3&gt;Admin Dashboard&lt;/h3&gt;

&lt;p&gt;&lt;img src="http://assets.sunlightlabs.com.s3.amazonaws.com/blog/sitegeist-tech-data/dev.png" style="width: 90%; margin: 2em auto; display: block; "&gt;&lt;/p&gt;

&lt;p&gt;I also created an admin panel that allows us to load up panes for any combination of devices, locations and such. This has been incredibly useful when testing new features or troubleshooting data associated with specific locations.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;The &lt;a href="https://github.com/sunlightlabs/sitegeist"&gt;source for the Android app is available now&lt;/a&gt; and the web and iOS projects will be published within the next week.&lt;/p&gt;

&lt;p&gt;If you haven't yet, check out &lt;a href="http://sitegeist.sunlightfoundation.com/"&gt;Sitegeist&lt;/a&gt;!&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/yM1fSPlkX20" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jeremy</dc:creator><pubDate>Mon, 07 Jan 2013 10:32:41 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2013/sitegeist-data-tech/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2013/sitegeist-data-tech/</feedburner:origLink></item><item><title>Sunlight from the Command Line</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/cIcUsLQVDDg/</link><description>&lt;p&gt;Are you as big of a fan of &lt;a href="http://sunlightfoundation.com/people/ptagliamonte/"&gt;Paul Tagliamonte&lt;/a&gt; as I am? If so, then you are well aware of &lt;a href="https://github.com/sunlightlabs/python-sunlight"&gt;python-sunlight&lt;/a&gt;, his awesome, comprehensive Python API client for &lt;a href="http://services.sunlightlabs.com/"&gt;Sunlight's APIs&lt;/a&gt;. The latest release includes a command line interface, or CLI, so you can interact with the Sunlight APIs directly from the shell. Cool, right?&lt;/p&gt;

&lt;p&gt;To get started, install python-sunlight:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ pip install sunlight
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Next, you'll need a Sunlight API key. Visit &lt;a href="http://services.sunlightlabs.com/"&gt;http://services.sunlightlabs.com/&lt;/a&gt; and register for a key. Once you have your key, save it to a convenient location in your home directory.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ echo "THIS_IS_MY_KEY" &amp;gt; ~/.sunlight.key
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;python-sunlight will automatically look in this file for your key. No need to set it in your program or pass it as a parameter to the command!&lt;/p&gt;

&lt;p&gt;Okay, let's do this…&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sunlight
Usage: sunlight &amp;lt;service&amp;gt; &amp;lt;method&amp;gt; [&amp;lt;args&amp;gt;, ...]
Available services:
    influenceexplorer
    openstates
    capitolwords
    congress
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So just by running &lt;code&gt;sunlight&lt;/code&gt; we can see the endpoints that are available to us. Let's use the congress endpoint to find out information about representatives.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sunlight congress
Usage: sunlight &amp;lt;service&amp;gt; &amp;lt;method&amp;gt; [&amp;lt;args&amp;gt;, ...]
Available methods:
    committee_detail
    committees
    committees_for_legislator
    districts_for_lat_lon
    districts_for_zip
    legislator_search
    legislators
    legislators_for_lat_lon
    legislators_for_zip
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And there are all of the methods of the &lt;code&gt;sunlight.congress&lt;/code&gt; module, exposed to us via the command line. Arguments to the API are passed via parameters.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sunlight congress legislators --state MD
[
  {
    "district": "6", 
    "title": "Rep", 
    "eventful_id": "", 
    "in_office": true, 
    "state": "MD", 
    "crp_id": "N00002017", 
    "official_rss": "", 
    …
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The response from &lt;code&gt;sunlight&lt;/code&gt; is the full JSON returned by the API call. It could potentially be a lot of data! Let's write that JSON to a file so that we can do some more interesting stuff without making repeated calls to the API.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sunlight congress legislators --state MD &amp;gt; marylanders.json
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now that we've got that saved, let's use the awesome JSON filtering utility, &lt;a href="http://stedolan.github.com/jq/"&gt;jq&lt;/a&gt;, to get at just the data we want. Install &lt;code&gt;jq&lt;/code&gt; via the binaries provided on the web site or by &lt;code&gt;brew install jq&lt;/code&gt; on Macs. The simplest thing to do with &lt;code&gt;jq&lt;/code&gt; is to just return the JSON as-is.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cat marylanders.json | jq '.'
[
  {
    "district": "6", 
    "title": "Rep", 
    "eventful_id": "", 
    "in_office": true, 
    "state": "MD", 
    "crp_id": "N00002017", 
    "official_rss": "", 
    …
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But what I really want to know are the names and Twitter screen names for each Marylander.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cat marylanders.json | jq '.[] | {firstname,lastname,twitter_id}'
{
  "twitter_id": "",
  "lastname": "Bartlett",
  "firstname": "Roscoe"
},
{
  "twitter_id": "SenatorCardin",
  "lastname": "Cardin",
  "firstname": "Benjamin"
},
{
  "twitter_id": "",
  "lastname": "Cummings",
  "firstname": "Elijah"
}
…
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Well that was easy! And just to be clear, you can use &lt;code&gt;jq&lt;/code&gt; directly with the &lt;code&gt;sunlight&lt;/code&gt; command, no need to write the JSON response to disk first!&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sunlight congress legislators --state MD | jq '.[] | {firstname,lastname,twitter_id}'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Anyway, that's just a quick intro. You should check out the &lt;a href="http://stedolan.github.com/jq/manual/"&gt;jq documentation&lt;/a&gt; and definitely &lt;a href="http://services.sunlightlabs.com/"&gt;register for a Sunlight API key&lt;/a&gt;!&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/cIcUsLQVDDg" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jeremy</dc:creator><pubDate>Wed, 19 Dec 2012 11:09:52 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2012/sunlight-cli/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2012/sunlight-cli/</feedburner:origLink></item><item><title>Influence Explorer API, Now with Entity Type Filtering</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/hgmMsXmGdDw/</link><description>&lt;p&gt;Last year, we wrote about &lt;a href="http://sunlightlabs.com/blog/2011/influence-data-apis/"&gt;how to get access to our political influence data via the Influence Explorer API&lt;/a&gt;. That post is a great introduction, but here's an update on a small, but significant, improvement we've made to make accessing our data easier.&lt;/p&gt;

&lt;p&gt;Calling our entity name search endpoint is the first step to accessing any of our aggregate data or wealth of metadata about political or politically influential entities, provided you don't already have the advantage of an ID from &lt;a href="http://www.opensecrets.org"&gt;CRP&lt;/a&gt;, &lt;a href="http://www.followthemoney.org"&gt;NIMSP&lt;/a&gt;, or the &lt;a href="http://bioguide.congress.gov"&gt;Congressional Bioguide&lt;/a&gt; (there's another endpoint to do ID-based lookups). For politicians, for instance, we've got not only campaign finance data and aggregate totals, but bios, photos, seats, districts, years run, and more. Say you want to find a head shot of John Boehner and how much money he raised in 2012. Play along with our examples, and &lt;a href="http://services.sunlightlabs.com/accounts/register/"&gt;go get a Sunlight API key&lt;/a&gt;. In the past, you would have just searched by name:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transparencydata.com/api/1.0/entities.json?&amp;amp;apikey=&amp;lt;your-key&amp;gt;&amp;amp;search=John+Boehner
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;From that, you'd get the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[
    {
        name: "Friends of John Boehner",
        count_given: 8,
        firm_income: 0,
        count_lobbied: 0,
        seat: null,
        total_received: 0,
        state: null,
        lobbying_firm: false,
        count_received: 0,
        party: null,
        total_given: 5701,
        type: "organization",
        id: "53ecb66245a64f0b989259b63f50bd9f",
        non_firm_spending: 0,
        is_superpac: false
    },
    {
        name: "John Boehner (R)",
        count_given: 0,
        firm_income: 0,
        count_lobbied: 0,
        seat: "federal:house",
        total_received: 31928263,
        state: "OH",
        lobbying_firm: null,
        count_received: 28267,
        party: "R",
        total_given: 0,
        type: "politician",
        id: "f990d08287c34c389cfabe3cbf3dde99",
        non_firm_spending: 0,
        is_superpac: null
    }
]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You'll notice that two entities were returned. Previously, you would have had to do an extra filtering step in your application code to get only the politician entity, which, nine times out of ten, is the one you'd want. And you'd either realize this up front, or spend some amount of time being frustrated as to why you were getting multiple results or assuming the first result returned was the one you wanted, when it wasn't really. No more! Now, you can search directly for the type of entity (politician/organization/individual/industry) you want.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transparencydata.com/api/1.0/entities.json?&amp;amp;apikey=&amp;lt;your-key&amp;gt;&amp;amp;search=John+Boehner&amp;amp;type=politician
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And the result:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[
    {
        name: "John Boehner (R)",
        count_given: 0,
        firm_income: 0,
        count_lobbied: 0,
        seat: "federal:house",
        total_received: 31928263,
        state: "OH",
        lobbying_firm: null,
        count_received: 28267,
        party: "R",
        total_given: 0,
        type: "politician",
        id: "f990d08287c34c389cfabe3cbf3dde99",
        non_firm_spending: 0,
        is_superpac: null
    }
]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Isn't that better? You'll still have to do your due diligence to make sure you're getting what you really expect, but now you can just grab that ID and go get the full set of metadata for Boehner for that campaign finance total and headshot:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transparencydata.com/api/1.0/entities/f990d08287c34c389cfabe3cbf3dde99.json?apikey=&amp;lt;your-key&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With the results (JSON abbreviated):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    {
        name: "John Boehner (R)",
        totals: {
            2012: {
                recipient_amount: 12852363,
            },  
        },
        type: "politician",
        id: "f990d08287c34c389cfabe3cbf3dde99",
        metadata: {
            photo_url: "http://assets.sunlightfoundation.com/moc/100x125/B000589.jpg",
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We hope you'll let us know if there are other ways we can make our data easier to use. Now, if you haven't already done so, &lt;a href="http://services.sunlightlabs.com/accounts/register/"&gt;go forth and get an API key&lt;/a&gt; and start building cool stuff with our &lt;a href="http://data.influenceexplorer.com/api"&gt;Influence Explorer API&lt;/a&gt;!&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/hgmMsXmGdDw" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Alison Rowland</dc:creator><pubDate>Fri, 07 Dec 2012 13:38:08 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2012/ie-api-entity-type-filter/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2012/ie-api-entity-type-filter/</feedburner:origLink></item><item><title>Election Night Snapshots</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/fYPMeYuQ0OY/</link><description>&lt;p&gt;Late last week, we had an idea: election night was sure to be a confusing rush, and the closeness of the race in many states made result-reporting snafus seem possible. And the basic shape of election night data is deeply lousy (outside of media outlets that subscribe to the AP). Why not try to keep a record of what election authorities disclosed, so we could have a closer look in the morning?&lt;/p&gt;

&lt;p&gt;Drew and Kaitlin adapted some of &lt;a href="http://politwoops.sunlightfoundation.com"&gt;Politwoops&lt;/a&gt;' code, and we quickly researched as many URLs for election results as we could. The results are necessarily incomplete: some states don't aggregate their results in a central place, and some only went online during election night. But we managed a pretty good start.&lt;/p&gt;

&lt;p&gt;We also decided to throw in a few media outlets, just for fun (and then a few more once the results began to come in and it became clear which sites might have to back off their predictions in a maximally screenshottable way). The results includes screenshots and HTML snapshots. Everything's timestamped -- the frequency of snapshots was determined both by when things changed (we only recorded new snapshots when something had been updated) and by the round-robining of the system (which was somewhat variable, based on the speed of the screenshot process).&lt;/p&gt;

&lt;p&gt;We haven't had time to go through all of this data, but we'd love your help (or just the chance to satisfy your curiosity). So if you'd like, head over to &lt;a href="http://electionshots.sunlightlabs.com/"&gt;electionshots.sunlightlabs.com&lt;/a&gt;. The content is organized by state -- ZZ is the media. We're working to put together bulk download options now.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/fYPMeYuQ0OY" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tom Lee</dc:creator><pubDate>Wed, 07 Nov 2012 14:36:27 -0500</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2012/election-night-snapshots/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2012/election-night-snapshots/</feedburner:origLink></item><item><title>New Districts in Sunlight Congress API</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/pyHHASHNSMU/</link><description>&lt;p&gt;Good news if you were one of the users waiting on our Congress API to support the newly drawn congressional districts!  As of today it is possible to pass the &lt;tt&gt;districts=2012&lt;/tt&gt; flag to the Congress API's &lt;a href="http://services.sunlightlabs.com/docs/congressapi/districts.getDistrictFromLatLong/"&gt;districts.getDistrictFromLatLong&lt;/a&gt; method to instruct the API to return the district in effect for the 2012 elections.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sunlightlabs.com/blog/2012/redistricting-2012/"&gt;As you may recall&lt;/a&gt;, the data wasn't previously available in a uniform format but thanks to a recent &lt;a href="https://www.census.gov/rdo/data/113th_congressional_and_new_state_legislative_district_plans.html"&gt;data release from Census.gov&lt;/a&gt; we were able to get this data loaded, with days to spare until the election.&lt;/p&gt;

&lt;p&gt;The default will remain to return the districts in effect for purposes of representation until the swearing in of the 113th Congress in January 2013 at which point the temporary  &lt;tt&gt;districts=2012&lt;/tt&gt;  flag will be retired (but it will be safe to continue to pass the parameter indefinitely).&lt;/p&gt;

&lt;p&gt;This change does not yet impact other Sunlight API methods. The &lt;a href="http://openstates.org"&gt;Open States&lt;/a&gt; district methods and the ZIP code related methods will be updated as that data is available, &lt;a href="http://sunlightlabs.com/blog/2012/redistricting-2012/"&gt;as described in our last update&lt;/a&gt;.) &lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/pyHHASHNSMU" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">James Turk</dc:creator><pubDate>Fri, 02 Nov 2012 11:54:35 -0400</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2012/new-districts-in-sunlight-congress-api/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2012/new-districts-in-sunlight-congress-api/</feedburner:origLink></item><item><title>Keeping Authentication Simple</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/z6fy-slCJmM/</link><description>&lt;p&gt;
The point of publishing bulk data is so it can be reused as widely as possible. This is particularly true for government data, which belongs to the public.
&lt;/p&gt;

&lt;p&gt;
Government agencies can sometimes also be concerned with ensuring the authenticity of their legal information - especially when the data might be seen as an official source. It breaks down into two major concerns: integrity (ensuring the text is accurate), and origin (proving it's official). A lot of people are used to the "wax seal" model of authenticity - the experience of opening a PDF and seeing that the document is signed and official. This model quickly breaks down for distributing bulk data.
&lt;/p&gt;

&lt;p&gt;
The goals of ease of use and authentication are frequently presented as being in tension, but that tension is overstated. There are straightforward approaches to guaranteeing authenticity of bulk data that don't get in the way of reuse.
&lt;/p&gt;

&lt;p&gt;
In fact, the Government Printing Office currently employs one of these approaches&amp;mdash;cryptographic hashes&amp;mdash;for every document it publishes on behalf of the United States. In their &lt;a href="http://www.gpo.gov/fdsys/"&gt;FDSys&lt;/a&gt; system, every document (take &lt;a href="http://www.gpo.gov/fdsys/search/pagedetails.action?browsePath=111%2Fhr%2F%5B6200%3B6299%5D&amp;granuleId=&amp;packageId=BILLS-111hr6289ih&amp;fromBrowse=true"&gt;H.R. 6289&lt;/a&gt; as an example) has an accompanying "PREMIS" file. 
&lt;/p&gt;

&lt;p&gt;
This PREMIS file contains a &lt;a href="http://en.wikipedia.org/wiki/SHA-2"&gt;SHA-256&lt;/a&gt; hash for every version of H.R. 6289 that GPO publishes - plain text, XML, and PDF. After you've downloaded any of those files, you can re-calculate the hash, using standard open source tools, to verify that the file is identical to what GPO published. PREMIS is an open standard, hosted &lt;a href="http://www.loc.gov/standards/premis/"&gt;at the Library of Congress&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
GPO &lt;a href="http://www.gpo.gov/pdfs/authentication/authenticationwhitepaper2011.pdf"&gt;described their approach&lt;/a&gt; in June of 2011, saying that data integrity should not get in the way of reuse:
&lt;/p&gt;

&lt;blockquote style="padding: 10px 50px 20px 50px; line-height: 22px;"&gt;
"The publication of the cryptographic hash values in the PREMIS metadata file, and the way FDsys structures its public URLs, makes it possible for machines to crawl and use this information to determine content integrity in bulk...
&lt;/blockquote&gt;
&lt;blockquote style="padding: 10px 50px 20px 50px; line-height: 22px;"&gt;
GPO recognizes the importance of ensuring that any content integrity verification method for XML content, such as digital signatures, should be structured so as not to interfere with data re-use or re-purposing. GPO is also committed to the principle of employing open, internationally recognized standards whenever possible."
&lt;/blockquote&gt;

&lt;p&gt;
In December, California's Office of Legislative Counsel wrote a &lt;a href="http://hapax.qc.ca/CA_Authentication_WhitePaper_Dec2011.pdf"&gt;report on authentication&lt;/a&gt; documenting several approaches, from a signature-based approach to a range of proprietary solutions. Signatures are, not surprisingly, a vastly cheaper solution.
&lt;/p&gt;

&lt;p&gt;
As for guaranteeing that your signatures themselves are legitimate, the OLC presents another simple, cheap solution - using SSL:
&lt;/p&gt;

&lt;blockquote style="padding: 10px 50px 20px 50px; line-height: 22px;"&gt;
"The primary limitation of hashes is that, by themselves, they do not authenticate the origin of the document...However, hashes can be used in combination with a secure Web site to authenticate documents. For instance, the hash for a document can be posted on a secure Web site, and consumers of the document can verify that the hash from the Web site matches the hash computed directly from the document."
&lt;/blockquote&gt;

&lt;p&gt;
This is quite true, although verifying origin like this is only necessary if you're concerned about someone &lt;a href="http://en.wikipedia.org/wiki/Man-in-the-middle_attack"&gt;pretending to be&lt;/a&gt; the owner of the document. It's difficult to see this as a concern for all but the most security-sensitive government materials. GPO doesn't use SSL for hosting anything on FDSys, apparently also not seeing it as an issue at present. (However, PREMIS supports verifying origin [&lt;a href="http://www.loc.gov/standards/premis/v2/premis-report-2-2.pdf"&gt;docs&lt;/a&gt;], by including a version of the hash signed with a public key.)
&lt;/p&gt;

&lt;p&gt;
The OLC also acknowledges another possibility: that an agency could legitimately decide that the whole issue is moot.
&lt;/p&gt;

&lt;blockquote style="padding: 10px 50px 20px 50px; line-height: 22px;"&gt;
"The validation problem could be simplified if XML validation by the general public is determined to be unnecessary. Large document consumers that desire authenticated XML documents could be required to implement their own validation solutions."
&lt;/blockquote&gt;

&lt;p&gt;
What's clear in all of this is that authenticity can be simple, inexpensive, and optional -- for both sides. Government bodies publishing bulk data that feel it's important to guarantee authenticity can provide signatures. Consumers who don't care about authenticity don't have to, and those that do can easily verify those signatures. Everyone can win.
&lt;/p&gt;

&lt;p style="color: #888"&gt;
Edit: Replaced "signature" with "hash" where appropriate, and clarified that PREMIS does support origin 
verification.
&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/z6fy-slCJmM" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Eric Mill</dc:creator><pubDate>Wed, 31 Oct 2012 13:24:05 -0400</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2012/keeping-authentication-simple/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2012/keeping-authentication-simple/</feedburner:origLink></item><item><title>Art Hack Day Boston</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/TUxb_KUov-o/</link><description>A couple of weekends ago, I attended &lt;a href = "http://arthackday.net/big_bad_lab/" &gt; Art Hack Day&lt;/a&gt;
at Harvard. The event was put on by the
&lt;a href = "http://thebigbadlab.com/" &gt;Big Bad Lab&lt;/a&gt;, where we were proud to provide
the data for the event through our &lt;a href = "http://services.sunlightlabs.com/" &gt;APIs&lt;/a&gt;
It was a pleasure getting to show off
&lt;a href = "https://github.com/sunlightlabs/python-transparencydata" &gt;python-transparencydata&lt;/a&gt;
and (ok, I'm biased here) &lt;a href = "https://github.com/sunlightlabs/python-sunlight" &gt;python-sunlight&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
The creative juices were really flowing throughout the 3-day hackathon, folks
created some amazing projects, such as a vending machine bill acceptor that
sucked 100 dollar bills at the same rate money has been spent this election
cycle (it was really fast!), political speech karaoke, and a voting booth that just
can't accept "no". Some other creative projects included an app that would
process tweets from federal Senators and Representatives (twitter IDs found off the
&lt;a href = "http://services.sunlightlabs.com/docs/Sunlight_Congress_API/)" &gt;Sunlight Congress API&lt;/a&gt;), and a
bulletin board was covered with flyers featuring (real!) numbers for lobbyists discovered through
&lt;a href = "http://influenceexplorer.com/" &gt;Influence Explorer&lt;/a&gt;.&lt;br /&gt;

&lt;br /&gt;

&lt;a href = "https://www.facebook.com/photo.php?fbid=287602871350359&amp;set=pb.276620312448615.-2207520000.1350075706&amp;type=1&amp;src=https%3A%2F%2Fsphotos-b.xx.fbcdn.net%2Fhphotos-ash3%2F561767_287602871350359_484376580_n.jpg&amp;size=960%2C940" &gt;&lt;img width = "200px" src = "https://s3.amazonaws.com/assets.sunlightlabs.com/blog/art-hack-day/8.jpg" /&gt;&lt;/a&gt;

&lt;a href = "https://www.facebook.com/photo.php?fbid=287594038017909&amp;set=pb.276620312448615.-2207520000.1350075535&amp;type=1&amp;src=https%3A%2F%2Fsphotos-a.xx.fbcdn.net%2Fhphotos-ash3%2F551972_287594038017909_559159617_n.jpg&amp;size=640%2C960" &gt;&lt;img width = "200px" src = "https://s3.amazonaws.com/assets.sunlightlabs.com/blog/art-hack-day/7.jpg" /&gt;&lt;/a&gt;

&lt;a href = "https://www.facebook.com/photo.php?fbid=287594231351223&amp;set=pb.276620312448615.-2207520000.1350075535&amp;type=1&amp;src=https%3A%2F%2Fsphotos-b.xx.fbcdn.net%2Fhphotos-snc7%2F486810_287594231351223_1959861037_n.jpg&amp;size=960%2C640" &gt;&lt;img width = "200px" src = "https://s3.amazonaws.com/assets.sunlightlabs.com/blog/art-hack-day/6.jpg" /&gt;&lt;/a&gt;

&lt;a href = "https://www.facebook.com/photo.php?fbid=287606744683305&amp;set=pb.276620312448615.-2207520000.1350075706&amp;type=1&amp;src=https%3A%2F%2Fsphotos-b.xx.fbcdn.net%2Fhphotos-ash4%2F391629_287606744683305_2068960424_n.jpg&amp;size=717%2C960" &gt;&lt;img width = "200px" src = "https://s3.amazonaws.com/assets.sunlightlabs.com/blog/art-hack-day/5.jpg" /&gt;&lt;/a&gt;

&lt;a href = "https://www.facebook.com/photo.php?fbid=287593984684581&amp;set=pb.276620312448615.-2207520000.1350075535&amp;type=1&amp;src=https%3A%2F%2Fsphotos-a.xx.fbcdn.net%2Fhphotos-snc6%2F264315_287593984684581_2139109342_n.jpg&amp;size=960%2C640" &gt;&lt;img width = "200px" src = "https://s3.amazonaws.com/assets.sunlightlabs.com/blog/art-hack-day/4.jpg" /&gt;&lt;/a&gt;

&lt;a href = "https://www.facebook.com/photo.php?fbid=287594251351221&amp;set=pb.276620312448615.-2207520000.1350075535&amp;type=1&amp;src=https%3A%2F%2Fsphotos-b.xx.fbcdn.net%2Fhphotos-snc6%2F253088_287594251351221_1371976855_n.jpg&amp;size=960%2C640" &gt;&lt;img width = "200px" src = "https://s3.amazonaws.com/assets.sunlightlabs.com/blog/art-hack-day/3.jpg" /&gt;&lt;/a&gt;

&lt;a href = "https://www.facebook.com/photo.php?fbid=287594588017854&amp;set=pb.276620312448615.-2207520000.1350075706&amp;type=1&amp;src=https%3A%2F%2Fsphotos-b.xx.fbcdn.net%2Fhphotos-ash4%2F251016_287594588017854_1601294059_n.jpg&amp;size=640%2C960" &gt;&lt;img width = "200px" src = "https://s3.amazonaws.com/assets.sunlightlabs.com/blog/art-hack-day/2.jpg" /&gt;&lt;/a&gt;

&lt;a href = "https://www.facebook.com/photo.php?fbid=287602584683721&amp;set=pb.276620312448615.-2207520000.1350075706&amp;type=1&amp;src=https%3A%2F%2Fsphotos-b.xx.fbcdn.net%2Fhphotos-snc6%2F248559_287602584683721_679643948_n.jpg&amp;size=960%2C640" &gt;&lt;img width = "200px" src = "https://s3.amazonaws.com/assets.sunlightlabs.com/blog/art-hack-day/1.jpg" /&gt;&lt;/a&gt;

&lt;br /&gt;
&lt;br /&gt;
All in all, it was a great event, and really showed what can get done with so
many great hackers, artists and researchers stuffed together for a weekend!
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/TUxb_KUov-o" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Paul Tagliamonte</dc:creator><pubDate>Wed, 17 Oct 2012 13:10:20 -0400</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2012/art-hack-day-boston/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2012/art-hack-day-boston/</feedburner:origLink></item><item><title>A Cite for Sore Eyes</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/2JDyj2WALBU/</link><description>&lt;p&gt;&lt;style&gt;
.alignright {
float: right;
margin-left: 20px;
margin-right: -20px;
margin-bottom: 20px;
border: 2px solid #E3DFDF;
padding: 1px;
}
&lt;/style&gt;&lt;/p&gt;

&lt;p&gt;Earlier this week the annual &lt;a href="http://blog.law.cornell.edu/lvi2012/"&gt;Law Via the Internet conference&lt;/a&gt; was hosted by the Legal Information Institute at Cornell University. The conference schedule featured talks on a range of policy and technical subjects, including the topic of extracting legal citations from text and understanding them programmatically, which arises whenever people need to determine the relevance of legal documents based on the authorities they cite. Recognizing citations in text is also a vexing but fun programming challenge, so I was excited to see this issue figure prominently in at least four separate talks.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://blog.law.cornell.edu/lvi2012/"&gt;&lt;img class="alignright" title="Cornell at Night" src="http://blog.law.cornell.edu/lvi2012/files/2012/01/FallWindow.jpg" alt="" width="350" height="230" style="margin-left: 15px;"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So why is parsing citations from text a difficult problem? I'll attempt to briefly explain. Beyond serving as attribution for a quotation or claim, a citation's only purpose is to identify the cited authority without too much ambiguity. For that reason, early citation manuals tended to be simple and straightforward. The citation formats they prescribed were calibrated only for authorship and consumption by humans&amp;mdash;never machines. The &lt;a href="https://www.legalbluebook.com/img/PastVersions/USC01.pdf"&gt;first edition of the Blue Book&lt;/a&gt; in 1926 was a mere 26 pages in length. But by 2010, the eighteenth edition of the Blue Book had swollen to over 500 pages, hundreds of which struggled to respond to the vagaries of citing electronic resources like websites and CD-ROM sets. In addition to the constant evolution in citation rules, the idiosyncratic field of deciding how sources should be cited tends to be populated with (there's no nice way to put this) weird people who experience an aesthetic mania in arguing about&amp;mdash;and ultimately revising&amp;mdash;existing rules. So citation rules also tend to be in constant flux and vary widely depending on which manual you consult. A third factor is that complicated, fluid rules are difficult to comply with, so writers frequently bend or break the rules to save time. With these challenges in mind, here are some techniques that presenters at the conference are using to deal with these issues:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://bitbucket.org/mlissner/"&gt;Michael Lissner&lt;/a&gt;, the co-founder and lead developer of &lt;a href="http://courtlistener.com"&gt;CourtListener.com&lt;/a&gt; explained the approach that he and collaborator &lt;a href="https://bitbucket.org/rowyn/"&gt;Rowyn McDonald&lt;/a&gt; are using to parse case citations from federal court opinions. They start by identifying a subset of acronyms for federal bound volumes, like "U.S.", "F. 2d", "F. 3d" and then search to the left and right to identify the volume number and page number, respectively (source code &lt;a href="https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/src/970f8822a92e/alert/citations/find_citations.py?at=default"&gt;here&lt;/a&gt;). This is slightly different from the approach taken by Sunlight Labs' own &lt;a href="https://twitter.com/konklone"&gt;Eric Mill&lt;/a&gt; in &lt;a href="https://github.com/unitedstates/citation"&gt;citation.js&lt;/a&gt;, which defines a set of regular expressions and callback functions to process the matches each finds.&lt;/p&gt;

&lt;p&gt;Another presentation by Marc-André Morissette, the director of technology at &lt;a href="http://www.lexum.org/en/"&gt;Lexum&lt;/a&gt;, focused on specialized techniques his firm is using to parse statute and session law citations from court decisions in Canada. This presentation touched on the challenges of programmatically identifying human-written citations that deviate from the styles prescribed in citation manuals. To correctly recognize bogus references, Morissette's team precomputes all valid citations to the resources they want to extract from the cases, then computes a variety of  misspellings and mistakes that commonly appear in the targeted citation types. They then generate a state machine&amp;mdash;a decision tree, basically&amp;mdash;and scan over the input text one token at a time, testing each against the current node in the decision tree. When a terminal node is reached, a match is found. This technique has the interesting (but also potentially limiting) characteristic of substantively verifying the citation during the extraction operation: if a citation doesn't get matched against the decision tree, it must not refer to a known resource.&lt;/p&gt;

&lt;p&gt;I was also very pleased to see that &lt;a href="http://scholar.google.com/citations?user=nGEWZbkAAAAJ&amp;amp;hl=en"&gt;Anurag Achary&lt;/a&gt;, the lead engineer of Google Scholar, was not only in attendance at LVI 2012, but presenting. In his presentation, he detailed a number of challenges his team faced in publishing and cross-referencing court opinions (and scholarly journals) on Google Scholar&amp;mdash;issues such as how to react to changes in citation formats across jurisdictions and different time periods, how to distinguish short citations ("in the New York decision we held...") from normal prose ("she lived in New York"), and how to resolve ambiguous citations, like "ibid" to the actual source they refer to. Anurag understandably stopped short of disclosing his recipe for the secret sauce that makes Google Scholar so, so delicious, but hinted that his go-to tools included a huge corpus of test data, unit tests, Bayes' theorem, and plenty of common sense. He spoke in very simple and down-to-earth terms about his work, but I still came away from his talk dumbfounded by what he has accomplished with a team of less than three full-time developers.&lt;/p&gt;

&lt;p&gt;A fourth and really impressive presentation was given by &lt;a href="http://gsl-nagoya-u.net/faculty/cache/gsliF_Bennett.html"&gt;Frank Bennet&lt;/a&gt;, an associate professor at the Nagoya University law school in Japan. Whereas the other presentations focused on identifying citations in text and using them to link related documents, Frank focused on the opposite but closely related task of transforming bibliographic data into properly formatted legal citations than can be inserted into documents. Confronted with the problem of law students so confounded by the complexity of the Blue Book that they simply resorted to plagiarizing citations, Frank set out to adapt the &lt;a href="http://www.zotero.org/"&gt;Zotero&lt;/a&gt; research platform for use with legal resources. Zotero is a browser extension that enables users to select resources from sites like Amazon and Google Scholar and store their bibliographic information in a local database. Frank's work goes one step further and provides style definitions that &lt;a href="https://bitbucket.org/fbennett/citeproc-js/wiki/History"&gt;citeproc-js&lt;/a&gt; can use to format the citations for publication. The session included an eye-popping demonstration of a related word-processing utility that enables users to insert formatted citations from sources in their local Zotero database.&lt;/p&gt;

&lt;p&gt;So what does any of this have to do with Sunlight Labs? Parsing citations effectively is an inevitable task for applications that need to identify related sources in text. For us, that issue is arising in a few places. Eric's latest project, &lt;a href="https://scout.sunlightfoundation.com/"&gt;Scout&lt;/a&gt;, aims to provide users with alerts for proposed regulations, and one of the more effective ways to determine whether a proposed regulation affects your interests is to search its text for references to statutes you care about. In an attempt to build out this feature, Eric has started working on parsing US Code citations from the text of bills and proposed regulations. Similarly, the &lt;a href="http://openstates.org"&gt;Open States project&lt;/a&gt; is also looking for ways to identify what statutes would be impacted by legislative bills so we can link users to appropriate sites (if they exist) and also enable querying of bills by the law sections they would affect. In our case, these tend to be prosaic references rather than formal citations, which lends an additional and terrifying dimension to the problem. But from a software developer's perspective, parsing citations involves the same family of techniques regardless of the underlying resources you're looking for.&lt;/p&gt;

&lt;p&gt;All in all, LVI was a great conference and I hope to see these projects enjoy continued success.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/2JDyj2WALBU" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Thom Neale</dc:creator><pubDate>Fri, 12 Oct 2012 12:28:29 -0400</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2012/cite-for-sore-eyes/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2012/cite-for-sore-eyes/</feedburner:origLink></item><item><title>Politwoops - Now With More Open Source Flavor</title><link>http://feedproxy.google.com/~r/sunlightlabs/blog/~3/jtDUMbz6Y1s/</link><description>&lt;p&gt;Thursday we released a revamped design of &lt;a href="http://politwoops.sunlightfoundation.com"&gt;Politwoops&lt;/a&gt;! &lt;img src="http://politwoops.sunlightfoundation.com/images/twoopsters.png" alt="twoopsters" style="float:left; margin: 10px;" /&gt; You can read about some of the changes to the features and content in &lt;a href="http://sunlightfoundation.com/blog/2012/10/04/politwoops-overhaul-in-time-for-the-elections/"&gt;Nicko's post on the main blog&lt;/a&gt;, but the main news of the day is that we've now &lt;a href="https://github.com/sunlightlabs/politwoops-tweet-collector"&gt;open sourced&lt;/a&gt; &lt;a href="https://github.com/sunlightlabs/politwoops"&gt;the code&lt;/a&gt;. That means, you can create your own Politwoops instance to track the deleted tweets of any subset of people you fancy.&lt;/p&gt;

&lt;p&gt;Of course the original &lt;a href="http://politwoops.nl/"&gt;concept&lt;/a&gt; and codebase is from the &lt;a href="http://openstate.eu/"&gt;Open State Foundation&lt;/a&gt;. The code is split into two github repositories: one for the &lt;a href="https://github.com/sunlightlabs/politwoops-tweet-collector"&gt;workers that track and import deleted tweets and their related screenshots&lt;/a&gt; and the other that &lt;a href="https://github.com/sunlightlabs/politwoops"&gt;includes the Rails project for the web frontend&lt;/a&gt;. The two parts simply share a MySQL database. The original OSF codebase for the web frontend included several non-Politwoops related features that were superfluous for our requirements, so we've stripped them out of our fork to make a slimmed down, twoops-only, version. However, our fork of the repository for the backend diverges relatively little from the OSF one. &lt;/p&gt;

&lt;p&gt;Special shout out to Caitlin Weber for the fun, new design!&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/sunlightlabs/blog/~4/jtDUMbz6Y1s" height="1" width="1"/&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Kaitlin Devine</dc:creator><pubDate>Tue, 09 Oct 2012 11:34:20 -0400</pubDate><guid isPermaLink="false">http://sunlightlabs.com/blog/2012/politwoops-now-with-more-open-source-flavor/</guid><feedburner:origLink>http://sunlightlabs.com/blog/2012/politwoops-now-with-more-open-source-flavor/</feedburner:origLink></item></channel></rss>
