<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
  xmlns:content="https://purl.org/rss/1.0/modules/content/"
  xmlns:wfw="https://wellformedweb.org/CommentAPI/"
  xmlns:dc="https://purl.org/dc/elements/1.1/"
  xmlns:atom="https://www.w3.org/2005/Atom"
  xmlns:sy="https://purl.org/rss/1.0/modules/syndication/"
  xmlns:slash="https://purl.org/rss/1.0/modules/slash/"
  >
<channel>
    <title xml:lang="en"></title>
    <atom:link type="application/atom+xml" href="https://tonypujals.io/feed/" rel="self"/>
    <link>https://tonypujals.io</link>
    <pubDate>Fri, 09 Jul 2021 00:26:17 +0000</pubDate>
    <lastBuildDate>Fri, 09 Jul 2021 00:26:17 +0000</lastBuildDate>
    <language>en-US</language>
    <description>Node, MongoDB, REST API, Open Data</description>
    <item>
        <title>Harvesting SNAP Data with Cloud9 and MongoDB (Part 3) - Supporting Geospatial Queries</title>
        <link>https://tonypujals.io/2013/11/19/harvesting-snap-data-with-cloud9-and-mongodb-part-3-supporting-geospatial-queries/</link>
        <pubDate>Tue, 19 Nov 2013 00:00:00 +0000</pubDate>
        <dc:creator></dc:creator>
        <category>mongodb</category>
        <category>nosql</category>
        <category>api</category>
        <guid isPermaLink="false">/2013/11/19/harvesting-snap-data-with-cloud9-and-mongodb-part-3-supporting-geospatial-queries</guid>
        <description><![CDATA[ <h2 id="overview">Overview</h2>

<p>In my <a href="/2013/11/17/harvesting-snap-data-with-cloud9-and-mongodb-part-2/">last post</a>, we imported data into a mongo database stored in MongoLab’s cloud. We used Cloud9 for the convenience of a ready-to-run environment for using the mongoimport utility and the mongo shell for testing a few queries.</p>

<p>In this post, we’ll discuss performing an post-import update on our stores collection that allows us to exploit mongo’s support for geospatial queries. This will support the use case in which we can find the stores that are closest to a particular location. For example, given a user’s current location, we want to find participating SNAP retailers within a desired range, such as three miles.</p>

<p>Because the update logic is a bit unwieldy to type interactively in the mongo shell, we’ll see how to use Cloud9 to save our update logic as a script that we can load and execute into the shell when we want to run it.</p>

<p>We’ll use the same technique to create a query that we can load into the shell as a function that we can execute with a few arguments so that we can easily run parameterized queries based on a location and desired range to find nearby stores.</p>

<h2 id="geospatial-queries-with-mongo">Geospatial queries with mongo</h2>

<p>Mongo provides a number of ways of supporting various types of geospatial queries. You can learn more about it starting <a href="http://docs.mongodb.org/manual/applications/geospatial-indexes/">here</a>. For our use case, we can use mongo’s <a href="http://docs.mongodb.org/manual/core/2d/">2d indexes</a> to find distances between points based on geometric projections to a flat Euclidean plane, which is suitable for relatively small distances. If our requirements were more stringent and involved greater distances, it would be better to use mongo’s 2dsphere support.</p>

<p>The data set that we imported from the USDA included longitude and latitude fields. We need to convert that data into a location field of a type expected by mongo to support indexing on geo data.</p>

<p>For each store document stored in our stores collection, we’ll add a new field that uses the store’s latitude and longitude formatted as coordinate pairs as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{ loc: [ lng, lat ] }
</code></pre></div></div>

<p>With this specific coordinate format, it isn’t necessary but we will want to create a 2d index on these points to improve query performance, as discussed in the next section.</p>

<h2 id="geo-support-update-script">Geo support update script</h2>

<p>Right-click on the snapdata folder in the Cloud9 Workspace Files pane and create a new file. Call it <code class="language-plaintext highlighter-rouge">postimport.js</code>. Double-click to open it in its own tab. Enter the following script:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>print('updating snapdb for geo query support...');
db.stores.find().forEach(function(store) {
  db.stores.update(
    { _id: store._id },
    { $set: { loc: [store.longitude, store.latitude] }
  });
});

print('indexing...');
db.stores.ensureIndex({ "loc": "2d" });
</code></pre></div></div>

<p>You can ignore the little warning icons in Cloud9 that tell you you’re using an undeclared variable. We’ll load the script into the mongo shell where db is a valid object. You should see something similar to the following:</p>

<p><img src="/assets/img/snap/mongo-postimport-01.png" alt="postimport" title="postimport" /></p>

<p>What the script does is apply an update function for every store document in the stores collection. After mongo is finished iterating through the entire collection, we then use the ensureIndex function to apply an index on GeoJSON points that we added.</p>

<h4 id="running-the-update-script">Running the update script</h4>

<p>In the Cloud9 Terminal, start the mongo shell; then load our newly created update script:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; load('postimport.js')
</code></pre></div></div>

<p>This will load and execute the script, as shown in the screenshot below. Be warned that this process will take around five minutes! There are different strategies we can pursue to make this a faster process. One would involve running our script on the server instead of from the shell in Cloud9 – but that would require administrative permission not available with the shared plan we’re using with MongoLab; other possibilities include transforming the data before it was imported, but for this blog series I wanted to keep the focus on things we can do to manipulate data within mongo.</p>

<p><img src="/assets/img/snap/mongo-postimport-02.png" alt="load script" title="load script" /></p>

<h4 id="verify-the-update">Verify the update</h4>

<p>In the mongo shell, run the following query to examine a store document:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; db.stores.find().pretty()
</code></pre></div></div>

<p>From the output, we can verify that our collection has indeed been updated</p>

<h2 id="creating-a-geo-query">Creating a geo query</h2>

<p>Now that the update script has run and added a new field to each store, our stores collection supports geospatial queries.</p>

<p>We can create another script to make it easy to query the stores collection with a function that accepts coordinates and a desired range. Create a new file called queries.js and add the following code:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// find stores within a target range (in miles)
function findStoresInRange(lat, lng, range, limit) {
    return db.runCommand({
        geoNear: 'stores',
        near: [lng, lat],          // longitude first
        spherical: true,
        distanceMultiplier: 3959,  // return results in miles
        maxDistance: range / 3950, // range in radians
        limit: limit || 100        // 100 is the default anyway
    });
}
</code></pre></div></div>

<p>This function allows us to easily make parameterized queries. From the mongo shell, we can load this function using <code class="language-plaintext highlighter-rouge">load('queries.js')</code>.</p>

<p>At this point, you should see something like the following:</p>

<p><img src="/assets/img/snap/mongo-postimport-03.png" alt="load queries" title="load queries" /></p>

<p>We need to supply a latitude and longitude as a reference point for the query. We could use the stores collection itself to find a store, get its location, then run our query to find all the other SNAP stores close to it.</p>

<p>But we can easily use <a href="https://developers.google.com/maps/documentation/geocoding/">Google’s geocoding service</a> to give us the location coordinates for an address we submit.</p>

<p>Open another browser window and enter a query similar to this example, which uses the address of Hacker Dojo in Mountain View. Use <code class="language-plaintext highlighter-rouge">+</code> in place of spaces to separate the address components.</p>

<p><a href="http://maps.googleapis.com/maps/api/geocode/json?sensor=false&amp;address=599+fairchild+dr,mountain+view,ca">http://maps.googleapis.com/maps/api/geocode/json?sensor=false&amp;address=599+fairchild+dr,mountain+view,ca</a></p>

<p>Look for the geometry property in the results and note the latitude and longitude values.</p>

<p><img src="/assets/img/snap/mongo-postimport-04.png" alt="geocode result" title="geocode result" /></p>

<h4 id="running-our-query">Running our query</h4>

<p>With our queries script loaded into the shell (described in the last section), we can now execute a query like the one shown below:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; q = findStoresInRange(37.4028344, -122.0496017, 3, 100)
</code></pre></div></div>

<p><img src="/assets/img/snap/mongo-postimport-05.png" alt="query" title="query" /></p>

<p>Stores will be printed in the shell, but since we saved the result in a variable, we can take examine it in pieces. To display how many results were returned, enter:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; q.results.length
</code></pre></div></div>

<p>The results are sorted in distance order in units of miles (since our function provided a <code class="language-plaintext highlighter-rouge">distanceMultiplier</code>, which multiplies radians by the radius of the earth in miles to give us unit-friendly results).</p>

<p>We can take a look at individual results, such as the first one, like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; q.results[0]
</code></pre></div></div>

<p>or the last:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; q.results[64]
</code></pre></div></div>

<p><img src="/assets/img/snap/mongo-postimport-06.png" alt="examining results" title="examining results" /></p>

<h2 id="next-steps">Next steps</h2>

<p>In the past few posts, we covered the process of importing csv data into a mongo database. To save the trouble of setting up our own environment, we leveraged two excellent cloud services: <a href="http://mongo.com/">MongoLab</a> for hosting the database, and <a href="https://c9.io/">Cloud9</a> for providing a development environment that allowed us to interactively work with SNAP data.</p>

<p>In this post, we discussed how we tuned the data to support queries that allow us to locate stores within a desired range of a specific location, and we took a look at what those queries look like.</p>

<p>We now have a solid foundation upon which to build an API to help citizens find SNAP stores near to them.</p>

<p>In the next few posts, we’ll discuss:</p>

<ul>
  <li>Building an API layer to respons to various REST requests for data using <a href="http://nodejs.org/" target="_blank">Node</a></li>
  <li>Building a mobile-first web app to provide a user interface that consumes the API using Node, <a href="http://expressjs.com/" target="_blank">Express</a>, <a href="https://github.com/visionmedia/ejs">EJS templates for Node</a>, and <a href="http://getbootstrap.com/">Bootstrap</a></li>
  <li>Moving the API and app into production and setting up an automated job to harvest updated SNAP data on a daily basis</li>
  <li>The open source projects on GitHub that provide a working SNAP API and app; a bit about the architecture and design philosophy; and details how to fork them and keep your fork current, contribute changes, and submit and follow issues.</li>
</ul>
 ]]></description>
    </item><item>
        <title>Harvesting SNAP Data with Cloud9 and MongoDB (Part 2)</title>
        <link>https://tonypujals.io/2013/11/17/harvesting-snap-data-with-cloud9-and-mongodb-part-2/</link>
        <pubDate>Sun, 17 Nov 2013 00:00:00 +0000</pubDate>
        <dc:creator></dc:creator>
        <category>mongodb</category>
        <category>nosql</category>
        <category>api</category>
        <guid isPermaLink="false">/2013/11/17/harvesting-snap-data-with-cloud9-and-mongodb-part-2</guid>
        <description><![CDATA[ <h2 id="overview">Overview</h2>

<p>In my <a href="/2013/11/15/harvesting-snap-data-with-cloud9-and-mongodb-part-1/">last post</a>, we set up accounts with MongoLab and Cloud9 to prepare to import SNAP data. In this post, we’ll walk through the process of downloading and extracting the data, importing it into our mongo database, and performing a few simple queries.</p>

<h2 id="recap">Recap</h2>

<p>At this point, we have a newly created, empty database hosted in MongoLab. Our database user and connection information has been saved in our notes in Cloud9. Your actual connection will be different than the one shown in the following screenshots.</p>

<h4 id="mongolab-database-connection-and-user">MongoLab database connection and user</h4>

<p><img src="/assets/img/snap/mongolab-import-snap.png" alt="Database info" title="Database info" /></p>

<h4 id="cloud9-notes">Cloud9 notes</h4>

<p><img src="/assets/img/snap/mongolab-create-db-7.png" alt="Cloud9 notes" title="Cloud9 notes" /></p>

<h2 id="accessing-mongo">Accessing mongo</h2>

<p>Let’s confirm that we can access our database. In Cloud9, maximize the Terminal window and then start the mongo shell with the connection string for the database that we previously saved. Remember your connection string will be different.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mongo ds053198.mongolab.com:53198/snapdb -u apiuser -p snapdb
</code></pre></div></div>

<p><img src="/assets/img/snap/mongolab-import-snap-02.png" alt="Access Mongo" title="Access Mongo" /></p>

<p>If everything is correct, you will get a welcome message and prompt.</p>

<p><img src="/assets/img/snap/mongolab-import-snap-03.png" alt="Mongo Shell" title="Mongo Shell" /></p>

<p>Enter <code class="language-plaintext highlighter-rouge">ctrl-c</code> to exit the shell for now.</p>

<h3 id="using-aliases-to-make-connecting-easier">Using aliases to make connecting easier</h3>

<p>To make it easy to access the mongo shell with the same connection string each time, you can enter the following at the terminal prompt to create aliases for the mongo and mongoimport commands. The aliases will be appended to your Cloud9 workspace .bashrc file. The final command will reload .bashrc into the current environment so you don’t have to log out and back in for the aliases to take effect.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo 'alias mongo="mongo ds053198.mongolab.com:53198/snapdb -u apiuser -p snapdb"' &gt;&gt; ~/.bashrc
$ echo 'alias mongoimport="mongoimport -h ds053198.mongolab.com:53198 -d snapdb -u apiuser -p snapdb"' &gt;&gt; ~/.bashrc
$ . ~/.bashrc
</code></pre></div></div>

<p>Note that the mongoimport doesn’t work with the same url connection string format you can use for mongo, which has the database name appended after the last slash. With mongoimport the host portion used with -h is separate from the database name you specify with the -d option.</p>

<p>Confirm you can log into snapdb at MongoLab by entering <code class="language-plaintext highlighter-rouge">mongo</code> at the command line, as shown below:</p>

<p><img src="/assets/img/snap/mongolab-import-snap-08.png" alt="Aliases" title="Aliases" /></p>

<h2 id="download-snap-data">Download SNAP Data</h2>

<p>Now that we’re set up with mongo, let’s download the current SNAP data file using curl:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ curl -O www.snapretailerlocator.com/export/Nationwide.zip
</code></pre></div></div>

<p><img src="/assets/img/snap/mongolab-import-snap-04.png" alt="Download SNAP Data" title="Download SNAP Data" /></p>

<p>Unzip the downloaded file to extract the compressed CSV file:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ unzip -o Nationwide.zip
</code></pre></div></div>

<p><img src="/assets/img/snap/mongolab-import-snap-05.png" alt="Unzip SNAP Data" title="Unzip SNAP Data" /></p>

<p>Don’t try to open the extracted CSV file into the Cloud9 editor – it’s too big! You can examine the top of the file to get a feel for what’s in it using the head command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ head store_locations_2013_10_29.csv
</code></pre></div></div>

<p>By default, you’ll get a dump of the first ten lines of the file. You can see the first line is a header with field names for the data.</p>

<p><img src="/assets/img/snap/mongolab-import-snap-06.png" alt="Inspect CSV File" title="Inspect CSV file" /></p>

<h2 id="import-snap-data">Import SNAP Data</h2>

<p>Now we can import the data into our mongo database using the <a href="https://docs.mongodb.org/manual/reference/program/mongoimport/#bin.mongoimport">mongoimport</a> utility, which can import data in either <a href="https://json.org/">json</a>, <a href="https://en.wikipedia.org/wiki/Comma-separated_values">csv</a>, <a href="https://en.wikipedia.org/wiki/Tab-separated_values">tsv</a> formats.</p>

<p>The header field names are not in an object-friendly format, so we’ll provide mongoimport with the actual field names we’d like to use when it stores the data. We’ll store the field names in a variable that we’ll use with the mongoimport command so that the command itself doesn’t look so long.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ FIELDS="storeName,longitude,latitude,address1,address2,city,state,zip5,zip4,county"
</code></pre></div></div>

<p>Now we’ll start the import process. This will take a couple of minutes.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mongoimport --type csv --fields $FIELDS --collection stores --file store_locations_2013_10_29.csv
</code></pre></div></div>

<p>Note this requires setting up the mongoimport that we discussed earlier. If you didn’t do that, you will also need to supply -h, -d, -u, and -p arguments for host, database, username, and password respectively.</p>

<p>If all is going well, you will see output as shown below:</p>

<p><img src="/assets/img/snap/mongolab-import-snap-09.png" alt="mongoimport" title="mongoimport" /></p>

<p>When mongoimport is finished, it will report the number of objects it imported, as shown below. For the most recent data, that’s 249,961 stores.</p>

<p><img src="/assets/img/snap/mongolab-import-snap-10.png" alt="mongoimport results" title="mongoimport results" /></p>

<h2 id="querying-mongo">Querying mongo</h2>

<p>Now that we’ve imported data, we can go back into the mongo shell and take a look at it. Again, assuming you set up the mongo alias discussed earlier, simply enter mongo at the terminal prompt.</p>

<p>Enter <code class="language-plaintext highlighter-rouge">show collections</code> in the shell; you should see the newly created stores collection in the output. Enter <code class="language-plaintext highlighter-rouge">db.stores.stats()</code> to see the metadata associated with the stores collection. Your results should be similar to this:</p>

<p><img src="/assets/img/snap/mongolab-import-snap-11.png" alt="collection stats" title="collection stats" /></p>

<h4 id="listing-stores">Listing stores</h4>

<p>Enter <code class="language-plaintext highlighter-rouge">db.stores.find().limit(5).pretty()</code> to list the first 5 stores with nice formatting.</p>

<p><img src="/assets/img/snap/mongolab-import-snap-12.png" alt="list stores" title="list stores" /></p>

<h4 id="removing-the-header">Removing the header</h4>

<p>You probably notice that the first document in the collection isn’t a valid store – it’s the first line of csv file that we imported, which consisted of a header of field names.</p>

<p>There is an option for mongoimport that allows us to specify that the first line is a header, but unfortunately mongoimport will then use those field names instead of the ones we supplied.</p>

<p>We also could have processed the csv file to remove the first line before importing, but because it’s a big file, I chose not to perform any type of operation that might have involved creating a temporary file or copy. For example, the following would have worked, but would have resulted in a copy of the file:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tail =n +2 store_locations_2013-10-29.csv &gt;&gt; output.csv
</code></pre></div></div>

<p>An easy way to remove this document is remove it after it has been imported. We can remove it with a simple query like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; db.stores.remove({ storeName: 'Store_Name' })
</code></pre></div></div>

<p>And confirm that it’s gone:</p>

<p><img src="/assets/img/snap/mongolab-import-snap-13.png" alt="delete header" title="delete header" /></p>

<h2 id="finding-stores">Finding stores</h2>

<p>Let’s find stores in Chicago:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; db.stores.find({ city: 'Chicago' }).pretty()
</code></pre></div></div>

<p>This will result in a list with a lot of results. So many, in fact, that they will not be returned all at once. The result of find is a cursor that can be iterated, and the shell will prompt you to type “it” if you wish to iterate to see more results, as shown here:</p>

<p><img src="/assets/img/snap/mongolab-import-snap-14.png" alt="iterate results" title="iterate results" /></p>

<p>Just how many results are there? You can find out in the shell either before you submit the find query:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; db.stores.count({ city: 'Chicago' })
</code></pre></div></div>

<p>or by getting the count for the cursor that find returned:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; db.stores.find({ city: 'Chicago' }).count()
</code></pre></div></div>

<p>Either method returns the same answer, which for the data I used was 2,402 stores.</p>

<p>If I had wanted to save the cursor in a variable, then I could have done something like this (which will still output results):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; chicago_stores = db.stores.find({ city: 'Chicago' })
...
&gt; chicago_stores.count()
2402
</code></pre></div></div>

<h2 id="next-steps">Next steps</h2>

<p>We now have data and we can perform queries, such as find the stores within a city or zip code, and with a bit of effort and math (see using the <a href="https://www.movable-type.co.uk/scripts/latlong.html">Havesine formula</a>), we can compute distances to stores to find those within a desired range of a specific location.</p>

<p>However, mongo has great support for fast geo queries that we can exploit as long as we provide location data in a specific format. In our next post, we cover the post-import updates we perform to take advantage of this facility.</p>
 ]]></description>
    </item><item>
        <title>Harvesting SNAP Data with Cloud9 and MongoDB (Part 1)</title>
        <link>https://tonypujals.io/2013/11/15/harvesting-snap-data-with-cloud9-and-mongodb-part-1/</link>
        <pubDate>Fri, 15 Nov 2013 00:00:00 +0000</pubDate>
        <dc:creator></dc:creator>
        <category>mongodb</category>
        <category>nosql</category>
        <category>api</category>
        <guid isPermaLink="false">/2013/11/15/harvesting-snap-data-with-cloud9-and-mongodb-part-1</guid>
        <description><![CDATA[ <h2 id="overview">Overview</h2>

<p>In my <a href="/2013/11/13/about-snap-data/">last post</a>, I discussed SNAP data. In this post, we will cover the process of harvesting USDA SNAP CSV data. We will store data in MongoDB and augment it to support fast geo queries. These geo queries will be used to locate nearby SNAP stores from a particular location.</p>

<p>We will exploit two Cloud-hosted services to reduce the amount of effort it takes to create an environment for working with the data. For our purposes, the free plan of both services will be adequate.</p>

<ul>
  <li>
    <p><a href="https://c9.io/">Cloud9</a> is a Cloud-hosted IDE. We will use the shell environment it provides to perform the tasks necessary to download SNAP data and import it into a mongo database.</p>
  </li>
  <li>
    <p><a href="https://mongolab.com">MongoLab</a> provides a Cloud-hosted MongoDB service. We will use it to import and serve SNAP data to support our queries.</p>
  </li>
</ul>

<h2 id="cloud9">Cloud9</h2>

<p>Go to <a href="https://c9.io/">Cloud9</a> and sign up using your <a href="https://github.com/">GitHub</a> account.</p>

<h4 id="create-a-new-workspace">Create a new workspace</h4>

<p><img src="/assets/img/snap/c9-create-new-workspace.png" alt="Create a Cloud9 workspace" title="Create a Cloud9 workspace" /></p>

<h4 id="name-your-new-workspace">Name your new workspace</h4>

<p>Call it “snapdb” and select the “Custom” option (which gives you a barebones workspace with just a README.md file). Cloud9 will take a moment while it provisions your new cloud-hosted development workspace.</p>

<p><img src="/assets/img/snap/c9-create-new-workspace-2.png" alt="Name your Cloud9 workspace" title="Name your Cloud9 workspace" /></p>

<h4 id="open-your-new-workspace">Open your new workspace</h4>

<p>Click the “Start Editing” button. Your workspace will open in a new window (or tab) and initialize.</p>

<p><img src="/assets/img/snap/c9-create-new-workspace-3.png" alt="Open workspace" title="Open workspace" /></p>

<h4 id="prepare-your-workspace">Prepare your workspace</h4>

<p>Right-click on README.md and choose “Delete” from the context menu to get rid of it.</p>

<p><img src="/assets/img/snap/c9-create-new-workspace-4.png" alt="Delete README" title="Delete README" /></p>

<p>Right-click on the snapdata workspace folder and create a new file. Call it “notes.md”. Double-click to open the file. We’ll use the file to save notes about our mongo database that we’ll create in the next step.</p>

<h2 id="mongolab">MongoLab</h2>

<h4 id="create-a-new-database">Create a new database</h4>

<p>Go to <a href="https://mongolab.com">MongoLab</a> and sign up for an account. Once you’re signed in, create a new database.</p>

<p><img src="/assets/img/snap/mongolab-create-db.png" alt="Create database" title="Create database" /></p>

<h4 id="create-a-new-database---configuration-options">Create a new database - configuration options</h4>

<p>Choose a free sandbox database on Amazon; name it “snapdb”.</p>

<p><img src="/assets/img/snap/mongolab-create-db-2.png" alt="Configure database" title="Configure database" /></p>

<h4 id="open-the-database-page">Open the database page</h4>

<p><img src="/assets/img/snap/mongolab-create-db-3.png" alt="Open database page" title="Open database page" /></p>

<h4 id="add-a-database-user">Add a database user</h4>

<p>To access the database from a client, we need to create a database user. The database user credentials will be used when connecting to snapdb. Click the link that says “Click here”.</p>

<p><img src="/assets/img/snap/mongolab-create-db-4.png" alt="Create a database user" title="Create a database user" /></p>

<h4 id="set-database-user-credentials">Set database user credentials</h4>

<p>Choose “apiuser” for the username and “snapdb” for the password.</p>

<p><img src="/assets/img/snap/mongolab-create-db-5.png" alt="Set database user credentials" title="Set database user credentials" /></p>

<h4 id="save-the-database-information">Save the database information</h4>

<p>You’re done with database creation. Save the connection information and database credentials in your Cloud9 notes.</p>

<p><img src="/assets/img/snap/mongolab-create-db-6.png" alt="Database info" title="Database info" /></p>

<p><img src="/assets/img/snap/mongolab-create-db-7.png" alt="Database notes" title="Database notes" /></p>

<h2 id="next-steps">Next steps</h2>

<p>We now have everything in place to get SNAP data and loaded into a mongo database. In our next post, we’ll walk through the steps of downloading, importing, transforming, and querying SNAP data interactively from the Cloud9 shell.</p>
 ]]></description>
    </item><item>
        <title>About SNAP Data</title>
        <link>https://tonypujals.io/2013/11/13/about-snap-data/</link>
        <pubDate>Wed, 13 Nov 2013 00:00:00 +0000</pubDate>
        <dc:creator></dc:creator>
        <category>mongodb</category>
        <category>nosql</category>
        <category>api</category>
        <guid isPermaLink="false">/2013/11/13/about-snap-data</guid>
        <description><![CDATA[ <h2 id="overview">Overview</h2>

<p>In my <a href="/2013/11/12/announcing-snapfinder/">last post</a>, I introduced <a href="https://snapfinder.org/">SNAPfinder</a>, a mobile-first web app, along with a separate framework and REST API, for helping low-income citizens find nearby participating SNAP stores (SNAP was formerly known as the “food stamps” program).</p>

<p>In this series, we will do a deep dive on the technology stack and tools involved, which, depending on the specific portion of the stack, included Node, MongoDB, Express and EJS templates, Twitter Bootstrap, and Digital Ocean.</p>

<h2 id="what-is-snap">What is SNAP?</h2>

<p><a href="https://www.fns.usda.gov/snap/">SNAP</a> stands for the Supplemental Nutrition Assistance Program mandated by the Federal Government and supervised by states to help millions of individuals and families who need financial assistance to buy food. Formerly known as the Food Stamp Program, SNAP provides an economic benefit as well as well as serving to eliminate hunger.</p>

<p>Today, instead of food stamps, eligible recipients are issued Electronic Benefit Transfer (EBT) cards. These cards can be used to purchase food and beverage items authorized by the USDA’s SNAP program at retailers that accept EBT transactions.</p>

<h2 id="about-snap-data">About SNAP data</h2>

<p>The <a href="https://www.fns.usda.gov/">USDA Food and Nutrition Service</a> maintains and publishes a list of retailers (vendors) across the country that welcome SNAP EBT customers. Data is stored in files in Comma Separated Value (CSV) format, which typically can be read by spreadsheet programs, such as Microsoft Excel. The file is updated regularly and can be downloaded from the following URL:</p>

<p><a href="https://www.snapretailerlocator.com/export/Nationwide.zip">https://www.snapretailerlocator.com/export/Nationwide.zip</a></p>

<h3 id="resource-format">Resource Format</h3>
<p>The URL retrieves a zip file that stores a single CSV file with a .csv extension. The name of the file represents the publish date; for example:</p>

<p><code class="language-plaintext highlighter-rouge">store_locations_2013_11_13.csv</code></p>

<p>When we import the data into mongo, we will map the field names to more JSON friendly names as shown in the following table.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">CSV Field</th>
      <th style="text-align: left">Map to</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">Store_Name</td>
      <td style="text-align: left">storeName</td>
    </tr>
    <tr>
      <td style="text-align: left">Longitude</td>
      <td style="text-align: left">longitude</td>
    </tr>
    <tr>
      <td style="text-align: left">Latitude</td>
      <td style="text-align: left">latitude</td>
    </tr>
    <tr>
      <td style="text-align: left">Address</td>
      <td style="text-align: left">address1</td>
    </tr>
    <tr>
      <td style="text-align: left">Address Line #2</td>
      <td style="text-align: left">address2</td>
    </tr>
    <tr>
      <td style="text-align: left">City</td>
      <td style="text-align: left">city</td>
    </tr>
    <tr>
      <td style="text-align: left">State</td>
      <td style="text-align: left">state</td>
    </tr>
    <tr>
      <td style="text-align: left">Zip5</td>
      <td style="text-align: left">zip5</td>
    </tr>
    <tr>
      <td style="text-align: left">Zip4</td>
      <td style="text-align: left">zip4</td>
    </tr>
  </tbody>
</table>

<h2 id="next-steps">Next steps</h2>

<p>In the next post, we’ll leverage two Cloud-host sevices for working with SNAP data. We’ll use <a href="https://c9.io/">Cloud9</a> to work with an editor and shell that we can use for downloading, importing, and querying mongo data, and we’ll use <a href="https://mongolab.com/">MongoLab</a> for storing the data that we harvest from the USDA.</p>

<p>We’ll also discuss how we update the data as part of the import process to provide the fields necessary to support mongo geo queries. This will make it easy to answer queries, such as “find the stores within range of a particular location.”</p>
 ]]></description>
    </item><item>
        <title>Announcing SNAPfinder</title>
        <link>https://tonypujals.io/2013/11/12/announcing-snapfinder/</link>
        <pubDate>Tue, 12 Nov 2013 00:00:00 +0000</pubDate>
        <dc:creator></dc:creator>
        <guid isPermaLink="false">/2013/11/12/announcing-snapfinder</guid>
        <description><![CDATA[ <p><img src="/assets/img/snaplogo.png" alt="SNAP Logo" title=" SNAP Logo" /></p>

<p>Back in May I <a href="/2013/05/29/kicking-off-an-open-source-project-with-the-gsa/">blogged</a> about brainstorming with Gwynne Kostin, Director of Digital Services Innovation Center at the GSA about kicking off an open source project to support the <a href="https://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-government-strategy.pdf">Digital Government Strategy</a>.</p>

<p>Gwynne introduced me to Gray Brooks, Sr. API Strategist at the GSA Digital Services Innovation Center, and together the three of us got together on a number of weekly Google Hangouts to bounce our ideas off each other until we had a strategy in place for what we wanted to accomplish around Gwynne’s idea to help out low income citizens. These were our goals:</p>

<ul>
  <li>To make a positive contribution to society by leveraging the Government’s commitment to Open Data</li>
  <li>To demonstrate a successful model of Open Source collaboration between a Federal Agency and the private sector development community</li>
  <li>To provide a well-documented example of leveraging a technology stack based on <a href="https://nodejs.org/">Node</a> and <a href="https://www.mongodb.org/">MongoDB</a> as a particularly effective way of building high-performance, scalable, cloud-hosted REST APIs for broad consumption.</li>
</ul>

<p>Currently, the <a href="https://www.fns.usda.gov/">USDA Food and Nutrition Service</a> administers <a href="https://www.fns.usda.gov/snap/">SNAP</a>. If a person wants to find retail stores participating in SNAP, the USDA hosts a flash-based webpage at <a href="https://www.snapretailerlocator.com/">www.snapretailerlocator.com</a>. We wanted to leverage the same USDA data, and architecturally, we wanted to provide a reference implementation that demonstrated a clear separation of responsibilities between:</p>

<ul>
  <li>an open source framework for dealing with SNAP data</li>
  <li>a REST API that exposes SNAP functionality to be consumed by app developers</li>
  <li>a <a href="https://zurb.com/word/mobile-first">mobile-first</a> web app that uses the API to provide a snappy (pun intended) UI</li>
</ul>

<p>Today I’m pleased to announce <a href="https://snapfinder.org/">SNAPfinder</a>, available on the web and on your mobile phone at the following link:</p>

<p><a href="https://snapfinder.org/">https://snapfinder.org/</a></p>

<p>In terms of actual development time, it took just under two weeks for two contributors (myself and <a href="https://tennysusantobi.blogspot.com/">Tenny Susanto</a>, Sr. Software Engineer at <a href="http://www.coupons.com/">Coupons.com</a>) to get this out the door with three separate open source projects hosted on GitHub:</p>

<ul>
  <li><a href="https://github.com/tonypujals/snapfinder">snapfinder</a> - a mobile-first web app that uses the API (Node, Express, EJS, Bootstrap 3)</li>
  <li><a href="https://github.com/tonypujals/snapfinder-api">snapfinder-api</a> - REST API for querying SNAP data (Node, Express)</li>
  <li><a href="https://github.com/tonypujals/snapfinder-lib">snapfinder-lib</a> - provides underlying framework for importing and querying SNAP data (Node, MongoDB)</li>
</ul>

<p>The project includes an import job that harvests USDA data daily and imports retail stores in the Mongo database. Mongo provides special support for geo queries, which along with its support for JavaScript makes it very well-suited for pairing with Node for an extremely responsive and scalable API. The web app itself leverages also leverages Node and Express along with EJS templates, as well as Bootstrap 3 for the mobile-first UI support.</p>

<p>Over the next week, I plan to take a deeper dive into the technology and how we built the solution. In the meantime, I welcome you to check out the site and the various GitHub projects. I’m certain the USDA would welcome more contributors as we flesh out the analytics, enhance the UI and add a few more features. And, of course, all involved are certainly delighted to see Open Data project participation.</p>
 ]]></description>
    </item><item>
        <title>Kicking Off an Open Source Project with the GSA</title>
        <link>https://tonypujals.io/2013/05/29/kicking-off-an-open-source-project-with-the-gsa/</link>
        <pubDate>Wed, 29 May 2013 00:00:00 +0000</pubDate>
        <dc:creator></dc:creator>
        <guid isPermaLink="false">/2013/05/29/kicking-off-an-open-source-project-with-the-gsa</guid>
        <description><![CDATA[ <p><img src="/assets/img/GSA.png" alt="GSA" title=" GSA" /></p>

<p>Last month I <a href="/2013/04/26/the-digital-gov-strategy-in-the-age-of-github/">blogged</a> about the Digital Government Strategy in the age of GitHub. Inspired by our meetings with Lisa Schlosser (Deputy Federal CIO) and Haley Van Dyck (Senior Advisor to the US CIO), I wrote an <a href="https://github.com/tonypujals/challenge-api">API</a> and co-developed an <a href="https://github.com/tonypujals/challenge-ios">app</a> for bringing Federal challenges to the iPhone.</p>

<p>My intention was to turn over the projects to the GSA, and last week I was able to meet up with Gwynne Kostin (Director of Digital Services Innovation Center in the Office of Citizen Services &amp; Innovative Technologies) and Sheila Campbell (Director, Center for Excellence in Digital Government), along with other members of the team (Karen Trebon, Jacob Parcell, and Tammi Marcoullier), who took time out of a hectic launch day for some great discussion on fostering a development community around Federal open data.</p>

<p>We ended up brainstorming on a project that Gwynne wants to kick off as a model of open source collaboration between the GSA and open source contributors in the private sector from start to finish, concept through live deployment.</p>

<p>The project concept is to make it easy for citizens to find:</p>

<ul>
  <li>Free or low cost health clinics</li>
  <li>Section 8 housing</li>
  <li>Stores that accept SNAP (“food stamp”) cards</li>
</ul>

<p>The project will exemplify a GitHub-oriented workflow around source control, wiki, issue tracking, continuous integration, and automated deployment. It will also serve as a recipe for others in leveraging various Amazon Cloud PaaS providers for hosting the API and data and for performing analytics.</p>

<p>To get the project off the ground, I created a repo on GitHub (now <a href="https://github.com/tonypujals/snapfinder-lib">here</a>). We’ll begin fleshing out the concept and adding issues for work that needs to get done. If you’re interested in contributing, go ahead and start watching the repo now for issue notifications.</p>
 ]]></description>
    </item><item>
        <title>Big Data Is...</title>
        <link>https://tonypujals.io/2013/05/29/big-data-is/</link>
        <pubDate>Wed, 29 May 2013 00:00:00 +0000</pubDate>
        <dc:creator></dc:creator>
        <guid isPermaLink="false">/2013/05/29/big-data-is</guid>
        <description><![CDATA[ <p><img src="/assets/img/matrix.png" alt="Big Data" title=" Big Data" /></p>

<p>“Big data is (bovine excrement)” — or <a href="http://www.cio.com.au/article/462961/big_data_bs_obama_campaign_cto/">so says Harper Reed</a>, former CTO for the Obama for America campaign, speaking today at the opening session of the CeBIT trade show in Sydney.</p>

<p>He makes a valid point — the term itself no longer means exactly what it once did. As Reed explains, storing vast amounts of data back in 2007 “was literally hard.” As far as he’s concerned, big data should just be called data; storage is not the issue — it’s the analysis that provides answers to questions that matters.</p>

<p>And he’s right. In the first place, most organizations are not actually storing the amount of data that the term “big data” brings to mind. And for those who are — and for those who do anticipate the need to scale — there are rich solutions today that were difficult or prohibitively expensive to implement five years ago.</p>

<p>Nevertheless, “Big Data” will continue to be relevant as a way to emphasize the difference between traditional data processing techniques and applications and current perspectives on the scale of data that can be captured, analyzed and visualized; the explosion of mobile devices and machine-generated sensor data; and analytics tools and techniques for providing actionable insights in real time.</p>
 ]]></description>
    </item><item>
        <title>The Digital Government Strategy in the Age of GitHub</title>
        <link>https://tonypujals.io/2013/04/26/the-digital-gov-strategy-in-the-age-of-github/</link>
        <pubDate>Fri, 26 Apr 2013 00:00:00 +0000</pubDate>
        <dc:creator></dc:creator>
        <guid isPermaLink="false">/2013/04/26/the-digital-gov-strategy-in-the-age-of-github</guid>
        <description><![CDATA[ <p><img src="/assets/img/digitalgov2-150x150.png" alt="Digital Government" title="Digital Government" /></p>

<p>If you haven’t yet heard about the <a href="https://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-government-strategy.pdf">Digital Government Strategy</a>, you might be a bit surprised by what you learn, and you will certainly be impressed. Of course, the federal government’s use of information technology and data has a long history, as does its investment in research into technological advances that ultimately become a part of the fabric of our daily lives (thank you, <a href="https://www.darpa.mil/WorkArea/DownloadAsset.aspx?id=2554">DARPA</a>).</p>

<p>The reason why you will be impressed with the Digital Government Strategy, however, is the extensive and comprehensive commitment the federal government has made to embrace the elements of successful Silicon Valley startup culture and open, social, collaborative development to deliver information and services to the citizens of this country — anywhere, anytime, on any platform or device.</p>

<p>Our team paid a visit to <a href="https://dorobekinsider.com/2011/06/01/dorobekinsider-kundra-names-schlosser-as-deputy-federal-cio/">Lisa Schlosser</a>, Deputy Federal CIO, back in January, when I learned about how the federal mobility roadmap evolved into the broader digital government strategy. As I learned about her office’s efforts to promote improving citizen access to government data, I naively suggested that her office should consider promoting the use of GitHub as a means of fostering an open source development community to create the APIs and apps that could get data out to the public. Lisa graciously smiled at me and said I really needed to meet <a href="https://fcw.com/articles/2012/09/30/rising-star-van-dyck-haley.aspx">Haley Van Dyck</a>, a policy analyst on her team.</p>

<p>A few weeks later, I spent an hour on the phone with Haley. Frankly, I was blown away by how clear the federal vision is and how articulately Haley was advocating and championing ideas that sound normal for startups, open source developers, and members of the GitHub revolution — yet not at all what you expect to hear coming from the government. And, most impressive of all, this wasn’t just talk — the Digital Government Strategy was published not quite one year ago, and there are already success stories.</p>

<p>For example, take a look at <a href="https://rfpez.sba.gov/">RFP-EZ</a>. Aside from the clean, <a href="https://getbootstrap.com/">Twitter Bootstrap</a>-styled interface, the most amazing part is the “Fork this site on Github!” link at the bottom of the page. Think and digest on that for just a moment.</p>

<p>Then take a look at the beta site for <a href="https://my.usa.gov/">MyUSA</a> and note the Developers link at the bottom of the page. Yet another aspect of the Digital Government Strategy is to ensure that agency sites provide a developer link.</p>

<p>Well, you can’t spend an hour on the phone with someone like Haley and not be inspired by her contagious enthusiasm to be a part of history and make a contribution to your country. She pointed me out to <a href="https://www.data.gov/">Data.gov</a>, where I spent some time looking at raw datasets that have been published by various agencies. At her suggestion and with her encouragement, I focused on consuming a federal data source, providing a REST API that could be easily consumed by a mobile device, and open sourcing the project on GitHub.</p>

<p>At Data.gov, I found a link to federal challenges data and the current web portal at <a href="https://challenge.gov/">Challenge.gov</a>. This is actually a fascinating site to browse because there are actual cash rewards for submitting a winning entry to challenges posted by various agencies through Challenge.gov. I liked the idea, so I decided as a proof of concept to harvest the <a href="https://challenge.gov/api/challenges.xml">existing RSS feed</a>, and make these challenges available in an iPhone app.</p>

<p>The result of two weekends and a few evenings of effort resulted in an actual complete app and is shown in the following screenshot.</p>

<p><img src="/assets/img/challengegov-screenshot-sm.png" alt="Challgenge.gov app screenshot" title="Challenge.gov app screenshot" /></p>

<p>Challenge.gov iPhone app</p>

<p>The app is an <a href="https://github.com/subfuzion/challenge-ios">open source project on GitHub</a>. It provides similar functionality to the existing website. The app uses a REST API written in Node.js, hosted by Modulus.io in the Amazon cloud.</p>

<p>The <a href="https://github.com/subfuzion/challenge-api">API is a separate GitHub project</a>, which harvests the RSS feed hourly and stores the data in a MongoDB database hosted by MongoLab, also in the Amazon cloud. The hourly harvest job is kicked off using a cloud-based service at <a href="https://cron.io/">CRON.io</a> (a project I recently joined). The rest of the API serves to respond to queries for data (for example, the <a href="https://challengeapi-7312.onmodulus.net/challenges">JSON equivalent</a> of the source RSS feed).</p>

<p>Because there is already an existing Challenges.gov site and this weekend project was more about providing a proof of concept that just happened to turn out to be a complete working app, the project and license will be transferred to the GSA, the agency responsible for the site. However, it is really interesting to note that the government is actually encouraging people in the private sector to feel free to commercialize any of their products that consume open data.</p>

<p>Haley pointed me toward a major success story, <a href="https://www.itriagehealth.com/">iTriage</a>. In fact, the company did so well with its mobile app and site, they have since been acquired by Aetna. The mobile healthtech market is <a href="https://techcrunch.com/2011/12/16/aetna-itriage-healthagen/">projected to quadruple to $400 million by 2016</a>, so it makes sense that the government’s strategy is to not only increase efficiency within federal agencies and improve citizen access to information, but to stimulate successful commerce as well based on digital innovation.</p>

<p>I would encourage civic-minded developers and entrepreneurs to take a look at the Digital Government Strategy, become a part of a new and growing developer community, and become aware of opportunities to contribute to your country — and pursue financial rewards stemming from this historic opportunity. And if you’re looking for strategic guidance or a partner to work with to pursue cloud and mobile development, feel free to get in touch.</p>
 ]]></description>
    </item>
</channel>
</rss>
