<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><description>Pulling stuff in and pushing stuff out…</description><title>Ephemera</title><generator>Tumblr (3.0; @karmi-blog)</generator><link>http://karmi-blog.tumblr.com/</link><item><title>Creating the “Decide the Election” Interactive Visualization</title><description>&lt;p&gt;The second round of the &lt;a href="http://en.wikipedia.org/wiki/Czech_presidential_election,_2013"&gt;Czech 2013 presidential election&lt;/a&gt;
looks like a close tie, with highly polarized candidates and their support bases alike.&lt;/p&gt;

&lt;p&gt;In the last few days, we’ve created &lt;a href="http://prezidenti.databoutique.cz/index.en.html"&gt;an application&lt;/a&gt;
which enables everybody to play an armchair politologist in their browser,
modelling possible scenarios of the election outcome. You can see the final version of the
application pictured below (click to load it).&lt;/p&gt;

&lt;p&gt;&lt;a href="http://prezidenti.databoutique.cz/index.en.html"&gt;&lt;img src="https://66.media.tumblr.com/70eadda51d34db6dc8c505aa70fa27b7/tumblr_inline_mh54mytdhh1qz6iv9.png" alt="Decide the Election" title=""/&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post, I’d like to walk you through the design and implementation process,
as well as elaborate on the technical solution. For the political context, visit the
&lt;a href="http://en.wikipedia.org/wiki/Czech_presidential_election,_2013"&gt;Wikipedia page&lt;/a&gt; about the elections or the
&lt;a href="http://zpravy.idnes.cz/prezidentske-volby-milose-zeman-karel-schwarzeneberg-ppn-/domaci.aspx?c=A130123_163354_domaci_jav"&gt;accompanying article on iDnes.cz&lt;/a&gt; (in Czech).&lt;/p&gt;

&lt;h2 id="the_process"&gt;The Process&lt;/h2&gt;

&lt;p&gt;Given the tight election outcome predictions, my friends &lt;a href="https://twitter.com/eliskah"&gt;Eliška&lt;/a&gt;,
&lt;a href="https://twitter.com/josefslerka"&gt;Josef&lt;/a&gt; and &lt;a href="https://twitter.com/vhyza"&gt;Vojta&lt;/a&gt; have debated how to create a
web application to play with all the possibilities in an entertaining and informative way,
reusing the knowledge gained by handling heaps of data in our dayjobs.&lt;/p&gt;

&lt;p&gt;After some crazy ideas featuring 80’s disco themes, turntables and lots of custom JavaScript, we have
quickly settled upon an “interactive dashboard” style, with a clear grid and sharp aesthetics.&lt;/p&gt;

&lt;p&gt;From the start, we knew we will use the magnificient &lt;a href="http://d3js.org"&gt;&lt;strong&gt;D3.js&lt;/strong&gt;&lt;/a&gt; library, which makes
creating highly sophisticated, interactive and good-looking visualizations &lt;em&gt;relatively&lt;/em&gt; easy. Our model
and template was quite clearly the famous, unmatchable &lt;a href="http://www.nytimes.com/interactive/2012/11/02/us/politics/paths-to-the-white-house.html"&gt;&lt;em&gt;512 Paths to the White House&lt;/em&gt;&lt;/a&gt;
interactive graphic created by Mike Bostock and Shan Carter in the New York Times.&lt;/p&gt;

&lt;p&gt;The first step was to create a minimal visual representation of results from the election’s first
round. Luckily, it was my off duties day, so I have spent the Friday afternoon fiddling with D3.js
and created the initial mockup:&lt;/p&gt;

&lt;p&gt;&lt;img src="https://66.media.tumblr.com/b9ed763867418d26e2be5673ef44eb70/tumblr_inline_mh54m9avqO1qz6iv9.png" alt="Step 1" title=""/&gt;&lt;/p&gt;

&lt;p&gt;Looks funny, right? Absolutely. But it also demonstrates the &lt;em&gt;single most important&lt;/em&gt; feature of D3.js:
&lt;strong&gt;it’s not a charting library&lt;/strong&gt;. It does not come with a set of predefined visualization types.
It opts for a different approach: it makes it &lt;em&gt;relatively&lt;/em&gt; easy to create the visualization
from a set of &lt;em&gt;flexible primitives&lt;/em&gt; and finely-tuned &lt;em&gt;utility functions&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;D3.js goes to great lengths to promote this principle: even the most primitive of all possible chart types,
the bar chart, isn’t offered as a pre-packaged solution, but as a simple-to-follow
&lt;a href="http://mbostock.github.com/d3/tutorial/bar-1.html"&gt;tutorial&lt;/a&gt;. The underlying concept is called a
&lt;a href="http://mbostock.github.com/d3/tutorial/circle.html"&gt;&lt;em&gt;data join&lt;/em&gt;&lt;/a&gt; — in essence, D3.js is just a very
pleasant way of setting up a binding between your &lt;em&gt;data&lt;/em&gt; and the graphical &lt;em&gt;elements&lt;/em&gt; on the screen.&lt;/p&gt;

&lt;p&gt;I have ended the first day with a rough vision for the application, and set out to prepare a conference
&lt;a href="http://www.slideshare.net/karmi/realtime-analytic-with-elasticsearch-new-media-inspiration-2013"&gt;talk&lt;/a&gt; for Saturday :)&lt;/p&gt;

&lt;p&gt;&lt;img src="https://66.media.tumblr.com/0ee2d3069b3c89b539b61fbf50ee73d5/tumblr_inline_mh54mfPghK1qz6iv9.png" alt="Step 2" title=""/&gt;&lt;/p&gt;

&lt;p&gt;The next day, Vojta and me have spent some time playing with different layouts and interactive features.
You can see how the “turntable faders” idea resurfaced in the form of range sliders for setting the
voters participation and the candidate split.&lt;/p&gt;

&lt;p&gt;&lt;img src="https://66.media.tumblr.com/ef11cd43cc9ccf02a688a98cfe78d6fa/tumblr_inline_mh54miCMK11qz6iv9.png" alt="Step 3" title=""/&gt;&lt;/p&gt;

&lt;p&gt;At that time, the code was a sprawling mess, a big ball of d3 declarations, magic numbers in positioning
offsets, and duplicated code. A time for rewrite, clearly. So, all of us have spent a chilly Sunday afternoon
by the office whiteboard, deconstructing the data set (7 first round candidates, 2 finalists,
undecided voters), experimenting with the layout options and visual encoding of the data.&lt;/p&gt;

&lt;p&gt;We agreed upon a different grid, which would make enough room for all the control elements, photos,
captions, etc.&lt;/p&gt;

&lt;p&gt;&lt;img src="https://66.media.tumblr.com/1d554bcbe138ba35cfc8849d493b6602/tumblr_inline_mh54mlOyp31qz6iv9.png" alt="Step 4" title=""/&gt;&lt;/p&gt;

&lt;p&gt;We were cleaning up the JavaScript code and slowly building up the positioning and visual encoding
as the night was falling upon Prague.&lt;/p&gt;

&lt;p&gt;&lt;img src="https://66.media.tumblr.com/158a4548159cfd6ffd0020e4206a1bd1/tumblr_inline_mh54mpNQo31qz6iv9.png" alt="Step 5" title=""/&gt;&lt;/p&gt;

&lt;p&gt;On Monday, we have split the duties: I started polishing the visual aspects of the application and
Vojta started working on a set of &lt;a href="http://www.opscode.com/chef/"&gt;Chef&lt;/a&gt; recipes for building the
supporting infrastructure.&lt;/p&gt;

&lt;p&gt;I have ended the day with a rough version of the minimal feature set, with dubious behaviour and buggy semantics. Big things have small beginnings… If you’re particularly nosy, you can clone the &lt;a href="https://gist.github.com/4626240"&gt;Git repository&lt;/a&gt; and devour all the silly mistakes and dirty implementation details just by checking out different commits in the history.&lt;/p&gt;

&lt;p&gt;&lt;img src="https://66.media.tumblr.com/50fe245cca53ba6e355354d011388b3d/tumblr_inline_mh54mtDi8q1qz6iv9.png" alt="Step 5" title=""/&gt;&lt;/p&gt;

&lt;p&gt;On the infrastructure front, we needed a reliable webserver and a flexible storage solution.&lt;/p&gt;

&lt;p&gt;It sounds quite silly to create a Chef-based, fully automated provisioning toolchain for
an application which will be online for couple of days, right? Not. It follows the &lt;em&gt;single
most important&lt;/em&gt; principle of the &lt;em&gt;#devops&lt;/em&gt; movement: the ability to &lt;strong&gt;rebuild your infrastructure
from scratch&lt;/strong&gt; from a set of provisioning scripts, application code, data backup and bare computing
power.&lt;/p&gt;

&lt;p&gt;We’re big fans of the &lt;em&gt;Chef&lt;/em&gt; toolchain, even considering its huge flaws and deficiencies.
For this project, we have reused the &lt;a href="https://github.com/karmi/chef-hello-cloud"&gt;knowledge&lt;/a&gt; and
many &lt;a href="https://gist.github.com/2050769#bootstrap-install-and-configure-elasticsearch-with-chef-solo"&gt;tools&lt;/a&gt;
built and published during our regular jobs.&lt;/p&gt;

&lt;p&gt;For the storage layer, we chose &lt;a href="http://www.elasticsearch.org"&gt;&lt;em&gt;Elasticsearch&lt;/em&gt;&lt;/a&gt;, a very powerful and
flexible search engine, which allows us to store the JSON data generated by our application’s
users directly, without any serialization or translation. &lt;em&gt;Elasticsearch&lt;/em&gt; is blazingly fast,
resilient, and we have plenty of experience with it. This choice was not particularly hard for us.&lt;/p&gt;

&lt;p&gt;Additionally, with all the anonymous numerical data stored in &lt;em&gt;Elasticsearch&lt;/em&gt;,
it will be quite easy to analyze it later by using &lt;em&gt;Elasticsearch’s&lt;/em&gt; &lt;a href="http://www.elasticsearch.org/blog/2011/05/13/data-visualization-with-elasticsearch-and-protovis.html"&gt;faceting&lt;/a&gt; features (computing the statistical values for the outcome estimations, creating date histograms, etc).&lt;/p&gt;

&lt;p&gt;Since &lt;em&gt;Elasticsearch&lt;/em&gt; does not offer a restricted access, Vojta has created a simple
&lt;a href="http://www.sinatrarb.com"&gt;Sinatra&lt;/a&gt;-based Ruby web application, which serves as a proxy
between the JavaScript code and &lt;em&gt;Elasticsearch&lt;/em&gt;, using the &lt;a href="https://github.com/karmi/tire"&gt;&lt;em&gt;Tire&lt;/em&gt;&lt;/a&gt;
library, and also serves the application in development mode. The application itself is served
by the insanely fast and efficient &lt;a href="http://nginx.org"&gt;Nginx&lt;/a&gt; webserver, which provides the
access and error logs for observing the application behaviour and usage patterns analysis.&lt;/p&gt;

&lt;p&gt;Thanks to the &lt;em&gt;Chef&lt;/em&gt; ecosystem, the whole stack is installed and configured in an automated
manner in &lt;a href="http://aws.amazon.com/ec2/"&gt;Amazon EC2&lt;/a&gt;, all the services are guarded by
&lt;a href="http://mmonit.com/monit/"&gt;Monit&lt;/a&gt;, data are backed up as EBS snapshots. If needed, we’re able
to recreate the whole stack in five minutes. (If you’re interested, we’re using a process
based on the &lt;a href="http://www.elasticsearch.org/tutorials/2012/03/21/deploying-elasticsearch-with-chef-solo.html"&gt;Deploying Elasticsearch with Chef Solo&lt;/a&gt; tutorial.)&lt;/p&gt;

&lt;p&gt;On Tuesday, we have divided our time between two major tasks: first, fixing all the
bugs in the application business and drawing logic, and second, creating a proper visual design
for the application. Since we have been talking with an online newspaper about the possibility of
publishing the application to a wide audience, I have settled on giving the application a decisive
newspaper look, which could be described as “a magazine spread come alive”.
(This task was quite enjoyable compared to all the calculator-driven coding of the application,
which brought dreadful memories of my career as a Flash designer and ActionScript
developer for me, and cheerful memories of teenage Atari programming for Josef.)&lt;/p&gt;

&lt;p&gt;After a short nap, here comes Wednesday, our go-to-live day. We had an excellent support from our
publisher, and began fighting security restrictions of &lt;code&gt;&amp;lt;iframe&amp;gt;&lt;/code&gt;s,
design quirks, fine-tuning the design of &lt;code&gt;&amp;lt;input[range]&amp;gt;&lt;/code&gt; sliders in Microsoft Explorer 10,
and stroking our chins about possible support for Firefox, which, amusingly, does not support the
range slider yet. You know, the final phase of any web-based software project.&lt;/p&gt;

&lt;p&gt;&lt;small&gt;
(We had a brief affair with the &lt;a href="https://github.com/fryn/html5slider"&gt;&lt;em&gt;html5slider&lt;/em&gt;&lt;/a&gt;
library, but after extensive checking decided to pull it from the already published application.
The inconsistencies, no clear way on how to re-trigger the initialization, and many subtle
problems just weren’t worth it. So far, we have received only limited complaints; browser feature
matrix is clearly a very Darwinian field…)
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;All the frenzy caused us to miss an optimal prime time for publishing the application and accompanying
article — together with the publisher, we scheduled the publication for early morning the next day,
giving us enough time to miss another night’s sleep and add many nice features in the process.&lt;/p&gt;

&lt;p&gt;On Thursday, we have awaken to find the accompanying article featured on the homepage of the
biggest online newspaper in Czech Republic, &lt;a href="http://idnes.cz"&gt;iDnes.cz&lt;/a&gt;, being the second most
visited story for the better part of the day. The application visits have been peaking around
3,000 visits per hour for the whole morning and more then 1,000 scenarios have been saved.
The audience response was very positive, comparing us (embarrasingly enough :) to the famed New York Times
&lt;a href="http://www.nytimes.com/interactive/2012/11/02/us/politics/paths-to-the-white-house.html"&gt;example&lt;/a&gt;
mentioned earlier.&lt;/p&gt;

&lt;p&gt;If anything, the whole story is a reminder how powerful the &lt;a href="http://en.wikipedia.org/wiki/Data_driven_journalism"&gt;&lt;em&gt;data driven journalism&lt;/em&gt;&lt;/a&gt; approach can be, and that it’s quite within reach of most
newspapers — they just have to care enough.
While an application like this is &lt;em&gt;not&lt;/em&gt; something you can whip up in an afternoon and go home —
after all, it took two highly skilled developers more than three 12–hour days to create it —, with tools
such as D3.js, &lt;em&gt;Elasticsearch&lt;/em&gt;, &lt;em&gt;Chef&lt;/em&gt;, &lt;em&gt;Ruby&lt;/em&gt; and &lt;em&gt;Amazon EC2&lt;/em&gt;, it’s an enjoyable and rewarding process.&lt;/p&gt;

&lt;p&gt;Spending most of the day in a purple haze, we have been spelunking around the access and error logs with
&lt;em&gt;Splunk&lt;/em&gt;, watching office colleagues creating and sharing their own scenarios, and debating the election
campaign details and silliness.&lt;/p&gt;

&lt;p&gt;Nevertheless, above all, we hope that the application will persuade people to go out voting tommorrow.
Because the future is predictable, but uncertain.&lt;/p&gt;

&lt;p&gt;Go have a look at the &lt;a href="http://prezidenti.databoutique.cz/index.en.html"&gt;application&lt;/a&gt; and tell us what
you think in the comments or at &lt;a href="http://news.ycombinator.com/item?id=5114522"&gt;&lt;em&gt;Hacker News&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;</description><link>http://karmi-blog.tumblr.com/post/41370777074</link><guid>http://karmi-blog.tumblr.com/post/41370777074</guid><pubDate>Thu, 24 Jan 2013 19:11:00 +0100</pubDate><category>data-visualization</category><category>visualization</category><category>d3.js</category><category>elections</category></item><item><title>Search Your Gmail Messages with ElasticSearch and Ruby</title><description>&lt;p&gt;If you&amp;rsquo;d like to  check out &lt;a href="http://www.elasticsearch.org/"&gt;&lt;em&gt;ElasticSearch&lt;/em&gt;&lt;/a&gt;, there&amp;rsquo;s already lots of options where to get the data to feed it with. You can use a &lt;a href="http://www.elasticsearch.org/guide/reference/river/twitter.html"&gt;Twitter&lt;/a&gt; or &lt;a href="http://www.elasticsearch.org/guide/reference/river/wikipedia.html"&gt;Wikipedia&lt;/a&gt; &lt;em&gt;river&lt;/em&gt; to fill it with gigabytes of public data, or you can feed it very quickly with some &lt;a href="https://gist.github.com/953072"&gt;RSS feeds&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But, let&amp;rsquo;s get a bit personal, shall we? Let&amp;rsquo;s feed it with your own e-mail, imported from your own &lt;em&gt;Gmail&lt;/em&gt; account.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ll use couple of Ruby gems: &lt;a href="http://github.com/dcparker/ruby-gmail"&gt;&lt;em&gt;Gmail&lt;/em&gt;&lt;/a&gt; to fetch the e-mail data, &lt;a href="http://github.com/karmi/tire"&gt;&lt;em&gt;Tire&lt;/em&gt;&lt;/a&gt; to put them into &lt;em&gt;ElasticSearch&lt;/em&gt; and search them, and &lt;a href="http://www.sinatrarb.com/"&gt;&lt;em&gt;Sinatra&lt;/em&gt;&lt;/a&gt; to create a simple web application, which will allow us to search the messages.
You can see it displayed below.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://cl.ly/6kOS/search-your-gmail-with-elasticsearch-and-ruby.png" alt="Your Gmail in ElasticSearch"/&gt;&lt;/p&gt;

&lt;p&gt;First of all, &lt;strong&gt;download or clone the source code from this &lt;a href="http://gist.github.com/973059"&gt;gist&lt;/a&gt;&lt;/strong&gt;.
If you have &lt;em&gt;ElasticSearch&lt;/em&gt;, &lt;em&gt;Ruby&lt;/em&gt; and &lt;em&gt;Rubygems&lt;/em&gt;, install all the required gems with the &lt;a href="http://gembundler.com/"&gt;&lt;em&gt;Bundler&lt;/em&gt;&lt;/a&gt; gem:&lt;/p&gt;

&lt;div class="gist"&gt;
&lt;div class="gist-file"&gt;&lt;div class="gist-data"&gt;
&lt;pre&gt;
$ bundle install
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;We&amp;rsquo;ll import the data with the &lt;a href="https://gist.github.com/973059#file_gmail_import.rb"&gt;&lt;code&gt;gmail-import.rb&lt;/code&gt;&lt;/a&gt; script. You must provide it your &lt;em&gt;Gmail&lt;/em&gt; credentials, like this:&lt;/p&gt;

&lt;div class="gist"&gt;
&lt;div class="gist-file"&gt;&lt;div class="gist-data"&gt;
&lt;pre&gt;
$ ruby gmail-import.rb user@gmail.com yourpassword
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Leave the script running in a terminal session, and launch the provided web application in another one, passing it the your &lt;em&gt;Gmail&lt;/em&gt; account name:&lt;/p&gt;

&lt;div class="gist"&gt;
&lt;div class="gist-file"&gt;&lt;div class="gist-data"&gt;
&lt;pre&gt;
$ INDEX=user@gmail.com ruby gmail-server.rb
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;You should see your own e-mail displayed at &lt;a href="http://localhost:4567/"&gt;http://localhost:4567/&lt;/a&gt;. Make sure to check out all of the rich &lt;a href="http://lucene.apache.org/java/3_1_0/queryparsersyntax.html"&gt;&lt;em&gt;Lucene&lt;/em&gt; query syntax&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Of course, you&amp;rsquo;re not limited to search. With &lt;em&gt;ElasticSearch&lt;/em&gt; &lt;a href="http://www.elasticsearch.org/guide/reference/api/search/facets/index.html"&gt;facets&lt;/a&gt;,
you can pull interesting stuff out of your data, such as getting statistics on who&amp;rsquo;s sending you the most e-mail:&lt;/p&gt;

&lt;div class="gist"&gt;
&lt;div class="gist-file"&gt;&lt;div class="gist-data"&gt;
&lt;pre&gt;
$ curl -X POST "http://localhost:9200/user@gmail.com/message/_search?pretty=true" -d '
    {
      "facets" : {
        "senders" : { "terms" : { "field" : "from.exact" } }
      },
      "size" : 0
    }
  '
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;It&amp;rsquo;s definitely &lt;code&gt;noreply@github.com&lt;/code&gt; in my case :) Your data are available in the &lt;a href="http://localhost:9200/user@gmail.com/_search?pretty=true&amp;amp;q=*"&gt;http://localhost:9200/user@gmail.com/_search?pretty=true&amp;amp;q=*&lt;/a&gt; index.&lt;/p&gt;

&lt;p&gt;The full source code is available below.&lt;/p&gt;

&lt;p&gt;&lt;script src="https://gist.github.com/973059.js"&gt;&lt;/script&gt;&lt;/p&gt;&lt;noscript&gt;Check out the source code at &lt;a href="https://gist.github.com/973059"&gt;https://gist.github.com/973059&lt;/a&gt;&lt;/noscript&gt;</description><link>http://karmi-blog.tumblr.com/post/5510326335</link><guid>http://karmi-blog.tumblr.com/post/5510326335</guid><pubDate>Sun, 15 May 2011 16:28:00 +0200</pubDate><category>search</category><category>elasticsearch</category><category>ruby</category><category>tire</category></item><item><title>Redis — The AK-47 of Post-relational Databases</title><description>&lt;div style="width:595px" id="__ss_7921645"&gt;&lt;iframe src="http://www.slideshare.net/slideshow/embed_code/7921645?rel=0" width="595" height="497" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"&gt;&lt;/iframe&gt;&lt;/div&gt;</description><link>http://karmi-blog.tumblr.com/post/5419485863</link><guid>http://karmi-blog.tumblr.com/post/5419485863</guid><pubDate>Thu, 12 May 2011 14:31:08 +0200</pubDate><category>redis</category><category>nosql</category></item><item><title>"Yes, you’re going to write some sketches that you love and are proud of forever—your golden nuggets...."</title><description>“Yes, you’re going to write some sketches that you love and are proud of forever—your golden nuggets. But you’re also going to write some real shit nuggets. You can’t worry about it. As long as you know the difference, you can go back to panning for gold on Monday.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt; &lt;a href="http://www.newyorker.com/online/blogs/newsdesk/2011/03/tina-fey-videos.html"&gt;Tina Fey&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/3979755824</link><guid>http://karmi-blog.tumblr.com/post/3979755824</guid><pubDate>Sun, 20 Mar 2011 13:21:49 +0100</pubDate></item><item><title>"In the state with the highest cigarette taxes in the country, in a city that has become one of the..."</title><description>“In the state with the highest cigarette taxes in the country, in a city that has become one of the hardest places in America to find a place to smoke, Ms. Silk has gone off the grid, growing, processing and smoking her own tax-free cigarettes from packets of seeds she buys online for about $2.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://www.nytimes.com/2011/02/25/nyregion/25tobacco.html"&gt;Now in Brooklyn, Homegrown Tobacco: Local, Rebellious and Tax Free&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/3500133518</link><guid>http://karmi-blog.tumblr.com/post/3500133518</guid><pubDate>Fri, 25 Feb 2011 10:04:33 +0100</pubDate></item><item><title>"One piece of advice I give to S.E.O. masters is, don’t chase after Google’s algorithm, chase after..."</title><description>“One piece of advice I give to S.E.O. masters is, don’t chase after Google’s algorithm, chase after your best interpretation of what users want, because that’s what Google’s chasing after.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://www.nytimes.com/2011/02/11/business/media/11search.html"&gt;Matt Cutts, Google&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/3349255349</link><guid>http://karmi-blog.tumblr.com/post/3349255349</guid><pubDate>Thu, 17 Feb 2011 21:54:47 +0100</pubDate></item><item><title>"Sure, Shazam, the popular music-spotting cellphone application, can identify that Rihanna track. But..."</title><description>“Sure, Shazam, the popular music-spotting cellphone application, can identify that Rihanna track. But what about the new song from the Sandwitches, a Bay Area folk-rock band? That is where Charles Slomovitz comes in.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://www.nytimes.com/2011/02/14/technology/14shazam.html"&gt;In Digital Era, Music Spotters Feed a Machine&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/3289482145</link><guid>http://karmi-blog.tumblr.com/post/3289482145</guid><pubDate>Mon, 14 Feb 2011 09:39:54 +0100</pubDate></item><item><title>"For example, claiming publicly that something is unhackable is usually a good way to find out that..."</title><description>“For example, claiming publicly that something is unhackable is usually a good way to find out that it is.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://blogs.forbes.com/firewall/2010/12/13/the-lessons-of-gawkers-security-mess/"&gt;The Real Lessons Of Gawker’s Security Mess&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/2311973485</link><guid>http://karmi-blog.tumblr.com/post/2311973485</guid><pubDate>Tue, 14 Dec 2010 13:36:50 +0100</pubDate></item><item><title>"We conclude that the current RDBMS code lines, while attempting to be a “one size fits all”..."</title><description>“We conclude that the current RDBMS code lines, while attempting to be a “one size fits all” solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of “from scratch” specialized engines.  The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow’s requirements, not continue to push code lines and architectures designed for yesterday’s needs.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://cs-www.cs.yale.edu/homes/dna/papers/vldb07hstore.pdf"&gt;The End of an Architectural Era&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/2174199265</link><guid>http://karmi-blog.tumblr.com/post/2174199265</guid><pubDate>Sat, 11 Dec 2010 14:36:02 +0100</pubDate></item><item><title>"Exactly two things have made airplane travel safer since 9/11: reinforcing the cockpit door, and..."</title><description>“Exactly two things have made airplane travel safer since 9/11: reinforcing the cockpit door, and convincing passengers they need to fight back. Everything else has been a waste of money. Add screening of checked bags and airport workers and we’re done. Take all the rest of the money and spend it on investigation and intelligence.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://www.nytimes.com/roomfordebate/2010/11/22/do-body-scanners-make-us-safer/a-waste-of-money-and-time"&gt;Bruce Schneier&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/1984632953</link><guid>http://karmi-blog.tumblr.com/post/1984632953</guid><pubDate>Tue, 30 Nov 2010 09:42:37 +0100</pubDate></item><item><title>Monittr: A Ruby and Web Interface for Multiple Monit Instances</title><description>&lt;p&gt;&lt;img src="http://img.skitch.com/20101128-n393mjfn9ec8eppse6stypmr3w.png" alt="Screenshot: Monittr, a Ruby and Web Interface for Monit"/&gt;&lt;/p&gt;

&lt;p&gt;On my current contract, we have &lt;a href="http://mmonit.com/monit/"&gt;&lt;em&gt;Monit&lt;/em&gt;&lt;/a&gt; set up to monitor a number of servers. Obviously, when you monitor something, you&amp;rsquo;d like to check its status from time to time. And while the default &lt;em&gt;Monit&lt;/em&gt; web interface is good enough for a quick check of one system, it has several major shortcomings.&lt;/p&gt;

&lt;p&gt;You have to remember or bookmark the URL and credentials for the interface. The interface is “good enough”, but the constant refreshing is kinda crazy.&lt;/p&gt;

&lt;p&gt;Worse, you can check only &lt;strong&gt;one&lt;/strong&gt; &lt;em&gt;Monit&lt;/em&gt; instance at a time. It&amp;rsquo;s impossible to do on a regular basis for more than one machine. Of course, there&amp;rsquo;s &lt;a href="http://mmonit.com/"&gt;&lt;em&gt;MMonit&lt;/em&gt;&lt;/a&gt;, but it brings another piece to the infrastructure, advanced features which are useless to us, and doesn&amp;rsquo;t display the system&amp;rsquo;s status in a concise way. We need something else, altogether.&lt;/p&gt;

&lt;p&gt;We need to display &lt;strong&gt;some&lt;/strong&gt; &lt;em&gt;Monit&lt;/em&gt; statistics from &lt;strong&gt;multiple&lt;/strong&gt; servers &lt;strong&gt;inside&lt;/strong&gt; our admin application. So, when I&amp;rsquo;ve discovered that &lt;em&gt;Monit&lt;/em&gt; has an XML output (while reading the sources of the &lt;a href="https://rubygems.org/gems/monit"&gt;&lt;code&gt;monit&lt;/code&gt;&lt;/a&gt; gem), I knew it would be just a matter of parsing and displaying it.&lt;/p&gt;

&lt;p&gt;The result is the &lt;a href="https://github.com/karmi/monittr"&gt;&lt;code&gt;monittr&lt;/code&gt;&lt;/a&gt; Rubygem.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s main goal is to be able to plug &lt;em&gt;Monit&lt;/em&gt; status information from multiple systems into any Ruby application, specifically into a &lt;a href="http://www.sinatrarb.com/"&gt;&lt;em&gt;Sinatra&lt;/em&gt;&lt;/a&gt; web application.&lt;/p&gt;

&lt;p&gt;You can either use the Ruby interface to retrieve the &lt;em&gt;Monit&lt;/em&gt; statistics and display them as you wish, or use the provided &lt;em&gt;Sinatra&lt;/em&gt; extension to embed them directly into your admin application, like this:&lt;/p&gt;

&lt;div class="gist"&gt;
&lt;div class="gist-file"&gt;&lt;div class="gist-data"&gt;
&lt;pre&gt;
&amp;lt;p&amp;gt;This is your regular admin application...&amp;lt;/p&amp;gt;
&amp;lt;div id="monittr"&amp;gt;
&amp;lt;%= monittr.html %&amp;gt;
&amp;lt;/div&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;The project README provides extensive information about how to try out, use and customize it: &lt;a href="https://github.com/karmi/monittr"&gt;https://github.com/karmi/monittr&lt;/a&gt;. You can customize the template and supply it as your own, changing the content, stylesheets, etc. It&amp;rsquo;s just an ERB file.&lt;/p&gt;

&lt;p&gt;Any feedback, suggestions or patches are welcome via &lt;a href="mailto:karmi@karmi.cz"&gt;e-mail&lt;/a&gt; or Github Issues/Pull Requests.&lt;/p&gt;

&lt;div class="github"&gt;
&lt;p&gt;&lt;a href="http://github.com/karmi/monittr"&gt;Monittr source at Github&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;</description><link>http://karmi-blog.tumblr.com/post/1714578431</link><guid>http://karmi-blog.tumblr.com/post/1714578431</guid><pubDate>Sun, 28 Nov 2010 15:39:00 +0100</pubDate><category>ruby</category><category>monitoring</category><category>monit</category></item><item><title>Explaining Ruby on Rails Infrastructure Alternatives</title><description>&lt;p&gt;Last week, I had a workshop on Rails infrastructure, deployment and hosting for one of the major hosting companies in Czech Republic, &lt;a href="http://www.active24.cz/"&gt;Active24&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I was confronted with the task of explaining and showing the various pieces (web server, application server, configurations, RubyGems management, etc) without browsing some previously setup server, opening configuration files in Vim, and relying on blind luck that all this will somehow click together for my audience.&lt;/p&gt;

&lt;p&gt;So, I&amp;rsquo;ve decided early on that I need to use some “recipe” definition for showing what are the packages and components and then use those those recipes to install a full Rails stack on a clean machine. You&amp;rsquo;ll find them all in a &lt;em&gt;Github&lt;/em&gt; repository: &lt;a href="http://github.com/karmi/rails-deployment-setups-sprinkle"&gt;&lt;code&gt;karmi/rails-deployment-setups-sprinkle&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I quickly rejected &lt;a href="http://www.opscode.com/chef"&gt;&lt;em&gt;Chef&lt;/em&gt;&lt;/a&gt; or &lt;a href="http://www.puppetlabs.com/"&gt;&lt;em&gt;Puppet&lt;/em&gt;&lt;/a&gt; for this task: I wanted the attendees to be able to provision a clean VPS with a Rails stack themselves, without any overhead and lengthy introductions. I have found out that the &lt;a href="http://github.com/crafterm/sprinkle"&gt;&lt;em&gt;Sprinkle&lt;/em&gt;&lt;/a&gt; provisioning tool would be the perfect fit for the task.&lt;/p&gt;

&lt;p&gt;In fact, just about everything about &lt;em&gt;Sprinkle&lt;/em&gt; is awesome. Its language is very clear and elegant. It can install software from source or from packages on every major platform. Its recipes are &lt;em&gt;idempotent&lt;/em&gt;, ie. when you set the “verification” conditions properly, you can run the installer over and over again and it will install only the missing pieces. It can install software remotely equally well as locally. Its source code is a great exercise in well designed and well written Ruby library.&lt;/p&gt;

&lt;p&gt;So, I went with &lt;em&gt;Sprinkle&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Attendees needed only Ruby and RubyGems on their local machines. After cloning the repository with recipes, they have installed &lt;em&gt;various&lt;/em&gt; Rails stacks on their own, fresh Ubuntu VPS in matter of minutes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sprinkle&lt;/em&gt; recipes are concise, understandable and clarify the relations between various &lt;em&gt;pieces&lt;/em&gt; of a Rails stack. Take, for instance, the definition of a classic Apache/Passenger stack:&lt;/p&gt;

&lt;script src="http://gist.github.com/653058.js?file=rails-stack-passenger.rb"&gt;&lt;/script&gt;&lt;p&gt;Notice how the &lt;em&gt;Sprinkle&lt;/em&gt; recipe definition clarifies what components are needed &lt;em&gt;at all&lt;/em&gt;, so you can base your explanations and whiteboard schemes on them.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;configuration&lt;/em&gt; for the stack is comparably simple to follow:&lt;/p&gt;

&lt;script src="http://gist.github.com/653059.js?file=appserver-passenger.rb"&gt;&lt;/script&gt;&lt;p&gt;That is extremely convenient, because you can use the same instructions to explain what you need to run a Rails application, how theese pieces work together, and to actually &lt;em&gt;install&lt;/em&gt; them on a server via provisioning tool, while talking about all of that. (And you have another argument for Ruby&amp;rsquo;s syntax superiority, of course.)&lt;/p&gt;

&lt;p&gt;To put it another way, &lt;em&gt;Sprinkle&lt;/em&gt; recipes can be understood as &lt;strong&gt;&lt;em&gt;executable instructions&lt;/em&gt;&lt;/strong&gt; in the same sense as &lt;em&gt;RSpec&lt;/em&gt; or &lt;em&gt;Shoulda&lt;/em&gt; tests can be understood as &lt;em&gt;executable specifications&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I am absolutely convinced that I couldn&amp;rsquo;t explain everything any other way; after a half-day workshop, the operations guys walked away with 100% understanding how hosting Ruby on Rails works, what are the alternatives and their advantages, and they can continue their explorations with the provided recipes. (The first half of the workshop was dedicated to showing them how &lt;em&gt;developing&lt;/em&gt; Rails applications works.)&lt;/p&gt;

&lt;p&gt;If you need to explain or teach Rails infrastructure, either on commercial workshops or in education, you may well use and adapt these recipes — please refer to the &lt;a href="http://github.com/karmi/rails-deployment-setups-sprinkle"&gt;Readme&lt;/a&gt; how to use them.&lt;/p&gt;

&lt;p&gt;Nevertheless, I tried hard for encouraging &lt;em&gt;best practices&lt;/em&gt; of hosting web applications, so you&amp;rsquo;ll find eg. &lt;em&gt;Apache&lt;/em&gt; and &lt;em&gt;Nginx&lt;/em&gt; correctly configured with expires headers for static assets or gzipping text responses. Thus, you can use the recipes for provisioning a real box as well.&lt;/p&gt;

&lt;p&gt;After you install one of the stacks, a place for your Rails applications is created in &lt;code&gt;/var/applications&lt;/code&gt;. To actually check out if the stack works, create a simple demo application. Connect via SSH to the box and run:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cd /var/applications
$ rails new demo

$ cd demo
$ rm public/index.html
$ rails generate controller welcome index
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;br/&gt;
Now, put something like &lt;code&gt;render :text =&amp;gt; 'Welcome'!&lt;/code&gt; inside the &lt;code&gt;app/controllers/welcome_controller.rb&lt;/code&gt;, and uncomment the &lt;code&gt;root :to =&amp;gt; "welcome#index"&lt;/code&gt; route inside &lt;code&gt;config/routes.rb&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You should see your new demo application&amp;rsquo;s greeting when you load the server IP or hostname in a browser. And it probably took 15 minutes or less.&lt;/p&gt;

&lt;p&gt;Rails had, historically, bad reputation in terms of deployment simplicity. As you can see, that is really a thing of the past. Whether you&amp;rsquo;re deploying to Heroku, Engine Yard, or your own server, the Ruby community has done a wonderful job of smoothing all the hard edges.&lt;/p&gt;</description><link>http://karmi-blog.tumblr.com/post/1430082805</link><guid>http://karmi-blog.tumblr.com/post/1430082805</guid><pubDate>Fri, 29 Oct 2010 09:28:00 +0200</pubDate><category>rails</category><category>hosting</category><category>deployment</category></item><item><title>"(…) The logic is akin to sticking with telegrams and avoiding the voice telephone. Telegrams..."</title><description>“(…) The logic is akin to sticking with telegrams and avoiding the voice telephone. Telegrams are written records and therefore can be stored, they can be confirmed and therefore audited, and they are the standard… sound familiar?”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;John Fitzpatrick, &lt;a href="http://fabfuture.blogspot.com/2010/10/nosql-means-no-sql.html"&gt;“NoSQL means no SQL”&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/1281815991</link><guid>http://karmi-blog.tumblr.com/post/1281815991</guid><pubDate>Sun, 10 Oct 2010 08:43:20 +0200</pubDate></item><item><title>Doplněk k přednášce o CouchDB na Webexpo 2010</title><description>&lt;p&gt;Děkuji všem za online i offline reakce na moji &lt;a href="http://webexpo.cz/prednaska/couchdb-databaze-pro-web/"&gt;přednášku o CouchDB&lt;/a&gt;. Snažil jsem se věnovat hodně prostoru obecnějším otázkám spojeným s využitím „netradičních“ databází, které CouchDB dobře reprezentuje a zároveň ryze praktickým zkušenostem, které jsem díky intezivnímu používání CouchDB za poslední rok získal. V tomto článku bych rád dodatečně doplnil či upřesnil některá témata či zodpověděl dotazy.&lt;/p&gt;

&lt;h2&gt;Příklady využití bezeschémových databází&lt;/h2&gt;

&lt;p&gt;Za prvé bych chtěl doplnit příklad využití bezeschémových, resp. dokumentových databází. Příklad s &lt;a href="http://karmi.couchone.com/addressbook/_design/person/_list/all/all"&gt;&lt;strong&gt;adresářem&lt;/strong&gt;&lt;/a&gt; byl, jak doufám, dostatečně ilustrativní: jedná se o nejobvyklejší příklad heterogenních dat: někdo má dva telefony, jiný tři, další má Skype, zatímco jiný má Jabber, a tak dále. Příklad obrázku ke kontaktu navíc hezky ilustruje výhodu CouchDB v přirozeném ukládání binárních dat přímo k dokumentu.&lt;/p&gt;

&lt;p&gt;Stejně dobrým příkladem ale může být i to, co každý webový vývojář ve svém životě programoval či potkal tucetkrát: &lt;strong&gt;„redakční systém“&lt;/strong&gt;. Uvozovky jsou záměrné, neboť nemá smysl řešit, co je redakční systém, co publikační systém a co systém na správu obsahu (&lt;em&gt;CMS&lt;/em&gt;), proto budeme dále využívat zkratky CMS. Každý, kdo někdy vytvářel CMS ví, že „stránka“ (dokument) v něm není „&lt;code&gt;title&lt;/code&gt; a velká textarea s WYWIWYGem“. Naopak, stránka je zpravidla sestavou mnoha &lt;em&gt;elementů&lt;/em&gt;, které mohou být textové, obrázkové, mohou odkazovat na další entity (zkrácený výpis novinek) nebo na externí zdroje (video na &lt;em&gt;YouTube&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;Oblíbeným školním cvičením každého CMS je pak ukládání hierarchických dat („stromů“) — a jejich efektivní získávání. V dokumentové databázi typu CouchDB mohu „strom“ reprezentovat přímo jako JSON dokument, a obejdu se bez zdlouhavého překládání mezi tabulkovou a hierarchickou reprezentací.Velká výhoda jakékoliv bezeschémové databáze je pak v tom, že mohu stránky modelovat v podstatě &lt;em&gt;ad hoc&lt;/em&gt;, bez nutnosti vymýšlet dopředu dostatečně flexibilní schéma – anebo je, muset dodatečně upravovat a ladit. V případě CouchDB se mi opět hodí nativní podpora binárních &lt;em&gt;attachmentů&lt;/em&gt; k dokumentu.&lt;/p&gt;

&lt;h2&gt;Kontinuální „stream“ HTTP notifikací o změnách v databázi&lt;/h2&gt;

&lt;p&gt;Jedna z nejzajímavějších vlastností CouchDB, které jsme se dotkli jen letmo, je &lt;a href="http://guide.couchdb.org/draft/notifications.html"&gt;kontinuální „stream“ změn v databázi&lt;/a&gt;, tzv. &lt;code&gt;_changes&lt;/code&gt; feed. Nejedná se o nějakou zajímavou &lt;em&gt;fíčurku&lt;/em&gt; přidanou pro efekt: na základě &lt;code&gt;_changes&lt;/code&gt; kanálu funguje celá infrastruktura &lt;strong&gt;replikace&lt;/strong&gt; v CouchDB, nebo indexování ze strany CouchDB-Lucene. Je tedy tak „robustní“, jak jen si lze představit.&lt;/p&gt;

&lt;p&gt;Jasnou výhodou &lt;code&gt;_changes&lt;/code&gt; kanálu je možnost persistentního spojení mezi klientem a databázovým serverem (s parametrem &lt;code&gt;continuous&lt;/code&gt;). Databáze otevře pro každého klienta jedno „vlákno“ v Erlangu a ten ji tedy nezatěžuje pravidelným &lt;em&gt;pollingem&lt;/em&gt;, jak je zvykem. Navíc se jedná o &lt;em&gt;push&lt;/em&gt; notifikaci, takže klient dostává změny ze strany databáze a neptá se stále „je něco nového? je něco nového?“.&lt;/p&gt;

&lt;p&gt;V &lt;a href="http://github.com/karmi/couchdb-showcase"&gt;&lt;strong&gt;repositáři s ukázkovou aplikací&lt;/strong&gt;&lt;/a&gt; k přednášce najdete i ukázkovou implementaci klienta pro kontinuální &lt;code&gt;_changes&lt;/code&gt; kanál v Ruby. Spustíte ho příkazem &lt;code&gt;rake changes DATABASE=addressbook&lt;/code&gt; (viz README), a pak si již jen otevřete Futon v okně nad terminálem, nebo jiným způsobem upravíte data, a v reálném čase můžete sledovat, jak vám databáze tlačí informace o změnách.&lt;/p&gt;

&lt;div class="thumbnail" style="margin-bottom: 1em"&gt;&lt;a href="http://img.skitch.com/20101003-j9ye6p7bdcwri7mx6g4g21cmic.png"&gt;&lt;img src="http://img.skitch.com/20100928-j9ye6p7bdcwri7mx6g4g21cmic.preview.jpg" alt="Apache CouchDB - Futon: View Document"/&gt;&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;Možností využití kontinuálního kanálu je nepřeberně. Krom té, která nás všechny napadne jako první, tedy &lt;strong&gt;chatu&lt;/strong&gt;, je to např. možnost přesouvat a agregovat data mezi databázemi, zejména díky možnosti &lt;code&gt;_changes&lt;/code&gt; kanál &lt;strong&gt;fitrovat&lt;/strong&gt;. Ideálně se ale hodí pro všechny možné &lt;em&gt;data enrichment&lt;/em&gt; operace, kdy např. chceme po uložení záznamu spustit nějaký asynchronní úkol (konverzi videa, doplnění dokumentu informacemi z webové služby, atd).&lt;/p&gt;

&lt;p&gt;A konečně, umožňuje na databázi napojit všemožné externí služby: jako příklad za všechny lze uvést &lt;a href="http://www.elasticsearch.com/blog/2010/09/28/the_river_searchable_couchdb.html"&gt;fulltext engine ElasticSearch&lt;/a&gt;, jehož podpora pro CouchDB byla přidána pár dní po WebExpo – a to právě napojením na &lt;code&gt;_changes&lt;/code&gt; kanál, který mu poskytuje ideální infrastrukturu pro kontinuální indexování záznamů.&lt;/p&gt;

&lt;h2&gt;CouchDB Lucene&lt;/h2&gt;

&lt;p&gt;Zaznamenal jsem i několik dotazů na &lt;a href="http://github.com/rnewson/couchdb-lucene"&gt;CouchDB-Lucene&lt;/a&gt; (CL), technologii pro fulltext indexování a prohledávání dokumentů v CouchDB, příp. i zvláštní názor, že to je přeci „nevýhoda“ CouchDB, když musím pro „složitější“ dotazy využít něco jako &lt;em&gt;fulltext search engine&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Použití CL je velmi jednoduché. Nejprve &lt;strong&gt;nadefinujeme indexy pro příslušné atributy dokumentu&lt;/strong&gt;, které nás zajímají – jako
&lt;a href="http://github.com/karmi/couchdb-showcase/blob/master/couchdb/_design/person/fulltext/search/index.js"&gt;JavaScriptovou funkci&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function(doc) {

  var result = new Document();
  if (doc.occupation) { result.add(doc.occupation, {"field":"occupation"}) }
  return result;

}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;br/&gt;
Poté, co CL napojíme na notifikace CouchDB, získáme data HTTP dotazem na fulltext index:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ curl "http://localhost:5984/addressbook/_fti/_design/person/search?q=occupation:supermodel&amp;amp;debug=true"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;br/&gt;
Lahůdka pro všímavé fanoušky HTTP: ve výsledném JSON opět dostáváme ETag pro konkrétní seznam výsledků. Zkrátka, HTTP od sklepa až na půdu.&lt;/p&gt;

&lt;p&gt;Stojí za upozornění, že CouchDB-Lucene není jediným řešením pro fulltextové vyhledávání a &lt;em&gt;ad-hoc&lt;/em&gt; dotazy. Vyjma vlastního napojení například na &lt;a href="http://lucene.apache.org/solr/"&gt;&lt;strong&gt;Solr&lt;/strong&gt;&lt;/a&gt;, které je realizovatelné s jakýmkoliv úložištěm, existuje např. experimentální projekt využití &lt;a href="http://github.com/ulbrich/couchsphinx"&gt;Sphinx v CouchDB&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Zvýšenou pozornost si ale zaslouží projekt &lt;strong&gt;ElasticSearch&lt;/strong&gt; zmíněný výše. ElasticSearch totiž nejenom &lt;em&gt;používá&lt;/em&gt; JSON, on mu &lt;em&gt;rozumí&lt;/em&gt;. Proto nemusíme deklarovat specifické atributy k indexaci, ale prostě necháme ElasticSearch získávat celé dokumenty z &lt;code&gt;_changes&lt;/code&gt; kanálu a můžeme se dotazovat rovnou do „hloubky“ JSON dokumentu:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ curl "http://localhost:9200/addressbook/_search?q=occupation:supermodel AND addresses.work.city:Eichmannburgh"
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Konflikty&lt;/h2&gt;

&lt;p&gt;Několik lidí také bylo překvapených implementací &lt;em&gt;multi version concurrency&lt;/em&gt;, která zamezuje tomu, upravit dokument, který nemám v poslední &lt;em&gt;revizi&lt;/em&gt;. Jakmile se v CouchDB pokusím uložit dokument, který se mezitím v databázi změnil, dostanu HTTP odpověď &lt;code&gt;409 Conflict&lt;/code&gt;. Mnozí účastníci, s nimiž jsem hovořil, byli nejen překvapení, ale rovnou &lt;em&gt;vyděšeni&lt;/em&gt; existencí nějakých „konfliktů“. Jako by to bylo nějaké &lt;em&gt;tabu&lt;/em&gt; slovo, jako bych takovou hrůzu měl zmiňovat &lt;em&gt;jen šeptem&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Pomineme nutnost uchovávání revizí a konceptu &lt;em&gt;konfliktu&lt;/em&gt; vůbec v silně distribuovaném/decentralizovaném světě jako je CouchDB. Důležitější je jiný ohled: ve skutečnosti to dává mnohem větší smysl, než práce s &amp;ldquo;tradiční&amp;rdquo; databází, která tyto děsivé „konflikty“ nemá (a naopak disponuje něčím mnohem děsivějším, jako např. zámky pro čtení a zápis).&lt;/p&gt;

&lt;p&gt;Uvažte scénář: uživatel začne upravovat dokument, a provádí nějakou větší změnu, která zabere hodně času. Mezitím, než v aplikaci uloží záznam, však přijde jiný uživatel, a provede změnu menší, kupříkladu pouze opraví telefonní číslo zákazníka v CRM systému. První uživatel ale zformátoval, doplnil, atd. ono &lt;em&gt;původní&lt;/em&gt; telefonní číslo (a též ostatní informace o zákazníkovi). Jakmile uloží tyto změny, &lt;em&gt;opravené&lt;/em&gt; telefonní číslo se přepíše &lt;em&gt;chybným&lt;/em&gt;. To z pohledu uživatele není &lt;em&gt;nijak&lt;/em&gt; intuitivní. Ba přímo naopak.&lt;/p&gt;

&lt;p&gt;Databáze totiž neví, a nemůže vědět, jak podobnou situaci vyřešit. Ale lidé ano. Programátor aplikace pak může snadno zobrazit obrazovku s rozdíly mezi „mojí“ verzí a verzí v databázi, a nechat uživatele vybrat ta správná data. To je dáno právě tím, že v CouchDB uvažujeme o dokumentu jako o skutečném dokumentu, nikoliv o soustavě relací (tabulek) propojených cizími klíči.&lt;/p&gt;

&lt;p&gt;Mimochodem, v tomto bodu také můžeme provést (výjimečně) přímé srovnání s přístupem databáze &lt;a href="http://www.mongodb.org/"&gt;MongoDB&lt;/a&gt;. V MongoDB je možné upravit a uložit jen &lt;em&gt;část&lt;/em&gt; dokumentu (tzv. &lt;a href="http://blog.mongodb.org/post/307919034/partial-object-updates-will-be-an-important-nosql"&gt;&lt;em&gt;partial update&lt;/em&gt;&lt;/a&gt;), nebo vložit jeden „dokument“ do druhého (tzv. &lt;a href="http://www.mongodb.org/display/DOCS/Updating+Data+in+Mongo#UpdatingDatainMongo-EmbeddingDocumentsDirectlyinDocuments"&gt;&lt;em&gt;embedded documents&lt;/em&gt;&lt;/a&gt;). V závislosti na vašem přesvědčení, znalostech a potřebách se vám taková vlastnost může zdát jako výhoda, nebo jako nevýhoda. Jakto? Protože nemůžete takovou vlastnost posuzovat izolovaně, ve smyslu „jé, to je hezké!“, ale v kontextu celkové architektury a koncepce, a připočíst např. použití nestandardního formátu (BSON), konsekvence pro replikaci, atd. Proto neexistuje asi lepší demonstrace toho, že NoSQL nerozumíte, než mluvit o výběru mezi CouchDB a MongoDB jako o výběru mezi „blondýnkou a brunetkou“. &lt;small&gt;(Prozrazuje to také zřetelné sklony k mužskému šovinismu, ale to se v IT světě nejen toleruje, ale často přímo hýčká.)&lt;/small&gt;&lt;/p&gt;

&lt;h2&gt;Ostatní&lt;/h2&gt;

&lt;p&gt;V kuloárech také padaly dotazy, které bych chtěl znovu velmi krátce zodpovědět zde.&lt;/p&gt;

&lt;p&gt;Za prvé, definice &lt;em&gt;views&lt;/em&gt; jsou uloženy v databázi stejně jako ostatní dokumenty, ve speciální variantě označované jako &lt;em&gt;design document&lt;/em&gt; (&lt;code&gt;_design/&amp;lt;NAME&amp;gt;&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Za druhé, definice &lt;em&gt;views&lt;/em&gt; nemusíte psát v JavaScriptu. Můžete je psát také v Erlangu. Ne, to není vtip :) Můžete je psát rovněž v Ruby či Pythonu, pro které existují experimentální tzv. &lt;em&gt;view servers&lt;/em&gt; — ale pro vážné nasazení se v současné době asi budete řídit tím, co dělají ostatní, a budete je psát v JavaScriptu (nebo Erlangu).&lt;/p&gt;

&lt;p&gt;Za třetí, pro další uvažování se vám vyplatí nechápat &lt;em&gt;views&lt;/em&gt; jako dotazy (&lt;em&gt;query&lt;/em&gt;) ve světě *SQL databází: těm daleko lépe odpovídají dotazy pro fulltext, jak bylo vidět výše. _Views_ totiž velmi přesně odpovídají _indexům_ ve světě *SQL databází; což je zjevné, když se nad tím chvíli zamyslíte. Výraz _view_ a _index_ je tedy ve světě CouchDB zaměnitelný.&lt;/p&gt;

&lt;p&gt;Máte-li nějaké další dotazy, neváhejte je položit v diskusi pod článkem, či v &lt;a href="http://webexpo.cz/prednaska/couchdb-databaze-pro-web/#tab-comments"&gt;diskusi na stránce WebExpo&lt;/a&gt;.&lt;/p&gt;</description><link>http://karmi-blog.tumblr.com/post/1234474779</link><guid>http://karmi-blog.tumblr.com/post/1234474779</guid><pubDate>Sun, 03 Oct 2010 12:22:43 +0200</pubDate><category>couchdb</category><category>nosql</category><category>http</category></item><item><title>"For four years we have offered the synchronization service for no charge, predicated on the..."</title><description>“For four years we have offered the synchronization service for no charge, &lt;strong&gt;predicated on the hypothesis that a business model would emerge&lt;/strong&gt; to support the free service. With that investment thesis thwarted, there is no way to pay expenses, primarily salary and hosting costs. Without the resources to keep the service going, we must shut it down.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://blog.xmarks.com/?p=1886"&gt;End of the Road for Xmarks&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/1214657679</link><guid>http://karmi-blog.tumblr.com/post/1214657679</guid><pubDate>Thu, 30 Sep 2010 09:55:00 +0200</pubDate></item><item><title>"Shirky’s argument is that this is the kind of thing that could never have happened in the..."</title><description>“Shirky’s argument is that this is the kind of thing that could never have happened in the pre-Internet age—and he’s right. (…) The story, to Shirky, illustrates “the ease and speed with which a group can be mobilized for the right kind of cause” in the Internet age. Shirky ends the story of the lost Sidekick by asking, portentously, “What happens next?”—no doubt imagining future waves of digital protesters. But he has already answered the question. What happens next is more of the same. A networked, weak-tie world is good at things like helping Wall Streeters get phones back from teen-age girls.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://www.newyorker.com/reporting/2010/10/04/101004fa_fact_gladwell?currentPage=all#ixzz10pLq57yX"&gt;Malcolm Gladwell: &lt;em&gt;Small Change. Why the revolution will not be tweeted&lt;/em&gt;&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/1204307028</link><guid>http://karmi-blog.tumblr.com/post/1204307028</guid><pubDate>Tue, 28 Sep 2010 14:40:41 +0200</pubDate></item><item><title>"ZUCK: yea so if you ever need info about anyone at harvard

ZUCK: just ask

ZUCK: i have over 4000..."</title><description>“&lt;p&gt;ZUCK: yea so if you ever need info about anyone at harvard&lt;/p&gt;

&lt;p&gt;ZUCK: just ask&lt;/p&gt;

&lt;p&gt;ZUCK: i have over 4000 emails, pictures, addresses, sns&lt;/p&gt;

&lt;p&gt;FRIEND: what!? how’d you manage that one?&lt;/p&gt;

&lt;p&gt;ZUCK: people just submitted it&lt;/p&gt;

&lt;p&gt;ZUCK: i don’t know why&lt;/p&gt;

&lt;p&gt;ZUCK: they “trust me”&lt;/p&gt;

&lt;p&gt;ZUCK: dumb fucks&lt;/p&gt;”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://www.newyorker.com/reporting/2010/09/20/100920fa_fact_vargas?currentPage=all"&gt;Jose Antonio Vargas: “The Face of Facebook”, &lt;em&gt;The New Yorker&lt;/em&gt;&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/1120523978</link><guid>http://karmi-blog.tumblr.com/post/1120523978</guid><pubDate>Tue, 14 Sep 2010 13:29:11 +0200</pubDate></item><item><title>"I firmly believe that a significant (but certainly not the only) part of Rails and Django’s..."</title><description>“I firmly believe that a significant (but certainly not the only) part of Rails and Django’s success comes from the fact that they are frameworks meant to help a team write specific applications. In contrast, ASP.NET is a framework designed with vague (and initially very flawed) ideas of what application development would look like. Rails and Django are built by application developers. ASP.NET is built by framework developers. The difference in the finished products because of that is staggering.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://openmymind.net/2010/8/30/How-I-would-fix-ASP-NET"&gt;How I would fix ASP.NET&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/1064160485</link><guid>http://karmi-blog.tumblr.com/post/1064160485</guid><pubDate>Sat, 04 Sep 2010 16:29:06 +0200</pubDate></item><item><title>"Normálně byste si řekli, že něco takového je vtip, nesplnitelný sen udělat z Rumcajse Chow Yun-Fata,..."</title><description>“Normálně byste si řekli, že něco takového je vtip, nesplnitelný sen udělat z Rumcajse Chow Yun-Fata, ale tady je to pojímáno naprosto vážně.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://aktualne.centrum.cz/kultura/film/recenze/clanek.phtml?id=674307"&gt;Kamil Fila RULEZ :)&lt;/a&gt;&lt;/em&gt;</description><link>http://karmi-blog.tumblr.com/post/913092695</link><guid>http://karmi-blog.tumblr.com/post/913092695</guid><pubDate>Fri, 06 Aug 2010 17:48:10 +0200</pubDate></item><item><title>Rake task to launch multiple Resque workers in development/production with simple management included</title><description>&lt;p&gt;&lt;script src="http://gist.github.com/486161.js?file=workers.rake"&gt;&lt;/script&gt;&lt;/p&gt;</description><link>http://karmi-blog.tumblr.com/post/845791249</link><guid>http://karmi-blog.tumblr.com/post/845791249</guid><pubDate>Thu, 22 Jul 2010 19:15:53 +0200</pubDate></item></channel></rss>
