Ephemera

Creating the “Decide the Election” Interactive Visualization

Thu, 24 Jan 2013 19:11:00 +0100

The second round of the Czech 2013 presidential election looks like a close tie, with highly polarized candidates and their support bases alike.

In the last few days, we’ve created an application which enables everybody to play an armchair politologist in their browser, modelling possible scenarios of the election outcome. You can see the final version of the application pictured below (click to load it).

In this post, I’d like to walk you through the design and implementation process, as well as elaborate on the technical solution. For the political context, visit the Wikipedia page about the elections or the accompanying article on iDnes.cz (in Czech).

The Process

Given the tight election outcome predictions, my friends Eliška, Josef and Vojta have debated how to create a web application to play with all the possibilities in an entertaining and informative way, reusing the knowledge gained by handling heaps of data in our dayjobs.

After some crazy ideas featuring 80’s disco themes, turntables and lots of custom JavaScript, we have quickly settled upon an “interactive dashboard” style, with a clear grid and sharp aesthetics.

From the start, we knew we will use the magnificient D3.js library, which makes creating highly sophisticated, interactive and good-looking visualizations relatively easy. Our model and template was quite clearly the famous, unmatchable 512 Paths to the White House interactive graphic created by Mike Bostock and Shan Carter in the New York Times.

The first step was to create a minimal visual representation of results from the election’s first round. Luckily, it was my off duties day, so I have spent the Friday afternoon fiddling with D3.js and created the initial mockup:

Looks funny, right? Absolutely. But it also demonstrates the single most important feature of D3.js: it’s not a charting library. It does not come with a set of predefined visualization types. It opts for a different approach: it makes it relatively easy to create the visualization from a set of flexible primitives and finely-tuned utility functions.

D3.js goes to great lengths to promote this principle: even the most primitive of all possible chart types, the bar chart, isn’t offered as a pre-packaged solution, but as a simple-to-follow tutorial. The underlying concept is called a data join — in essence, D3.js is just a very pleasant way of setting up a binding between your data and the graphical elements on the screen.

I have ended the first day with a rough vision for the application, and set out to prepare a conference talk for Saturday :)

The next day, Vojta and me have spent some time playing with different layouts and interactive features. You can see how the “turntable faders” idea resurfaced in the form of range sliders for setting the voters participation and the candidate split.

At that time, the code was a sprawling mess, a big ball of d3 declarations, magic numbers in positioning offsets, and duplicated code. A time for rewrite, clearly. So, all of us have spent a chilly Sunday afternoon by the office whiteboard, deconstructing the data set (7 first round candidates, 2 finalists, undecided voters), experimenting with the layout options and visual encoding of the data.

We agreed upon a different grid, which would make enough room for all the control elements, photos, captions, etc.

We were cleaning up the JavaScript code and slowly building up the positioning and visual encoding as the night was falling upon Prague.

On Monday, we have split the duties: I started polishing the visual aspects of the application and Vojta started working on a set of Chef recipes for building the supporting infrastructure.

I have ended the day with a rough version of the minimal feature set, with dubious behaviour and buggy semantics. Big things have small beginnings… If you’re particularly nosy, you can clone the Git repository and devour all the silly mistakes and dirty implementation details just by checking out different commits in the history.

On the infrastructure front, we needed a reliable webserver and a flexible storage solution.

It sounds quite silly to create a Chef-based, fully automated provisioning toolchain for an application which will be online for couple of days, right? Not. It follows the single most important principle of the #devops movement: the ability to rebuild your infrastructure from scratch from a set of provisioning scripts, application code, data backup and bare computing power.

We’re big fans of the Chef toolchain, even considering its huge flaws and deficiencies. For this project, we have reused the knowledge and many tools built and published during our regular jobs.

For the storage layer, we chose Elasticsearch, a very powerful and flexible search engine, which allows us to store the JSON data generated by our application’s users directly, without any serialization or translation. Elasticsearch is blazingly fast, resilient, and we have plenty of experience with it. This choice was not particularly hard for us.

Additionally, with all the anonymous numerical data stored in Elasticsearch, it will be quite easy to analyze it later by using Elasticsearch’s faceting features (computing the statistical values for the outcome estimations, creating date histograms, etc).

Since Elasticsearch does not offer a restricted access, Vojta has created a simple Sinatra-based Ruby web application, which serves as a proxy between the JavaScript code and Elasticsearch, using the Tire library, and also serves the application in development mode. The application itself is served by the insanely fast and efficient Nginx webserver, which provides the access and error logs for observing the application behaviour and usage patterns analysis.

Thanks to the Chef ecosystem, the whole stack is installed and configured in an automated manner in Amazon EC2, all the services are guarded by Monit, data are backed up as EBS snapshots. If needed, we’re able to recreate the whole stack in five minutes. (If you’re interested, we’re using a process based on the Deploying Elasticsearch with Chef Solo tutorial.)

On Tuesday, we have divided our time between two major tasks: first, fixing all the bugs in the application business and drawing logic, and second, creating a proper visual design for the application. Since we have been talking with an online newspaper about the possibility of publishing the application to a wide audience, I have settled on giving the application a decisive newspaper look, which could be described as “a magazine spread come alive”. (This task was quite enjoyable compared to all the calculator-driven coding of the application, which brought dreadful memories of my career as a Flash designer and ActionScript developer for me, and cheerful memories of teenage Atari programming for Josef.)

After a short nap, here comes Wednesday, our go-to-live day. We had an excellent support from our publisher, and began fighting security restrictions of <iframe>s, design quirks, fine-tuning the design of <input[range]> sliders in Microsoft Explorer 10, and stroking our chins about possible support for Firefox, which, amusingly, does not support the range slider yet. You know, the final phase of any web-based software project.

(We had a brief affair with the html5slider library, but after extensive checking decided to pull it from the already published application. The inconsistencies, no clear way on how to re-trigger the initialization, and many subtle problems just weren’t worth it. So far, we have received only limited complaints; browser feature matrix is clearly a very Darwinian field…)

All the frenzy caused us to miss an optimal prime time for publishing the application and accompanying article — together with the publisher, we scheduled the publication for early morning the next day, giving us enough time to miss another night’s sleep and add many nice features in the process.

On Thursday, we have awaken to find the accompanying article featured on the homepage of the biggest online newspaper in Czech Republic, iDnes.cz, being the second most visited story for the better part of the day. The application visits have been peaking around 3,000 visits per hour for the whole morning and more then 1,000 scenarios have been saved. The audience response was very positive, comparing us (embarrasingly enough :) to the famed New York Times example mentioned earlier.

If anything, the whole story is a reminder how powerful the data driven journalism approach can be, and that it’s quite within reach of most newspapers — they just have to care enough. While an application like this is not something you can whip up in an afternoon and go home — after all, it took two highly skilled developers more than three 12–hour days to create it —, with tools such as D3.js, Elasticsearch, Chef, Ruby and Amazon EC2, it’s an enjoyable and rewarding process.

Spending most of the day in a purple haze, we have been spelunking around the access and error logs with Splunk, watching office colleagues creating and sharing their own scenarios, and debating the election campaign details and silliness.

Nevertheless, above all, we hope that the application will persuade people to go out voting tommorrow. Because the future is predictable, but uncertain.

Go have a look at the application and tell us what you think in the comments or at Hacker News.

Search Your Gmail Messages with ElasticSearch and Ruby

Sun, 15 May 2011 16:28:00 +0200

If you’d like to check out ElasticSearch, there’s already lots of options where to get the data to feed it with. You can use a Twitter or Wikipedia river to fill it with gigabytes of public data, or you can feed it very quickly with some RSS feeds.

But, let’s get a bit personal, shall we? Let’s feed it with your own e-mail, imported from your own Gmail account.

We’ll use couple of Ruby gems: Gmail to fetch the e-mail data, Tire to put them into ElasticSearch and search them, and Sinatra to create a simple web application, which will allow us to search the messages. You can see it displayed below.

First of all, download or clone the source code from this gist. If you have ElasticSearch, Ruby and Rubygems, install all the required gems with the Bundler gem:

$ bundle install

We’ll import the data with the gmail-import.rb script. You must provide it your Gmail credentials, like this:

$ ruby gmail-import.rb user@gmail.com yourpassword

Leave the script running in a terminal session, and launch the provided web application in another one, passing it the your Gmail account name:

$ INDEX=user@gmail.com ruby gmail-server.rb

You should see your own e-mail displayed at http://localhost:4567/. Make sure to check out all of the rich Lucene query syntax.

Of course, you’re not limited to search. With ElasticSearch facets, you can pull interesting stuff out of your data, such as getting statistics on who’s sending you the most e-mail:

$ curl -X POST "http://localhost:9200/user@gmail.com/message/_search?pretty=true" -d '
    {
      "facets" : {
        "senders" : { "terms" : { "field" : "from.exact" } }
      },
      "size" : 0
    }
  '

It’s definitely noreply@github.com in my case :) Your data are available in the http://localhost:9200/user@gmail.com/_search?pretty=true&q=* index.

The full source code is available below.

Redis — The AK-47 of Post-relational Databases

Thu, 12 May 2011 14:31:08 +0200

"Yes, you’re going to write some sketches that you love and are proud of forever—your golden nuggets...."

Sun, 20 Mar 2011 13:21:49 +0100

“Yes, you’re going to write some sketches that you love and are proud of forever—your golden nuggets. But you’re also going to write some real shit nuggets. You can’t worry about it. As long as you know the difference, you can go back to panning for gold on Monday.”

- Tina Fey

"In the state with the highest cigarette taxes in the country, in a city that has become one of the..."

Fri, 25 Feb 2011 10:04:33 +0100

“In the state with the highest cigarette taxes in the country, in a city that has become one of the hardest places in America to find a place to smoke, Ms. Silk has gone off the grid, growing, processing and smoking her own tax-free cigarettes from packets of seeds she buys online for about $2.”

- Now in Brooklyn, Homegrown Tobacco: Local, Rebellious and Tax Free

"One piece of advice I give to S.E.O. masters is, don’t chase after Google’s algorithm, chase after..."

Thu, 17 Feb 2011 21:54:47 +0100

“One piece of advice I give to S.E.O. masters is, don’t chase after Google’s algorithm, chase after your best interpretation of what users want, because that’s what Google’s chasing after.”

- Matt Cutts, Google

"Sure, Shazam, the popular music-spotting cellphone application, can identify that Rihanna track. But..."

Mon, 14 Feb 2011 09:39:54 +0100

“Sure, Shazam, the popular music-spotting cellphone application, can identify that Rihanna track. But what about the new song from the Sandwitches, a Bay Area folk-rock band? That is where Charles Slomovitz comes in.”

- In Digital Era, Music Spotters Feed a Machine

"For example, claiming publicly that something is unhackable is usually a good way to find out that..."

Tue, 14 Dec 2010 13:36:50 +0100

“For example, claiming publicly that something is unhackable is usually a good way to find out that it is.”

- The Real Lessons Of Gawker’s Security Mess

"We conclude that the current RDBMS code lines, while attempting to be a “one size fits all”..."

Sat, 11 Dec 2010 14:36:02 +0100

“We conclude that the current RDBMS code lines, while attempting to be a “one size fits all” solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of “from scratch” specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow’s requirements, not continue to push code lines and architectures designed for yesterday’s needs.”

- The End of an Architectural Era

"Exactly two things have made airplane travel safer since 9/11: reinforcing the cockpit door, and..."

Tue, 30 Nov 2010 09:42:37 +0100

“Exactly two things have made airplane travel safer since 9/11: reinforcing the cockpit door, and convincing passengers they need to fight back. Everything else has been a waste of money. Add screening of checked bags and airport workers and we’re done. Take all the rest of the money and spend it on investigation and intelligence.”

- Bruce Schneier

Monittr: A Ruby and Web Interface for Multiple Monit Instances

Sun, 28 Nov 2010 15:39:00 +0100

On my current contract, we have Monit set up to monitor a number of servers. Obviously, when you monitor something, you’d like to check its status from time to time. And while the default Monit web interface is good enough for a quick check of one system, it has several major shortcomings.

You have to remember or bookmark the URL and credentials for the interface. The interface is “good enough”, but the constant refreshing is kinda crazy.

Worse, you can check only one Monit instance at a time. It’s impossible to do on a regular basis for more than one machine. Of course, there’s MMonit, but it brings another piece to the infrastructure, advanced features which are useless to us, and doesn’t display the system’s status in a concise way. We need something else, altogether.

We need to display some Monit statistics from multiple servers inside our admin application. So, when I’ve discovered that Monit has an XML output (while reading the sources of the monit gem), I knew it would be just a matter of parsing and displaying it.

The result is the monittr Rubygem.

It’s main goal is to be able to plug Monit status information from multiple systems into any Ruby application, specifically into a Sinatra web application.

You can either use the Ruby interface to retrieve the Monit statistics and display them as you wish, or use the provided Sinatra extension to embed them directly into your admin application, like this:

<p>This is your regular admin application...</p>
<div id="monittr">
<%= monittr.html %>
</div>

The project README provides extensive information about how to try out, use and customize it: https://github.com/karmi/monittr. You can customize the template and supply it as your own, changing the content, stylesheets, etc. It’s just an ERB file.

Any feedback, suggestions or patches are welcome via e-mail or Github Issues/Pull Requests.

Monittr source at Github

Explaining Ruby on Rails Infrastructure Alternatives

Fri, 29 Oct 2010 09:28:00 +0200

Last week, I had a workshop on Rails infrastructure, deployment and hosting for one of the major hosting companies in Czech Republic, Active24.

I was confronted with the task of explaining and showing the various pieces (web server, application server, configurations, RubyGems management, etc) without browsing some previously setup server, opening configuration files in Vim, and relying on blind luck that all this will somehow click together for my audience.

So, I’ve decided early on that I need to use some “recipe” definition for showing what are the packages and components and then use those those recipes to install a full Rails stack on a clean machine. You’ll find them all in a Github repository: karmi/rails-deployment-setups-sprinkle.

I quickly rejected Chef or Puppet for this task: I wanted the attendees to be able to provision a clean VPS with a Rails stack themselves, without any overhead and lengthy introductions. I have found out that the Sprinkle provisioning tool would be the perfect fit for the task.

In fact, just about everything about Sprinkle is awesome. Its language is very clear and elegant. It can install software from source or from packages on every major platform. Its recipes are idempotent, ie. when you set the “verification” conditions properly, you can run the installer over and over again and it will install only the missing pieces. It can install software remotely equally well as locally. Its source code is a great exercise in well designed and well written Ruby library.

So, I went with Sprinkle.

Attendees needed only Ruby and RubyGems on their local machines. After cloning the repository with recipes, they have installed various Rails stacks on their own, fresh Ubuntu VPS in matter of minutes.

Sprinkle recipes are concise, understandable and clarify the relations between various pieces of a Rails stack. Take, for instance, the definition of a classic Apache/Passenger stack:

Notice how the Sprinkle recipe definition clarifies what components are needed at all, so you can base your explanations and whiteboard schemes on them.

The configuration for the stack is comparably simple to follow:

That is extremely convenient, because you can use the same instructions to explain what you need to run a Rails application, how theese pieces work together, and to actually install them on a server via provisioning tool, while talking about all of that. (And you have another argument for Ruby’s syntax superiority, of course.)

To put it another way, Sprinkle recipes can be understood as executable instructions in the same sense as RSpec or Shoulda tests can be understood as executable specifications.

I am absolutely convinced that I couldn’t explain everything any other way; after a half-day workshop, the operations guys walked away with 100% understanding how hosting Ruby on Rails works, what are the alternatives and their advantages, and they can continue their explorations with the provided recipes. (The first half of the workshop was dedicated to showing them how developing Rails applications works.)

If you need to explain or teach Rails infrastructure, either on commercial workshops or in education, you may well use and adapt these recipes — please refer to the Readme how to use them.

Nevertheless, I tried hard for encouraging best practices of hosting web applications, so you’ll find eg. Apache and Nginx correctly configured with expires headers for static assets or gzipping text responses. Thus, you can use the recipes for provisioning a real box as well.

After you install one of the stacks, a place for your Rails applications is created in /var/applications. To actually check out if the stack works, create a simple demo application. Connect via SSH to the box and run:

$ cd /var/applications
$ rails new demo

$ cd demo
$ rm public/index.html
$ rails generate controller welcome index

Now, put something like render :text => 'Welcome'! inside the app/controllers/welcome_controller.rb, and uncomment the root :to => "welcome#index" route inside config/routes.rb.

You should see your new demo application’s greeting when you load the server IP or hostname in a browser. And it probably took 15 minutes or less.

Rails had, historically, bad reputation in terms of deployment simplicity. As you can see, that is really a thing of the past. Whether you’re deploying to Heroku, Engine Yard, or your own server, the Ruby community has done a wonderful job of smoothing all the hard edges.

"(…) The logic is akin to sticking with telegrams and avoiding the voice telephone. Telegrams..."

Sun, 10 Oct 2010 08:43:20 +0200

“(…) The logic is akin to sticking with telegrams and avoiding the voice telephone. Telegrams are written records and therefore can be stored, they can be confirmed and therefore audited, and they are the standard… sound familiar?”

- John Fitzpatrick, “NoSQL means no SQL”

Doplněk k přednášce o CouchDB na Webexpo 2010

Sun, 03 Oct 2010 12:22:43 +0200

Děkuji všem za online i offline reakce na moji přednášku o CouchDB. Snažil jsem se věnovat hodně prostoru obecnějším otázkám spojeným s využitím „netradičních“ databází, které CouchDB dobře reprezentuje a zároveň ryze praktickým zkušenostem, které jsem díky intezivnímu používání CouchDB za poslední rok získal. V tomto článku bych rád dodatečně doplnil či upřesnil některá témata či zodpověděl dotazy.

Příklady využití bezeschémových databází

Za prvé bych chtěl doplnit příklad využití bezeschémových, resp. dokumentových databází. Příklad s adresářem byl, jak doufám, dostatečně ilustrativní: jedná se o nejobvyklejší příklad heterogenních dat: někdo má dva telefony, jiný tři, další má Skype, zatímco jiný má Jabber, a tak dále. Příklad obrázku ke kontaktu navíc hezky ilustruje výhodu CouchDB v přirozeném ukládání binárních dat přímo k dokumentu.

Stejně dobrým příkladem ale může být i to, co každý webový vývojář ve svém životě programoval či potkal tucetkrát: „redakční systém“. Uvozovky jsou záměrné, neboť nemá smysl řešit, co je redakční systém, co publikační systém a co systém na správu obsahu (CMS), proto budeme dále využívat zkratky CMS. Každý, kdo někdy vytvářel CMS ví, že „stránka“ (dokument) v něm není „title a velká textarea s WYWIWYGem“. Naopak, stránka je zpravidla sestavou mnoha elementů, které mohou být textové, obrázkové, mohou odkazovat na další entity (zkrácený výpis novinek) nebo na externí zdroje (video na YouTube).

Oblíbeným školním cvičením každého CMS je pak ukládání hierarchických dat („stromů“) — a jejich efektivní získávání. V dokumentové databázi typu CouchDB mohu „strom“ reprezentovat přímo jako JSON dokument, a obejdu se bez zdlouhavého překládání mezi tabulkovou a hierarchickou reprezentací.Velká výhoda jakékoliv bezeschémové databáze je pak v tom, že mohu stránky modelovat v podstatě ad hoc, bez nutnosti vymýšlet dopředu dostatečně flexibilní schéma – anebo je, muset dodatečně upravovat a ladit. V případě CouchDB se mi opět hodí nativní podpora binárních attachmentů k dokumentu.

Kontinuální „stream“ HTTP notifikací o změnách v databázi

Jedna z nejzajímavějších vlastností CouchDB, které jsme se dotkli jen letmo, je kontinuální „stream“ změn v databázi, tzv. _changes feed. Nejedná se o nějakou zajímavou fíčurku přidanou pro efekt: na základě _changes kanálu funguje celá infrastruktura replikace v CouchDB, nebo indexování ze strany CouchDB-Lucene. Je tedy tak „robustní“, jak jen si lze představit.

Jasnou výhodou _changes kanálu je možnost persistentního spojení mezi klientem a databázovým serverem (s parametrem continuous). Databáze otevře pro každého klienta jedno „vlákno“ v Erlangu a ten ji tedy nezatěžuje pravidelným pollingem, jak je zvykem. Navíc se jedná o push notifikaci, takže klient dostává změny ze strany databáze a neptá se stále „je něco nového? je něco nového?“.

V repositáři s ukázkovou aplikací k přednášce najdete i ukázkovou implementaci klienta pro kontinuální _changes kanál v Ruby. Spustíte ho příkazem rake changes DATABASE=addressbook (viz README), a pak si již jen otevřete Futon v okně nad terminálem, nebo jiným způsobem upravíte data, a v reálném čase můžete sledovat, jak vám databáze tlačí informace o změnách.

Možností využití kontinuálního kanálu je nepřeberně. Krom té, která nás všechny napadne jako první, tedy chatu, je to např. možnost přesouvat a agregovat data mezi databázemi, zejména díky možnosti _changes kanál fitrovat. Ideálně se ale hodí pro všechny možné data enrichment operace, kdy např. chceme po uložení záznamu spustit nějaký asynchronní úkol (konverzi videa, doplnění dokumentu informacemi z webové služby, atd).

A konečně, umožňuje na databázi napojit všemožné externí služby: jako příklad za všechny lze uvést fulltext engine ElasticSearch, jehož podpora pro CouchDB byla přidána pár dní po WebExpo – a to právě napojením na _changes kanál, který mu poskytuje ideální infrastrukturu pro kontinuální indexování záznamů.

CouchDB Lucene

Zaznamenal jsem i několik dotazů na CouchDB-Lucene (CL), technologii pro fulltext indexování a prohledávání dokumentů v CouchDB, příp. i zvláštní názor, že to je přeci „nevýhoda“ CouchDB, když musím pro „složitější“ dotazy využít něco jako fulltext search engine.

Použití CL je velmi jednoduché. Nejprve nadefinujeme indexy pro příslušné atributy dokumentu, které nás zajímají – jako JavaScriptovou funkci:

function(doc) {

  var result = new Document();
  if (doc.occupation) { result.add(doc.occupation, {"field":"occupation"}) }
  return result;

}

Poté, co CL napojíme na notifikace CouchDB, získáme data HTTP dotazem na fulltext index:

$ curl "http://localhost:5984/addressbook/_fti/_design/person/search?q=occupation:supermodel&debug=true"

Lahůdka pro všímavé fanoušky HTTP: ve výsledném JSON opět dostáváme ETag pro konkrétní seznam výsledků. Zkrátka, HTTP od sklepa až na půdu.

Stojí za upozornění, že CouchDB-Lucene není jediným řešením pro fulltextové vyhledávání a ad-hoc dotazy. Vyjma vlastního napojení například na Solr, které je realizovatelné s jakýmkoliv úložištěm, existuje např. experimentální projekt využití Sphinx v CouchDB.

Zvýšenou pozornost si ale zaslouží projekt ElasticSearch zmíněný výše. ElasticSearch totiž nejenom používá JSON, on mu rozumí. Proto nemusíme deklarovat specifické atributy k indexaci, ale prostě necháme ElasticSearch získávat celé dokumenty z _changes kanálu a můžeme se dotazovat rovnou do „hloubky“ JSON dokumentu:

$ curl "http://localhost:9200/addressbook/_search?q=occupation:supermodel AND addresses.work.city:Eichmannburgh"

Konflikty

Několik lidí také bylo překvapených implementací multi version concurrency, která zamezuje tomu, upravit dokument, který nemám v poslední revizi. Jakmile se v CouchDB pokusím uložit dokument, který se mezitím v databázi změnil, dostanu HTTP odpověď 409 Conflict. Mnozí účastníci, s nimiž jsem hovořil, byli nejen překvapení, ale rovnou vyděšeni existencí nějakých „konfliktů“. Jako by to bylo nějaké tabu slovo, jako bych takovou hrůzu měl zmiňovat jen šeptem.

Pomineme nutnost uchovávání revizí a konceptu konfliktu vůbec v silně distribuovaném/decentralizovaném světě jako je CouchDB. Důležitější je jiný ohled: ve skutečnosti to dává mnohem větší smysl, než práce s “tradiční” databází, která tyto děsivé „konflikty“ nemá (a naopak disponuje něčím mnohem děsivějším, jako např. zámky pro čtení a zápis).

Uvažte scénář: uživatel začne upravovat dokument, a provádí nějakou větší změnu, která zabere hodně času. Mezitím, než v aplikaci uloží záznam, však přijde jiný uživatel, a provede změnu menší, kupříkladu pouze opraví telefonní číslo zákazníka v CRM systému. První uživatel ale zformátoval, doplnil, atd. ono původní telefonní číslo (a též ostatní informace o zákazníkovi). Jakmile uloží tyto změny, opravené telefonní číslo se přepíše chybným. To z pohledu uživatele není nijak intuitivní. Ba přímo naopak.

Databáze totiž neví, a nemůže vědět, jak podobnou situaci vyřešit. Ale lidé ano. Programátor aplikace pak může snadno zobrazit obrazovku s rozdíly mezi „mojí“ verzí a verzí v databázi, a nechat uživatele vybrat ta správná data. To je dáno právě tím, že v CouchDB uvažujeme o dokumentu jako o skutečném dokumentu, nikoliv o soustavě relací (tabulek) propojených cizími klíči.

Mimochodem, v tomto bodu také můžeme provést (výjimečně) přímé srovnání s přístupem databáze MongoDB. V MongoDB je možné upravit a uložit jen část dokumentu (tzv. partial update), nebo vložit jeden „dokument“ do druhého (tzv. embedded documents). V závislosti na vašem přesvědčení, znalostech a potřebách se vám taková vlastnost může zdát jako výhoda, nebo jako nevýhoda. Jakto? Protože nemůžete takovou vlastnost posuzovat izolovaně, ve smyslu „jé, to je hezké!“, ale v kontextu celkové architektury a koncepce, a připočíst např. použití nestandardního formátu (BSON), konsekvence pro replikaci, atd. Proto neexistuje asi lepší demonstrace toho, že NoSQL nerozumíte, než mluvit o výběru mezi CouchDB a MongoDB jako o výběru mezi „blondýnkou a brunetkou“. (Prozrazuje to také zřetelné sklony k mužskému šovinismu, ale to se v IT světě nejen toleruje, ale často přímo hýčká.)

Ostatní

V kuloárech také padaly dotazy, které bych chtěl znovu velmi krátce zodpovědět zde.

Za prvé, definice views jsou uloženy v databázi stejně jako ostatní dokumenty, ve speciální variantě označované jako design document (_design/<NAME>).

Za druhé, definice views nemusíte psát v JavaScriptu. Můžete je psát také v Erlangu. Ne, to není vtip :) Můžete je psát rovněž v Ruby či Pythonu, pro které existují experimentální tzv. view servers — ale pro vážné nasazení se v současné době asi budete řídit tím, co dělají ostatní, a budete je psát v JavaScriptu (nebo Erlangu).

Za třetí, pro další uvažování se vám vyplatí nechápat views jako dotazy (query) ve světě *SQL databází: těm daleko lépe odpovídají dotazy pro fulltext, jak bylo vidět výše. _Views_ totiž velmi přesně odpovídají _indexům_ ve světě *SQL databází; což je zjevné, když se nad tím chvíli zamyslíte. Výraz _view_ a _index_ je tedy ve světě CouchDB zaměnitelný.

Máte-li nějaké další dotazy, neváhejte je položit v diskusi pod článkem, či v diskusi na stránce WebExpo.

"For four years we have offered the synchronization service for no charge, predicated on the..."

Thu, 30 Sep 2010 09:55:00 +0200

“For four years we have offered the synchronization service for no charge, predicated on the hypothesis that a business model would emerge to support the free service. With that investment thesis thwarted, there is no way to pay expenses, primarily salary and hosting costs. Without the resources to keep the service going, we must shut it down.”

- End of the Road for Xmarks

"Shirky’s argument is that this is the kind of thing that could never have happened in the..."

Tue, 28 Sep 2010 14:40:41 +0200

“Shirky’s argument is that this is the kind of thing that could never have happened in the pre-Internet age—and he’s right. (…) The story, to Shirky, illustrates “the ease and speed with which a group can be mobilized for the right kind of cause” in the Internet age. Shirky ends the story of the lost Sidekick by asking, portentously, “What happens next?”—no doubt imagining future waves of digital protesters. But he has already answered the question. What happens next is more of the same. A networked, weak-tie world is good at things like helping Wall Streeters get phones back from teen-age girls.”

- Malcolm Gladwell: Small Change. Why the revolution will not be tweeted

"ZUCK: yea so if you ever need info about anyone at harvard ZUCK: just ask ZUCK: i have over 4000..."

Tue, 14 Sep 2010 13:29:11 +0200

“

ZUCK: yea so if you ever need info about anyone at harvard

ZUCK: just ask

ZUCK: i have over 4000 emails, pictures, addresses, sns

FRIEND: what!? how’d you manage that one?

ZUCK: people just submitted it

ZUCK: i don’t know why

ZUCK: they “trust me”

ZUCK: dumb fucks

”

- Jose Antonio Vargas: “The Face of Facebook”, The New Yorker

"I firmly believe that a significant (but certainly not the only) part of Rails and Django’s..."

Sat, 04 Sep 2010 16:29:06 +0200

“I firmly believe that a significant (but certainly not the only) part of Rails and Django’s success comes from the fact that they are frameworks meant to help a team write specific applications. In contrast, ASP.NET is a framework designed with vague (and initially very flawed) ideas of what application development would look like. Rails and Django are built by application developers. ASP.NET is built by framework developers. The difference in the finished products because of that is staggering.”

- How I would fix ASP.NET

"Normálně byste si řekli, že něco takového je vtip, nesplnitelný sen udělat z Rumcajse Chow Yun-Fata,..."

Fri, 06 Aug 2010 17:48:10 +0200

“Normálně byste si řekli, že něco takového je vtip, nesplnitelný sen udělat z Rumcajse Chow Yun-Fata, ale tady je to pojímáno naprosto vážně.”

- Kamil Fila RULEZ :)

Rake task to launch multiple Resque workers in development/production with simple management included

Thu, 22 Jul 2010 19:15:53 +0200