<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
 <channel>
  <title>antirez weblog</title>
  <link>http://antirez.com</link>
  <description>antirez weblog</description>
  <language>it-it</language>
  <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/antirez" /><feedburner:info uri="antirez" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>antirez</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
   <title>Redis Virtual Memory: the story and the code</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/xkJF2JTsU_0/redis-virtual-memory-story.html</link>
   <guid isPermaLink="false">http://antirez.com/post/203</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;If you are reading this article probably you already know it: &lt;a href="http://code.google.com/p/redis"&gt;Redis&lt;/a&gt; is an in-memory DB. It's persistent, as it's disk backed, but disk is only used to persist, all the data is taken in the computer RAM.
&lt;br/&gt;&lt;br/&gt;

I think the last few months showed that this was not a bad design decision. Redis proved to be very fast in real-world scenarios where there is to scale an unhealthy amount of writes, and it is supporting advanced features like &lt;a href="http://code.google.com/p/redis/wiki/ZaddCommand"&gt;Sorted Sets&lt;/a&gt;, and many other complex atomic operations, just because it is in memory, and single threaded. In other words, some of the features supported by Redis tend to be very complex to implement if there is to organize data on disk for fast access, and there are many concurrent threads accessing this data. The Redis design made this two problems a non issue, with the drawback of holding data in memory.
&lt;br/&gt;&lt;br/&gt;

I really think to take data in memory is the way to go in many real world scenarios, as eventually your most accessed data must be in memory anyway to scale (think at the memcached farms many companies are running in this moment). But warning. I said &lt;i&gt;most accessed&lt;/i&gt;. Too many datasets have something in common, they are accessed in a long tail fashion, that is, a little percentage of the dataset will get the majority of the queries (let's call it the &lt;i&gt;hot spot&lt;/i&gt;). Still from time to time even data outside the hot spot is requested. With Redis we are forced (well, actually &lt;i&gt;were&lt;/i&gt; forced) to take all the data in memory, and it's a huge waste as actually most of the times only our hot spot is stressed. So the logical question started to be more and more this: &lt;b&gt;is there a way to free the memory used by rarely accessed data&lt;/b&gt;?
&lt;h3&gt;Virtual Memory&lt;/h3&gt;
Virtual Memory is an idea originated in the operating systems world, &lt;b&gt;50 years ago&lt;/b&gt;. It is probably one of the few non trivial OS ideas that many non tech people are aware of, in some way: the &lt;i&gt;swap file&lt;/i&gt; is a famous object, and most  Windows power users more or less understand how it works.
&lt;br/&gt;&lt;br/&gt;

Basically the memory is organized in pages, that are usually 4096 bytes in size. The OS is able to transfer this pages from memory to disk to free memory. When an application will try to access an address that maps to the physical memory page that was transfered on disk, the processor will call a special function that is in charge of loading such a page in memory, so that the accessing program can continue the execution.
&lt;br/&gt;&lt;br/&gt;

OSes will not swap memory only when they are out of memory, but even when there is still some free memory, as more free memory can always be used for a very precious thing: disk cache, and this is a win if the pages we transfer on the swap file were rarely accessed.
&lt;br/&gt;&lt;br/&gt;

So the question is: &lt;b&gt;Why Redis can't just use the OS Virtual Memory?&lt;/b&gt; (instead to invent its own VM at application level?). There are two main reasons:
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;The OS will only swap pages rarely used. A page is 4096 bytes. Redis uses hash tables, object sharing and caching, so a single Redis &amp;quot;value&amp;quot; (like a Redis List or Set) can be physically allocated across many different pages. The reverse is also true: a physical page will likely contain objects about many Redis keys. Basically even if just 10% of the dataset is actively used, probably all the memory pages are accessed. Maybe most memory pages will contain only a few bytes of our hot, frequently used data, but even a byte for page is enough for preventing swapping, or to force the OS to transfer back and forth memory pages from disk to memory if it's out of memory.&lt;/li&gt;

&lt;li&gt;Redis objects, both simple and complex, take a lot more space when they are stored in RAM, compared to the space they take serialized on disk. On disk there are no pointers, nor meta data. An object is usually even &lt;b&gt;10 times&lt;/b&gt; smaller serialized on disk, as Redis is able to encode the objects stored on disk pretty well. This means that the Redis application-level VM needs to perform ten times less disk I/O compared to the OS VM, for the same amount of data.&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

While the OS cache can't help a lot, the idea behind Virtual Memory is very helpful. All I needed to do was to move the concept of Virtual Memory from kernel space to use space.
&lt;h3&gt;Virtual Memory: the Redis way&lt;/h3&gt;
There are many design details about implementing Virtual Memory in a key-value store, but well, the basic concept is pretty straightforward: when we are out of memory, let's transfer values belonging to keys not recently used from memory to disk. When a Redis command will try to access a key that is swapped out, it is loaded back in memory.
&lt;br/&gt;&lt;br/&gt;

It's as simple as that, but in the above description there is the first of many design decisions: &lt;b&gt;only values are swapped, not keys&lt;/b&gt;. This is actually the direct result of another much more important design principia I made at the start: dealing with in memory keys should be more or less as fast as when VM is  disabled.
&lt;br/&gt;&lt;br/&gt;

What this means is that you need to have enough RAM at least to hold all the keys objects, and this is the bad news, the good one is: Redis will be mostly as fast as you know it is when accessing in-memory keys. So if your dataset will have the famous &amp;quot;long tail&amp;quot; alike access pattern and your hot spot fits the available RAM, Redis will be as fast as it is with VM disabled.
&lt;br/&gt;&lt;br/&gt;

Ok, it's time to show some number I guess, so you can start to make your math about the real world impact of Redis VM and when it is practical and when still too much memory is needed.
&lt;pre class="code"&gt;
VM off: 300k keys, 4096 bytes values: 1.3G used
VM on:  300k keys, 4096 bytes values: 73M used
VM off: 1 million keys, 256 bytes values: 430.12M used
VM on:  1 million keys, 256 bytes values: 160.09M used
VM on:  1 million keys, values as large as you want, still: 160.09M used
&lt;/pre&gt;
Guess what? With VM on (and configuring Redis VM in order to use as little memory as possible), it does not matter how big the value is. 1 million keys will always use 160 MB. You can store huge lists or sets inside, or tiny string values. Every value will be swapped out, but the keys and the top level hash table, will still use RAM, as well as the &amp;quot;page table bitmap&amp;quot;, that is a bit array of bits in the Redis memory containing information about used / free pages in the swap file.
&lt;br/&gt;&lt;br/&gt;

So a very important question is, when VM is enabled, how much memory we'll use for every additional million of keys? More or less 160 MB for million of keys, so at minimum you need:
&lt;pre class="code"&gt;
1M keys: 160 MB
10M keys: 1.6 GB
100M keys: 16 GB
&lt;/pre&gt;
If you have 16 GB of RAM you can store 100M of keys, and every key can contain values as large and complex as you want (Lists, Sets, JSON encoded objects, and so forth) and the memory requirements will not change.
&lt;br/&gt;&lt;br/&gt;

Think at this: even with MySQL it is not trivial to have a database with 100 million rows with less than 16 GB of RAM, but with the top-level keys in memory the speed gain is big.
&lt;h3&gt;A reversed memcached&lt;/h3&gt;
When I started to work at Redis one year ago I often compared it to memcached, saying &amp;quot;it's like memcached, but persistent and with more ops&amp;quot; in order to tell people what Redis was about.
&lt;br/&gt;&lt;br/&gt;

Not that this description was wrong from a pragmatic point of view, but in some philosophical sense Redis with VM is the &lt;i&gt;exact contrary&lt;/i&gt; of MySQL + memcached.
&lt;br/&gt;&lt;br/&gt;

Using memcached in order to cache SQL queries is a well established pattern. My SQL DB is slow, so I write an application layer to take the frequently accessed data in memcached (handling invalidation by hand), so I can query this faster cache instead of the DB. The idea is to take data on disk, but to cache the hotspot in memory for fast access.
&lt;br/&gt;&lt;br/&gt;

Redis + VM is exactly the reverse. You take your data in memory, but what is &lt;i&gt;not&lt;/i&gt; the hot spot is disk-backed in order to free mem for more interesting data. In both models the frequently accessed data will stay in memory, but the process is reversed, with the following benefits:
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;There is no invalidation to do. There is only one object we need to interact with, Redis. Data is not duplicated in two places, like in MySQL + memcached.&lt;/li&gt;

&lt;li&gt;This model can scale writes as well as it can scale reads. MySQL + memcached can mainly scale read queries.&lt;/li&gt;

&lt;li&gt;Once you write the memcached layer in your application, what you discover is that after all you are trying to access more and more data by unique key, or sort off: parametrized data is not handy to cache if the space of the parameters is large enough, and invalidation is crazy. Even to cache a simple pagination query can be hard, go figure with more complex stuff. So most benefits about SQL are lost in some way, you are silently turning your application into a key-value business! But memcached can't offer the higher level operations Redis is able to offer. To return to the pagination example, &lt;a href="http://code.google.com/p/redis/wiki/LrangeCommand"&gt;LRANGE&lt;/a&gt; and &lt;a href="http://code.google.com/p/redis/wiki/ZrangeCommand"&gt;ZRANGE&lt;/a&gt; are your friends.&lt;/li&gt;

&lt;/ul&gt;
&lt;h3&gt;The code&lt;/h3&gt;
I implemented VM in two stages. The first logical step was to start with a blocking implementation, given that Redis is single threaded, that is, an implementation where keys are swapped out blocking all the other clients when we are out of memory (but swapping just as many objects as needed to return to the memory limits, so it actually does not appear to block the server). The blocking implementation also loads keys synchronously when a client is accessing a swapped out key (or better, a key associated to a swapped out value).
&lt;br/&gt;&lt;br/&gt;

This implementation took very little time, as I used the same functions to serialize and unserialized Redis objects in Redis .rdb files (used in order to persist on disk). A few more details:
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;The swap file is divided in pages, the page size can be configured.&lt;/li&gt;

&lt;li&gt;The page allocation table is taken in memory. It's a bitmap so every page takes 1 bit of actual RAM.&lt;/li&gt;

&lt;li&gt;When VM is enabled, Redis objects are allocated with a few more fields, one of this is about the last time the object was accessed. So when Redis is out of memory and there is something to swap, we sample a few random objects from the dataset, and the one with the higher &lt;i&gt;swappability&lt;/i&gt; is the one that will be transfered on disk. The &lt;i&gt;swappability&lt;/i&gt; is currently computed using the formula Object.age*Logarithm(Object.used_memory).&lt;/li&gt;

&lt;li&gt;The page allocation algorithm uses an algorithm I found reading the source code of the Linux VM system. Basically we try to allocate pages sequentially up to a given limit. When this limit is reached we start from page 0. This tries to improve locality. I added another trick: if I can't find free pages for a while, I start to &lt;i&gt;fast forward&lt;/i&gt; with random jumps.&lt;/li&gt;

&lt;li&gt;When Redis fork()s in order to save the dataset on disk (Redis uses copy-on-write semantic in order to take the snapshot of the DB) VM is suspended: only loads are allowed, writes are blocked. So the child can access the VM file without troubles. The same happens when the &lt;a href="http://code.google.com/p/redis/wiki/AppendOnlyFileHowto"&gt;Append Only File&lt;/a&gt; is enabled and you issue a &lt;a href="http://code.google.com/p/redis/wiki/BgrewriteaofCommand"&gt;BGREWRITEAOF&lt;/a&gt; command.&lt;/li&gt;

&lt;/ul&gt;
&lt;h3&gt;I/O threads&lt;/h3&gt;
The blocking implementation worked very well, but in the real world there are applications where it is not good at all. It's perfect if you are using Redis with few clients to perform batch computations, but what about web applications with N clients? To wait for blocked clients to load stuff from disk before to continue is hardly an acceptable scenario.
&lt;br/&gt;&lt;br/&gt;

Redis is a single-threaded multiplexing server, so a possible solution was to use non blocking disk I/O. I didn't liked enough this solution for a reason: it's not just a matter of I/O, also to serialize / unserialize the Redis objects to/from the disk representation is a slow CPU intensive operation with lists or sets composed of many elements. The last resort was what everybody tries to avoid (and for good reasons!): multi-threading programming.
&lt;br/&gt;&lt;br/&gt;

There are two obvious ways to do this: serve every client with a different thread, or just make the VM I/O stuff threaded. I picked the second for two reasons: to make the implementation simpler and &lt;b&gt;self contained&lt;/b&gt; (that is, outside the VM subsystem, no synchronization problems at all), and to retain the raw speed of the single threaded implementation when there was to access non swapped values.
&lt;br/&gt;&lt;br/&gt;

So the final design is that the main thread communicate with a configurable number of I/O threads with a queue of I/O jobs. When there is a value to swap, an I/O job to swap the key is put in the queue. When there is a value to load because a client is requesting it, the client is suspended, an I/O job to load the key back in memory is added to the queue, and when all the keys needed for a given client are loaded the client is &amp;quot;resumed&amp;quot;.
&lt;br/&gt;&lt;br/&gt;

Basically the main thread puts I/O jobs in the io_newjobs queue. After this jobs are processed, the I/O threads put the I/O jobs (filled with additional data) in the io_processed queue. This processed jobs are post-processed by the main thread in order to change the status of the keys from swapped to in-memory or vice versa and so forth.
&lt;h3&gt;Our main trick&lt;/h3&gt;
To resume a client that is in the middle of a command exectuion is hard, but there was a simple solution, a probabilistic one.
&lt;br/&gt;&lt;br/&gt;

When a client issues a command, like: &lt;b&gt;GET mykey&lt;/b&gt;, we scan the arguments looking for swapped keys. If there is at least one swapped key, the client is suspended &lt;i&gt;before&lt;/i&gt; the command is executed at all. Once the keys are back in memory the client is resumed.
&lt;br/&gt;&lt;br/&gt;

This trick allows to reduce the complexity a lot, but it is just probabilistic. What if once we resume a client a key is swapped again as we are in hard out of memory conditions? What about the &amp;quot;SORT BY&amp;quot; command that will access keys we can't guess beforehand? Well, that's simple: if a given key is swapped for some reason, Redis reverts to the blocking implementation.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;The unblocking VM is a blocking VM with the trick of loading the keys in I/O threads thanks to static command analysis&lt;/b&gt;. As simple as that, and works very well for all the commands but &lt;b&gt;SORT BY&lt;/b&gt; that is a slow operation anyway.
&lt;h3&gt;Still too complex&lt;/h3&gt;
The actual implementation is much more complex than that as you can guess. What happens if a value is being swapped off by an I/O thread while a client is accessing it? And so forth. There was to design the system so that I/O operations can be invalidated at any time, and this was tricky.
&lt;br/&gt;&lt;br/&gt;

After the VM, I lost my feeling that Redis was trivial to gasp by the casual coder just reading the source code. Now it's 13k lines of code and there are many things to understand. Some functions are a few lines, but there are a lot of comments just to explain what's going on. Just an example, from the function in charge of jobs invalidation:
&lt;pre class="code"&gt;
                switch(i) {
                case 0: /* io_newjobs */
                    /* If the job was yet not processed the best thing to do
                     * is to remove it from the queue at all */
                    freeIOJob(job);
                    listDelNode(lists[i],ln);
                    break;
                case 1: /* io_processing */
                    /* Oh Shi- the thread is messing with the Job:
                     *
                     * Probably it's accessing the object if this is a
                     * PREPARE_SWAP or DO_SWAP job.
                     * If it's a LOAD job it may be reading from disk and
                     * if we don't wait for the job to terminate before to
                     * cancel it, maybe in a few microseconds data can be
                     * corrupted in this pages. So the short story is:
                     *
                     * Better to wait for the job to move into the
                     * next queue (processed)... */&lt;br /&gt;&lt;br /&gt;                    /* We try again and again until the job is completed. */
                    unlockThreadedIO();
                    /* But let's wait some time for the I/O thread
                     * to finish with this job. After all this condition
                     * should be very rare. */
                    usleep(1);
                    goto again;
                case 2: /* io_processed */
                    /* The job was already processed, that's easy...
                     * just mark it as canceled so that we'll ignore it
                     * when processing completed jobs. */
                    job-&amp;gt;canceled = 1;
                    break;
                }
&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

(Nazi Grammar Is Not Happy, I know). The complexity is self contained, but still there are a number of non trivial issues to understand for an external programmer in order to hack with the VM.
&lt;br/&gt;&lt;br/&gt;

Fortunately the VM needs very little maintenance work, as the trick of using the same serialization format used to persiste on disk completely decoupled it from the other Redis subsystems. Want to implement a new type for Redis? Just write the commands to work with this new type and the functions to load/save it in the .rdb file and you are done. The VM will do the rest without your help.
&lt;br/&gt;&lt;br/&gt;

Ok this article is already too long. I hope that Redis 2.0.0 will be released as stable code in two or three months at max. The VM needs a few more weeks of work and testing, but now it is working well and I encourage you to give it a try in development environment if you think you'll run out of memory in short time without it ;)
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
You can comment this article on &lt;a href="http://news.ycombinator.com/item?id=1097545"&gt;Hacker news&lt;/a&gt; and &lt;a href="http://www.reddit.com/r/programming/comments/axivr/redis_virtual_memory_the_story_and_the_code/"&gt;Programming Reddit&lt;/a&gt;
&lt;/div&gt;&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 11670 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 317.1 visits/day)&lt;/div&gt;Posted at 11:47:15 &lt;a href="http://antirez.com/post/redis-virtual-memory-story.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/redis-virtual-memory-story.html"&gt;33 comments&lt;/a&gt; | &lt;a href="/print.php?postid=203"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Redis+Virtual+Memory%3A+the+story+and+the+code&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Fredis-virtual-memory-story.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/redis-virtual-memory-story.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/xkJF2JTsU_0" height="1" width="1"/&gt;</description>
   <dc:date>2010-02-03T11:47:15+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/redis-virtual-memory-story.html</feedburner:origLink></item>
  <item>
   <title>Pseudo Random Number Generator with power law (long tail-alike) distribution</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/EasF_AglfR0/PRNG-power-law-long-tail.html</link>
   <guid isPermaLink="false">http://antirez.com/post/202</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Today I needed a PRNG with a bised distribution, specifically following the &lt;a href="http://en.wikipedia.org/wiki/Power_law"&gt;Power Law&lt;/a&gt; distribution in order to simulate the &amp;quot;long tail&amp;quot; often seen in things like social networks data, search, and so forth.
&lt;br/&gt;&lt;br/&gt;

I was not able to find ready to use code, but just a few formulas, so I wrote one in Ruby (I'm going to port this to C for Redis).
&lt;br/&gt;&lt;br/&gt;

Maybe this will be useful to somebody else:
&lt;br/&gt;&lt;br/&gt;

&lt;pre class="code"&gt;
# Power law (log tail) distribution
# Copyright(C) 2010 Salvatore Sanfilippo
# this code is under the public domain&lt;br /&gt;&lt;br /&gt;# min and max are both inclusive
# n is the distribution power: the higher, the more biased
def powerlaw(min,max,n)
    max += 1
    pl = ((max**(n+1) - min**(n+1))*rand() + min**(n+1))**(1.0/(n+1))
    (max-1-pl.to_i)+min
end&lt;br /&gt;&lt;br /&gt;freq = {}
100000.times {
    n = powerlaw(0,100,2).to_i
    freq[n] = 0 if !freq[n]
    freq[n] += 1
}
(0..100).each{|x|
    puts &amp;quot;#{x} =&amp;gt; #{freq[x]}&amp;quot;
}
&lt;/pre&gt;&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 904 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 17.6 visits/day)&lt;/div&gt;Posted at 22:02:24 &lt;a href="http://antirez.com/post/PRNG-power-law-long-tail.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/PRNG-power-law-long-tail.html"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=202"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Pseudo+Random+Number+Generator+with+power+law+%28long+tail-alike%29+distribution&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2FPRNG-power-law-long-tail.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/PRNG-power-law-long-tail.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/EasF_AglfR0" height="1" width="1"/&gt;</description>
   <dc:date>2010-01-19T22:02:24+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/PRNG-power-law-long-tail.html</feedburner:origLink></item>
  <item>
   <title>One year of Redis</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/eeREnltN0FI/one-year-of-redis.html</link>
   <guid isPermaLink="false">http://antirez.com/post/201</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;The &lt;a href="http://code.google.com/p/redis/downloads/detail?name=redis-beta-1.tar.gz&amp;amp;can=1&amp;amp;q=#makechanges"&gt;first version&lt;/a&gt; of &lt;a href="http://code.google.com/p/redis"&gt;Redis&lt;/a&gt; was released to the public almost one year ago (10 months ago actually). This seems like a good time to look back at the development process.
&lt;br/&gt;&lt;br/&gt;

Redis can't be considered a successful project yet, it's just too early, however the followings are a few concepts I learned in the last months that I feel like to share with you in the hope that I'll be able to apply this ideas in the future, and in the hope that programmers interested in creating a new open source project will find something interesting here and possibly avoid some mistake.
&lt;h2&gt;Develop something that you think you'll use for years&lt;/h2&gt;
Redis is not my first open source project. My main past projects are &lt;a href="http://hping.org"&gt;hping&lt;/a&gt; and &lt;a href="http://jim.berlios.de"&gt;the Jim interpreter&lt;/a&gt;. Still Redis is the first project I'm sure I&amp;quot;ll develop for years, assuming it will be successful, because this time I was wise enough to select something that I want to use myself for the years to come.
&lt;br/&gt;&lt;br/&gt;

I stopped the development of Hping when I quitted security, and I stopped the development of Jim when I quitted Tcl. I'm confident that I'll not quit databases as it's ten years more or less that I'm a MySQL user. When you know you'll need what you are doing for years to come there is a profund vision shift. You think all your efforts are not wasted even if no one will want to use your code, and that likely your users will not be left alone in a few months.
&lt;br/&gt;&lt;br/&gt;

So make a careful choice when starting the development of a new open source project, don't pick something you are interested in &lt;i&gt;today&lt;/i&gt;, but something you'll probably be interested in for the next decade.
&lt;h2&gt;Early adopters&lt;/h2&gt;
Early adopters are vital for your project: the mass will not use what you build as long as there isn't a solid initial user base. This sounds like the chicken or egg problem, but actually is not: there are smart guys that will be brave enough to use what you build if it's worthwhile.
&lt;br/&gt;&lt;br/&gt;

Your early adopters are not brave because they are irresponsible, just they can evaluate something without the need to follow the mass. So where should you search for your early adopters? Among the smartest guys around. To post your code on &lt;a href="http://news.ycombinator.com"&gt;Hacker News&lt;/a&gt; once you have something that is complete enough to get an initial feeling is a good idea.
&lt;br/&gt;&lt;br/&gt;

A wonderful side effect of all this is that at least on the initial stages you'll have a terrific community. I love people around Redis, they are in the average incredibly smart and interesting. I enjoy when I provide help via the google group or when I try to fix bugs in very little time because with such a community it's worth the efforts.
&lt;h2&gt;Simplicity matters&lt;/h2&gt;
Users don't like to read zillions of pages of documentation just to get started using your new open source project. They don't like compilation errors, nor complex ideas or protocols.
&lt;br/&gt;&lt;br/&gt;

Your project should be trivial to run, and your documentation should include &lt;i&gt;in the first page&lt;/i&gt; instructions about how to try an Hello World usage in a few trivial steps. Once users will have a working hello world they'll be willing to learn more, and read documentation, but not before most of the times.
&lt;br/&gt;&lt;br/&gt;

If the libs you use are using are not included in debian/ubuntu apt-get and/or in mac os x package systems, it's better to include the libs inside the code. Your users should not need more than five minutes to go from the download to the working hello world usage example.
&lt;br/&gt;&lt;br/&gt;

I suspect that the fact that Redis is one of the rare case of NoSQL database that will compile almost everywhere just with &lt;i&gt;make&lt;/i&gt;, that will run without a configuration with default settings just with &lt;i&gt;./redis-server&lt;/i&gt;, and that uses a simple enough protocol that you can understand and implement in minutes (so that I could claim many client libs since the first weeks), is playing a very important Role in the relative good adoption Redis is experimenting considering how young it is.
&lt;br/&gt;&lt;br/&gt;

Simplicity also matters in the &lt;i&gt;concepts&lt;/i&gt; your users are required to understand to get started. Redis is trivial but I'm always surprised by the number of people that don't get it. I don't even want to think about how hard is for the average user to understand a more complex NoSQL database.
&lt;br/&gt;&lt;br/&gt;

Of course there are also people that told me they don't like Redis because it's too simple to be powerful enough for their use cases. I don't trust this argument, and anyway I think that is a good tradeoff, but be prepared to hear this kind of arguments if you take the simplicity path.
&lt;h2&gt;Be conservative about adding features&lt;/h2&gt;
It's very hard to understand if a user request should or should not be implemented. It's not a matter of development time: I mean that even if the feature request provides a patch, maybe the right thing to do is to not merge.
&lt;br/&gt;&lt;br/&gt;

Every user has his specific needs. They are legitimate from the point of view of the user, but possibly they are not legitimate from the point of view of the project: maybe there are other ways to solve the problem, or the problem in the first instance is a result of a design error the user is doing.
&lt;br/&gt;&lt;br/&gt;

Many times it's just that the feature request is too particular and specific: it's something legitimate but that 1/1000 of users will actually need, but still the feature adds complexity and code to your project. To say no to this feature requests is almost always the right thing to do.
&lt;br/&gt;&lt;br/&gt;

Other times you instead feel like the feature request is ok, general enough, not too hard to implement, and there are no good ways to address the problem in some other way: this may be a good feature to implement, and yet it is a good idea to wait a a few weeks at least, to see if after some time the addition appears to be still good. Basically every non trivial feature should stay in the TODO list some time before to get implemented.
&lt;h2&gt;Be pragmatic about your roadmap&lt;/h2&gt;
Real programmers love to solve hard problems, so it's easy to fall in the trap of implementing what's the most fun to code instead of implementing what's useful. Actually if you love the problem domain, most things will be fun to code in the end, but to get the roadmap wrong is a huge mistake.
&lt;br/&gt;&lt;br/&gt;

For instance I was quite convinced to implement redis-cluster (a layer that gives automatic sharding and fault tolerance among N nodes) as it is a very interesting problem to solve. There is to study new things and possibly design some new algorithm variant that will work well with the Redis semantic and data model. But actually most people in the short time will need much more the ability to use datasets bigger than RAM, that is, the Virtual Memory feature. I changed the plans and I'm going to work on VM in all the first part of 2010, this means that most people will have a very simpler upgrade path once their datasets will be bigger (assuming accesses are not evenly distributed). Even if redis-cluster is nice and will be the next big thing after VM to get inside Redis, this is not as important for most users. To implement one or the other feature before can change the users feeling about your project, so here the rule is, solve problems accordingly to the number of people that will benefit from the new implementations.
&lt;h2&gt;Don't expect tons of code that you can actually merge&lt;/h2&gt;
There is a fable in the open source world, that you get a lot of code once you start to have an user base. This is not how it works: be prepared to write 95% of the code of your project for the first &lt;i&gt;years&lt;/i&gt;. There will be somebody that will contribute code actually, but most of the times this code will be about features you don't want to implement, or will not look like sane enough to be merged without a profound review, or will solve a good problem in a way that is not general enough, or simply the coder does not understand enough of the Redis internals or about your future plans to provide an implementation that is acceptable.
&lt;br/&gt;&lt;br/&gt;

From time to time actually it's possible to merge a patch as it is, but this is rare. The idea of &amp;quot;let's implement a solid base so that other programmers will build all the rest&amp;quot; will not ever work.
&lt;h2&gt;BSD can be a strength even in the business side&lt;/h2&gt;
If you are going to develop something that targets not just end users but companies, BSD can be the best pick, as in many business environments it's much more comfortable to use code with a license that allows for internal developments without to deal with distribution of the changes.
&lt;br/&gt;&lt;br/&gt;

Most of the time it's not that this companies don't want to share their changes with the rest of the community, but that this changes are not ready for prime time or well documented, or may show too many things about corporate secrets, and so forth.
&lt;br/&gt;&lt;br/&gt;

The good thing is that, you'll be free to provide a closed source version of your project for instance, even if you accept external patches. This can be a viable business model in many ways: the commercial version can only include things that are marginally useful for most users, but that are important in corporate environments, or may have special features that are too specific to get inside the &amp;quot;real&amp;quot; project but that it's ok to support commercially, and so forth.
&lt;br/&gt;&lt;br/&gt;

Basically the BSD license does &lt;i&gt;not&lt;/i&gt; mean that it will be impossible to do business with your project, but your users can be much more comfortable using something that can't experience problems similar to the ones MySQL experienced lately.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 14579 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 192.6 visits/day)&lt;/div&gt;Posted at 14:43:01 &lt;a href="http://antirez.com/post/one-year-of-redis.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/one-year-of-redis.html"&gt;16 comments&lt;/a&gt; | &lt;a href="/print.php?postid=201"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=One+year+of+Redis&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Fone-year-of-redis.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/one-year-of-redis.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/eeREnltN0FI" height="1" width="1"/&gt;</description>
   <dc:date>2009-12-26T14:43:01+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/one-year-of-redis.html</feedburner:origLink></item>
  <item>
   <title>Linux: still better for coding</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/DQ-jgxZOKbc/linux-better-for-coding.html</link>
   <guid isPermaLink="false">http://antirez.com/post/200</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Something like one year ago I switched from Linux to Mac OS X. It was not an easy switch if you think that my desktop on Linux used to be &lt;a href="http://antirez.com/blogdata/119/desktop.png"&gt;this one&lt;/a&gt;, that is, just fvmw2 with a minimal configuration, super fast virtual desktop, border-less windows.
&lt;br/&gt;&lt;br/&gt;

Why I switched after more than 12 years of Linux? Actually I didn't decided to switch, just I got a macbook because I wanted to hack a bit with the iPhone SDK and with Mac OS X itself. I installed Linux and Mac OS X in multi boot, and for a few weeks the user interface of Mac OS X was impossible to use for me. But after a few more weeks I was using only Mac OS X for everything.
&lt;br/&gt;&lt;br/&gt;

For the first time I had a system that &lt;b&gt;was working out of the box&lt;/b&gt; in all of its parts, it was a good desktop experience with consistent behavior. Screen sharing was cool when working with remote people, no need to hack configuration files or to check why my webcam was not working. To try new programs was as simple as downloading the disk image, opening it, and &lt;b&gt;click on the icon&lt;/b&gt;. And.. as optional, the system still usable and responsive while it was performing a lot of I/O! Not something Linux users are used to see.
&lt;br/&gt;&lt;br/&gt;

Note that I'm not the kind of guy that don't want to hack with the system he is using. I even &lt;a href="http://www.kyuzz.org/antirez/s10sh.html"&gt;wrote&lt;/a&gt;, &lt;a href="http://www.kyuzz.org/antirez/vga256fb.html"&gt;my drivers&lt;/a&gt;, &lt;a href="http://antirez.com/page/rt73.html"&gt;in the past&lt;/a&gt;. But it was years ago and I was still willing to trust that Linux after all was young and that it was improving in the desktop side. Now enough is enough, with my computer I want just to focus on what I'm doing without spending hours trying to fix unrelated desktop stuff.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;The strength of Linux, or why Mac OS X is weak as development environment&lt;/h3&gt;
&lt;br/&gt;&lt;br/&gt;

When I say &lt;i&gt;environment&lt;/i&gt; I don't think about an IDE or something like this. I used to program with a terminal and vim, and I still code this way on Mac OS X, so in this respect nothing changed. What I'm missing are a number of important tools that made the Linux experience so much comfortable. So here is my list of what's wrong with Mac OS X, or if you want to read it the other way around, of what is cool with Linux.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Valgrind&lt;/b&gt;. If you are serious about writing C code you can't live without &lt;a href="http://valgrind.org/"&gt;valgrind&lt;/a&gt;. Valgrind turns you into a better programmer. The code you produce is times more reliable after you discover this tool. It is invaluable, and guess what... it started to be usable for Mac OS X only lately, then Snow Leopard came and broken everything already. There are patches that make it barely working with issues (AFAIK things will get better when the new source code of Darwin will be relased), but, this is one of the most important tools of a C developer together with the compiler, it must be rock solid.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Apt-get&lt;/b&gt;. Homegrow, Fink, and alike, are great tools, but seriously, they are no way as cool as the apt-get in a good distribution like Debian or Ubuntu. You can find tons of libraries, full systems, with perfect dependencies, fully managed by people that are serious users of this projects, that will tell you that something changed in this new release when you upgrade, &amp;quot;do you want to automatically fix your old config file?&amp;quot; or things like this.
&lt;br/&gt;&lt;br/&gt;

And... the &lt;i&gt;coverage&lt;/i&gt; of tools is impressive. From the mainstream to the almost unknown library, there is almost everything. Want the GD lib for Tcl? There is. Want an old release of a lib? Sure. Documentation, full application servers, everything. I can turn a freshly installed Debian system into a production server with everything inside to run a web application in one hour or less (every system administrator can). It's wonderful. And guess what? Developers need to simulate the systems they are developing for, in short time and without too much efforts in order to be productive.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;API stability&lt;/b&gt;. Mac OS X API, excluding the POSIX calls, change simply too often for my tastes, and there is nowhere the same amount of information you can find about Linux. For instance I compiled nmap yesterday for snow leopard, and guess what, it was not working, unable to open the interfaces. The compiled binary I downloaded from insecure.org worked against the loopback interface but not against the wifi one. The interface to capture raw frames from the airport extreme changed again with snow leopard, breaking valuable programs like &lt;a href="http://trac.kismac-ng.org/"&gt;Kismac&lt;/a&gt;. It's not an open platform so this changes are even worse.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;/proc filesystem&lt;/b&gt;
I know Mac OS X exposes similar informations in other ways, but I sincerely miss the proc filesystem, it was a quick way to explore what was going behind the scenes without having to remember hard-to-remember names to grep.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Compiler&lt;/b&gt;. Don't know why but gprof does not work with the standard GCC (the one installed by the official Apple SDK). Tried to compile applications with --arch i386 too, still not working. I bet there is a trivial way to fix it, didn't googled, but it's not OK profiling does not work out of the box with the official compiler.
&lt;br/&gt;&lt;br/&gt;

This are the things that I'm remembering just now, but there are more, little problems that I encounter in my everyday programming experience with Mac OS X that I didn't had with Linux. I still continue to use Mac OS X as my first development platform because while I code I also want a good desktop to surf the web, tweet, chat, and so on, but in order to do my paywork (that's about web development mostly) and my open source development with Redis I've to use a Linux box constantly in order to be more productive.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 14110 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 110.8 visits/day)&lt;/div&gt;Posted at 21:51:27 &lt;a href="http://antirez.com/post/linux-better-for-coding.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/linux-better-for-coding.html"&gt;31 comments&lt;/a&gt; | &lt;a href="/print.php?postid=200"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Linux%3A+still+better+for+coding&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Flinux-better-for-coding.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/linux-better-for-coding.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/DQ-jgxZOKbc" height="1" width="1"/&gt;</description>
   <dc:date>2009-11-04T21:51:27+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/linux-better-for-coding.html</feedburner:origLink></item>
  <item>
   <title>Why you should boycott Nokia</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/vEml45Eb3E4/Why-you-should-boycott-Nokia.html</link>
   <guid isPermaLink="false">http://antirez.com/post/199</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;&lt;a href="http://www.nokia.com/press/press-releases/showpressrelease?newsid=1349562"&gt;Nokia just annunced&lt;/a&gt; an action against Apple:
&lt;div class="emph"&gt;
Nokia announced that it has today filed a complaint against Apple with the Federal District Court in Delaware, alleging that Apple's iPhone infringes Nokia patents for GSM, UMTS and wireless LAN (WLAN) standards.
&lt;/div&gt;
To me this sounds like: if you can't compete with the quality of the products you can try to sue to beat your competitor. This is not acceptable. Even more sad is that Nokia patents are more or less unavoidable, this is what I understand from the Nokia press release itself:
&lt;div class="emph"&gt;
The ten patents in suit relate to technologies fundamental to making devices which are compatible with one or more of the GSM, UMTS (3G WCDMA) and wireless LAN standards. The patents cover wireless data, speech coding, security and encryption and are infringed by all Apple iPhone models shipped since the iPhone was introduced in 2007
&lt;/div&gt;
This is just too sad. I'll never purchase another Nokia phone in the future.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 3045 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 21.7 visits/day)&lt;/div&gt;Posted at 16:23:55 &lt;a href="http://antirez.com/post/Why-you-should-boycott-Nokia.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/Why-you-should-boycott-Nokia.html"&gt;12 comments&lt;/a&gt; | &lt;a href="/print.php?postid=199"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Why+you+should+boycott+Nokia&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2FWhy-you-should-boycott-Nokia.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/Why-you-should-boycott-Nokia.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/vEml45Eb3E4" height="1" width="1"/&gt;</description>
   <dc:date>2009-10-22T16:23:55+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/Why-you-should-boycott-Nokia.html</feedburner:origLink></item>
  <item>
   <title>Hping wiki back online</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/PCHCZXLobig/hping-wiki-back-online.html</link>
   <guid isPermaLink="false">http://antirez.com/post/198</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;After a huge downtime, the &lt;a href="http://wiki.hping.org"&gt;Hping Wiki&lt;/a&gt; is back online, thanks to the emails of users pinging me about this issue. Finally I found two hours to restore its content, running under a very legacy code, a Tcl Wiki implementation I wrote myself many many years ago.
&lt;br/&gt;&lt;br/&gt;

Sorry for the delay...&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 2677 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 16.4 visits/day)&lt;/div&gt;Posted at 10:51:44 &lt;a href="http://antirez.com/post/hping-wiki-back-online.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/hping-wiki-back-online.html"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=198"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Hping+wiki+back+online&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Fhping-wiki-back-online.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/hping-wiki-back-online.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/PCHCZXLobig" height="1" width="1"/&gt;</description>
   <dc:date>2009-09-30T10:51:44+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/hping-wiki-back-online.html</feedburner:origLink></item>
  <item>
   <title>There is not just a single Apple</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/3C2csFzldA8/there-is-not-just-a-single-Apple.html</link>
   <guid isPermaLink="false">http://antirez.com/post/197</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Obviously thanks to the recent moves of Apple a lot of guys are &lt;a href="http://www.techcrunch.com/2009/07/31/i-quit-the-iphone/"&gt;Qutting the iPhone&lt;/a&gt;. This may appear sensationalism, but I think that every time an hardware vendor is trying to say what software can run and can not on his hardware, there is just one thing to do: to protest loudly. The issue is so important for our future that there should be a law explicitly prohibiting all the possible forms of software censorship, at least about hardware that is sold to end users.
&lt;br/&gt;&lt;br/&gt;

My feeling about this issue appears to be pretty common: on one hand I &lt;i&gt;love&lt;/i&gt; this great device. On the other hand I'm really upset with Apple, and there is the smell of the worst kind of marketing-driven decision process behind what they are doing with the App Store. Why we have this mixed feeling, loving and hating at the same time the same company? &lt;b&gt;Because actually there is not a single Apple&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

Basically who is designing and coding the phone is not the same guy that's going to exclude Google Voice (or another application) from the Store. I know many programmers and designers, and there is almost a constant trait among them: they love freedom. Apple's designers and developers are creating a great device that makes our live better, and I bet they love to see the latest great stuff running on the iPhone. As a developer and designer you are just badly happy when there is something new and cool using what you built.
&lt;br/&gt;&lt;br/&gt;

But there is the other Apple, that is, the dark side of the company, not happy enough selling a lot of phones but willing to earn more money performing partnerships with carriers in order to earn even from services, and then being forced to obey to carrier logics and needs.
&lt;br/&gt;&lt;br/&gt;

I don't want to quit the iPhone, but I want to imagine that there is a direct link between the iPhone designers and me: they want to bring to me the best user experience, but part of their company is fighting against them and customers. So the first thing I'll do is to jailbreak my phone again trying to get the most of the device, hoping that if the marketing guys at Apple are not smart enough to realize that this moves are not the ones of the company that will dominate the future of mobile devices at least will be able to understand that there is some kind of customer reaction that is better to avoid.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 4770 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 21.3 visits/day)&lt;/div&gt;Posted at 12:56:41 &lt;a href="http://antirez.com/post/there-is-not-just-a-single-Apple.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/there-is-not-just-a-single-Apple.html"&gt;2 comments&lt;/a&gt; | &lt;a href="/print.php?postid=197"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=There+is+not+just+a+single+Apple&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Fthere-is-not-just-a-single-Apple.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/there-is-not-just-a-single-Apple.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/3C2csFzldA8" height="1" width="1"/&gt;</description>
   <dc:date>2009-07-31T12:56:41+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/there-is-not-just-a-single-Apple.html</feedburner:origLink></item>
  <item>
   <title>Some math about the Engineyard contest</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/_5Rwf4S9sQQ/some-math-about-the-engineyard-contest.html</link>
   <guid isPermaLink="false">http://antirez.com/post/196</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;I did some math, and I think this is the probability of two random SHA1 strings having the hamming dinstance of 'D':
&lt;pre class="code"&gt;
(1/2^160)*(160!/((160-d)!*d!))
&lt;/pre&gt;
This is why. For example imagine we want to compute the probability of an hamming distance of 2. I want the first bit to be not equal, that occurs with a probability of .5, the second bit to be not equal too, again .5, the others to be equal, that is .5 each. This are independent events, so we need to multiply them together, basically for any given pattern of equal/not equal we want there is a probability of .5^160 for a specific arrangement of equal/not-equal of the bits.
&lt;br/&gt;&lt;br/&gt;

But... with an hamming distance of 2 it is required that 2 bits are not equal and 158 are equal, in any kind of order. How many ways there are to arrange 160 items, two black and 158 white? 160!/(158!*2!), so I actually have to multiply the probability of a single arrangement for the number of possible arrangements.
&lt;br/&gt;&lt;br/&gt;

Ok, what are the practical implications of this? That with an optimized C program you can compute around 3.25 million sha1 hashes per second per core (this is what I got with my implementation). So if you have a server farm with 250 computers that happen to be quad cores, you need the following amount of hours to get a given hamming distance:
&lt;pre class="code"&gt;
6 hours for an h.d. of 33
26 hours for an h.d. of 32
108 hours for an h.d. of 31
&lt;/pre&gt;
it looks like that university clusters can bring us a solution with an HD of 32 or 31 with a bit of luck.
&lt;br/&gt;&lt;br/&gt;

People having just ten quadcore boxes can expect to reach an HD of 35 in 12 hours.
&lt;br/&gt;&lt;br/&gt;

I could love to know if my math is correct, I'm not particularly strong on probability theory... but it looks correct to me.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 6962 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 29.3 visits/day)&lt;/div&gt;Posted at 14:54:37 &lt;a href="http://antirez.com/post/some-math-about-the-engineyard-contest.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/some-math-about-the-engineyard-contest.html"&gt;9 comments&lt;/a&gt; | &lt;a href="/print.php?postid=196"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Some+math+about+the+Engineyard+contest&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Fsome-math-about-the-engineyard-contest.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/some-math-about-the-engineyard-contest.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/_5Rwf4S9sQQ" height="1" width="1"/&gt;</description>
   <dc:date>2009-07-17T14:54:37+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/some-math-about-the-engineyard-contest.html</feedburner:origLink></item>
  <item>
   <title>MongoDB and Redis: a different interpretation of what's wrong with Relational DBs</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/44PqIKPyBig/MongoDB-and-Redis.html</link>
   <guid isPermaLink="false">http://antirez.com/post/195</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Working to Redis is a good feeling for me: it's not something about money, or deadlines, or customers not agreeing with me, but about trying to do my 2 cents in order to help the field to go forward. It's a joy to work to things you love, especially if you have the feeling that you don't want to win: even if a few ideas of your work will be useful for another experiment or implementation it is already worth it. It's like science, what matters is to know more, to find better solutions to problems, and so on.
&lt;br/&gt;&lt;br/&gt;

So I check other work on other key value databases, and suggest this databases to people interested in something different than Redis. For instance this &lt;a href="http://locomotivation.squeejee.com/post/117119353/mongodb-ruby-friendly-document-storage-that-doesnt-rhyme"&gt;MongoDB slides&lt;/a&gt; are good and worth a look. MongoDB seems an interesting project to me, and the interesting thing is how Redis and MongoDB try to solve the same problem in theory but with a very different analysis of it.
&lt;br/&gt;&lt;br/&gt;

Both the projects are about &lt;i&gt;there is something wrong if we use an RDBMS for all the kind of works&lt;/i&gt;. Not all the problems look like a nail but too much databases look like an hammer, the slide says, and indeed it's a colorful imagine to communicate. But it is remarkable how, in response to the same non-nail problems, this two tools taken different paths.
&lt;h2&gt; MongoDB &lt;/h2&gt;
Before to continue I want to spend some word about how MongoDB works. The idea is to have objects, that are actually a sum of named fields with values. A Mongo DB object looks like this:
&lt;pre class="code"&gt;
Name: Salvatore
Surname: Sanfilippo
Foo: yes
Bar: no
age: 32
&lt;/pre&gt;
That is, actually, very similar to an RDBMS table. Then you can run interesting queries against your object collections:
&lt;pre class="code"&gt;
db.collection.find.({'Name':'John'}) # Finds all Johns
db.collection.find.({$where:'this.age &amp;gt;= 6 &amp;amp;&amp;amp; this.age &amp;lt;= 20'})
&lt;/pre&gt;
You can have indexes in given fields, like in RDBMS, and can sort your queries against some field, order it, get a range using LIMIT, and so on. Basically the &lt;i&gt;data model&lt;/i&gt; is the same as an RDBMS, so the MongoDB developers main idea is the following, in my opinion:
&lt;div class="emph"&gt;
What's wrong with RDBMS when used for (many) tasks that don't need all this complexity? They are bloated, thus slow and a pain to replicate, shard, ... But the data model is right, to have tables and index and run complex queries against data.
&lt;/div&gt;
&lt;h2&gt; The Redis path &lt;/h2&gt;
Redis tries to solve non nail problems too indeed. But in a different way: what Redis provides are data structures much more similar to the data structures you find in a computer science book, liked lists, Sets, and server side operations against this kind of values. Programming with Redis is just like doing everything with Lists and Hashes inside memory with your favorite dynamic programming language, but the dataset is persistent and of course not as fast as accessing directly to your PC's memory (there is a networking layer in the middle).
&lt;br/&gt;&lt;br/&gt;

So what's the Redis point of view?
&lt;div class="emph"&gt;
What's wrong with RDBMS when used for (many) tasks that don't need all this complexity? The data model: non scalable, time complexity hard to predict, and can't model many common problems well enough.
&lt;/div&gt;
I expect that in a few years what was the real problem with RDBMS is going to be very clear, even if now it can look confusing enough and there are different alternatives and it is very hard to meter the relative value of the different solutions proposed. This kind of changes appear to be very fast, with all the key-value hype growing every week, but actually it's going to take much more time before to start considering RDBMS alternatives as &lt;i&gt;conceptually&lt;/i&gt; mature as we look at RDBMS today.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 11357 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 40.4 visits/day)&lt;/div&gt;Posted at 20:26:15 &lt;a href="http://antirez.com/post/MongoDB-and-Redis.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/MongoDB-and-Redis.html"&gt;8 comments&lt;/a&gt; | &lt;a href="/print.php?postid=195"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=MongoDB+and+Redis%3A+a+different+interpretation+of+what%27s+wrong+with+Relational+DBs&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2FMongoDB-and-Redis.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/MongoDB-and-Redis.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/44PqIKPyBig" height="1" width="1"/&gt;</description>
   <dc:date>2009-06-03T20:26:15+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/MongoDB-and-Redis.html</feedburner:origLink></item>
  <item>
   <title>Ruby on Rails for Microsoft developers</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/lKV0skm_1So/ruby-on-rails-windows-book.html</link>
   <guid isPermaLink="false">http://antirez.com/post/194</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;My friend &lt;a href="http://antoniocangiano.com"&gt;Antonio Cangiano&lt;/a&gt; just published a book about the development of Ruby on Rails applications for developers that come from Microsoft development. Usually Rails is a Linux/MacOSx geeks land, but thanks to the book of Antonio Microsoft Developers can read a book that exposes the Rails concepts from a point of view they are comfortable with. This is the editorial description of the book:
&lt;div class="emph"&gt;
This definitive guide examines how to take advantage of the new Agile methodologies offered when using Ruby on Rails (RoR). You'll quickly grasp the RoR methodology by focusing on the RoR development from the point of view of the beginner- to intermediate-level Microsoft developer. Plus, you'll get a reliable roadmap for migrating your applications, skill set, and development processes to the newer, more agile programming platform that RoR offers.
&lt;/div&gt;
I didn't read the book, yet, but Antonio is a friend of mine and he is very skilled in many fields, from Agile development, to Ruby, and Math. I'm sure you'll enjoy his book.
&lt;br/&gt;&lt;br/&gt;

You can buy the book &lt;a href="http://www.amazon.com/gp/product/0470374950?ie=UTF8&amp;amp;tag=zenruby-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0470374950"&gt;on Amazon&lt;/a&gt; (no referral link).&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 6595 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 19.5 visits/day)&lt;/div&gt;Posted at 16:51:05 &lt;a href="http://antirez.com/post/ruby-on-rails-windows-book.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/ruby-on-rails-windows-book.html"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=194"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Ruby+on+Rails+for+Microsoft+developers&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Fruby-on-rails-windows-book.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/ruby-on-rails-windows-book.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/lKV0skm_1So" height="1" width="1"/&gt;</description>
   <dc:date>2009-04-07T16:51:05+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/ruby-on-rails-windows-book.html</feedburner:origLink></item>
  <item>
   <title>Redis beta-7 released, new features implemented. Make sure to check the SORT command.</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/rx1y0hZGYEo/redis-beta-7.html</link>
   <guid isPermaLink="false">http://antirez.com/post/193</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;If you read my &lt;a href="http://antirez.com/post/Sorting-in-key-value-data-model.html"&gt;article called 'sorting in key value data model'&lt;/a&gt; and liked the idea, well, now it is implemented and usable in &lt;a href="http://code.google.com/p/redis/"&gt;Redis&lt;/a&gt; beta 7!
&lt;br/&gt;&lt;br/&gt;

This is not the only new feature, make sure to check the &lt;a href="http://code.google.com/p/redis/wiki/CommandReference"&gt;Redis Command Reference&lt;/a&gt;, it is one of the more feature rich key-value DBs out there.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 6509 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 18.1 visits/day)&lt;/div&gt;Posted at 13:43:56 &lt;a href="http://antirez.com/post/redis-beta-7.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/redis-beta-7.html"&gt;1 comment&lt;/a&gt; | &lt;a href="/print.php?postid=193"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Redis+beta-7+released%2C+new+features+implemented.+Make+sure+to+check+the+SORT+command.&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Fredis-beta-7.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/redis-beta-7.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/rx1y0hZGYEo" height="1" width="1"/&gt;</description>
   <dc:date>2009-03-18T13:43:56+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/redis-beta-7.html</feedburner:origLink></item>
  <item>
   <title>Sorting in key-value data model</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/2sNvLB123tc/Sorting-in-key-value-data-model.html</link>
   <guid isPermaLink="false">http://antirez.com/post/192</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;The key-value data model is an emerging paradigm in many forms, from key-value stores to caching systems, where data is represented in form of a map between keys and values. Example of real world systems using this model are &lt;a href="http://www.danga.com/memcached/"&gt;memcached&lt;/a&gt;, a distributed caching system, &lt;a href="http://tokyocabinet.sourceforge.net/index.html"&gt;Tokyo Cabinet&lt;/a&gt; and &lt;a href="http://code.google.com/p/redis/"&gt;Redis&lt;/a&gt;, two key-value databases.
&lt;br/&gt;&lt;br/&gt;

This data model is at the same time old and new: as old as LISP's associative lists, and even older, but at the same time a new emerging paradigm for scalable applications.
&lt;br/&gt;&lt;br/&gt;

In order to exploit the full potential of key-value stores, more advanced operations and design patterns on the key-value model are needed besides the Set-ket, Get-key, Delete-key basic operations of Dictionary data structures.
For instance one additional and very useful operation is an atomic increment/decrement.
&lt;pre class="code"&gt;
set x 10
incr x =&amp;gt; 11
incr x =&amp;gt; 12
&lt;/pre&gt;
The atomic increment operation is particularly useful since it allows to obtain an unique identifier for an object.
&lt;pre class="code"&gt;
id = incr nextId =&amp;gt; 100
set obj_&amp;lt;id&amp;gt; &amp;quot;My new object&amp;quot;
&lt;/pre&gt;
Multiple instances of a program can access simultaneously the key-value store and still be sure to obtain unique references (keys) where to store objects, in order to reference this new objects by key in other parts of the key-value data space using simply the identifier.
&lt;br/&gt;&lt;br/&gt;

Some key-value stores like &lt;a href="http://code.google.com/p/redis/"&gt;Redis&lt;/a&gt; support Lists as values. List support allows to model a large number of problems in a simpler way, expecially when atomic operations to append or consume elements from the list are provided.
&lt;br/&gt;&lt;br/&gt;

This two features, atomic increments and lists as values can be combined in a very powerful way. For example imagine to have the following problem: the key-value store contains keys representing objects with an unique reference (identifiers), this objects can be news in a social news site like &lt;a href="http://reddit.com"&gt;Reddit&lt;/a&gt;.
&lt;pre class="code"&gt;
id = incr NextId =&amp;gt; 1
set news_url_&amp;lt;id&amp;gt; &amp;quot;http://foobar.org&amp;quot;
set news_title_&amp;lt;id&amp;gt; &amp;quot;My foobar story&amp;quot;
push mylist 1&lt;br /&gt;&lt;br /&gt;id = incr NextId =&amp;gt; 2
set news_url_&amp;lt;id&amp;gt; &amp;quot;http://antirez.com&amp;quot;
set news_title_&amp;lt;id&amp;gt; &amp;quot;The blog you are reading right now&amp;quot;
push mylist 2&lt;br /&gt;&lt;br /&gt;id = incr NextId =&amp;gt; 3
set news_url_&amp;lt;id&amp;gt; &amp;quot;http://someothersite.com&amp;quot;
set news_title_&amp;lt;id&amp;gt; &amp;quot;Some Other Site&amp;quot;
push mylist 3
&lt;/pre&gt;
Later I'll be able to ask a specific range of the list (for example the latest 10 elements added) and populate a web page asking for the news_url_&amp;lt;id&amp;gt; and news_title_&amp;lt;id&amp;gt; keys.
&lt;br/&gt;&lt;br/&gt;

While to retrieve things in the same or reverse chronological order is very useful (for example the 'new sumbissions' page of a Reddit alike site needs such a feature), there are times when we want to perform sorting of this items. To generate the front page of a social news site is such an example of application demanding sorting.
&lt;br/&gt;&lt;br/&gt;

This article is a proposal for a sorting strategy in a key-value data model.
&lt;h3&gt;Sorting and weight keys&lt;/h3&gt;
In order to perform sorting we need another player in our data schema, that is, weights. Our three news with ID 1, 2 and 3 can have different weights generated accordingly to the news age and score (the score may result from user votes). One of our social news system will be in charge of updating this scores performing set operations on score keys:
&lt;pre class="code"&gt;
set score_1 124
set score_2 4461
set score_3 -50
&lt;/pre&gt;
We can now define a new SORT operation on our key-value data model that takes as input key holding a list, and a pattern to retrieve scores for every element of the list:
&lt;pre class="code"&gt;
sort mylist by score_* =&amp;gt; 3 1 2
&lt;/pre&gt;
Elements of the list are compared by the value of the score_&amp;lt;id&amp;gt; key. A real world implementation could provide a way to specify if the sorting is done ascending or descending, and the range of values we want to retrieve in order to implement pagination or to take only the first N top items.
&lt;h3&gt;Sorting into a distributed environment: key tags&lt;/h3&gt;
One of the greatest benefits of the key-value data model is that the dataset can be partitioned into N different servers just hashing the key. This can be a problem when sorting is required since the scores needed to sort a given list may be distributed across different stores. This problem can be resolved making the sorting server able to access the other stores to lookup the score keys in other servers: while this can be a good solution the latency time may slow down the sorting operation too much.
&lt;br/&gt;&lt;br/&gt;

An alternative is to make sure that all the scores needed to sort a given key are stored in the same servers of the key holding the list. In order to do so instead to define the hashing of the key as a simple operation that hashes the whole key, only a specific subset of the key is hashed if the key contains a special delimiter.
&lt;br/&gt;&lt;br/&gt;

For example the key &amp;quot;foobar&amp;quot; can be hashed as a full 6 bytes key, but if the pattern [...] is found inside the key, only the part of the key between [ and ] is hashed. As long as the part of the string inside the [] delimiters is the same multiple keys will hash to the same value, and will be stored in the same server.
&lt;pre class="code"&gt;
Hash(mylist[homenews]) == Hash(score_10_[homenews]) == Hash(score_20_[homenews])
&lt;/pre&gt;
&lt;h3&gt;Implementation&lt;/h3&gt;
What was described in this article will soon appear on the Redis key-value store. If you have feedbacks about how to improve this model please leave a comment here. Thank you.
&lt;br/&gt;&lt;br/&gt;

&lt;a href="http://www.reddit.com/r/programming/comments/84d95/sorting_in_keyvalue_data_model/"&gt;vote on reddit&lt;/a&gt; or &lt;a href="http://news.ycombinator.com/newest"&gt;hacker news&lt;/a&gt;&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 17092 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 47.0 visits/day)&lt;/div&gt;Posted at 12:02:55 &lt;a href="http://antirez.com/post/Sorting-in-key-value-data-model.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/Sorting-in-key-value-data-model.html"&gt;2 comments&lt;/a&gt; | &lt;a href="/print.php?postid=192"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Sorting+in+key-value+data+model&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2FSorting-in-key-value-data-model.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/Sorting-in-key-value-data-model.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/2sNvLB123tc" height="1" width="1"/&gt;</description>
   <dc:date>2009-03-13T12:02:55+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/Sorting-in-key-value-data-model.html</feedburner:origLink></item>
  <item>
   <title>Redis, my new open source project</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/SqmlzJN4CUA/Redis-my-new-open-source-project.html</link>
   <guid isPermaLink="false">http://antirez.com/post/191</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;&lt;img src="http://redis.googlecode.com/files/redis.png"&gt;
&lt;br/&gt;&lt;br/&gt;

I think it's the first time I talk of &lt;a href="http://code.google.com/p/redis/"&gt;Redis&lt;/a&gt; in this blog, but for sure it is not the last: Redis will be the target of my hacking sessions for the next years, I hope.
&lt;br/&gt;&lt;br/&gt;

So what's Redis about? It is a key-value database, but it is a bit different than memcache&lt;b&gt;db&lt;/b&gt; (a persistent version of memcached) and many others, since it's a bit an higher level business. For instance with a many key-value persistent databases you can do things like this:
&lt;pre class="code"&gt;
SET mykey foobar
GET mykey
DEL mykey
&lt;/pre&gt;
And other similar operations in order to make locking free algorithms simpler like set-if-not-exists operations and so on. What's different about Redis is that the value can be a String data type, but even a List or a Set. You can push/pop elements, perform intersections between sets, and so on. For instance:
&lt;pre class="code"&gt;
LPUSH user_100_messages &amp;quot;Indeed, you are right!&amp;quot;
LPUSH user_100_messages &amp;quot; .... &amp;quot;
LRANGE user_100_messages 0 10
&lt;/pre&gt;
The first two operations will push elements into a list, the next one will return the first 10 elements of the list. You can also add/remove elements from Sets, and ask for the intersection of N sets.
&lt;br/&gt;&lt;br/&gt;

The SVN version of Redis supports master &amp;lt;-&amp;gt; slave replication, and I'm working to auto-expire of values. Redis is young but it will improve, and thanks to skilled people we have clients for Ruby, Python , Erlang, and soon PHP.
&lt;br/&gt;&lt;br/&gt;

You can get more information and the source code here: &lt;a href="http://code.google.com/p/redis/"&gt;http://code.google.com/p/redis/&lt;/a&gt;
&lt;br/&gt;&lt;br/&gt;

If you plan to use it make sure to subscribe to the google groups. Another way to stay updated about Redis progresses is to &lt;a href="http://twitter.com/antirez"&gt;follow my Twitter account&lt;/a&gt;. Every feedback is appreciated.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 9629 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 26.0 visits/day)&lt;/div&gt;Posted at 18:23:06 &lt;a href="http://antirez.com/post/Redis-my-new-open-source-project.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/Redis-my-new-open-source-project.html"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=191"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Redis%2C+my+new+open+source+project&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2FRedis-my-new-open-source-project.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/Redis-my-new-open-source-project.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/SqmlzJN4CUA" height="1" width="1"/&gt;</description>
   <dc:date>2009-03-06T18:23:06+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/Redis-my-new-open-source-project.html</feedburner:origLink></item>
  <item>
   <title>Scalability of today resembles security of ten years ago</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/k-AUOqRVapM/Scalability-of-today-resembles-security-of-ten-years-ago.html</link>
   <guid isPermaLink="false">http://antirez.com/post/190</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;About eleven years ago I was 20, and entered the computer world as a programmer thanks to the fascinating and wonderful field of information security. IT was very effervescent at the time, a lot of truly smart people and a lot of new things to learn and discover.
&lt;br/&gt;&lt;br/&gt;

Then, in few years, security become a product. Need security? It's easy, install a firewall, an IDS, follow this guidelines, and blablablabla. A lot of money, a lot of false needs, and of course it didn't worked. Some year later the slogan was &lt;i&gt;Security is not a Product but a Process&lt;/i&gt;, and anyway security was not funny anymore for me, I had already moved into programming languages for my hacking sessions, much more fun, a lot less money (tending to zero, actually).
&lt;br/&gt;&lt;br/&gt;

Today the new big word is Scalability. Web-scale scalability, to be more precise, since for the first time to scale is not a rare need, like it was before. To have a big site is enough to compress your web servers and databases. Like Security of the past, scalability of today appears to be too much of a recipe: use memcached, replication, try to scale horizontally, use MySQL as a stupid BTREE and use Blob fields to store your data json-encoded!
&lt;br/&gt;&lt;br/&gt;

It is very hard to find people talking about time complexity, data structures, and ultimately that &lt;b&gt;you should try to store the data in a way that makes easy to access them in the way your application needs this data back&lt;/b&gt;. Maybe relational databases put in our heads this pattern, that we store data in tables, and we'll see later what queries we will need to get this data back: but it does not work, like Security as a Product of ten years ago.
&lt;br/&gt;&lt;br/&gt;

&lt;a href="http://www.reddit.com/r/programming/comments/82dg8/scalability_of_today_resembles_security_of_ten/"&gt;vote on reddit&lt;/a&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;img src="http://www.heathsidebuilders.com/images/stairs-long.jpg"&gt;&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 8088 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 21.8 visits/day)&lt;/div&gt;Posted at 17:04:35 &lt;a href="http://antirez.com/post/Scalability-of-today-resembles-security-of-ten-years-ago.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/Scalability-of-today-resembles-security-of-ten-years-ago.html"&gt;1 comment&lt;/a&gt; | &lt;a href="/print.php?postid=190"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Scalability+of+today+resembles+security+of+ten+years+ago&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2FScalability-of-today-resembles-security-of-ten-years-ago.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/Scalability-of-today-resembles-security-of-ten-years-ago.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/k-AUOqRVapM" height="1" width="1"/&gt;</description>
   <dc:date>2009-03-05T17:04:35+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/Scalability-of-today-resembles-security-of-ten-years-ago.html</feedburner:origLink></item>
  <item>
   <title>A missing feature in most dynamic languages: take a random key from an hash table.</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/k7vlbUCBmrc/A-missing-feature-in-most-dynamic-languages%3A-take-a-random-key-from-an-hash-table..html</link>
   <guid isPermaLink="false">http://antirez.com/post/189</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Dynamic languages changed programming in many ways. One if this ways is the raise of the associative arrays as mainstream data structure. Associative arrays, usually implemented via hash tables, and also known as hashes or dictionaries, are so convenient that it is hard to find any non trivial program written in Ruby or Python not using this data structure because of the great advantages and flexibility they can bring to programming.
&lt;br/&gt;&lt;br/&gt;

In my opinion the only reason hashes are not used even in every non trivial C program is that ANSI-C does not export an implementation of this data structure. It is hard to understand why there was no effort to bring things such dynamic strings, lists and hash tables as default C components. But this is another story... ;)
&lt;h3&gt;The missing feature&lt;/h3&gt;
Given that most of the times hashes are implemented using hash tables let's focus on this data structure. What's cool about hash tables is that in the average case operations like INSERT, EXISTS?, DELETE are O(1). You can count how many occurrences of every word there are in a large text in an efficient way for example, or implement a fast caching system.
&lt;br/&gt;&lt;br/&gt;

Of course dynamic programming languages export all these great features of hash tables in the hash object implementation. However under decent assumptions hash tables are able to support another useful operation in O(1), that is, GET A RANDOM ELEMENT.
&lt;h3&gt;Why is it useful to get a random element?&lt;/h3&gt;
Sometimes there are problems where you need too much memory or time. We can return to our old example of counting words occurrences in a text, this is simple since there are a number of words finite &lt;i&gt;and&lt;/i&gt; small, even if the text is pretty large still the memory we use is proportional to the number of words in the language the text is written in.
&lt;br/&gt;&lt;br/&gt;

What about if you want to extract the most common 10 bytes sequences of a huge binary file? In theory the problem is the same as word counting, but the possible sequences of 10 bytes are... 2^10*8 that is 1208925819614629174706176 different elements. Our program running against a very large file will use all the memory we have in little time.
&lt;h3&gt;Imperfect solutions&lt;/h3&gt;
Still it is possible that we don't really need a prefect solution, but one that is good enough... we can trade precision for memory, for example our program can be modified so that when we reach 100000 elements in the hash table we remove a random element from the table... this will not bring a perfect solution and our &amp;quot;top 100 sequences&amp;quot; may change from a run to another run of the program, but it is much bettern than nothing. We can make our algorithm even better. Instead of removing a random element we get three random elements and remove the sequence with the lower count, this will make less probable that we lost some frequent sequence from our hash table.
&lt;h3&gt;Is 'get a random key' O(1)?&lt;/h3&gt;
So this operation is useful to perform against an hash table, even if most dynamic languages don't export this feature in the interface of the hashes. But is it really an O(1) operation? It is only if the following is true: the ratio between the hash table size and the number of used buckets should always be in a given range (also we need a decent distribution of elements in the table, but this is an assumption of hash tables anyway).
&lt;br/&gt;&lt;br/&gt;

This is always true with most hash tables implementations if you just add elements. The table starts small, once it gets too populated of elements the table is resized so that for example it is three times larger than the number of elements it contains, and so on. It is trivial to see that this way the &lt;i&gt;number-of-buckets&lt;/i&gt;/&lt;i&gt;number-of-elements&lt;/i&gt; will always stay under a given range.
&lt;br/&gt;&lt;br/&gt;

But... hash tables support even the DELETE operation. This may affect our operation in a bad way. For example if I fill the hash table with 10000 elements and remove all the elements but one, the ratio between the table size and element will be big, and the 'get a random element' operation will be a O(N) operation where N is max number of elements that a given hash table hold in its life. Still note that with this usage pattern even the memory used by the hash table is going to not be proportional to the number of elements &lt;i&gt;so in the real world many hash tables implementations will resize the table once the elements are too few compared to the size of the table&lt;/i&gt;.
&lt;h3&gt;The algorithm&lt;/h3&gt;
In pseudocode the algorithm to get a random element is as simple as this:
&lt;pre class="code"&gt;
(key) get_random_element(table)
    if table.size == 0:
        return nil
    while true:
        index = random(0,table.size)
        if table.buckets[index] != nil
            return table.buckets[index].key
end
&lt;/pre&gt;
It's useful, it's simple, it's O(1) in a lot of implementations, I hope to see this in Ruby and other dynamic languages ASAP!
&lt;br/&gt;&lt;br/&gt;

&lt;a href="http://www.reddit.com/r/programming/comments/7waot/a_missing_feature_in_most_dynamic_languages_take/"&gt;Vote or comment this article on reddit!&lt;/a&gt;&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 17106 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 43.3 visits/day)&lt;/div&gt;Posted at 15:37:24 &lt;a href="http://antirez.com/post/A-missing-feature-in-most-dynamic-languages%3A-take-a-random-key-from-an-hash-table..html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/A-missing-feature-in-most-dynamic-languages%3A-take-a-random-key-from-an-hash-table..html"&gt;12 comments&lt;/a&gt; | &lt;a href="/print.php?postid=189"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=A+missing+feature+in+most+dynamic+languages%3A+take+a+random+key+from+an+hash+table.&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2FA-missing-feature-in-most-dynamic-languages%253A-take-a-random-key-from-an-hash-table..html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/A-missing-feature-in-most-dynamic-languages%3A-take-a-random-key-from-an-hash-table..html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/k7vlbUCBmrc" height="1" width="1"/&gt;</description>
   <dc:date>2009-02-10T15:37:24+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/A-missing-feature-in-most-dynamic-languages%3A-take-a-random-key-from-an-hash-table..html</feedburner:origLink></item>
  <item>
   <title>[Italian] Se leggi l'italiano e ti interessa cosa scrivo...</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/fIZdoE6YWJA/se-leggi-italiano-e-ti-interessa-cosa-scrivo.html</link>
   <guid isPermaLink="false">http://antirez.com/post/188</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;... allora dovresti aggiungere al tuo feed reader &lt;a href="http://zzimma.antirez.com/rss"&gt;l'indirizzo del feed&lt;/a&gt; del &lt;a href="http://zzimma.antirez.com"&gt;mio blog in italiano&lt;/a&gt; dove posto regolarmente. Anche qui spunteranno delle cose di tanto in tanto, ma in inglese e pensate per un pubblico piu' ampio. Grazie.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 15194 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 28.0 visits/day)&lt;/div&gt;Posted at 10:10:06 &lt;a href="http://antirez.com/post/se-leggi-italiano-e-ti-interessa-cosa-scrivo.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/se-leggi-italiano-e-ti-interessa-cosa-scrivo.html"&gt;6 comments&lt;/a&gt; | &lt;a href="/print.php?postid=188"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=%5BItalian%5D+Se+leggi+l%27italiano+e+ti+interessa+cosa+scrivo...&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Fse-leggi-italiano-e-ti-interessa-cosa-scrivo.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/se-leggi-italiano-e-ti-interessa-cosa-scrivo.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/fIZdoE6YWJA" height="1" width="1"/&gt;</description>
   <dc:date>2008-09-15T10:10:06+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/se-leggi-italiano-e-ti-interessa-cosa-scrivo.html</feedburner:origLink></item>
  <item>
   <title>Developing for the iphone using the open toolchain and SDK 2.0 headers</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/7QV-KRQUIFQ/iphone-gcc-guide.html</link>
   <guid isPermaLink="false">http://antirez.com/post/187</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;&lt;h3&gt;Release notes&lt;/h3&gt;
&lt;ul&gt;&lt;li&gt;26 Sep: added warning about the need to install even a single free application from AppStore before to try this guide.&lt;/li&gt;

&lt;li&gt;22 Sep: now there are instructions about iPhones with 2.1 firmware (and SpringBoard cache), and some missing information is now included.&lt;/li&gt;

&lt;/ul&gt;
&lt;h3&gt;Introduction: iPhone development and the Open Toolchain&lt;/h3&gt;
It is possible to write native applications for the iPhone in Objective-C, but in order to be able to do so it is needed to have a Mac with the SDK downloaded from Apple. You need to have a MacOS installation, be a registered Apple developer, download the SDK, use the graphic environment, and so on.
&lt;br/&gt;&lt;br/&gt;

Fortunately there is an alternative, the Open Toolchain: it's a special GCC compiler able to emit code for the iPhone CPU (and other tools to create the final executable).
&lt;br/&gt;&lt;br/&gt;

The Open Toolchain can be used from your Linux box, this is how most developers are using it. This guide instead is about using the Open Toolchain &lt;b&gt;directly inside the iPhone&lt;/b&gt;!
&lt;br/&gt;&lt;br/&gt;

Actually the iPhone runs an operating system very similar to Unix, so there is a port of the compiler for the iPhone itself. It is possible to use SSH to work inside the iPhone, editing files with VIM, compiling with this GCC, and installing the resulting binary in order to test it on the real device.
&lt;br/&gt;&lt;br/&gt;

The following parts of this document will try to explain how to setup a development environment inside the iPhone, what you need to do, the files you need to download, and finally will provide an Hello World example in order to make you able to test something in little time, and to have a starting point to modify.
&lt;h3&gt;What you need&lt;/h3&gt;
&lt;ul&gt;&lt;li&gt;a Jailbroken iPhone. In order to Jailbreak check QuickPwn at &lt;a href="http://blog.iphone-dev.org/"&gt;blog.iphone-dev.org&lt;/a&gt;. &lt;b&gt;Warning: the jailbreaking process will make your phone able to run non-apple-signed binaries but may damage your phone. If you have problems and ruin your phone I'll not be responsible in any way&lt;/b&gt;.&lt;/li&gt;

&lt;li&gt;Cydia (this is an apt-get for the iPhone. Basically it is a program that installs other programs). Cydia will be installed by QuickPwn, just make sure to select it during the Jailbreaking process.&lt;/li&gt;

&lt;li&gt;SDK 2.0 headers. The GCC alone is not enough, you need the header files from the Apple iPhone SDK. Just download &lt;a href="http://www.demonoid.com/files/details/1613408/?rel=1220902039"&gt;this torrent&lt;/a&gt; or use instead &lt;a href="http://www.megaupload.com/?d=55ZNOCKI"&gt;this link from megaupload&lt;/a&gt;, what you need is the &lt;b&gt;sdk-2.0-headers.tar.gz&lt;/b&gt; inside. The alternative way to obtain the headers is to download the full SDK and extract the headers from the SDK image. It's pretty hard and boring, the torrent is a better solution for now.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;Warning: YOU NEED TO INSTALL SOMETHING FROM THE APPSTORE!&lt;/b&gt; even a free application is ok, otherwise your iPhone for some reason will not execute your compiled binaries.&lt;/li&gt;

&lt;/ul&gt;
&lt;h3&gt;Ok, now I assume you did the jailbreak, and your iPhone is free :)&lt;/h3&gt;
Prefect, we are ready! Your iPhone was jailbroken and everything is working fine. You can see the new Cydia application in your iPhone.
&lt;br/&gt;&lt;br/&gt;

Now it's time to connect the phone to a Wifi network. Cydia needs to get stuff from the internet that will be installed in your phone. This is what you need to do:
&lt;ul&gt;&lt;li&gt;Open Cydia and install the &lt;b&gt;GCC&lt;/b&gt;. If you can't find it just use the &lt;i&gt;search&lt;/i&gt; feature of Cydia. Then instal &lt;b&gt;make&lt;/b&gt; and &lt;b&gt;ldid&lt;/b&gt; just like you installed GCC. (ldid is used to pseudo-sign binaries).&lt;/li&gt;

&lt;li&gt;Install SSH, and BossPref (Again from Cydia).&lt;/li&gt;

&lt;/ul&gt;
Well, first stage complete! Now you should be able to enter inside your
phone using SSH.
&lt;div class="emph"&gt;
Note that Cydia also installs a lot of other things when you run it the first time: all the standard Unix command line utilities including the 'make' command that we will use later to build our Hello World application.
&lt;/div&gt;
&lt;h3&gt;Let's try to enter inside the iPhone via SSH&lt;/h3&gt;
Ok, now exit from Cydia, Open BossPref and check the IP address of your phone, you can see it from the BossPref starting page. Also while in BossPref enable SSH.
&lt;br/&gt;&lt;br/&gt;

&lt;img src="http://antirez.com/misc/bosstool.jpg" /&gt;
&lt;br/&gt;&lt;br/&gt;

As you can see in the example the IP address of the wifi interface
is 192.168.1.101. So... we are ready to enter inside the iPhone!
From your Linux box try this (otherwise for windows users use
the program Putty):
&lt;pre class="code"&gt;
ssh root@192.168.1.101
&lt;/pre&gt;
Of course use your iPhone IP address instead of 192.168.1.101 ;)
If you are using Putty use the IP address as hostname and &amp;quot;root&amp;quot; as username.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;The default root password is &amp;quot;alpine&amp;quot;&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

Use this to enter inside the iPhone.
It works? Uaaaaaaaaoo!!! this is a good start.
&lt;br/&gt;&lt;br/&gt;

Now it's time to transfer the SDK 2.0 headers inside the iPhone.
In order to do so use the SCP program (Linux users) or SFTP (Windows Users).
&lt;pre class="code"&gt;
scp sdk-2.0-headers.tar.gz root@192.168.1.101:/var
&lt;/pre&gt;
If you are a Windows user make sure to copy the file inside the &lt;b&gt;/var&lt;/b&gt; folder.
&lt;br/&gt;&lt;br/&gt;

Ok now enter again inside your phone via SSH and untar the headers.
This is the commands you need to perform:
&lt;pre class="code"&gt;
ssh root@192.168.1.101 # Or use Putty if you are a Window guy
cd /var
tar xvzf sdk-2.0-headers.tar.gz
mv include-2.0-sdk-ready-for-iphone include
&lt;/pre&gt;
Ok... now you should be ready to start compiling and testing applications.
&lt;h3&gt;Hello World!&lt;/h3&gt;
I hate tutorials explaining how to setup an environment without to give at least a little example, so here there is the source code and the steps to put an Hello World Objective-C application, build it, install it, run it.
While I'm at it I'll try to explain a bit how the hello world program actually works, and the anatomy of an iPhone application.
&lt;h3&gt;Anatomy of an iPhone application&lt;/h3&gt;
Fortunately Apple designers take the clean and easy path to put everything about a given application inside a directory in a &lt;i&gt;self contained&lt;/i&gt; world where the application will take the executable, icons, sounds, and generally all the data the application needs in order to run.
&lt;br/&gt;&lt;br/&gt;

For example our Hello World program will be installed inside the iPhone under
the /Applications/HelloWorld.app directory. This is the layout of the HelloWorld.app directory:
&lt;pre class="code"&gt;
# ls -l HelloWorld.app/&lt;br /&gt;&lt;br /&gt;-rw-r--r-- 1 root admin   108 Sep 11 12:45 Default.png
-rwxr-xr-x 1 root admin 14192 Sep 11 12:45 HelloWorld*
-rw-r--r-- 1 root admin   812 Sep 11 12:45 Info.plist
-rw-r--r-- 1 root admin     9 Sep 11 12:45 PkgInfo
-rw-r--r-- 1 root admin  2399 Sep 11 12:45 icon.png
&lt;/pre&gt;
This files are the minimal set of files you'll find inside a working application directory:
&lt;ul&gt;&lt;li&gt;Default.png is a PNG image that will be shown on the screen while the application is loading. In our example it's just an empty png.&lt;/li&gt;

&lt;li&gt;HelloWorld is the binary executable, the result of compiling our Objective-C source code.&lt;/li&gt;

&lt;li&gt;Info.plist is an xml file that specifies the program name, version, and other information about our program. This is how it looks like (it's not too complex):&lt;/li&gt;

&lt;pre class="code"&gt;
&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;UTF-8&amp;quot;?&amp;gt;
&amp;lt;!DOCTYPE plist PUBLIC &amp;quot;-//Apple Computer//DTD PLIST 1.0//EN&amp;quot; &amp;quot;http://www.apple.
com/DTDs/PropertyList-1.0.dtd&amp;quot;&amp;gt;
&amp;lt;plist version=&amp;quot;1.0&amp;quot;&amp;gt;
&amp;lt;dict&amp;gt;
        &amp;lt;key&amp;gt;CFBundleDevelopmentRegion&amp;lt;/key&amp;gt;
        &amp;lt;string&amp;gt;en&amp;lt;/string&amp;gt;
        &amp;lt;key&amp;gt;CFBundleExecutable&amp;lt;/key&amp;gt;
        &amp;lt;string&amp;gt;HelloWorld&amp;lt;/string&amp;gt;
        &amp;lt;key&amp;gt;CFBundleIdentifier&amp;lt;/key&amp;gt;
        &amp;lt;string&amp;gt;org.iphone.HelloWorldapp&amp;lt;/string&amp;gt;
        &amp;lt;key&amp;gt;CFBundleInfoDictionaryVersion&amp;lt;/key&amp;gt;
        &amp;lt;string&amp;gt;6.0&amp;lt;/string&amp;gt;
        &amp;lt;key&amp;gt;CFBundleName&amp;lt;/key&amp;gt;
        &amp;lt;string&amp;gt;HelloWorld&amp;lt;/string&amp;gt;
        &amp;lt;key&amp;gt;CFBundlePackageType&amp;lt;/key&amp;gt;
        &amp;lt;string&amp;gt;APPL&amp;lt;/string&amp;gt;
        &amp;lt;key&amp;gt;CFBundleShortVersionString&amp;lt;/key&amp;gt;
        &amp;lt;string&amp;gt;1.0.0&amp;lt;/string&amp;gt;
        &amp;lt;key&amp;gt;CFBundleSignature&amp;lt;/key&amp;gt;
        &amp;lt;string&amp;gt;????&amp;lt;/string&amp;gt;
        &amp;lt;key&amp;gt;CFBundleVersion&amp;lt;/key&amp;gt;
        &amp;lt;string&amp;gt;1.0&amp;lt;/string&amp;gt;
        &amp;lt;key&amp;gt;SignerIdentity&amp;lt;/key&amp;gt;
        &amp;lt;string&amp;gt;Apple iPhone OS Application Signing&amp;lt;/string&amp;gt;
&amp;lt;/dict&amp;gt;
&amp;lt;/plist&amp;gt;
&lt;/pre&gt;
&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
You'll need to create a new Info.plist file for your new applications, but for the Hello World example this one is already good and working.
&lt;/div&gt;
&lt;ul&gt;&lt;li&gt;Pkg.info: I don't know what this file is supposed to do, I just grabbed it from another Hello World example and all it contains is the string &amp;quot;APPL????&amp;quot;.&lt;/li&gt;

&lt;li&gt;icon.png: Your application icon, this is what you will see in the springboard after the program is installed.&lt;/li&gt;

&lt;/ul&gt;
Basically all it's needed in order to install a new application is to copy the YourApplication.app directory inside the /Applications directory and restart the springboard using the command: &lt;b&gt;killall SpringBoard&lt;/b&gt;.
&lt;h3&gt;Source code layout&lt;/h3&gt;
What we seen in the last section is how the &lt;i&gt;installed application&lt;/i&gt; looks like. Actually we need to deal with another directory layout that's the one of our source code... but before to continue it's better that you &lt;a href="http://antirez.com/misc/iphone-helloworld-1.tar.gz"&gt;download our first hello world example&lt;/a&gt; and extract it on the iphone under the directory /var/mobile/src. The first step is to create the directory inside the iPhone so as usually enter via SSH inside your phone and write:
&lt;pre class="code"&gt;
mkdir /var/mobile/src
&lt;/pre&gt;
Then from your linux box (or using SFTP from Windows) use the following command to transfer the file inside /var/mobile/src:
&lt;/pre&gt;
scp iphone-helloworld-1.tar.gz root@192.168.1.101:/var/mobile/src
&lt;/pre&gt;
Then enter inside your iPhone and use the following commands to extract the example:
&lt;pre class="code"&gt;
cd /var/mobile/src
tar xvzf phone-helloworld-1.tar.gz
&lt;/pre&gt;
Now our first example is inside the phone! Ready to be compiled. But... try to stop your impatience for a moment and let's look a bit to our source structure:
&lt;pre class="code"&gt;
ls -l HelloWorld             /home/antirez/hack/iphone/article&lt;br /&gt;&lt;br /&gt;drwxr-xr-x 2 antirez antirez 4096 2008-09-11 12:47 build/
drwxr-xr-x 2 antirez antirez 4096 2008-09-11 12:47 Classes/
-rw-r--r-- 1 antirez antirez  812 2008-09-11 12:46 Info.plist
-rw-r--r-- 1 antirez antirez 2301 2008-09-11 12:46 Makefile
drwxr-xr-x 2 antirez antirez 4096 2008-09-11 12:46 Resources/
&lt;/pre&gt;
As you can see there are different directories and files:
&lt;ul&gt;&lt;li&gt;build: is a directory where thanks to the &lt;i&gt;make&lt;/i&gt; command targets the HelloWorld.app directory with all the stuff needed is created, so that &lt;i&gt;make install&lt;/i&gt; will be able to copy the final directory under /Applications.&lt;/li&gt;

&lt;li&gt;Classes: is where our source code lives (.m and .h files, Objective-C files are .m and not .c or something like this).&lt;/li&gt;

&lt;li&gt;Info.plist... you already know it from the HelloWorld.app directory.&lt;/li&gt;

&lt;li&gt;Makefile contains the make targets to compile the program, create the build, install, and so on.&lt;/li&gt;

&lt;li&gt;Resources are just data files needed to create the final build, in our case just the two images Default.png and icon.png.&lt;/li&gt;

&lt;/ul&gt;
&lt;h3&gt;It's time to test the example&lt;/h3&gt;
Ok, it's time to compile and try it... so inside your tell use the &amp;quot;cd&amp;quot; command to enter the HelloWorld directory and use this commands:
&lt;pre class="code"&gt;
make
make install
&lt;/pre&gt;
Since make install will also restart the springboard you will see the progress bar on your iPhone screen rotating for some seconds (it may even take almost a minute if you have a lot of apps installed) and finally you should see the &amp;quot;HelloWorld&amp;quot; application icon... launch it! This is what you should see:
&lt;br/&gt;&lt;br/&gt;

&lt;img src="http://antirez.com/misc/iphone-helloworld-1.jpg" /&gt;
&lt;div class="emph"&gt;
&lt;b&gt;Importante, 2.1 firware users read this:&lt;/b&gt; due to changes to the iPhone SpringBoard now the iPhone takes a cache of applications: just to restart the SpringBoard is not enough after &lt;i&gt;make install&lt;/i&gt; in order to show your new icon. Use the following command:
&lt;pre class="code"&gt;
/Applications/BossPrefs.app/Respring
&lt;/pre&gt;
(It's useless to say that you need BossPrefs installed to issue this command ;)
&lt;br/&gt;&lt;br/&gt;

This will force the iPhone to see that there is something of new inside /Applications.
&lt;/div&gt;
Herm... Are you impressed, aren't you?! :)
Ok it's lame but it is a good start! because the code is trivial and you can use it as a starting point for your hacks.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Making the modify, recompile, install, test cycle a bit more fun&lt;/h3&gt;
&lt;br/&gt;&lt;br/&gt;

WORK IN PROGRESS...
&lt;h3&gt;Translations&lt;/h3&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://taraska.ru/?p=5"&gt;Russian translation&lt;/a&gt;&lt;/li&gt;

&lt;/ul&gt;
&lt;h3&gt;License&lt;/h3&gt;
&lt;a rel="license" href="http://creativecommons.org/licenses/by-nc/3.0/us/"&gt;&lt;img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc/3.0/us/88x31.png" /&gt;&lt;/a&gt;&lt;br /&gt;This work is licensed under a &lt;a rel="license" href="http://creativecommons.org/licenses/by-nc/3.0/us/"&gt;Creative Commons Attribution-Noncommercial 3.0 United States License&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;Pagina creata il Monday, 08 September 08 | &lt;a href="/print.php?pageid=187"&gt;stampa&lt;/a&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/7QV-KRQUIFQ" height="1" width="1"/&gt;</description>
   <dc:date>2008-09-08T19:35:17+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/page/iphone-gcc-guide.html</feedburner:origLink></item>
  <item>
   <title>Liquida.it is out</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/Fx5UzrOAyEk/liquida-it-is-out.html</link>
   <guid isPermaLink="false">http://antirez.com/post/186</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;I wrote my toughts about Liquida in Italian in my &lt;a href="http://zzimma.antirez.com/post/pensieri-sparsi-su-liquida.html"&gt;other blog&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 13351 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 23.8 visits/day)&lt;/div&gt;Posted at 14:14:26 &lt;a href="http://antirez.com/post/liquida-it-is-out.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/liquida-it-is-out.html"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=186"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Liquida.it+is+out&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Fliquida-it-is-out.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/liquida-it-is-out.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/Fx5UzrOAyEk" height="1" width="1"/&gt;</description>
   <dc:date>2008-08-27T14:14:26+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/liquida-it-is-out.html</feedburner:origLink></item>
  <item>
   <title>Zzimma, my new blog</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/4uQ09YkFGJ8/zzimma-my-new-blog.html</link>
   <guid isPermaLink="false">http://antirez.com/post/185</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;&lt;a href="http://zzimma.antirez.com"&gt;Zzimma is my new blog&lt;/a&gt; in addition to this one. It is in italian language and it is about everything. The barrier to post here was a bit too big to have just this blog: too tech, too english. So I'll use this blog to post tech stuff I want to share with the reddit community and everything else in the new blog. See you there.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 12803 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 21.6 visits/day)&lt;/div&gt;Posted at 07:36:00 &lt;a href="http://antirez.com/post/zzimma-my-new-blog.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/zzimma-my-new-blog.html"&gt;2 comments&lt;/a&gt; | &lt;a href="/print.php?postid=185"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=Zzimma%2C+my+new+blog&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Fzzimma-my-new-blog.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/zzimma-my-new-blog.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/4uQ09YkFGJ8" height="1" width="1"/&gt;</description>
   <dc:date>2008-07-28T07:36:00+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/zzimma-my-new-blog.html</feedburner:origLink></item>
  <item>
   <title>tar.gz is the best package format for complex programs</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/2RcJAQkpET8/tar-gz-is-the-best-package-format-for-complex-programs.html</link>
   <guid isPermaLink="false">http://antirez.com/post/184</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;There are two kind of Unix programs. The first are programs with little dependencies that will compile out of the box with GCC and some basic library like zlib and other libs that are installed by default in almost every Unix distribution. For this kind of applications it's not so important to be distributed in binary form, the real problem is for complex programs with tons of dependencies, like firefox, amule, openoffice and a lot of less famous unix applications. To compile this applications is hard for the newbie and tedious for the expert Unix user so it's mandatory to have a binary distribution if the
project goal is to reach a big user base.
&lt;br/&gt;&lt;br/&gt;

In order to distribute complex programs in binary form there are two widely adopted alternatives, that are to ship a tar.gz with a statically linked binary and other library dependencies shipped with the program and linked at run time with some linker trick when you execute the program (this is what firefox does), or to ship a package for every kind of well known distribution out there.
&lt;br/&gt;&lt;br/&gt;

Of course there are projects that are doing both, that is the best solution but
to do so requires a lot of work. The point of this article is that projects that need to select just one binary shipping method &lt;b&gt;should choose the all-inside-tar-gz&lt;/b&gt;, and for good reasons.
&lt;br/&gt;&lt;br/&gt;

The tar.gz will work almost everywhere, all it's needed is to download it, unpack everywhere and execute the program: done! it is as simple as what happens with windows programs like uTorrent. It makes very easy to run different versions of the same program in the same box, this will encourage
users to try beta versions and give feedbacks for example. It is also much
simpler even from the point of view of the developers: to handle all the
major distributions out there is hard so basically in order to provide
distribution specific packages developers need to do a lot of work (and usually this is not possible) or use the help of the community
(that will give you very little quality control). And likely you will not covery 100% of users even with the best of the efforts.
&lt;br/&gt;&lt;br/&gt;

What I think is that the Unix world really need a standard way of packaging
applications that will run out of the box with everything inside the binary
directory, but while this is not still possible at least big projects authors
should be encouraged to ship easy to run binary packages with everything inside
the tar.gz.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;post read 11390 times&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt; (average 19.2 visits/day)&lt;/div&gt;Posted at 12:00:15 &lt;a href="http://antirez.com/post/tar-gz-is-the-best-package-format-for-complex-programs.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/tar-gz-is-the-best-package-format-for-complex-programs.html"&gt;4 comments&lt;/a&gt; | &lt;a href="/print.php?postid=184"&gt;print&lt;/a&gt; | &lt;a href="http://postli.com/post?t=tar.gz+is+the+best+package+format+for+complex+programs&amp;amp;u=http%3A%2F%2Fantirez.com%2Fpost%2Ftar-gz-is-the-best-package-format-for-complex-programs.html"&gt;post it&lt;/a&gt; | &lt;a class="tr-linkcount" href="http://technorati.com/search/http://antirez.com/post/tar-gz-is-the-best-package-format-for-complex-programs.html"&gt;View blog reactions&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/2RcJAQkpET8" height="1" width="1"/&gt;</description>
   <dc:date>2008-07-26T12:00:15+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/tar-gz-is-the-best-package-format-for-complex-programs.html</feedburner:origLink></item>
 </channel>
</rss>
