<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
 <channel>
  <title>antirez weblog</title>
  <link>http://antirez.com</link>
  <description>antirez weblog</description>
  <language>it-it</language>
  <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/antirez" /><feedburner:info uri="antirez" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>antirez</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
   <title>Redis persistence demystified</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/lnXTUqToF-8/redis-persistence-demystified.html</link>
   <guid isPermaLink="false">http://antirez.com/post/251</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Part of my work on Redis is reading blog posts, forum messages, and the twitter time line for the &amp;quot;Redis&amp;quot; search. It is very important for a developer to have a feeling about what the community of users, and the community of &lt;i&gt;non&lt;/i&gt; users, think about the product he is developing. And my feeling is that there is no Redis feature that is as misunderstood as its persistence.
&lt;br/&gt;&lt;br/&gt;

In this blog post I'll do an effort to be truly impartial: no advertising of Redis, no attempt to skip the details that may put Redis in a &lt;i&gt;bad light&lt;/i&gt;. All I want is simply to provide a clear, understandable picture of how Redis persistence works, how much reliable is, and how it compares to other database systems.
&lt;h3&gt;The OS and the disk&lt;/h3&gt;
The first thing to consider is what we can expect from a database in terms of durability.
In order to do so we can visualize what happens during a simple write operation:
&lt;ul&gt;&lt;li&gt;1: The client sends a write command to the database (data is in client's memory).&lt;/li&gt;

&lt;li&gt;2: The database receives the write (data is in server's memory).&lt;/li&gt;

&lt;li&gt;3: The database calls the system call that writes the data on disk (data is in the kernel's buffer).&lt;/li&gt;

&lt;li&gt;4: The operating system transfers the write buffer to the disk controller (data is in the disk cache).&lt;/li&gt;

&lt;li&gt;5: The disk controller actually writes the data into a physical media (a magnetic disk, a Nand chip, ...).&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
Note: the above &lt;b&gt;is an oversimplification&lt;/b&gt; in many ways, because there are more levels of caching and buffers than that.
&lt;br/&gt;&lt;br/&gt;

Step 2 is often implemented as a complex caching system inside the database implementation, and sometimes writes are handled by a different thread or process. However soon or later, the database will have to write data to disk, and this is what matters from our point of view. That is, data from memory has to be transmitted to the kernel (step 3) at some point.
&lt;br/&gt;&lt;br/&gt;

Another big omission of details is about step 3. The reality is more complex since most advanced kernels implement different layers of caching, that usually are the file system level caching (called the &lt;i&gt;page cache&lt;/i&gt; in Linux) and a smaller &lt;i&gt;buffer cache&lt;/i&gt; that is a buffer containing the data that waits to be committed to the disk. Using special APIs is possible to bypass both (see for instance O_DIRECT and O_SYNC flags of the open system call on Linux). However from our point of view we can consider this as an unique layer of opaque caching (that is, we don't know the details). It is enough to say that often the page cache is disabled when the database already implements its caching to avoid that both the database and the kernel will try to do the same work at the same time (with bad results). The buffer cache is usually never turned off because this means that every write to the file will result into data committed to the disk that is too slow for almost all the applications.
&lt;br/&gt;&lt;br/&gt;

What databases usually do instead is to call system calls that will commit the buffer cache to the disk, only when absolutely needed, as we'll see later in a more detailed way.
&lt;/div&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;When is our write safe along the line?&lt;/h3&gt;
If we consider a failure that involves just the database software (the process gets killed by the system administrator or crashes) and does not touch the kernel, the write can be considered safe just after the step 3 is completed with success, that is after the write(2) system call (or any other system call used to transfer data to the kernel) returns successfully. After this step even if the database process crashes, still the kernel will take care of transferring data to the disk controller.
&lt;br/&gt;&lt;br/&gt;

If we consider instead a more catastrophic event like a power outage, we are safe only at step 5 completion, that is, when data is actually transfered to the physical device memorizing it.
&lt;br/&gt;&lt;br/&gt;

We can summarize that the important stages in data safety are the 3, 4, and 5. That is:
&lt;ul&gt;&lt;li&gt;How often the database software will transfer its user-space buffers to the kernel buffers using the write (or equivalent) system call?&lt;/li&gt;

&lt;li&gt;How often the kernel will flush the buffers to the disk controller?&lt;/li&gt;

&lt;li&gt;And how often the disk controller will write data to the physical media?&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
Note: when we talk about &lt;i&gt;disk controller&lt;/i&gt; we actually mean the caching performed by the controller &lt;i&gt;or&lt;/i&gt; the disk itself. In environments where durability is important system administrators usually disable this layer of caching.
&lt;br/&gt;&lt;br/&gt;

Disk controllers by default only perform a write through caching for most systems (i.e. only reads are cached, not writes). It is safe to enable the write back mode (caching of writes)
only when you have batteries or a super-capacitor device protecting the data in case of power shutdown.
&lt;/div&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;POSIX API&lt;/h3&gt;
From the point of view of the database developer the path that the data follows before being actually written to the physical device is interesting, but even more interesting is &lt;i&gt;the amount of control&lt;/i&gt; the programming API provides along the path.
&lt;br/&gt;&lt;br/&gt;

Let's start from step 3. We can use the &lt;i&gt;write&lt;/i&gt; system call to transfer data to the kernel buffers, so from this point of view we have a good control using the POSIX API. However we don't have much control about how much time this system call will take before returning successfully. The kernel write buffer is limited in size, if the disk is not able to cope with the application write bandwidth, the kernel write buffer will reach it's maximum size and the kernel will block our write. When the disk will be able to receive more data, the write system call will finally return. After all the goal is to, eventually, reach the physical media.
&lt;br/&gt;&lt;br/&gt;

Step 4: in this step the kernel transfers data to the disk controller. By default it will try to avoid doing it too often, because it is faster to transfer it in bigger pieces. For instance Linux by default will actually commit writes after &lt;b&gt;30 seconds&lt;/b&gt;. This means that if there is a failure, all the data written in the latest 30 seconds can get potentially lost.
&lt;br/&gt;&lt;br/&gt;

The POSIX API provides a family of system calls to force the kernel to write buffers to the disk: the most famous of the family is probably the &lt;i&gt;fsync&lt;/i&gt; system call (see also &lt;i&gt;msync&lt;/i&gt; and &lt;i&gt;fdatasync&lt;/i&gt; for more information).
Using fsync the database system has a way to force the kernel to actually commit data on disk, but as you can guess this is a very expensive operation: &lt;b&gt;fsync will initiate a write operation&lt;/b&gt; every time it is called and there is some data pending on the kernel buffer. Fsync() also blocks the process for all the time needed to complete the write, and if this is not enough, on Linux it will also block all the other threads that are writing against the same file.
&lt;h3&gt;What we can't control&lt;/h3&gt;
So far we learned that we can control step 3 and 4, but what about 5? Well formally speaking we don't have control from this point of view using the POSIX API. Maybe some kernel implementation will try to tell the drive to actually commit data on the physical media, or maybe the controller will instead re-order writes for the sake of speed, and will not really write data on disk ASAP, but will wait a couple of milliseconds more. This is simply out of our control.
&lt;br/&gt;&lt;br/&gt;

In the rest of this article we'll thus simplify our scenario to two data safety levels:
&lt;ul&gt;&lt;li&gt;Data written to kernel buffers using the write(2) system call (or equivalent) that gives us &lt;b&gt;data safety against process failure&lt;/b&gt;.&lt;/li&gt;

&lt;li&gt;Data committed to the disk using the fsync(2) system call (or equivalent) that gives us, virtually, &lt;b&gt;data safety against complete system failure&lt;/b&gt; like a power outage. We actually know that there is no guarantee because of the possible disk controller caching, but we'll not consider this aspect because this is an invariant among all the common database systems. Also system administrators often can use specialized tools in order to control the exact behavior of the physical device.&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
Note: not all the databases use the POSIX API. Some proprietary database use a kernel module that will allow a more direct interaction with the hardware. However the main shape of the problem remains the same. You can use user-space buffers, kernel buffers, but soon or later there is to write data on disk to make it safe (and this is a slow operation). A notable example of a database using a kernel module is Oracle.
&lt;/div&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Data corruption&lt;/h3&gt;
In the previous paragraphs we analyzed the problem of ensuring data is actually transfered to the disk by the higher level layers: the application and the kernel. However this is not the only concern about durability. Another one is the following: is the database still readable after a catastrophic event, or its internal structure can get corrupted in some way so that it may no longer be read correctly, or requires a recovery step in order to reconstruct a valid representation of data?
&lt;br/&gt;&lt;br/&gt;

For instance many SQL and NoSQL databases implement some form of on-disk tree data structure that is used to store data and indexes. This data structure is manipulated on writes. If the system stops working in the middle of a write operation, is the tree representation still valid?
&lt;br/&gt;&lt;br/&gt;

In general there are three levels of safety against data corruption:
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;Databases that write to the disk representation not caring about what happens in the event of failure,  asking the user to use a replica for data recovery, and/or providing tools that will try to reconstruct a valid representation &lt;i&gt;if possible&lt;/i&gt;.&lt;/li&gt;

&lt;li&gt;Database systems that use a log of operations (a journal) so that they'll be able to recover to a consistent state after a failure.&lt;/li&gt;

&lt;li&gt;Database systems that never modify already written data, but only work in &lt;i&gt;append only&lt;/i&gt; mode, so that &lt;b&gt;no corruption is possible&lt;/b&gt;.&lt;/li&gt;

&lt;/ul&gt;
Now we have all the elements we need to evaluate a database system in terms of reliability of its persistence layer. It's time to check how Redis scores in this regard.
Redis provides two different persistence options, so we'll examine both one after the other.
&lt;h3&gt;Snapshotting&lt;/h3&gt;
Redis snapshotting is the simplest Redis persistence mode. It produces point-in-time snapshots of the dataset when specific conditions are met, for instance if the previous snapshot was created more than 2 minutes ago and there are already at least 100 new writes, a new snapshot is created. This conditions can be controlled by the user configuring the Redis instance, and can also be modified at runtime without restarting the server. Snapshots are produced as a single &lt;i&gt;.rdb&lt;/i&gt; file that contains the whole dataset.
&lt;br/&gt;&lt;br/&gt;

The durability of snapshotting is limited to what the user specified as &lt;i&gt;save points&lt;/i&gt;. If the dataset is saved every 15 minutes, than in the event of a Redis instance crash or a more catastrophic event, up to 15 minutes of writes can be lost. From the point of view of Redis transactions snapshotting guarantees that either a MULTI/EXEC transaction is fully written into a snapshot, or it is not present at all (as already said RDB files represent exactly &lt;i&gt;point in time&lt;/i&gt; images of the dataset).
&lt;br/&gt;&lt;br/&gt;

The RDB file can not get corrupted, because it is produced by a child process in an append-only way, starting from the image of data in the Redis memory. A new rdb snapshot is created as a temporary file, and gets renamed into the destination file using the atomic rename(2) system call once it was successfully generated by a child process (and only after it gets synched on disk using the fsync system call).
&lt;br/&gt;&lt;br/&gt;

Redis snapshotting does NOT provide good durability guarantees if up to a few minutes of data loss is not acceptable in case of incidents, so it's usage is limited to applications and environments where losing recent data is not very important.
&lt;br/&gt;&lt;br/&gt;

However even when using the more advanced persistence mode that Redis provides, called &amp;quot;AOF&amp;quot;, it is still advisable to also turn snapshotting on, because to have a single compact RDB file containing the whole dataset is extremely useful to perform backups, to send data to remote data centers for disaster recovery, or to easily roll-back to an old version of the dataset in the event of a dramatic software bug that compromised the content of the database in a serious way.
&lt;br/&gt;&lt;br/&gt;

It's worth to note that RDB snapshots are also used by Redis when performing a master -&amp;gt; slave synchronization.
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
One of the additional benefits of RDB is the fact for a given database size,
the number of I/Os on the system is bound, whatever the activity on the database is.
This is a property that most traditional database systems (and the Redis other persistence, the AOF) do not have.
&lt;/div&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Append only file&lt;/h3&gt;
The Append Only File, usually called simply AOF, is the main Redis persistence option.
The way it works is extremely simple: every time a write operation that modifies the dataset in memory is performed, the operation gets logged. The log is produced exactly in the same format used by clients to communicate with Redis, so the AOF can be even piped via netcat to another instance, or easily parsed if needed. At restart Redis re-plays all the operations to reconstruct the dataset.
&lt;br/&gt;&lt;br/&gt;

To show how the AOF works in practice I'll do a simple experiment, setting up a new Redis 2.6 instance with append only file enabled:
&lt;pre class="code"&gt;
./redis-server --appendonly yes
&lt;/pre&gt;
Now it's time to send a few write commands to the instance:
&lt;pre class="code"&gt;
redis 127.0.0.1:6379&amp;gt; set key1 Hello
OK
redis 127.0.0.1:6379&amp;gt; append key1 &amp;quot; World!&amp;quot;
(integer) 12
redis 127.0.0.1:6379&amp;gt; del key1
(integer) 1
redis 127.0.0.1:6379&amp;gt; del non_existing_key
(integer) 0
&lt;/pre&gt;
The first three operations actually modified the dataset, the fourth did not, because there was no key with the specified name. This is how our append only file looks like:
&lt;pre class="code"&gt;
$ cat appendonly.aof 
*2
$6
SELECT
$1
0
*3
$3
set
$4
key1
$5
Hello
*3
$6
append
$4
key1
$7
 World!
*2
$3
del
$4
key1
&lt;/pre&gt;
As you can see the final DEL is missing, because it did not produced any modification to the dataset.
&lt;br/&gt;&lt;br/&gt;

It is as simple as that, new commands received will get logged into the AOF, but only if they have some effect on actual data. However not all the commands are logged as they are received. For instance blocking operations on lists are logged for their final effects as normal non blocking commands. Similarly INCRBYFLOAT is logged as SET, using the final value after the increment as payload, so that differences in the way floating points are handled by different architectures will not lead to different results after reloading an AOF file.
&lt;br/&gt;&lt;br/&gt;

So far we know that the Redis AOF is an append only business, so no corruption is possible. However this desirable feature can also be a problem: in the above example after the DEL operation our instance is completely empty, still the AOF is a few bytes worth of data. The AOF is an &lt;i&gt;always growing file&lt;/i&gt;, so how to deal with it when it gets too big?
&lt;h3&gt;AOF rewrite&lt;/h3&gt;
When an AOF is too big Redis will simply rewrite it from scratch in a temporary file. The rewrite is NOT performed by reading the old one, but directly accessing data in memory, so that Redis can create the shortest AOF that is possible to generate, and will not require read disk access while writing the new one.
&lt;br/&gt;&lt;br/&gt;

Once the rewrite is terminated, the temporary file is synched on disk with fsync and is used to overwrite the old AOF file.
&lt;br/&gt;&lt;br/&gt;

You may wonder what happens to data that is written to the server while the rewrite is in progress. This new data is simply also written to the old (current) AOF file, and at the same time queued into an in-memory buffer, so that when the new AOF is ready we can write this missing part inside it, and finally replace the old AOF file with the new one.
&lt;br/&gt;&lt;br/&gt;

As you can see still everything is append only, and when we rewrite the AOF we still write everything inside the old AOF file, for all the time needed for the new to be created. This means that for our analysis we can simply avoid considering the fact that the AOF in Redis gets rewritten at all. So the real question is, how often we write(2), and how often we fsync(2).
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
AOF rewrites are generated only using sequential I/O operations, so the
whole dump process is efficient even with rotational disks (no random I/O is performed).
This is also true for RDB snapshots generation.
The complete lack of Random I/O accesses is a rare feature among databases, and is possible mostly because Redis serves read operations from memory, so data on disk does not need to be organized for a random access pattern, but just for a sequential loading on restart.
&lt;/div&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;AOF durability&lt;/h3&gt;
This whole article was written to reach this paragraph. I'm glad I'm here, and I'm even more glad you are &lt;i&gt;still&lt;/i&gt; here with me.
&lt;br/&gt;&lt;br/&gt;

The Redis AOF uses an user-space buffer that is populated with new data as new commands are executed. The buffer is usually flushed on disk &lt;i&gt;every time we return back into the event loop&lt;/i&gt;, using a single write(2) call against the AOF file descriptor, but actually there are three different configurations that will change the exact behavior of write(2), and especially, of fsync(2) calls.
&lt;br/&gt;&lt;br/&gt;

This three configurations are controlled by the &lt;b&gt;appendfsync&lt;/b&gt; configuration directive, that can have three different values: no, everysec, always. This configuration can also be queried or modified at runtime using the CONFIG SET command, so you can alter it every time you want without stopping the Redis instance.
&lt;h3&gt;appendfsync no&lt;/h3&gt;
In this configuration Redis does not perform fsync(2) calls at all. However it will make sure that clients &lt;b&gt;not using&lt;/b&gt;  &lt;a href="http://redis.io/topics/pipelining"&gt;pipelining&lt;/a&gt;, that is, clients that wait to receive the reply of a command before sending the next one, will receive an acknowledge that the command was executed correctly &lt;b&gt;only after the change is transfered to the kernel by writing the command to the AOF file descriptor, using the write(2) system call&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

Because in this configuration fsync(2) is not called at all, data will be committed to disk at kernel's wish, that is, every 30 seconds in most Linux systems.
&lt;h3&gt;appendfsync everysec&lt;/h3&gt;
In this configuration data will be both written to the file using write(2) and flushed from the kernel to the  disk using fsync(2) &lt;b&gt;one time every second&lt;/b&gt;. Usually the write(2) call will actually be performed every time we return to the event loop, but this is not guaranteed.
&lt;br/&gt;&lt;br/&gt;

However if the disk can't cope with the write speed, and the background fsync(2) call is taking longer than 1 second, Redis may delay the write up to an additional second (in order to avoid that the write will block the main thread because of an fsync(2) running in the background thread against the same file descriptor). If a total of &lt;b&gt;two seconds&lt;/b&gt; elapsed without that fsync(2) was able to terminate, Redis finally performs a (likely blocking) write(2) to transfer data to the disk at any cost.
&lt;br/&gt;&lt;br/&gt;

So in this mode Redis guarantees that, in the worst case, within 2 seconds everything you write is going to be committed to the operating system buffers &lt;i&gt;and&lt;/i&gt; transfered to the disk. In the average case data will be committed every second.
&lt;h3&gt;appednfsync always&lt;/h3&gt;
In this mode, and if the client does not use pipelining but waits for the replies before issuing new commands, data is both written to the file and synched on disk using fsync(2) &lt;b&gt;before an acknowledge is returned to the client&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

This is the highest level of durability that you can get, but is slower than the other modes.
&lt;br/&gt;&lt;br/&gt;

The default Redis configuration is &lt;b&gt;appendfsync everysec&lt;/b&gt; that provides a good balance between speed (is almost as fast as &lt;b&gt;appendfsync no&lt;/b&gt;) and durability.
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
What Redis implements when appendfsync is set to &lt;b&gt;always&lt;/b&gt; is usually called &lt;b&gt;group commit&lt;/b&gt;.
This means that instead of using an fsync call for every write operation performed, Redis is able to &lt;i&gt;group&lt;/i&gt; this commits in a single write+fsync operation performed before sending the request to the group of clients that issued a write operation during the latest event loop iteration.
&lt;br/&gt;&lt;br/&gt;

In practical terms it means that you can have hundreds of clients performing write operations at the same time: the fsync operations will be factorized - so even in this mode Redis should be able to support a thousand of concurrent transactions per second while a rotational device can only sustain 100-200 write op/s.
&lt;br/&gt;&lt;br/&gt;

This feature is usually hard to implement in a traditional database, but Redis makes it remarkably more simple.
&lt;/div&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Why is pipelining different?&lt;/h3&gt;
The reason for handling clients using pipelining in a different way is that clients using pipelining &lt;i&gt;with writes&lt;/i&gt; are sacrificing the ability to read what happened with a given command, before executing the next one, in order to gain speed. There is no point in committing data before replying to a client that seems not interested in the replies before going forward, the client is asking for speed. However even if a client is using pipelining, writes and fsyncs (depending on the configuration) always happen when returning to the event loop.
&lt;h3&gt;AOF and Redis transactions&lt;/h3&gt;
AOF guarantees a correct MULTI/EXEC transactions semantic, and will refuse to reload a file that contains a broken transaction at the end of the file. An utility shipped with the Redis server can trim the AOF file to remove the partial transaction at the end.
&lt;br/&gt;&lt;br/&gt;

Note: since the AOF file is populated using a single write(2) call at the end of every event loop iteration, an incomplete transaction can only appear if the disk where the AOF resides gets full while Redis is writing.
&lt;h3&gt;Comparison with PostrgreSQL&lt;/h3&gt;
So how durable is Redis, with its main persistence engine (AOF) in its default configuration?
&lt;ul&gt;&lt;li&gt;Worst case: It guarantees that write(2) and fsync(2) are performed within two seconds.&lt;/li&gt;

&lt;li&gt;Normal case: it performs write(2) before replying to client, and performs an fsync(2) every second.&lt;/li&gt;

&lt;/ul&gt;
What is interesting is that in this mode Redis is still extremely fast, for a few reasons. One is that fsync is performed on a background thread, the other is that Redis only writes in append only mode, that is a big advantage.
&lt;br/&gt;&lt;br/&gt;

However if you need maximum data safety and your write load is not high, you can still have the best of the durability that is possible to obtain &lt;i&gt;in any database system&lt;/i&gt; using &lt;b&gt;fsync always&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

How this compares to PostgreSQL, that is (with good reasons) considered a good and very reliable database?
&lt;br/&gt;&lt;br/&gt;

Let's read some PostgreSQL documentation together (note, I'm only citing the interesting pieces, you can find the full documentation &lt;a href="http://www.postgresql.org/docs/9.1/static/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT"&gt;here in the PostgreSQL official site&lt;/a&gt;)
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
&lt;b&gt;fsync (boolean)&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

If this parameter is on, the PostgreSQL server will try to make sure that updates are physically written to disk, by issuing fsync() system calls or various equivalent methods (see wal_sync_method). This ensures that the database cluster &lt;b&gt;can recover to a consistent state after an operating system or hardware crash&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

[snip]
&lt;br/&gt;&lt;br/&gt;

In many situations, turning off synchronous_commit for noncritical transactions can provide much of the potential performance benefit of turning off fsync, without the attendant risks of data corruption.
&lt;/div&gt;
&lt;br/&gt;&lt;br/&gt;

So PostgreSQL needs to fsync data in order to avoid corruptions. Fortunately with Redis AOF we don't have this problem at all, no corruption is possible. So let's check the next parameter, that is the one that more closely compares with Redis fsync policy, even if the name is different:
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
&lt;b&gt;synchronous_commit (enum)&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

Specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a &amp;quot;success&amp;quot; indication to the client. Valid values are on, local, and off. The default, and safe, value is on. When off, there can be a delay between when success is reported to the client and when the transaction is really guaranteed to be safe against a server crash. (The maximum delay is three times wal_writer_delay.) Unlike fsync, setting this parameter to off does not create any risk of database inconsistency: an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly.
&lt;/div&gt;
&lt;br/&gt;&lt;br/&gt;

Here we have something much similar to what we can tune with Redis. Basically the PostgreSQL guys are telling you, want speed? Probably it is a good idea to disable synchronous commits.
That's like in Redis: want speed? Don't use &lt;b&gt;appendfsync always&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

Now if you disable synchronous commits in PostgreSQL you are in a very similar affair as with Redis &lt;b&gt;appendfsync everysec&lt;/b&gt;, because by default &lt;i&gt;wal_writer_delay&lt;/i&gt; is set to 200 milliseconds, and the documentation states that you need to multiply it by three to get the actual delay of writes, that is thus 600 milliseconds, very near to the 1 second Redis default.
&lt;br/&gt;&lt;br/&gt;

&lt;div class="emph"&gt;
MySQL InnoDB has similar parameters the user can tune. From the documentation:
&lt;br/&gt;&lt;br/&gt;

If the value of innodb_flush_log_at_trx_commit is 0, the log buffer is written out to the log file once per second and the flush to disk operation is performed on the log file, but nothing is done at a transaction commit. When the value is 1 (the default), the log buffer is written out to the log file at each transaction commit and the flush to disk operation is performed on the log file. When the value is 2, the log buffer is written out to the file at each commit, but the flush to disk operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues.
&lt;br/&gt;&lt;br/&gt;

You can &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit"&gt;read more here&lt;/a&gt;.
&lt;/div&gt;
&lt;br/&gt;&lt;br/&gt;

Long story short: even if Redis is an in memory database it offers good durability compared to other on disk databases.
&lt;br/&gt;&lt;br/&gt;

From a more practical point of view Redis provides both AOF and RDB snapshots, that can be enabled simultaneously (this is the advised setup, when in doubt), offering at the same time easy of operations and data durability.
&lt;br/&gt;&lt;br/&gt;

Everything we said about Redis durability can also be applied not only when Redis is used as a datastore but also when it is used to implement queues that needs to persist on disk with good durability.
&lt;h3&gt;Credits&lt;/h3&gt;
&lt;a href="http://twitter.com/didier_06"&gt;Didier Spezia&lt;/a&gt; provided very useful ideas and insights for this blog post. The topic is huge and I'm sure I overlooked a lot of things, but surely thanks to Didier the current post is much better compared to the first draft.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Addendum: a note about restart time&lt;/h3&gt;
I received a few requests about adding some information about restart time, since when a Redis instance is stopped and gets restarted it has to read the dataset from disk into memory. I think it is a good addition, because there are differences between RDB and AOF persistence, and between Redis 2.6 and Redis 2.4. Also it is interesting to see how Redis compares with PostgreSQL and MySQL in this regard.
&lt;br/&gt;&lt;br/&gt;

First of all it's worth to mention why Redis requires to load the whole dataset in memory before starting to serve request to clients: the reason is not, strictly speaking, that it is an in-memory DB. It is conceivable to think that a database that is in memory, but &lt;i&gt;uses the same representation of data in memory and on disk&lt;/i&gt; could start serving data ASAP.
&lt;br/&gt;&lt;br/&gt;

Actually the true reason is that we optimized the different representations for the different scopes they serve: on disk we have a compact append-only representation that is not suitable for random access. On memory we have the best possible representation for fast data fetching and modification. But this forces us to perform a &lt;i&gt;conversion step&lt;/i&gt; on loading. Redis reads keys one after the other on disk, and encodes the same keys and associated values using the in-memory representation.
&lt;br/&gt;&lt;br/&gt;

With RDB file this process is very fast for a few reasons: the first is that RDB files are usually more compact, binary, and sometimes even encode values in the same format they are in memory (this happens for small aggregate data types that are encoded as &lt;i&gt;ziplists&lt;/i&gt; or &lt;i&gt;intsets&lt;/i&gt;).
&lt;br/&gt;&lt;br/&gt;

CPU and disk speed will do a big difference, but as a general rule you can think that a Redis server will load an RDB file at the rate of 10 ~ 20 seconds per gigabyte of memory used, so loading a dataset composed of tens of gigabytes can take even a few minutes.
&lt;br/&gt;&lt;br/&gt;

Loading an AOF file that was just rewritten by the server takes something like twice per gigabyte in Redis 2.6, but of course if a lot of writes reached the AOF file &lt;i&gt;after&lt;/i&gt; the latest compaction it can take longer (however Redis in the default configuration triggers a rewrite automatically if the AOF size reaches 200% of the initial size).
&lt;br/&gt;&lt;br/&gt;

Restarting an instance is usually not needed however in a setup with a single server it is a better idea to use replication in order to transfer the control to the new Redis instance without service interruption.
For instance in the case of an upgrade to a newer Redis version usually the system administrator will setup the Redis instance running the new version as slave of the old instance, then will point all the clients to the new instance, will turn this instance into a master, and will finally shut down the old one.
&lt;br/&gt;&lt;br/&gt;

What about traditional on disk databases? They don't need to load data in memory... or maybe yes?
Well basically they do a better job than Redis is this regard, because, when you start a MySQL  server it is albe to serve request since the first second, however if the database and index files are no longer in the operating system cache what is happening is a &lt;i&gt;cold restart&lt;/i&gt;. In this case the database will work since the start, but will be very slow and may not be able to cope with the speed at which the application is requesting data. I saw this happening multiple times first-hand.
&lt;br/&gt;&lt;br/&gt;

What is happening in a cold restart is that the database is actually &lt;i&gt;reading&lt;/i&gt; data from disk to memory, very similarly to what Redis does, but incrementally.
&lt;br/&gt;&lt;br/&gt;

Long story short: Redis requires some time to restart if the dataset is big. On disk databases are better in this regard, but you can't expect that they'll perform well in the case of a cold restart, and if they are under load it is easy to see a condition where the whole application is actually blocked for several minutes. On the other hand &lt;i&gt;once&lt;/i&gt; Redis starts, it starts at full speed.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Edit:&lt;/b&gt; want to learn more? &lt;a href="http://sqlite.org/atomiccommit.html"&gt;this article at sqlite.org about atomic commits&lt;/a&gt; is strongly suggested.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;53746 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 10:08:08 | &lt;a href="http://antirez.com/post/redis-persistence-demystified.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/redis-persistence-demystified.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=251"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/lnXTUqToF-8" height="1" width="1"/&gt;</description>
   <dc:date>2012-03-26T10:08:08+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/redis-persistence-demystified.html</feedburner:origLink></item>
  <item>
   <title>Redis reliable queues with Lua scripting</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/6oWEoc4l1sE/250</link>
   <guid isPermaLink="false">http://antirez.com/post/250</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Redis 2.6 support for Lua scripting opens a lot of possibilities, basically because you can do atomically a lot of things that before required to pay a big performance hit. Now the price is much cheaper, so why don't abuse our power?
&lt;br/&gt;&lt;br/&gt;

Even more important is that, before scripting, in order to turn non atomic primitives into atomic primitives you required help of the Redis WATCH command, that is a &lt;i&gt;check and set&lt;/i&gt; style primitive. Being it an optimistic locking when there is high contention, like in the example of a queue with multiple workers (with many clients accessing a single key with WATCH), performances may be pretty bad.
&lt;br/&gt;&lt;br/&gt;

In this blog post I want to show a pattern based on the scripting capability that can be used to implement reliable queues.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Circular queue&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

In our system there are only two players: &lt;b&gt;Producers&lt;/b&gt; and &lt;b&gt;Consumers&lt;/b&gt;, and we only push IDs, it's up to the consumer to agree with the producer about what this IDs really mean, similarly to Michel Martens &lt;a href="https://github.com/soveran/ost"&gt;Ost library&lt;/a&gt;.
&lt;br/&gt;&lt;br/&gt;

An item is in &lt;i&gt;processing&lt;/i&gt; state if a client is already processing it but has not yet finished.
&lt;br/&gt;&lt;br/&gt;

Everything is based on the idea that tasks are never removed from the list, unless they were &lt;i&gt;actually&lt;/i&gt; processed. But instead of using a service list to put there tasks that are in the &lt;i&gt;processing&lt;/i&gt; state, we use a single list for everything.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Producer&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

From the point of view of the producer, if there is the new object ID 123 that needs to be processed by a worker (consumer), only an operation is performed:
&lt;pre class="code"&gt;
LPUSH queue 123
&lt;/pre&gt;
So we add the item on the top of the list. Items on the top will be processed the last by workers, so this queue is &lt;i&gt;First In Last Out&lt;/i&gt;.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Consumer&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

The interesting part is what the consumer does to access an item in the queue.
It runs a Lua script that does the following:
&lt;ul&gt;&lt;li&gt;Get the element on the tail of the list, for instance 45.&lt;/li&gt;

&lt;li&gt;Put the same element on the head of the list, but followed by a trailing asterisk to signal that the item is currently being processed, followed by the unix time (passed by the client to the scripting engine). So in the end we get &amp;quot;45&amp;quot; from the tail, and we put &amp;quot;45*&amp;lt;unixtime&amp;gt;&amp;quot; to the head.&lt;/li&gt;

&lt;li&gt;Return the element to the client (45 in this case).&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

If the element currently on the tail was already followed by an asterisk and unix time the script does not add an additional asterisk and unix time, it is simply moved on the head, and returned to the client, including the asterisk and the timestamp.
&lt;br/&gt;&lt;br/&gt;

So the client calling this script will either receive 45 (or any other ID actaully), or an ID followed by an unix timestamp like 45*1332014784.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;What the consumer does with the returned value&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

If the item is in &lt;i&gt;processing&lt;/i&gt; state but is still young enough (no timeout) it is discarded and the script is called again to fetch the next ID.
&lt;br/&gt;&lt;br/&gt;

Otherwise if the item timed out the consumer will check if the item was actually processed or not by the original client, in an application-specific way, and will remove it from the queue if needed, otherwise the client will call another script that atomically remove the old item and add a new one with the new timestamp. And of course it will start processing it.
&lt;br/&gt;&lt;br/&gt;

When an item was processed successfully it gets removed from the queue using LREM.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Advantages&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

The advantage of this system, that may actually be modeled in many different ways, is that you have a rotating list full of jobs to process or currently being processed. There is no way for a job to be lost. Also clients will receive jobs that are still being processed every time a full run of the list is performed, so this jobs will be activated again if needed, but will still remain in the list forever as long as no one is able to complete them.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Improvements&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

If tasks take a lot of time to complete using LREM to delete the task may not be optimal. We may use an additional key with a Redis set where we store all the completed tasks, that the lua script will remove every time an item in the &lt;i&gt;processing&lt;/i&gt; state is encountered and is also in the Set.
&lt;br/&gt;&lt;br/&gt;

Another good use of an additional Set is to mark the items currently processed or waiting to be processed if we don't want to put the same ID multiple times (rarely useful).
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Blocking VS polling&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

This system requires some form of polling from the point of view of the consumer. In order to avoid that a consumer will rotate the list as fast as possible without actually fetching interesting things. To avoid this problem is possible to use a sentinel to signal the end of the list (like a special task ID -1) so that clients will pause a bit when this element is encountered. Another solution is to simply sleep a bit if after N calls to the script no processable element was found.
&lt;br/&gt;&lt;br/&gt;

Another alternative is to use a second list just to notify that new tasks are available, using blocking pop. and push.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Alternative implementations&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

An alternative implementation is to use a list and a sorted set: the list contains new elements to process, while the sorted set elements that are in the &lt;i&gt;processing&lt;/i&gt; state, scored by unix time. Basically there are endless alternatives, the main point is that now with scripting we can fetch an element while adding it somewhere else, with even additional information (the unix time) without issues, so many new patterns are possible in the messaging area of Redis usage.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;17638 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 20:15:02 | &lt;a href="http://antirez.com/post/250"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/250#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=250"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/6oWEoc4l1sE" height="1" width="1"/&gt;</description>
   <dc:date>2012-03-17T20:15:02+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/250</feedburner:origLink></item>
  <item>
   <title>Redis 2.6 is near, and a few more updates</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/IjpNQExovRo/redis-2.6-is-near.html</link>
   <guid isPermaLink="false">http://antirez.com/post/249</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Redis 2.6 was expected to go live in the first weeks of 2012, but today is 24th of February and there are still no 2.6-rc1 tags around, what happened to it you may ask!?
&lt;br/&gt;&lt;br/&gt;

Well, for one time, a delay is not a signal that something is wrong. What happened is simply that we put a lot more than expected inside this release, so without further delays here is a list of new features:
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;Server side &lt;a href="http://redis.io/commands/eval"&gt;Lua scripting&lt;/a&gt;, probably the most exciting and big news, with built-in support for fast json JSON and MessagePack encoding and decoding.&lt;/li&gt;

&lt;li&gt;Milliseconds resolution expires, also added new commands with milliseconds precision. This means that if you set an expire at 1 second, now the key will stop existing after exactly 1000 milliseconds, with an error of +/- 1 millisecond. At the same time you have new commands like &lt;b&gt;PEXIRE&lt;/b&gt;, &lt;b&gt;PTTL&lt;/b&gt;, &lt;b&gt;PSETEX&lt;/b&gt;, that let you specify the timeout of a key in milliseconds. What to trottle an API so that no more than two requests per 50 milliseconds are done? now you can easily.&lt;/li&gt;

&lt;li&gt;Hardcoded limits about max number of clients removed. Now your Redis instance can handle all the clients your OS is able to handle, without recompilations or other hard coded limits.&lt;/li&gt;

&lt;li&gt;AOF low level semantics is generally more sane, and especially when used in slaves. This is an uncommon use case, and the misbehavior was subtle, but now the implementation and behavior is definitely more sane.&lt;/li&gt;

&lt;li&gt;Clients max output buffer soft and hard limits. You can specifiy different limits for different classes of clients (normal,pubsub,slave).&lt;/li&gt;

&lt;li&gt;AOF is now able to rewrite aggregate data types using variadic commands, often producing an AOF that is faster to save, load, and is smaller in size. So what in 2.4 used to be N &lt;b&gt;LPUSH&lt;/b&gt; calls to reconstruct a list of N items, now it is N/64, because variadic &lt;b&gt;LPUSH&lt;/b&gt; with (up to) 64 arguments was used.&lt;/li&gt;

&lt;li&gt;Every redis.conf directive is now accepted as a command line option for the redis-server binary, with the same name and number of arguments. You can write ./redis-server --slaveof 127.0.0.1 6379 --port 6380, and in general pass any possible option, exactly like it is specified in redis.conf.&lt;/li&gt;

&lt;li&gt;Hash table seed randomization for protection against collisions attacks.&lt;/li&gt;

&lt;li&gt;Performances improved when writing large objects to Redis.&lt;/li&gt;

&lt;li&gt;Significant parts of the core refactored or rewritten. New internal APIs and core changes allowed to develop Redis Cluster on top of the new code, however for 2.6 all the cluster code was removed, and will be released with Redis 3.0 when it is more complete and stable.&lt;/li&gt;

&lt;li&gt;Redis ASCII art logo added at startup. This is where our major efforts went in the latest months.&lt;/li&gt;

&lt;li&gt;redis-benchmark improvements: ability to run selected tests, CSV output, faster, better help, and support for pipelining giving awesome results. More about this later in this blog post.&lt;/li&gt;

&lt;li&gt;redis-cli improvements: --eval for comfortable development of Lua scripts.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;SHUTDOWN&lt;/b&gt; now supports two optional arguments: &lt;b&gt;SAVE&lt;/b&gt; and &lt;b&gt;NOSAVE&lt;/b&gt;. They respectively force to save an RDB when no RDB persistence is configured, or to avoid to save when RDB persistence is configured.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;INFO&lt;/b&gt; output split into sections, the command is now able to just show specific sections.&lt;/li&gt;

&lt;li&gt;New statistics about how many time a command was called, and how much execution time it used (&lt;b&gt;INFO commandstats&lt;/b&gt;).&lt;/li&gt;

&lt;li&gt;More predictable &lt;b&gt;SORT&lt;/b&gt; behavior in edge cases.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;INCRBYFLOAT&lt;/b&gt; and &lt;b&gt;HINCRBYFLOAT&lt;/b&gt; commands, for atomic fast float counters.&lt;/li&gt;

&lt;li&gt;Virtual Memory was removed from the code (was already deprecated in 2.4)&lt;/li&gt;

&lt;li&gt;Much better bug report on crash, with stack trace, register dump, state of the client causing the crash, command vector and so forth. This was in part back ported to 2.4 releases.&lt;/li&gt;

&lt;/ul&gt;
There are two features still to merge, but already implemented into branches:
&lt;ul&gt;&lt;li&gt;Small hashes now implemented using ziplists instead of zipmaps, for better performances when there are more than 253 fields but less than the number of fields needed to convert the zipmap into a full hash table.&lt;/li&gt;

&lt;li&gt;More coherent behavior of list blocking commands in presence of non trivial conditions and blocked clients.&lt;/li&gt;

&lt;/ul&gt;
&lt;h3&gt;And new internals...&lt;/h3&gt;
Redis 2.6 offers the above new features, but another interesting fact is that it is also a spinoff of the &lt;i&gt;unstable&lt;/i&gt; branch, the one that is going to be Redis 3.0 soon or later. Instead 2.4 was a spinoff of Redis 2.2 code base. This means that we now are working with a better code base that makes implementing certain features simpler.
&lt;br/&gt;&lt;br/&gt;

It will also make it much easier for us in the future to backport stuff from the unstable branch to 2.6. This means that we can either backport stuff from time to time into 2.6 releases, or to create a 2.8 branch to merge all the interesting features that are already stable to create an intermediate release in a few months from now.
&lt;h3&gt;Redis benchmarks with pipelining support, impressive numbers, and stupid benchmarks&lt;/h3&gt;
After looking to the next &lt;a href="http://hyperdex.org/performance/"&gt;set of benchmarks&lt;/a&gt; that were actually measuring everything but actual DB performances, I decided to go ahead and implement pipelining in the Redis-benchmark tool to show some good numbers.
&lt;br/&gt;&lt;br/&gt;

Redis-benchmark used to create 50 clients, and perform something like: send request, wait for reply, send request, wait for reply, with all those 50 clients. However Redis supports &lt;a href="http://redis.io/topics/pipelining"&gt;pipelining&lt;/a&gt;, that is, if you have N queries to do where you don't need the reply of the previous to perform the next request, you can send N queries at once to Redis, and then read all the replies. This dramatically improve performances because there are less syscall required, less context switches, less TCP packets, and so forth.
&lt;br/&gt;&lt;br/&gt;

Most real world Redis applications use pipelining, often you need to do things like paginate a list of objets, so you do LRANGE to get the IDs, and then a pipeling with all the GET or HGETALL and so forth. Or you want to write an object on the database and update it's position into a sorted set.
&lt;br/&gt;&lt;br/&gt;

But still redis-benchmark was not able to test pipelining, so when we saw &lt;i&gt;Redis can do 150k requests per second in entry level hardware&lt;/i&gt; we were actually saying &lt;i&gt;... if you never use pipelining at all&lt;/i&gt;. But how it can perform if you can use it?
&lt;br/&gt;&lt;br/&gt;

Let's check with pipelining, using my glorious MBA 11&amp;quot; running OSX:
&lt;pre class="code"&gt;
$ redis-benchmark -P 64 -q -n 1000000
PING_INLINE: 540540.56 requests per second
PING_BULK: 636942.62 requests per second
SET: 301204.81 requests per second
GET: 430848.75 requests per second
INCR: 341530.06 requests per second
LPUSH: 305623.47 requests per second
LPOP: 296120.81 requests per second
SADD: 313774.72 requests per second
SPOP: 418060.22 requests per second
&lt;/pre&gt;
Wow, 430k GETS/sec requests per second with a macbook air, and finally with this new benchmark not everything is the same, PING is faster than GET that is faster than SET, and so forth.
This also means: more ability to optimize commands in our side.
&lt;br/&gt;&lt;br/&gt;

If you test this into a Xeon, you get 650k GETs easily, or other impressive numbers even reducing the pipeling from 64 to 32 or 16.
&lt;br/&gt;&lt;br/&gt;

Now to show how benchmarks can easily be turned into everything you want, we have this numbers of Redis performing 500k operations per second, per core, but now in the web site of HyperDex I read: &lt;i&gt;With 32 servers and sufficient clients to create a heavy workload, HyperDex is able to sustain 3.2 million ops/s.&lt;/i&gt;.
&lt;br/&gt;&lt;br/&gt;

Hey dudes, I can do 1/6th of the ops/sec you do with 32 servers using just 1 core of my Xeon desktop. What this means? &lt;b&gt;nothing&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

Long story short, don't show benchmarks unless you have a very good methodology explained in the web site, and your methodology makes sense, otherwise it is just marketing that does not provide any value to the user.
&lt;br/&gt;&lt;br/&gt;

A better way to do benchmarks is to isolate a common real-world problem, and write a real world implementation of this problem using different databases, in the idiomatic way for every database, mixing both writes and reads in the same benchmark. Then test the different implementations with many simultaneous clients, with millions of objects.
&lt;br/&gt;&lt;br/&gt;

Those tests, performed independently by smart programmers, is what is making Redis very popular across guys that have serious requests per second, and I hope that 2.6 with built-in server side scripting will allow them to get more out of Redis.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;41455 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 16:00:03 | &lt;a href="http://antirez.com/post/redis-2.6-is-near.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/redis-2.6-is-near.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=249"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/IjpNQExovRo" height="1" width="1"/&gt;</description>
   <dc:date>2012-02-24T16:00:03+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/redis-2.6-is-near.html</feedburner:origLink></item>
  <item>
   <title>How my todo list works</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/vr3uULaTLTE/my-todo-list.html</link>
   <guid isPermaLink="false">http://antirez.com/post/248</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;There is a constant in my life for 335 days a year (let's assume that 30 days of vacation are a bit more relaxing): I've tons of things to do every day. Most are about my work, many are about handling home, health, family, and so forth.
&lt;br/&gt;&lt;br/&gt;

I'm not the kind of guy that is good at remembering that I need to do this and that, usually I wake up, do a breakfast, sit in front of my computer and start to write code, read issues, reply to emails, and so forth. So to get pending things done I absolutely need some form of todo list, and in the course of the years I tried different things.
&lt;h3&gt;Paper and pen&lt;/h3&gt;
One of the systems that I used most was just a piece of A4 paper and a pen. I did this for years, and it's not too bad, but this system eventually does not scale: writing by hand is time consuming so you end writing a bit too little information to complete or remember the task well enough. Deletion forces you to rewrite the items on a new piece of paper often. Paper gets easily lost, and you don't have it with you if you move often (I always change work place moving from home to a small office before lunch).
&lt;h3&gt;Computer notes&lt;/h3&gt;
Eventually I ended trying different solutions using the computer and the keyboard: from a todo service I coded from myself, to &amp;quot;Remember the milk&amp;quot;, and everything in the middle.
All those systems worked for a few weeks, but I always ended with some kind of mess, too many different &amp;quot;lists&amp;quot;, and accessing a web site to look or modify my TODO list was boring.
&lt;br/&gt;&lt;br/&gt;

However I discovered that the biggest problem was not the web service, how it was implemented, or how fast it was, the biggest of the problems was... myself. More specifically the way I used my TODO list.
&lt;br/&gt;&lt;br/&gt;

I finally found a system that works great for me, and is working great since months. So at this point I want to share it with you. I'll not try to get into details about why it works and so forth, I'll just describe it. If you are looking for an alternative for your todo list keeping business, try it and check yourself.
&lt;h3&gt;My system&lt;/h3&gt;
&lt;ul&gt;&lt;li&gt;I write my TODO list using &lt;a href="http://www.evernote.com/"&gt;Evernote&lt;/a&gt;, in a single note called TODO. Evernote is great for two reasons in this context: it's fast because it is a resident program, but gets synched, so you have your TODO list in all your computers, in your phone, and so forth.&lt;/li&gt;

&lt;li&gt;The note is split into three sub parts: daily, weekly, monthly.&lt;/li&gt;

&lt;li&gt;The last two items in the daily list are: &amp;quot;read the weekly list if it is monday&amp;quot;, &amp;quot;read the monthly list if it is the first day of the month&amp;quot;.&lt;/li&gt;

&lt;li&gt;Every time you need to insert a new todo list item, just insert it at the end of the appropriate sub-list, daily, weekly, or monthly, depending on the urgency you have to do this, or simply where do you think it is more appropriate for the item to stay.&lt;/li&gt;

&lt;li&gt;READ THE DAILY LIST EVERY DAY once you sit in front of the computer &amp;lt;- this is the core of the system. Don't do nothing before. No emails, no news sites, nothing. Read the list.&lt;/li&gt;

&lt;li&gt;When appropriate, move items between sublists. For instance if you are reading the monthly list and something is urgent now, move it in the daily part of the list.&lt;/li&gt;

&lt;li&gt;When needed, remove items, because you already completed the task or because it is no longer relevant or a priority.&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

That's all. You don't need to do at least one item or alike per day, as long as you keep reading the list every day. It's up to you when to act, how much you act, this system is not designed to fix your ability to get things done, is designed just to fix the schedule, and to keep you informed with little efforts about what you should do today.
&lt;br/&gt;&lt;br/&gt;

It's working well for me and I hope it works well for you as well. If you find ways to improve this system I would love to hear.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;37330 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 13:49:00 | &lt;a href="http://antirez.com/post/my-todo-list.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/my-todo-list.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=248"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/vr3uULaTLTE" height="1" width="1"/&gt;</description>
   <dc:date>2012-02-07T13:49:00+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/my-todo-list.html</feedburner:origLink></item>
  <item>
   <title>Redis Moka Awards 2011</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/-uszrN6AsH0/redis-moka-awards-2011.html</link>
   <guid isPermaLink="false">http://antirez.com/post/246</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;The Redis community is something special for me... it is full of great guys trying to participate in the development providing ideas, help, fixes and support.
&lt;br/&gt;&lt;br/&gt;

However there are a few users that clearly go the extra mile helping the project in a special way: by making the code more robust. Doing heroic debugging sessions, reviewing the code I commit pointing at bugs, suggesting the adoption of some new idea or library to make Redis better, that turns out to be the right idea.
&lt;br/&gt;&lt;br/&gt;

Don't get me wrong, all the other efforts are very appreciated, like the work done in the Redis Group trying to help newcomers or to come up with a solid design to attack a new problem with Redis, but I feel a need to recognize in a special way the efforts having a direct effect in the code quality.
&lt;br/&gt;&lt;br/&gt;

I planned to recognize this efforts with a special Redis t-shirt, but after many months I still don't have a good design, and after all... the t-shirt is a bit too obvious. So recently I had a new idea. After all coders need to say awake to help with Redis and they usually love coffee, and Italy is good at coffee, so why not sending Moka pots as an award? And here we are.
&lt;br/&gt;&lt;br/&gt;

&lt;img class="leftimage" src="http://antirez.com/blogdata/246/Moka.jpg" alt="Bialetti Moka" title="Bialetti Moka"/&gt;
&lt;br/&gt;&lt;br/&gt;

The winners of this year can select between the classic Bialetti Moka pot and the induction variant, depending on the cooktop you have. If you already have one you can convert the price into the equivalent amount of good coffee for your Moka (this is also useful in case the price is awarded multiple times to the same user).
&lt;br/&gt;&lt;br/&gt;

This is the &lt;a href="http://en.wikipedia.org/wiki/Moka_pot"&gt;same Moka pot&lt;/a&gt; that millions of italians use every day to make coffee, they work great, are reliable and last literally for decades.
&lt;br/&gt;&lt;br/&gt;

And... the winners of this year are... :) (in alphabetical order):
&lt;ul&gt;&lt;li&gt;&lt;a href="http://twitter.com/didier_06"&gt;Didier Spezia&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://twitter.com/d1ApRiL"&gt;Dominik Herbst&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://twitter.com/jzawodn"&gt;Jeremy Zawodny&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://twitter.com/tfengjun"&gt;Jokea&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://twitter.com/anydot"&gt;Premysl Hrub&amp;yacute;&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://twitter.com/HeyChinaski"&gt;Tom Martin&lt;/a&gt;&lt;/li&gt;

&lt;/ul&gt;
Thank you guys! Please send me the shipping address and the kind of cookpit you have (induction or normal). Also please send me info about the size you want, I recommend the three cups one, but if you plan to always be alone this will waste a lot of coffee and the one cup is better.
&lt;br/&gt;&lt;br/&gt;

p.s. please send me your address before 15th of January if you can!&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;18327 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 14:55:23 | &lt;a href="http://antirez.com/post/redis-moka-awards-2011.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/redis-moka-awards-2011.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=246"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/-uszrN6AsH0" height="1" width="1"/&gt;</description>
   <dc:date>2011-12-28T14:55:23+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/redis-moka-awards-2011.html</feedburner:origLink></item>
  <item>
   <title>Testing the new Redis AOF rewrite</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/VkdtoxZT35E/new-aof-variadic-rewrite.html</link>
   <guid isPermaLink="false">http://antirez.com/post/245</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Redis 2.4 introduced variadic versions of many Redis commands, including SADD, ZADD, LPUSH, RPUSH. HMSET was already available in Redis 2.2. So for every Redis data type, we know have a way to add multiple items in a single command, for example:
&lt;ul&gt;&lt;li&gt;LPUSH mylist item1 item2 item3&lt;/li&gt;

&lt;li&gt;SADD myset A B C&lt;/li&gt;

&lt;li&gt;ZADD myzset 1 first 2 second 3 third&lt;/li&gt;

&lt;li&gt;HMSET myhash name foo surname bar&lt;/li&gt;

&lt;/ul&gt;
However this feature was not used when rewriting the AOF log (operation now performed automatically since Redis 2.4, but that the user can still trigger using the BGREWRITEAOF, even if the server is not configured to use AOF).
&lt;br/&gt;&lt;br/&gt;

The AOF was still generated using a single command for every element inside an aggregate data type. For instance a three elements list required three different calls to LPUSH in the rewritten AOF file.
&lt;br/&gt;&lt;br/&gt;

Finally Redis 2.6 (that will be forked from the current unstable branch, just removing the cluster code) is introducing the use of variadic commands for AOF log rewriting. The result is that both rewriting and loading an AOF file containing aggregate types and not just plain key-&amp;gt;string pairs will be much faster.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;How much faster?&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

We'll start checking the speed gain that can be obtained in a real world dataset with very few keys containing aggregate data types, that is, the database of &lt;a href="http://llooggg.com"&gt;lloogg.com&lt;/a&gt;.
Since lloogg was designed in the early stage of Redis development where Hashes where still not available, it stores a lot of user counters as separated keys, so there are a lot of keys just containing a string (huge waste of memory, but I've still to find the time to modify the code). However there is around a 5% of sorted sets. This is an excerpt &lt;a href="https://gist.github.com/1471636"&gt;from the full output&lt;/a&gt; of &lt;a href="http://github.com/antirez/redis-sampler"&gt;Redis Sampler&lt;/a&gt; against the lloogg live DB.
&lt;pre class="code"&gt;
TYPES
=====
 string: 95480 (95.48%)   zset: 4469 (4.47%)       list: 48 (0.05%)        
 set: 3 (0.00%)
&lt;/pre&gt;
As you can see this is far from the ideal dataset to make the new AOF changes to look cool, still the result is significant:
&lt;ul&gt;&lt;li&gt;Time needed to rewrite the AOF log, and size of the resulting file with the OLD rewrite: about 12 seconds, 569 MB&lt;/li&gt;

&lt;li&gt;Time needed to rewrite the AOF log, and size of the resulting file with the NEW rewrite: about 9 seconds, 479 MB&lt;/li&gt;

&lt;li&gt;Time to BGSAVE, for reference: about 9 seconds, file size: 344 MB.&lt;/li&gt;

&lt;/ul&gt;
Now let's check the loading time of all the three options:
&lt;ul&gt;&lt;li&gt;Time to load the RDB: 7.156 seconds&lt;/li&gt;

&lt;li&gt;Time to load the OLD AOF: 15.232 seconds&lt;/li&gt;

&lt;li&gt;Time to load the NEW AOF: 12.589 seconds&lt;/li&gt;

&lt;/ul&gt;
I think this is very good news if you consider this database contained just a small number of keys. Now what happens for users that have a lot of lists, hashes, sets, sorted sets?
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Bigger gains&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

To test the new code with a database that better represents an use case where most of the keys are aggregate values I created a dataset with 1 million of hashes containing 16 fiels each. Fields are reasonably sized, like Field1: Value1, Field2: Value2, and so forth.
&lt;br/&gt;&lt;br/&gt;

I used this Lua script to create the dataset:
&lt;pre class="code"&gt;
local i, j
for i=1,1000000 do
    for j=1,16 do
        redis.call('hmset','key'..i,'field:'..j,'value:'..j)
    end
end
return {ok=&amp;quot;DONE&amp;quot;}
&lt;/pre&gt;
(Note, if you use the latest unstable branch you can run it using: redis-cli --eval /tmp/script.lua)
&lt;br/&gt;&lt;br/&gt;

Now the same metrics as above but against this new dataset:
&lt;ul&gt;&lt;li&gt;Time needed to rewrite the AOF log, and size of the resulting file with the OLD rewrite: about 17 seconds, 851 MB&lt;/li&gt;

&lt;li&gt;Time needed to rewrite the AOF log, and size of the resulting file with the NEW rewrite: about 10 seconds, 440 MB&lt;/li&gt;

&lt;li&gt;Time to BGSAVE, for reference: about 4 seconds, file size: 158 MB.&lt;/li&gt;

&lt;/ul&gt;
Now let's check the loading time of all the three options:
&lt;ul&gt;&lt;li&gt;Time to load the RDB: 1.888 seconds&lt;/li&gt;

&lt;li&gt;Time to load the OLD AOF: 31.946 seconds&lt;/li&gt;

&lt;li&gt;Time to load the NEW AOF: 17.512 seconds&lt;/li&gt;

&lt;/ul&gt;
As you can see now both the AOF rewriting and loading time is reduced to almost an half of the time required with Redis 2.4. However you can still see an amazing 1.888 seconds in the time needed to load the RDB. Why?
&lt;br/&gt;&lt;br/&gt;

Because since Redis 2.4 BGSAVE directly outputs the &lt;i&gt;encoded&lt;/i&gt; version of the value, if the value is encoded as a ziplist, an intset or a zipmap. This is a huge advantage, both while loading and saving the database, that could be easily implemented in the AOF rewrite. However I'm currently not doing it as probably in the next versions of Redis we'll have an option to rewrite the AOF log in RDB format itself... so with the unification of the two systems a lot of problems will be reduced.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;18725 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 15:55:04 | &lt;a href="http://antirez.com/post/new-aof-variadic-rewrite.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/new-aof-variadic-rewrite.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=245"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/VkdtoxZT35E" height="1" width="1"/&gt;</description>
   <dc:date>2011-12-13T15:55:04+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/new-aof-variadic-rewrite.html</feedburner:origLink></item>
  <item>
   <title>Redis for win32 and the Microsoft patch</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/Ja2SYgbVuZ4/redis-win32-msft-patch.html</link>
   <guid isPermaLink="false">http://antirez.com/post/244</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;A few days ago &lt;a href="https://gist.github.com/1439660"&gt;Microsoft released a patch&lt;/a&gt; to compile Redis under win32. The team working on this project used the &lt;a href="https://github.com/dmajkic/redis/"&gt;already existing win32/win64 port&lt;/a&gt; as a reference, and used the &lt;a href="https://github.com/joyent/libuv"&gt;libuv&lt;/a&gt; library that powers the node.js project.
&lt;br/&gt;&lt;br/&gt;

Yesterday the &lt;a href="http://news.ycombinator.com/item?id=3330847"&gt;story hit Hacker News&lt;/a&gt; as I discovered a few hours later (I was away from the computer since December 8th is an holiday here in Italy), and as usually when you mix Microsoft, Open Source, and a news site, the result is some friction inside comments.
&lt;br/&gt;&lt;br/&gt;

I decided to write this blog post to clarify my opinion on the matter, and it is funny how this blog post will delay another that I've already written and I was ready to publish today, called &amp;quot;We are programmers, we need a revolution&amp;quot;, that in some way is related to what I want for the Redis future... but you'll see that post in a few days if you are interested. Now back to the win32 patch.
&lt;h3&gt;How good the patch is&lt;/h3&gt;
What the Microsoft team working at the patch did was to port Redis to libuv that is, mainly, a library for evented programming based on libev, but cross platform. Actually libuv is ending as a container of many useful programming tools to interact with the operating system that are usually different between POSIX and WIN32. It was developed for the node.js project in order to, eventually, contain every difference between POSIX and WIN32 inside a unique library.
&lt;br/&gt;&lt;br/&gt;

Persistence was not properly addressed yet, but apparently the next version of the patch should handle it better. However currently the persistence is basically unusable since it blocks the main thread while Redis is saving a snapshot. There are also intermitting problems passing the test, it is not clear to the authors of the patch if it is due to the testing engine itself or to actual issues in the patch.
&lt;br/&gt;&lt;br/&gt;

But in short: the patch is exactly as functional as it was the native win32/win64 port that was already provided by dmajkic: a port good enough to develop under Windows without the need of running Redis under a virtualized Linux install (not a big effort btw, in my opinion), but that was not good enough to use Redis in production under win32 systems.
&lt;br/&gt;&lt;br/&gt;

It is worth to note that the native port operated by dmajkic and not using libuv has the remarkable advantage of just adding the minimal set of changes in order to port Redis to win32, it implements a new win32 backend in the event library we use, ae.c, that proved to be a very stable and performant component in our stack in the latest two years. So it was a lot more compact, and I see this as an advantage.
&lt;h3&gt;Patches or pull requests?&lt;/h3&gt;
Microsoft was criticized for not sending a pull request, but a patch... I think here the point is that they provided some code: send a pull request, send a patch, or an email, it's the same and IMHO there is very little point in this formal things. But I think that in this specific instance the patch into a gist was the right way to contribute, since the patch was huge and since in the past I stated many times that I don't want to add win32 support directly in the Redis main project, but I'll favor the creation of a satellite &amp;quot;Redis-win32&amp;quot; project that is separated from the main project.
&lt;br/&gt;&lt;br/&gt;

Also note that in our contributing guidelines we state that it is better to talk with me or Pieter before going forward with the development of significant code, after all I say no many times, so why wasting efforts? But Microsoft did not informed me simply because this was a project they wanted to do anyway I guess, so sending a patch is appropriate even more. However I was informed by email about the fact a patch would be published in 24 hours, and I appreciated it.
&lt;br/&gt;&lt;br/&gt;

In short: Microsoft behavior as an OSS contributor in this case is fine from my point of view.
&lt;h3&gt;Why I'll not accept this patch&lt;/h3&gt;
I don't think Redis running under win32 is a very important feature. It is cool to have a win32 port that can be used for testing, as we had before, and as we have in a different implementation thanks to the Microsoft patch, so developers using Windows can easily test Redis and develop their projects. But what is the point in providing a &lt;i&gt;production quality win32 port&lt;/i&gt;?
&lt;br/&gt;&lt;br/&gt;

I think that Linux completely won as a platform to deploy software, and even if you want to run your code under win32 systems what's wrong about installing Linux boxes to run Redis? For instance Stack Overflow runs their systems in a mix of Windows and Linux boxes, they have no troubles into using Linux to run Redis.
&lt;br/&gt;&lt;br/&gt;

Instead handling a win32 port directly in the main project means to delay everything else for the little gain of having, eventually, a production ready win32 port of Redis. In Redis we use a lot subtle things about the operating system, from copy on write to the time needed to fork a process, to the way operating systems overcommit memory. If we add a new platform, in the future, exploiting the OS to do the best for our users will get harder and harder. It is completely not the case.
&lt;br/&gt;&lt;br/&gt;

However &lt;b&gt;I like the idea of a win32 port as a separated project&lt;/b&gt;, with a different set of developers, and not officially supported by the main project. That is just added value, and can provide a more reasonable port for development, or even for production at some point, without impacting the main project: so fork the code, and have fun. I'll help if needed, ask me questions, let's collaborate on general ideas. I'll also put a page about the win32 port in the redis.io site so that users will be aware of the port.
&lt;h3&gt;Let's merge just for libuv?&lt;/h3&gt;
In the latest days I also heard that is a good idea to switch to libuv in general, win32 port or not. I beg to differ.
&lt;br/&gt;&lt;br/&gt;

To start the node.js project is using libuv since they &lt;i&gt;are interested&lt;/i&gt; in multi platform code able to run under Win32 and POSIX. Otherwise libuv from the point of view of what Redis uses of an evented library does not offer new interesting things (we just use file events and timers with our ae.c library). If there is some interesting abstraction that we'll need to use in the future, like streams, we'll implement it in ae.c, but as far as I can tell we will NOT have that need.
&lt;br/&gt;&lt;br/&gt;

Also, I've an argument that for me is truly important:
&lt;pre class="code"&gt;
$ wc -l ae*.[ch]
     397 ae.c
     118 ae.h
      94 ae_epoll.c
      96 ae_kqueue.c
      72 ae_select.c
     777 total
&lt;/pre&gt;
What the above means is: ability to resolve any possible bug with our events or timers in no time, instead of trying to understand a much bigger multi platform code.
&lt;br/&gt;&lt;br/&gt;

I avoid dependencies, but when dependencies are needed I don't have problems with them.
Redis includes a full Lua interpreter and the jemalloc allocator. it was idiotic to provide my implementation of a programming language, or to rewrite an allocator that works as well as jemalloc works. When dependencies provide a lot of added value it is worth adding them. Instead when you need to switch to something bigger and more complex without any gain, why to do it?
&lt;br/&gt;&lt;br/&gt;

Ah, and about the gain being some kind of feature only exciting for we code nerds and having zero effects on how a system works, please read my next article in a few days, we are programmers and we need a revolution.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;38074 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 10:43:24 | &lt;a href="http://antirez.com/post/redis-win32-msft-patch.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/redis-win32-msft-patch.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=244"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/Ja2SYgbVuZ4" height="1" width="1"/&gt;</description>
   <dc:date>2011-12-09T10:43:24+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/redis-win32-msft-patch.html</feedburner:origLink></item>
  <item>
   <title>Short term Redis plans</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/wLWFNIKHG3k/short-term-redis-plans.html</link>
   <guid isPermaLink="false">http://antirez.com/post/242</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Users often ask me what is the Redis development roadmap, so it is probably time to write a blog post about our short/mid term plans, with the most important points.
&lt;br/&gt;&lt;br/&gt;

There are two major features that we are pushing forward: scripting and cluster, so let's start from this two.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Scripting&lt;/h3&gt;
Scripting implements Lua scripting support for Redis. We &lt;a href="http://redis.io/commands/eval"&gt;already have detailed documentation for this feature&lt;/a&gt; that describes what is already implemented and what will likely go inside the first stable release of Redis featuring scripting. The only part you should consider outdated both in the doc and the implementation is how scripts running for too much time are handled (this topic was extensively covered in the Redis google group).
&lt;br/&gt;&lt;br/&gt;

Redis scripting will appear in Redis 2.6, and I'm trying hard to ship Redis 2.6 RC1 for the end of this year (2011). There are many features that are planned for 2.6 as well, I'm not sure I'll be able to address everything but at some point I think I'll try to do a time-driven release for 2.6 and just put inside scripting and everything else that is already implemented/stable.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Redis Cluster&lt;/h3&gt;
Redis cluster is definitely the next big thing, and you can read our &lt;a href="http://redis.io/topics/cluster-spec"&gt;Redis Cluster draft specification&lt;/a&gt; to get an idea about what it can do and what not. But in short Redis Cluster is a distributed implementation of a &lt;b&gt;subset of Redis standalone&lt;/b&gt;.
Not all commands will be supported, especially we don't support things like multi-key operations.
In general we are just implementing the subset of Redis that we are sure can be made working in a solid way in a cluster setup, with predictable behaviors.
&lt;br/&gt;&lt;br/&gt;

Redis cluster will stress consistency in favor of ability to resist to netsplits and failures in general. Basically it will tollerate well a few instances going down, but will not survive to big netsplits like other eventually consistent systems are able to do.
&lt;br/&gt;&lt;br/&gt;

Redis cluster is as important as scripting, but will be delayed to Redis 3.0 since scripting it is much simpler to implement and in our opinion of almost equal importance for most users (if not more) so we prioritized scripting first.
&lt;br/&gt;&lt;br/&gt;

The current status of Redis cluster is that you can play with it already but not everything in the specification is implemented. It will take a few more months in order to reach beta, and then we'll work on the details in order to ship something solid. We'll try hard to resist to ship a system that is not mature just to say we are already cluster-ready. Redis is fortunately very useful already so we want to make sure to ship the cluster version only when it will likely resolve problems instead of creating new ones.
&lt;br/&gt;&lt;br/&gt;

The good news is that the Redis Cluster design is particularly simple in almost all the aspects, this helps our hope to ship a good system in a reasonable amount of time.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Replication improvements&lt;/h3&gt;
As part of the work we are doing for Redis Cluster we'll need to improve replication. This part of Redis almost always received improvements in the course of releases, but with Redis Cluster we need an even better one. For instance it is planned to avoid a full resync every time the link goes down if the downtime was reasonable and the differences can be accumulated. In short when the slave disconnects the master does not kill the client representation of the slave, but continues sending data (that gets bufferized).
When the slave reconnects we recognized it form a new per-instance ID that always changes after a restart (or after a SLAVEOF NO ONE command), and perform the incremental resync.
&lt;br/&gt;&lt;br/&gt;

This changes will either be shipped with Redis 3.0 or with a future version (the current replication is not optimal but probably already good enough for the first Redis Cluster release, so it is not clear if we'll be able to fix it before or after the first release).
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Persistence improvements&lt;/h3&gt;
Currently we have two persistence modes: append only files and RDB persistence.
Both have different tradeoffs. It is not clear what we'll do about it but it is possible that we will either unify the two models and/or improve AOF a lot so that it does no longer need the online rewrite process in most use cases (but the log can be rewritten by an external process or simply a Redis thread).
&lt;br/&gt;&lt;br/&gt;

Everything is very hypothetical in this area for now, but there are al lot of ideas that we accumulated in the latest years that are wroth to experiment with for sure.
&lt;br/&gt;&lt;br/&gt;

We want also work both in the communication (most users don't understand that Redis with both AOF and RDB enabled is &lt;i&gt;very&lt;/i&gt; durable already, and this is the setup we suggest) and the implementation to make sure that Redis AOF can be a very durable solution, as durable as the best SQL databases out there.
&lt;br/&gt;&lt;br/&gt;

This is definitely a post-cluster stuff.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;More introspection&lt;/h3&gt;
There is a plan to use Pub/Sub in order to communicate events happening inside Redis, like a key that expired, clients connecting / disconnecting, operations performed against keys.
We'll probably allow the user to script this feature with Lua so that you can, for instance, push all the keys expired inside a list as well, or other things that can't be reliably done with clients and Pub/Sub since the client is not guaranteed to get all the messages (it can get disconnected for some reason).
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;High resolution expires&lt;/h3&gt;
I'm working at it already, we'll have high precision expires in Redis 2.6. So you can set an expire just for a few milliseconds for a given key. The current resolution is 1 second that is ok for most applications but not for all.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Performances improvements when reading/writing big objects&lt;/h3&gt;
If you check the 'slowset' branch there is work in this direction  already. As part of this work I'm creating a speed regression test, since we really lack it. Note: with big objects I mean sets/gets in the range of 100k or 1 MB per element. Redis performs very well already with reads/writes of a few Kb values.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Many other smaller things&lt;/h3&gt;
See the &lt;a href="https://github.com/antirez/redis/issues?labels=feature+request&amp;amp;sort=created&amp;amp;direction=desc&amp;amp;state=open&amp;amp;page=1"&gt;list of issues filtered by &amp;quot;new feature&amp;quot; tag on github&lt;/a&gt; to get an idea about the smallest things that are going to be implemented.
&lt;br/&gt;&lt;br/&gt;

I hope this helped please ask me questions in the blog comments if you want more details.  I'll reply tomorrow morning likely since here is already late ;)&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;28486 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 21:39:23 | &lt;a href="http://antirez.com/post/short-term-redis-plans.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/short-term-redis-plans.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=242"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/wLWFNIKHG3k" height="1" width="1"/&gt;</description>
   <dc:date>2011-11-07T21:39:23+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/short-term-redis-plans.html</feedburner:origLink></item>
  <item>
   <title>On cryptography and dogmas</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/_ETtJGfnIU0/crypto-dogmas.html</link>
   <guid isPermaLink="false">http://antirez.com/post/241</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Yesterday I finally released the initial public release of &lt;a href="http://github.com/antirez/lamernews"&gt;Lamer News&lt;/a&gt;, that is both a real world Redis programming example in the form of an Hacker News style site, and a project to run a completely independent (with a consortium) programming news site.
&lt;br/&gt;&lt;br/&gt;

The project was well received, and was in the top page of HN for some time. Thanks for providing your feedbacks.
&lt;br/&gt;&lt;br/&gt;

After the release of the code I got a few requests about changing the hash function I was using in order to hash passwords in the database:
&lt;pre class="code"&gt;
# Turn the password into an hashed one, using
# SHA1(salt|password).
def hash_password(password)
    Digest::SHA1.hexdigest(PasswordSalt+password)
end
&lt;/pre&gt;
The above code uses SHA1 with a Salt. As others pointed out this is not the safest pick, since there are ways to compute SHA1 very fast. After some time a chorus of people started twitting and commenting a single sentece: &amp;quot;Use bcrypt&amp;quot;. I proposed using nested SHA1 in a loop, in order to avoid adding more dependencies in the code (if you check the README one of the goals is to take the code simple and depending on just a few gems). And at this point it happened: the crypto dogma. No way to reason about crypto primitives and their possible applications and combinations, but just &amp;quot;use bcrypt&amp;quot;.
In the eyes of this crew programmers are just stupid drones applying guidelines, that can't in any way reason about cryptography. But I'll talk more about that later...
&lt;br/&gt;&lt;br/&gt;

For now let's do a step backward... and show what the original problem is with all this, and how much insecure the original code is.
&lt;h3&gt;The problem&lt;/h3&gt;
The problem is pretty easy to understand, but it is worth to be explained in details.
In order to avoid storing passwords in cleartext into the database is common practice to hash passwords. So:
&lt;pre class="code"&gt;
HP (hashed password) = HASH (password)
&lt;/pre&gt;
When the software needs to perform the user authentication it receives the plaintext password, hashes it again, and verifies that it matches the one in the database. If so the user is authenticated.
&lt;br/&gt;&lt;br/&gt;

However what happens if an attacker, let's call it Eve, will steal the database and the passwords
are leaked?
Eve has a number of hashed passwords, let's call them HP1, HP2, HP3, ...
Her goal is to find an attack such that it can turn back HP into P.
&lt;br/&gt;&lt;br/&gt;

The hashing algorithm HASH is public, so the first thing Eve can do is to apply HASH to a dictionary composed of common words and check if HP matches the HASH(common_word). If there is a match the original password was found. Note that there are not so many words in the English dictionary, so this attack is very easy to perform, and super fast.
&lt;br/&gt;&lt;br/&gt;

But maybe our user, Bob, picked a password that is not in the dictionary, but is neither particularly long.
&lt;br/&gt;&lt;br/&gt;

Eve can generate all the combinations up to 6 chars passwords and hash them with HASH, trying to find a match. This attack is computationally harder. If the password is a completely binary string, let's say of six characters, there are 256^6 passwords, that is, 281474976710656.
&lt;br/&gt;&lt;br/&gt;

If our attacker can hash one billion passwords per second (it is possible with modern GPUs without spending a fortune on it) cracking this password takes:
&lt;pre class="code"&gt;
281474976710656 / 1000000000 = 281474 seconds
&lt;/pre&gt;
This is just... three days, so one day and half in the average case. Not good! it's too easy to crack. There is another problem, an user will hardly use all the 256 characters with equal probability. Let's consider the worst case of it just using 26 low case letters without number nor symbols. This time let's consider an 8 characters password.
&lt;br/&gt;&lt;br/&gt;

There are 26 ^ 8 possible passwords, that is: 208827064576 possible passwords.
This time our password can be cracked in 208 seconds (half that time in average).
&lt;br/&gt;&lt;br/&gt;

This is clearly not good. How long should be a 26 letters alphabet password to be unaccessible for the attacker able to compute HASH 1 billion times per second?
&lt;br/&gt;&lt;br/&gt;

A 14 characters  password will resist 1024 years on average to be cracked.
For a 16 characters password our attacker needs 1382824 years.
&lt;br/&gt;&lt;br/&gt;

Just 12 chars will resist for one year and half in average, definitely too little for most applications.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;So is SHA1 secure for hashing passwords?&lt;/b&gt; Yes if the user picked a strong password of 14 chars or more. Otherwise not very secure. It all depends on the length of the password, and guess what, users have a bad habit of picking bad and short passwords.
&lt;h3&gt;It is worse than that&lt;/h3&gt;
Unfortunately it is worse than that. For instance the attack against our 12 chars password can be made instantaneous in an easy way: using three years to compute a table of all the 12 chars passwords and associated resulting hash value. This is basically a big map between HASH(P) and P.
&lt;br/&gt;&lt;br/&gt;

However such a table takes space, &lt;i&gt;a lot&lt;/i&gt; of terabytes (86792 for precision) to store the lookup table assuming  we have a so cool compression algorithm that can use just a byte per HP,P pair (an unreachable goal likely). However this is a valid attack when the size of the table is reasonable.
&lt;br/&gt;&lt;br/&gt;

The point here is, many times in cryptography an attack can be made working using space instead of time.
&lt;br/&gt;&lt;br/&gt;

The good thing is that there is a way to avoid the user precomputing &lt;b&gt;a single table that will work for all the sites using the same hash function&lt;/b&gt;, that is, using *&lt;i&gt;a salt&lt;/i&gt;. A salt is an (assumed public) string we concatenate to our password before hashing it, so if our salt is &amp;quot;lame&amp;quot;, and the password is &amp;quot;foo&amp;quot; we will perform:
&lt;pre class="code"&gt;
HP = HASH(&amp;quot;foolame&amp;quot;)
&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

This way for the table-based attack to work the attacker needs to pre-compute a table with all the 12 char passwords combination hashed with the same salt. This means, this table is useless if Eve plans to attack another site with a different salt.
&lt;h3&gt;Random salts&lt;/h3&gt;
We can do even better than that, and not just store HP, but also a random salt.
When we create the user account we also generate an user-specific random salt, and store it along with the hashed password.
&lt;br/&gt;&lt;br/&gt;

With a per-user salt we are safer, the requirement is a table &lt;b&gt;per user&lt;/b&gt; now if the attacker wants to precompute it. And even more interesting while a global salt is more likely to be leaked even if the user passwords are not leaked, this is unlikely to happen if you have a salt per every user.
&lt;h3&gt;Making HASH slow&lt;/h3&gt;
However even if we stop all the table based attacks, there is still a fundamental problem: if the password is short and Eve is able to compute HASH 1 billion times per second we have problems.
&lt;br/&gt;&lt;br/&gt;

There is one thing we can do: using an hash function HASH that is MONKEY ASSES SLOW.
&lt;br/&gt;&lt;br/&gt;

There are algorithms that are very slow both in hardware and in software for instance.
Or we can take an existing algorithm and make it very slow by using it into a loop.
&lt;br/&gt;&lt;br/&gt;

For example &lt;a href="http://en.wikipedia.org/wiki/Blowfish_(cipher)"&gt;Blowfish&lt;/a&gt; is an encryption algorithm with a slow key scheduling algorithm (the algorithm is pretty fast once you performed the key scheduling, so Blowfish is not good only if you want to encrypt many short messages with different keys, but can be fast if you want to encrypt a big message with a single key).
&lt;br/&gt;&lt;br/&gt;

The fact Blowfish key scheduling algorithm is slow makes it a good candidate for HASH.
&lt;br/&gt;&lt;br/&gt;

So  Niels Provos and David Mazi&amp;egrave;res designed an algorithm called &lt;a href="http://en.wikipedia.org/wiki/Bcrypt"&gt;Bcrypt&lt;/a&gt; that can be used in order to hash passwords. The algorithm was presented in 1999 and uses a &lt;i&gt;modified&lt;/i&gt; blowfish key scheduling algorithm. I'm not sure if past analysis against Blowfish can be applied to Bcrypt after the modifications, nor how much analysis was performed against Bcrypt itself, so I can't comment about the security of the algorithm in question.
&lt;br/&gt;&lt;br/&gt;

However it is a popular pick, Provos and Mazi&amp;egrave;res are two known cryptographer so probably the algorithm has no obvious flaws as well.
&lt;br/&gt;&lt;br/&gt;

Once you use a slow HASH the attacker will start to have much troubles. For instance Bcrypt is &amp;quot;tunable&amp;quot;, you can modify an input parameter to make it slower or faster. If you make it slow enough so that even with good hardware you can't compute more than 1000 hashes per second, it is still probably fast enough for your authentications servers to handle, but it is unpractically slow for Eve to crack even a 8 characters password, even using just 26 letters:
&lt;pre class="code"&gt;
26^8/1000/3600/24/365 =6.6218627782
&lt;/pre&gt;
3.3 years on average to crack an 8 digits password. Probably still a bit too weak but better than a few seconds...
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;However note how we are still not secure against a dictionary attack&lt;/b&gt;. If the user picked a common word there are no hopes. 30k hashes are still trivial to perform in a reasonable amount of time.
&lt;h3&gt;On dogmas&lt;/h3&gt;
So far we showed a few interesting points I think, first: there is no hashing schema that will save users picking very bad passwords. It is very important to force users to add non alphanumerical characters and a few capital letters in the password IF security is very important for your application.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;It is important to understand how things work&lt;/b&gt;. And this brings us to the following point.
After the &amp;quot;use bcrypt&amp;quot; chorus I replied that I could use another solution instead, based on just iterating SHA1. But apparently cryptography is not a topic a programmer should understand for many. It is just a dogma. When you have dogmas you are going to be a bad programmer probably, what about if your system does not have bcrypt support for some reason and you still want to mount something useful?
&lt;br/&gt;&lt;br/&gt;

What I proposed was this trivial schema:
&lt;pre class="code"&gt;
SHA1(SHA1(SHA1(...(SHA1(password|salt))))
&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

How heretic! I was marked as a stupid not understanding security, that is not safe to chain hash functions like this, and so forth. But if you think at it:
&lt;br/&gt;&lt;br/&gt;

SHA1 is a one way hash function. It is composed of a small computation step called &lt;b&gt;round&lt;/b&gt; that is iterated again and again. There is no key scheduling as it is not a block cipher, it just compresses a stream of bits into a fixed length output.
&lt;br/&gt;&lt;br/&gt;

It is very important to understand that &lt;i&gt;many&lt;/i&gt; crypto algorithms are based on that idea of taking a simpler function and iterating it many times to strength the effect it has. This concept is so important that sometimes an attack to an algorithm disappears (becomes not practical or requires more time than brute force) if you add more rounds. Sometimes cryptographers use a variant of the algorithm with a reduced amount of rounds just to analyze the algorithm in a more attackable form, to understand better how strong the variant with the full number of rounds is.
&lt;br/&gt;&lt;br/&gt;

So why we don't just add a lot of rounds? Because it is slow. Even an amateur cryptographer could design an algorithm that is secure but slow. A good cryptographer will be able to find the tradeoff between security and speed.
&lt;br/&gt;&lt;br/&gt;

But... now we know this concept of rounds, and we know that in SHA1 there is no key scheduling algorithm, the output of the function is only related to its input, nor SHA1 is designed to be &lt;i&gt;inverted&lt;/i&gt;, as there is no decryption stage.
&lt;br/&gt;&lt;br/&gt;

So it is quite natural that the schema I proposed of computing SHA1(SHA1(SHA1(..))) will just do that, adding rounds to SHA1. So for the fundamental properties of SHA1 it should be computationally unfeasible to write a function SHA1000 that is equivalent to 1000 times SHA1 nested but that can be computed easily.
&lt;br/&gt;&lt;br/&gt;

Note that &lt;b&gt;the output&lt;/b&gt; of SHA1(SHA1(..)) is not the same as modifying the algorithm adding more rounds since there is a pre and post stage in the SHA1 algorithm that will make the output differ compared to a plain SHA1 with more rounds.
&lt;br/&gt;&lt;br/&gt;

But guess what? This morning I discovered that actually the algorithm PBKDF1 described into &lt;a href="http://tools.ietf.org/html/rfc2898"&gt;RFC2898&lt;/a&gt; does exactly what I proposed.
&lt;br/&gt;&lt;br/&gt;

There are people that are very happy to show you the way, but if you look at them more closely you discover they are clueless. So please use proven standards, try to write secure code, but use your mind, learn about cryptography and how you can combine primitives. Dogmas are lame.
&lt;br/&gt;&lt;br/&gt;

It is not a good idea for a programmer to try designing a block cipher and then use it for sensible purposes, there are specialists doing that, but understanding what are the building blocks, and what you can do with cryptography, how to mount protocols, it is a &lt;i&gt;very&lt;/i&gt; important skill for our community.
&lt;br/&gt;&lt;br/&gt;

Finally it is ok for me when people are rude with me when they are right. Arrogance can be handled if it is mixed with smartness. When instead it meets ignorance it is really just a sad affair.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Edit:&lt;/b&gt; two new interesting links from HN:
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="http://eprint.iacr.org/2010/384"&gt;How SHA1 and SHA2 will lose some entropy at every iteration. Not a big problem for our application (too little iterations) but worth knowing&lt;/a&gt;.&lt;/li&gt;

&lt;li&gt;&lt;a href="https://groups.google.com/group/sci.crypt/msg/92fe3e4e1edf0d0f?hl=nl"&gt;How nesting SHA1 does not help if you want to slowdown a pre-image attack&lt;/a&gt;.&lt;/li&gt;

&lt;/ul&gt;
About the second link, in the message the explanation is not very clear but this is what happens:
&lt;br/&gt;&lt;br/&gt;

Here the attack they want to mount is the following: find another string, ANY string, that will hash to the same output, but only 32 bits of the output.
&lt;br/&gt;&lt;br/&gt;

Since it is any string, it can also be a SHA1 itself. So what you do is to start with an &amp;quot;X&amp;quot; that can be ANY ANY value, even &amp;quot;foo&amp;quot;. And you start doing:
&lt;pre class="code"&gt;
    x = SHA1(x)
    x = SHA1(x)
    ... again and again ...
&lt;/pre&gt;
ok? Well in the average case after 2^31 iterations you find a collision, right?
&lt;br/&gt;&lt;br/&gt;

But the output 65536 iterations ago was it! The string that will output that specific 32 bit output after SHA1() nested 65536 times. So you want to go backward but it is not possible, SHA1 can't be inverted.
So what you do? You start again from &amp;quot;X&amp;quot; and stop exactly 65536 iterations before you found the wanted value.
&lt;br/&gt;&lt;br/&gt;

Obviously doing 65536 more SHA1s of that string you get the previous output. So you found your string.
Why the original poster says that the attack takes 2x time but can even optimized? Since you can store the value of SHA1 at 10000 iterations, at 20000 and so forth. Then instead of re-running the iteration again you start from the nearest cached value.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;33581 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 14:14:09 | &lt;a href="http://antirez.com/post/crypto-dogmas.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/crypto-dogmas.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=241"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/_ETtJGfnIU0" height="1" width="1"/&gt;</description>
   <dc:date>2011-10-21T14:14:09+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/crypto-dogmas.html</feedburner:origLink></item>
  <item>
   <title>What's wrong with the iPhone 4s, and why Jobs is not my hero</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/HJgF2Kc2sos/iphone-4s-and-jobs.html</link>
   <guid isPermaLink="false">http://antirez.com/post/240</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;The iPhone 4s is out, and apparently for the first time it disappointed many of the most addicted Apple fan boys, especially the ones that Apple grown in the latest decade in a sort of semi religious way, using bold statements like &lt;i&gt;this changes everything, again&lt;/i&gt;.
&lt;br/&gt;&lt;br/&gt;

Apparently for many, the 4s did not changed everything enough: they probably expected some kind of tangible change like a bigger display, a redesign, or who knows... maybe a portable &lt;a href="http://en.wikipedia.org/wiki/Holodeck"&gt;holodeck&lt;/a&gt;.
&lt;br/&gt;&lt;br/&gt;

If you ask me, the iPhone 4s is a huge step forward, because of &lt;a href="http://www.apple.com/iphone/features/siri.html"&gt;Siri&lt;/a&gt;.
I'm not talking about what Siri is &lt;i&gt;currently&lt;/i&gt; able to do, or the current impact it can have in our interaction with a portable device. Even if it seems like a great new way for interacting with computers already, what makes me so excited is the fact that natural language processing finally hit a mass market product. What I hope is that the experience will be already better enough for average people that this will boost innovation in that area, both in the industry and in the academia.
&lt;br/&gt;&lt;br/&gt;

I'm pretty sure Google will end involved in that game. After all till yesterday Android was the state of art in voice interaction. But apparently Google is not able to connect the dots: they solve a given problem in a great way, that is, translating speech to text, but they can't translate that into great user experience for their users.
&lt;br/&gt;&lt;br/&gt;

Now that Apple, again, showed the world the obvious, other companies will compete in the same arena, improving the technology.
&lt;br/&gt;&lt;br/&gt;

In my opinion Siri will be integrated in the iPad soon as well, since the iPad needs Siri even more than the iPhone itself. With tablets you have a bigger screen that makes you feel like you can use it to accomplish actual work, but the limitation of typing in a virtual keyboard, in a non natural position, severely limit what you can do. If you can use your voice it is a different story.
&lt;br/&gt;&lt;br/&gt;

So if it is so cool what is wrong with the iPhone 4s? First: that Apple sells you new software implicitly pretending it is selling you new hardware. Siri is the most interesting thing in the iPhone 4s and could run perfectly well in the iPhone 4 (it is mostly a server-side thing), but even if you purchased an iPhone 4 a few weeks ago, no way, you can't have it, even if you would pay for a software update.
&lt;br/&gt;&lt;br/&gt;

No wait. It is worse than that. Siri was even available in the App Store as a standard application, but was now removed since it got integrated into the 4s.
&lt;br/&gt;&lt;br/&gt;

You may complain that Siri as application sucked, without integration there is no fun. Unfortunately the lack of integration is a result of a closed environment, the iOS environment, where there is a single entity dictating what you can run and what not, and how an application can interact with the device (usually in a very limited form).
&lt;br/&gt;&lt;br/&gt;

All this is happening at the same time as the world lost Steve Jobs. News sites are full of articles showing how great he was, and I think he actually contributed a lot to the technology world. But my heroes are different: they want a world where everybody has access to the best technology, to the best hospitals, and making money is a side effect of contributing in a non evil way to the development of our culture.
&lt;br/&gt;&lt;br/&gt;

Being among the creators of the marketing and business philosophy that Apple pushes forward made Steve Jobs a great CEO, but the world needs different kind of heroes. Unfortunately when they happen to die you can expect some small news in major media, at max.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;33001 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 13:29:00 | &lt;a href="http://antirez.com/post/iphone-4s-and-jobs.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/iphone-4s-and-jobs.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=240"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/HJgF2Kc2sos" height="1" width="1"/&gt;</description>
   <dc:date>2011-10-07T13:29:00+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/iphone-4s-and-jobs.html</feedburner:origLink></item>
  <item>
   <title>Why the MBA 11 is now my sole computer</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/sXklPWeyEyg/apple-mba11-my-sole-computer.html</link>
   <guid isPermaLink="false">http://antirez.com/post/239</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;(Since I received a few questions about how I feel writing code with the MBA I'm posting this blog post.)
&lt;br/&gt;&lt;br/&gt;

I work from a number of different places. Mainly from home and from an office where I've a room for me, here in Catania, together with other friends of mine also writing code for another company.
&lt;br/&gt;&lt;br/&gt;

This way I'm sure I'll not spend the whole day at home, and when I've an interesting problem I've a few friends to share ideas with in front of a coffee. So usually I work at home in the morning, then go to the office for lunch, and stay at the office till the end of the working day.
&lt;br/&gt;&lt;br/&gt;

It is also common for me to work from the swimming pool waiting for my son, or from my parents home, and in many other places especially during weekends and holidays. Clearly I need a computer that is good for mobility, and guess what: the Macbook PRO 13&amp;quot; that I own is not the best fit. It is simply too big to both carry and take in your legs. It is not good for &amp;quot;couch browsing&amp;quot; when I need some info or to check if there are updates on some Redis matter that is particularly urgent.
&lt;br/&gt;&lt;br/&gt;

The iPad is also not an option: I spent a life in front of a keyboard, I'm good at it, I can type fast without efforts. Any computing experience that makes me slow at typing is frustrating. Without to mention that the iPad is the worst computing device to write even a single line of code.
&lt;br/&gt;&lt;br/&gt;

So... when Apple released the first version of the 11&amp;quot; Macbook Air (Late 2010) I purchased one within the first week of the announcement, and my computing experience changed.
&lt;br/&gt;&lt;br/&gt;

&lt;h2&gt;The old MBA&lt;/h2&gt;
The old MBA was almost exactly what I needed. A small full-featured computer with a good enough keyboard, good battery life, and readable screen.
&lt;br/&gt;&lt;br/&gt;

I spend almost 90% of my time inside a Terminal or a web browser, and the 11&amp;quot; is not a problem for me in this contexts. I force myself to write code with an 80 column max line, so even using a bigger screen I tend to use small terminal apps. The 11&amp;quot; screen is enough to have a big font in the Terminal app to display 80 column x 37 rows. Web browsing is also ok, just a matter of tuning the font size to read comfortably.
&lt;br/&gt;&lt;br/&gt;

The old MBA was so good from many points of view that I rapidly started using it to code, and every day I was using more MBA and less MBP. However it had a big problem: it was slow, very very slow.
Don't get me wrong, thanks to the SSD HD it is fast enough to do things like running the twitter client, browsing the web without waiting too much time, and even for watching videos (as long as you don't have other background stuff). But once you start using it to write C code and run unit tests... well it is entirely a different story. Compiling Redis after switching branch or running the test was too slow.
Not slow enough to stop using it given the advantages, but from time to time I found myself switching back to the MBP 13 in order to work more comfortably, especially during bug hunting or other tasks where there was a edit-compile-test fast loop.
&lt;br/&gt;&lt;br/&gt;

I was asking myself the same question again and again: &amp;quot;Why Apple does small computers that are slow, and big computers that are fast?&amp;quot;. An obvious reason was I guess the battery life concerns. If the computer is small the battery is small, and too much computing power costs more battery. However the tradeoff was still not very clear. I was in need of a small, but fast, computer, even with a not so great battery life.
&lt;br/&gt;&lt;br/&gt;

&lt;h2&gt; The new MBA &lt;/h2&gt;
At some point it happened: Apple released an 11&amp;quot; computer that was as portable as the older model but was almost four times faster. Obviously I ordered one the same day it was announced, making sure to get the one with the i7 processor for maximum performances.
&lt;br/&gt;&lt;br/&gt;

My new MBA is actually two times faster than my old MBP 13 at running the Redis test.
Long story short I no longer use my MBA 13, that is now only used connected to the TV in order to watch stuff in streaming...
&lt;br/&gt;&lt;br/&gt;

The new MBA is virtually identical to the old one, if not for the backlit keyboard that is also a pro for many people (but not for me as I think that if you can't see the keys you are harming your eyes, so I always use at least a dim light even when working at night. Btw I don't watch at the keyboard while typing as I guess most of you.).
&lt;br/&gt;&lt;br/&gt;

My feeling is that the battery life of my new MBA is not as good as my old MBA, but actually I never verified this experimentally, I'll try it since I've also the old one that is now used by my wife.
&lt;br/&gt;&lt;br/&gt;

In short if you want an MBA 11&amp;quot; and you are a programmer, concerned with screen size or speed, go for it, it is the best computer I ever owned for sure. Virtually all my Redis development is done with the MBA, but since it is so fast nothing prevents you from connecting it to a bigger screen when you are in the location where you usually work, turning it into a real desktop.
&lt;br/&gt;&lt;br/&gt;

For instance I've a 22&amp;quot; Samsung monitor and an USB keyboard that I use with it when I'm at the office. I just plug the video cable and an USB hub with everything connected to switch from mobile to big-screen mode: a wireless mouse, an HD to do backups, a Dell keyboard, and so forth.
&lt;br/&gt;&lt;br/&gt;

I hope this helped somebody with mixed feelings about purchasing it or not :)&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;31108 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 13:53:18 | &lt;a href="http://antirez.com/post/apple-mba11-my-sole-computer.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/apple-mba11-my-sole-computer.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=239"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/sXklPWeyEyg" height="1" width="1"/&gt;</description>
   <dc:date>2011-09-30T13:53:18+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/apple-mba11-my-sole-computer.html</feedburner:origLink></item>
  <item>
   <title>Everything about Redis 2.4</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/RzVytEkThTw/everything-about-redis-24.html</link>
   <guid isPermaLink="false">http://antirez.com/post/238</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;A few months ago I realized that the cluster support for Redis, currently in development in the &lt;i&gt;unstable&lt;/i&gt; branch, was going to require some time to be shipped into a stable release, and required significant changes in the Redis core.
&lt;br/&gt;&lt;br/&gt;

At the same time I and Pieter already had a number of good things not related to cluster in our development code: delaying everything for the cluster stable release was not acceptable.
So I took a different path, forking 2.2 into 2.4, and merging my and Pieter's developments (at least the ones compatible with the 2.2 code base) into this new branch. In other words 2.4 was possible because &lt;b&gt;git rules&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

2.4 delayed the work into unstable, but this was a good compromise after all. And now the effort finally reached a form that is near to be stable as we are at the release candidate number five.
You can find Redis 2.4-rc5 in the &lt;a href="http://redis.io/download"&gt;Redis site download section&lt;/a&gt;, and in a few weeks this will be rebranded Redis 2.4.0-stable if no critical bugs will be discovered.
&lt;br/&gt;&lt;br/&gt;

This article is going to show you in detail all the new things introduced in Redis 2.4. Before continuing, no... scripting is not included. It will be released with Redis 2.6 that will be based on the Redis unstable code base instead. Redis 2.6 is planned for this fall.
&lt;br/&gt;&lt;br/&gt;

The following is a summary of all the changes contained in 2.4.
We'll show every one in detail in the course of the article.
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;Small sorted sets now use significantly less memory.&lt;/li&gt;

&lt;li&gt;RDB Persistence is much much faster for many common data sets.&lt;/li&gt;

&lt;li&gt;Many write commands now accept multiple arguments, so you can add multiple items into a Set or List with just a single command. This can improve the performance in a pretty impressive way.&lt;/li&gt;

&lt;li&gt;Our new allocator is &lt;i&gt;jemalloc&lt;/i&gt;.&lt;/li&gt;

&lt;li&gt;Less memory is used by the saving child, as we reduced the amount of copy on write.&lt;/li&gt;

&lt;li&gt;INFO is more informative. However it is still the old 2.2-alike INFO, not the new one into unstable composed of sub sections.&lt;/li&gt;

&lt;li&gt;The new OBJECT command can be used to introspect Redis values.&lt;/li&gt;

&lt;li&gt;The new CLIENT command allows for connected clients introspection.&lt;/li&gt;

&lt;li&gt;Slaves are now able to connect to the master instance in a non-blocking fashion.&lt;/li&gt;

&lt;li&gt;Redis-cli was improved in a few ways.&lt;/li&gt;

&lt;li&gt;Redis-benchmark was improved as well.&lt;/li&gt;

&lt;li&gt;Make is now colorized ;)&lt;/li&gt;

&lt;li&gt;VM has been deprecated.&lt;/li&gt;

&lt;li&gt;In general Redis is now faster than ever.&lt;/li&gt;

&lt;li&gt;We have a much improved Redis test framework.&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

Everything on this list was coded by me and Pieter Noordhuis but feedbacks from users were really helpful. A special thank goes to Hampus Wessman that spotted and fixed &lt;i&gt;interesting&lt;/i&gt; bugs.
&lt;br/&gt;&lt;br/&gt;

&lt;a href="http://vmware.com"&gt;VMware&lt;/a&gt; kindly sponsored all our work as usually. Thanks!
&lt;h3&gt;Memory optimized Sorted Sets&lt;/h3&gt;
One of the most interesting changes in Redis 2.2 was the support for memory optimized small values. Why to represent a Redis List as a linked list if it only got 10 elements for instance? If you have a billion of lists this is going to take a lot of space since there are a lot of pointers, many allocations each with its own overhead, and so forth.
&lt;br/&gt;&lt;br/&gt;

So we introduced the ability to switch encoding on the fly. Lists, Sets, Hashes, all start encoded as an unique blob that uses little memory, even if it requires O(N) algorithms to do things that are otherwise O(1). But once a given threshold is reached Redis converts this values into the old representation. So the amortized time is still O(1) to perform the operation on the element, but we use a lot less memory. Many datasets are composed of millions of small lists, hashes, and so forth.
&lt;br/&gt;&lt;br/&gt;

However in Redis 2.2 we applied this optimization to everything but Sorted Sets. Redis 2.4 finally brings this optimization to Sorted Sets as well, as we discovered that there are many users also using data sets with many many small sorted sets. And this brings us to the next point...
&lt;h3&gt;Faster RDB persistence&lt;/h3&gt;
If our small values are encoded as a blobs, this means we can do something very interesting from the point of view of persistence: this values are already serialized!
&lt;br/&gt;&lt;br/&gt;

The kind of representation we use for small values does not have pointers or alike. The only change that we required was to put all the integers (lengths and relative offsets in the encoding format) in an endianess independent form. I used little endian encoding as this means no conversion most of the times.
&lt;br/&gt;&lt;br/&gt;

This is a huge win from the point of view of RDB persistence. In Redis 2.2 to save an hash with ten fields represented as an zipmap (this is one of our special encoding formats) required to iterate the hash and save every field and value as a different logical objects in the RDB format.
&lt;br/&gt;&lt;br/&gt;

Now instead we save the serialized value as it is in memory!
Many datasets are now an order of magnitude faster to load and save. This also means that Redis 2.2 can't read datasets saved with 2.4.
&lt;h3&gt;Variadic write commands&lt;/h3&gt;
Finally many write commands are able to take multiple values!
This is the full list:
&lt;ul&gt;&lt;li&gt;&lt;b&gt;SADD set val1 val2 val3 ...&lt;/b&gt; -- now returns the number of elements added (not already present).&lt;/li&gt;

&lt;li&gt;&lt;b&gt;HDEL hash field2 field3 field3 ...&lt;/b&gt; -- now returns the number of elements removed.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;SREM set val1 val2 val3 ... &lt;/b&gt; -- now returns the number of elements removed.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;ZREM zset val1 val2 val3 ... &lt;/b&gt; -- now returns the number of elements removed.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;ZADD zset score1 val1 score2 val2 ... &lt;/b&gt; -- now returns the number of elements added.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;LPUSH/RLPUSH list val1 val2 val3 ... &lt;/b&gt; -- return value is the new length of the list, as usually.&lt;/li&gt;

&lt;/ul&gt;
Since Redis ability to process commands faster is not usually related to the time needed to alter the data set, but to the time spent into I/O, dispatching, sending the reply back, this means that now for some applications there is some impressive speed improvement.
&lt;br/&gt;&lt;br/&gt;

Just an example:
&lt;pre class="code"&gt;
&amp;gt; redis-cli del mylist
(integer) 1
&amp;gt; ./redis-benchmark -n 100000 lpush mylist 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
====== lpush mylist 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ======
  100000 requests completed in 1.28 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1&lt;br /&gt;&lt;br /&gt;99.93% &amp;lt;= 1 milliseconds
99.95% &amp;lt;= 2 milliseconds
100.00% &amp;lt;= 2 milliseconds
78247.26 requests per second&lt;br /&gt;&lt;br /&gt;&amp;gt; redis-cli llen mylist
(integer) 2101029
&lt;/pre&gt;
Yes, &lt;b&gt;we added two million items into a list in 1.28 seconds&lt;/b&gt;, with a networking layer between us and the server.  Just saying...
&lt;br/&gt;&lt;br/&gt;

You may ask, why we modified only a specific number of commands into variadic versions?
We did it in all the commands where the return value would not require a type change, nor to be dependent by the number of arguments. For all the rest there will be scripting... doing this and more for you :)
&lt;h3&gt;Jemalloc FTW&lt;/h3&gt;
The jemalloc affair is one of our most fortunate use of external code ever. If you used to follow the Redis developments you know I'm not exactly the kind of guy excited to link some big project to Redis without some huge gain. We don't use libevent, our data structures are implemented in small .c files, and so forth.
&lt;br/&gt;&lt;br/&gt;

But an allocator is a serious thing. Since we introduced the specially encoded data types Redis started suffering from fragmentation. We tried different things to fix the problem, but basically the Linux default allocator in glibc sucks really, really hard.
&lt;br/&gt;&lt;br/&gt;

Including jemalloc inside of Redis (no need to have it installed in your computer, just download the Redis tarball as usually and type make) was a huge win. Every single case of fragmentation in real world systems was fixed by this change, and also the amount of memory used dropped a bit.
&lt;br/&gt;&lt;br/&gt;

So now we build on Linux using Jemalloc by default. Thanks Jemalloc!
If you are on osx or *BSD you can still force a jemalloc build with make USE_JEMALLOC=yes, but those other systems have a sane libc malloc so usually this is not required. Also a few of those systems use jemalloc-derived libc malloc implementations.
&lt;h3&gt;Less copy-on-write&lt;/h3&gt;
Redis RDB persistence, and the AOF log rewriting system are based on fork() memory semantic in modern operation systems. While the child is writing the new AOF or an RDB file, it is cool to have the operating system preserving a point-in-time copy of the dataset for us, but every time we change a page of memory in the parent process this will get duplicated. This is known as copy-on-write, and is responsible for the additional memory used by the saving child in Redis.
&lt;br/&gt;&lt;br/&gt;

We did different changes in the past in order to reduce copy on write, but one of the latest change needed was still not implemented, related to the internal working of our hash table implementation iterator. Finally 2.4 has this change. It is interesting to note that I did the error of back porting this change into Redis 2.2. This was responsible of many bugs in the course of Redis 2.2 recent history. In the future I'll continue to be conservative as I was in the past and will do just the minimal changes in stable releases.
&lt;br/&gt;&lt;br/&gt;

The additional copy on write in Redis 2.2 looked like a bug, but fixing those bug with a patch involving several changes into the core was surely not a good idea. To wait for the next release is almost always the right thing to do in the case of non critical bugs.
&lt;h3&gt;More fields in INFO&lt;/h3&gt;
The new INFO into unstable is much better compared to the one into 2.2 and 2.4. It was not a good idea to backport it into 2.4 as it was too much different code, but the new 2.4 INFO has a few interesting new fields, especially this two:
&lt;pre class="code"&gt;
used_memory_peak:185680824
used_memory_peak_human:177.08M
&lt;/pre&gt;
Your RSS and your fragmentation rate are usually related to the &lt;i&gt;peak&lt;/i&gt; memory usage. Now Redis is able to hold this information, and this is very useful for memory related troubleshooting.
&lt;br/&gt;&lt;br/&gt;

So for instance if you have an RSS of 5 GB but your DB is almost empty, are you sure it used to be always empty? Now there is just to look at this field.
&lt;h3&gt;Two new introspection commands: OBJECT and CLIENT&lt;/h3&gt;
The &lt;b&gt;DEBUG&lt;/b&gt; command was already able to show a few interesting informations about Redis objects. However you can't count on DEBUG as this command is not required to be stable over time, and should never be used if not in order to hack on Redis code base.
&lt;br/&gt;&lt;br/&gt;

The &lt;b&gt;OBJECT&lt;/b&gt; command brings a few interesting information about Redis values in a space that is accessible and usable by developers.
&lt;br/&gt;&lt;br/&gt;

You can find the &lt;a href="http://redis.io/commands/object"&gt;full documentation of the Object command here&lt;/a&gt;.
&lt;br/&gt;&lt;br/&gt;

Another interesting new command is the &lt;b&gt;CLIENT&lt;/b&gt; command. Using this command you are able to both list and kill clients. I'm sorry but I've still to write the documentation for this command, so here I'll show an interactive usage example:
&lt;pre class="code"&gt;
redis 127.0.0.1:6379&amp;gt; client list
addr=127.0.0.1:49083 fd=5 idle=0 flags=N db=0 sub=0 psub=0
addr=127.0.0.1:49085 fd=6 idle=9 flags=N db=0 sub=0 psub=0
&lt;/pre&gt;
We got the list of clients, and some info about what they are doing (or not doing, see the idle field).
Now it's time to kill some client:
&lt;pre class="code"&gt;
redis 127.0.0.1:6379&amp;gt; client kill 127.0.0.1:49085
OK
redis 127.0.0.1:6379&amp;gt; client kill 127.0.0.1:49085
(error) ERR No such client
&lt;/pre&gt;
&lt;h3&gt;Non blocking slave connect&lt;/h3&gt;
Redis master - slave replication was a non blocking process already almost for everything but the connect(2) call performed by the slave to the master.
&lt;br/&gt;&lt;br/&gt;

This is finally fixed. A small change but with a significantly better behavior compared to the past. We still need to fix a few things about replication, but we'll do other changes in order to make replication better for cluster. Redis Cluster uses replication in order to maintain copies of nodes, so you can expect that as cluster will evolve replication will also evolve.
&lt;h3&gt;Better redis-cli and redis-benchmark&lt;/h3&gt;
Redis-cli is now able to do more interesting things. For instance you can now prefix a command with a number to run the command multiple times:
&lt;pre class="code"&gt;
redis 127.0.0.1:6379&amp;gt; 4 ping
PONG
PONG
PONG
PONG
&lt;/pre&gt;
Another interesting change is the ability to reconnect to an instance if the link goes down and to retry the reconnection after every command typed.
&lt;br/&gt;&lt;br/&gt;

Finally redis-cli can be now used to monitor INFO parameters together with grep.
In the following example we display the memory usage every second.
&lt;pre class="code"&gt;
./redis-cli -r 10000 -i 1 info | grep used_memory_human
used_memory_human:909.22K
used_memory_human:909.22K
used_memory_human:909.22K
&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

Redis-benchmark was also improved, and now you can specify the exact command to benchmark, that is an awesome change. You can see an example run of the new redis-benchmark in the paragraph related to variadic commands.
&lt;h3&gt;Other improvements&lt;/h3&gt;
An important change in Redis 2.4 is that it is the last version of Redis featuring VM.
Redis will warn you that it is not a good idea to use VM as we are going to no longer support it in future versions of Redis as already discussed many times.
&lt;br/&gt;&lt;br/&gt;

Also the new test is much faster, we have a &lt;a href="http://antirez.com/post/redis-new-test-engine.html"&gt;full article about this change&lt;/a&gt;. The new &lt;a href="http://ci.redis.io"&gt;continuous integration&lt;/a&gt; is also helpful, running our code base over valgrind multiple times every hour.
&lt;br/&gt;&lt;br/&gt;

Another interesting change is the colorized make process ;) You may thing this is just a fancy thing, but actually it is much simpler to see compilation warnings this way.
&lt;br/&gt;&lt;br/&gt;

I hope you'll enjoy Redis 2.4, and a big thank you to all the Redis community!
Since it's friday, have a good week end :)&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;62345 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 17:46:47 | &lt;a href="http://antirez.com/post/everything-about-redis-24.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/everything-about-redis-24.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=238"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/RzVytEkThTw" height="1" width="1"/&gt;</description>
   <dc:date>2011-07-29T17:46:47+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/everything-about-redis-24.html</feedburner:origLink></item>
  <item>
   <title>Redis Documentation Fiesta 2</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/T3p9g6TStMc/redis-doc-fiesta-2.html</link>
   <guid isPermaLink="false">http://antirez.com/post/237</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;A few months ago I asked my twitter followers to tell me something that could be improved in the Redis Documentation, with the promise to fix the problem in the next couple of days.
This was a good thing as I received good suggestions and spent some time just writing doc, that is a &lt;i&gt;fundamental&lt;/i&gt; part of any software project.
&lt;br/&gt;&lt;br/&gt;

So I'm trying it again. Tomorrow I&amp;quot;ll post a blog post about the improvements of Redis 2.4. A the same time Redis 2.4 tar.gz will appear on Redis.io download section. I'll be spending the rest of the day and part of the weekend writing documentation.
&lt;br/&gt;&lt;br/&gt;

So please if you have in mind something specific that needs some love in the Redis documentation leave a comment here. I'll try to fix it.
&lt;br/&gt;&lt;br/&gt;

One of the things I'll surely write is a better &amp;quot;quick start&amp;quot; documentation for newcomers.
I'll also fix all the command pages that with 2.4 are now able to accept a variable number of arguments.
&lt;br/&gt;&lt;br/&gt;

Write to you tomorrow with a detailed post about changes in 2.4 and how to make use of this changes.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;31506 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 19:52:45 | &lt;a href="http://antirez.com/post/redis-doc-fiesta-2.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/redis-doc-fiesta-2.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=237"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/T3p9g6TStMc" height="1" width="1"/&gt;</description>
   <dc:date>2011-07-28T19:52:45+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/redis-doc-fiesta-2.html</feedburner:origLink></item>
  <item>
   <title>Redis new test engine</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/EZ3A4Ok4SLA/redis-new-test-engine.html</link>
   <guid isPermaLink="false">http://antirez.com/post/236</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;If you used to follow the advice that Redis displays after you build it, that is, to run &lt;i&gt;make test&lt;/i&gt;, then you know that our test was not the fastest in the world, actually taking several minutes to complete in a not so fast computer.
&lt;br/&gt;&lt;br/&gt;

One reason the test used to be so slow is that we do a lot of &lt;a href="http://en.wikipedia.org/wiki/Fuzz_testing"&gt;Fuzz Testing&lt;/a&gt;. Almost all the bugs we discovered thanks to the test suite were discovered thanks to fuzz tests and very very rarely thanks to regression tests and unit tests. This is I guess pretty obvious since if you try to write correct code it is a bit unlikely you do trivial errors. More likely errors will arise from interactions you are not vulcan enough to spot while coding.
&lt;br/&gt;&lt;br/&gt;

Fuzz tests are slow but another element of the Redis test slowness is that it has to deal with the networking stack, since the test is actually a specialized Redis client poking commands against a live instance. This means that every time we issue a command we pay the round trip time.
&lt;br/&gt;&lt;br/&gt;

Whatever the reason was, a slow test is not good for developers that need to constantly run it after significant changes (and the slower it is, the more significant you wait for the change to be to run it), and of course for users that will be disappointed to wait several minutes after installation just since they decided to behave safely testing the build before deploying it. Also our Continuous Integration environment can perform less test runs per hour if the test is slow, and this will prevent or delay the discovery of hard to catch bugs that only happen from time to time.
&lt;br/&gt;&lt;br/&gt;

Since before the 2.4 release candidate I want to get better testing, better CI, and better coverage, this was the right time to fix the test making it faster, and more valgrind friendly.
The result is already merged in the unstable branch. The following are a few notes about what I did in order to make the test faster.
&lt;h3&gt;Going parallel&lt;/h3&gt;
For a faster test to have serious impacts on how developers and users use it you don't need a 30% faster test, you need an order of magnitude faster test, so this was the kind of improvement I was looking for.
&lt;br/&gt;&lt;br/&gt;

Using a faster client, or trying to optimize some specific tests would help a little bit, but not enough to reach the one order of magnitude speedup I was looking for, however there was a simple thing to do in order to dramatically speedup the test execution: turning the test into a parallel one.
&lt;br/&gt;&lt;br/&gt;

The Redis test was already organized in separated units, such as &amp;quot;list&amp;quot;, &amp;quot;aof&amp;quot;, &amp;quot;replication&amp;quot; and so forth. Often the tests inside a single unit need to run in a sequential fashion because sometimes a test uses the data set created by the previous test and so forth, so turning every test into a separated unit was too complex and possibly not worth it. What is simpler to do is instead to run the different units (composed of tens of tests each) in parallel.
&lt;br/&gt;&lt;br/&gt;

This was much simpler, as different units already used to start difference instances of Redis.
&lt;h3&gt;Server Client model&lt;/h3&gt;
One of my goals was to reuse as much code as possible from the old test engine: I and Pieter Noordhuis work at the tests since two years, it is not work to throw away without good reasons.
The previous engine was also perfectly able to run a single unit, that is part of what I wanted to accomplish. So what I did was to turn the old test into a &lt;b&gt;test client&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

This is the final design:
&lt;ul&gt;&lt;li&gt;The test starts a &lt;b&gt;test server&lt;/b&gt;, that is a process that will handle the execution of all the test units and report back to the user.&lt;/li&gt;

&lt;li&gt;The test server starts a number of &lt;b&gt;test clients&lt;/b&gt;. A test client is basically the old test suite, but with a networking interface. Every test client connects to the test server on startup via a TCP socket, and waits for commands.&lt;/li&gt;

&lt;li&gt;At this point the test server starts assigning tasks to the different clients, like &amp;quot;run the list test&amp;quot;.&lt;/li&gt;

&lt;li&gt;Clients running test units will report back to the test server. The test server uses an &lt;b&gt;event-driven&lt;/b&gt; design, so it can read the replies from all the clients with little work.&lt;/li&gt;

&lt;li&gt;Every time a test client finished to execute a test, it sends a &amp;quot;done&amp;quot; message to the server, that will re-use the test client to run the next test unit, if any.&lt;/li&gt;

&lt;li&gt;Eventually all the test units will be executed, and the test can exit with the appropriate exit code.&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

Currently I'm spawning 16 test clients to run a little more than 20 units.
The new units are not exactly what they used to be, since too long running tests are now split into N sub units, in order to improve the parallelization.
&lt;h3&gt;Less fuzz, more speed&lt;/h3&gt;
As I said at the begin of this article we have a lot of fuzz testing, however this tests sometimes run for 10000 iterations and are very slow to execute. Running fuzz tests into the CI makes sense as this helps discovering rare bugs that need a few very particular events to be triggered, however if something is broken it will be evident even after a lot less iterations.
&lt;br/&gt;&lt;br/&gt;

So in order to further speed up the test execution I introduced a --accurate option that the user can pass to the test suite. When this option is given the test is ran (as it used to be) with a lot of iterations in the fuzz tests, by default instead we use less iterations. The continuous integration tests uses --accurate, and so we developers will do before a release (but anyway the test is running continuously inside the CI with --accurate so if something is broken we'll be informed soon).
&lt;br/&gt;&lt;br/&gt;

This further reduced the execution time of a few of the more time consuming tests.
&lt;h3&gt;The actual speedup&lt;/h3&gt;
How faster we can run the test now? &lt;i&gt;A lot faster!&lt;/i&gt;
&lt;br/&gt;&lt;br/&gt;

In the same (fast) box what used to take 2 minutes and 54 seconds now can run the same tests in just 13 seconds. This is a &lt;b&gt;13x speed improvement!&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

Even running with --accurate is significantly faster than it used to be, taking a total of 48 seconds to execute.
&lt;br/&gt;&lt;br/&gt;

This is the kind of speedup I was looking for since it is the kind of speedup that can completely change the interaction between testing and users. A 13 seconds test running hundreds of tests with a colorized output is fun to run, while a 3 minutes test is frustrating and many users will not run the tests or will be far from happy of doing it.
&lt;h3&gt;Implementation details&lt;/h3&gt;
This system is implemented in Tcl, using the built in event-driven programming support that Tcl has since... decades? Far before this paradigm started to gain in popularity. You may remember the &lt;a href="http://www.csd.uoc.gr/~hy527/papers/threads-ousterhout.pdf"&gt;Why threads are a bad idea&lt;/a&gt; paper from Ousterhout (Tcl's father).
&lt;br/&gt;&lt;br/&gt;

For instance in Tcl you could write the following event-driven time server even 10 years ago:
&lt;pre class="code"&gt;
socket -server handle_client 1234
proc handle_client {fd host port} {
    fconfigure $fd -blocking 0
    puts $fd &amp;quot;Hello $host:$port! the current unix time is [clock seconds]&amp;quot;
    close $fd
}
vwait forever
&lt;/pre&gt;
If you telnet to port 127.0.0.1 the result is: &lt;b&gt;Hello 127.0.0.1:65236! the current unix time is 1310419689&lt;/b&gt; as expected. That will handle 30k clients per second in your macbook without issues.
&lt;br/&gt;&lt;br/&gt;

Tcl event driven programming supports sending data in background automagically, timers, and so forth. Another feature I used is that in Tcl everything is a string so it was very trivial to exchange data between the test clients and the test server. I used the following function:
&lt;pre class="code"&gt;
proc send_data_packet {fd status data} {
    set payload [list $status $data]
    puts $fd [string length $payload]
    puts -nonewline $fd $payload
    flush $fd
}
&lt;/pre&gt;
The receiver of the data packet can easily decoded the data as it is a Tcl list, and is represented as a string like any other data type in Tcl.
&lt;br/&gt;&lt;br/&gt;

So for instance if a test worked as expected the test client sends a data packet to the server with &lt;i&gt;status&lt;/i&gt; &amp;quot;ok&amp;quot; using the data filed to communicate the test name.
Otherwise if there was some problem an &amp;quot;err&amp;quot; status is sent, along with the details of the error.
&lt;br/&gt;&lt;br/&gt;

Instead if for some reason a runtime error happens in a client it gets trapped using Tcl exceptions and sent to the test server as an &amp;quot;exception&amp;quot; packet, that will halt the execution of everything.
&lt;h3&gt;Valgrind support&lt;/h3&gt;
Another thing I improved was the valgrind support. When the test is running over valgrind a few time dependent tests may fail or produce slightly different outputs that result in false positives.
I simply added a few sleeps where needed and everything is now fine. As a result now the Redis test is constantly running over valgrind (at least the unstable branch so far). This was pretty important as in the latest months we spot at least one bug that was detected by valgrind just running the existing test suite.
&lt;br/&gt;&lt;br/&gt;

In short testing is going to be more important in the Redis world, as I'm more content shipping a bit less, but rock solid, that a bit more but with potential issues. The release of 2.4 will be probably delayed a bit in order to add more tests and to test by hand the 2.4 release better, but I think this is in the best interest of our users.
&lt;br/&gt;&lt;br/&gt;

Also the 2.6 release will be based in the unstable release, so that we'll try to restrict the number of significantly different source trees we are managing.
&lt;br/&gt;&lt;br/&gt;

I hope this overview of the test suite was interesting, please if you have questions feel free to ask.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;28109 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 21:39:51 | &lt;a href="http://antirez.com/post/redis-new-test-engine.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/redis-new-test-engine.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=236"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/EZ3A4Ok4SLA" height="1" width="1"/&gt;</description>
   <dc:date>2011-07-11T21:39:51+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/redis-new-test-engine.html</feedburner:origLink></item>
  <item>
   <title>How to take advantage of Redis just adding it to your stack</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/Oxrh1hj4h0Y/take-advantage-of-redis-adding-it-to-your-stack.html</link>
   <guid isPermaLink="false">http://antirez.com/post/235</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Redis is different than other database solutions in many ways: it uses memory as main storage support and disk only for persistence, the data model is pretty unique, it is single threaded and so forth. I think that another big difference is that in order to take advantage of Redis in your production environment you don't need to &lt;i&gt;switch&lt;/i&gt; to Redis. You can just use it in order to do new things that were not possible before, or in order to fix old problems.
&lt;br/&gt;&lt;br/&gt;

Switching to Redis is of course an option, and many users are using Redis as primary database since they need features or write speed or latency or some other feature, but as you can guess switching is a big step if you have an already running application in production. Also for some other kind of applications Redis may not be the right database: for instance a Redis data set can't be bigger than available memory, so if you have some &lt;i&gt;big data&lt;/i&gt; application and a mostly-reads access pattern, Redis is not the right pick.
&lt;br/&gt;&lt;br/&gt;

However one thing I like about Redis is that it can solve a lot of problems just &lt;b&gt;adding it to your stack&lt;/b&gt; to do things that were too slow or impossible with your existing database. This way you start to take confidence with Redis in an incremental way, starting to use it just to optimize or to create new features in your application. This blog post explores a few use cases showing how people added Redis to existing environments to take advantage of Redis set of features. I'll not report specific use cases with site names and exact configurations, I'll just try to show you class of problems that Redis can solve without being your primary database.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Slow latest items listings in your home page&lt;/h3&gt;
Can I have a penny for every instance of the following query that is running too slow please?
&lt;pre class="code"&gt;
SELECT * FROM foo WHERE ... ORDER BY time DESC LIMIT 10
&lt;/pre&gt;
To have listings like &amp;quot;latest items added by our users&amp;quot; or &amp;quot;latest something else&amp;quot; in web applications is very common, and often a scalability problem. It is pretty counter intuitive that you need to sort stuff if you just want to list items in the same order they were created.
&lt;br/&gt;&lt;br/&gt;

Similar problems can be fixed using a Redis pattern that I'll show you with an example. We have a web application where we want to show the latest 20 comments posted by our users. Near to the latest comments box we also have a link &amp;quot;show all&amp;quot; that links to a page where it is possible to show more than the latest 20 comments, and there is also pagination so that I can see the whole comments &amp;quot;time line&amp;quot;.
&lt;br/&gt;&lt;br/&gt;

We also assume that every comment is stored in our database, and has an unique incremental ID field.
&lt;br/&gt;&lt;br/&gt;

We can make both the home page box and the comments time line page with pagination fast using a simple Redis pattern:
&lt;ul&gt;&lt;li&gt;Every time a new comment is added, we add its ID into a Redis list: &lt;b&gt;LPUSH latest.comments &amp;lt;ID&amp;gt;&lt;/b&gt;.&lt;/li&gt;

&lt;li&gt;We also trim the list to a given length, so that Redis will just hold the latest 5000 items: &lt;b&gt;LTRIM latest.comments 0 5000&lt;/b&gt;.&lt;/li&gt;

&lt;li&gt;Every time we need to get a range of items for our latest comments usages, we call a function that will do the following (in pseudo code):&lt;/li&gt;

&lt;pre class="code"&gt;
FUNCTION get_latest_comments(start,num_items):
    id_list = redis.lrange(&amp;quot;latest.comments&amp;quot;,start,start+num_items-1)
    IF id_list.length &amp;lt; num_items
        id_list = SQL_DB(&amp;quot;SELECT ... ORDER BY time LIMIT ...&amp;quot;)
    END
    RETURN id_list
END
&lt;/pre&gt;
&lt;/ul&gt;
What we are doing here is simple. In Redis we are taking a &lt;i&gt;live cache&lt;/i&gt;, always updated, of the latest IDs. But we are limited to 5000 IDs, and after the system is started the first time those IDs can be even zero as the list did not existed. So our new function to get the IDs of the latest comments will try to always ask Redis. If our start/count parameters are out of range, we fall back to the database.
&lt;br/&gt;&lt;br/&gt;

We never need to &amp;quot;refresh&amp;quot; the cache with this system, and the SQL database (or other type of on-disk data store) will only be pinged if the user is paginating &amp;quot;far&amp;quot; intervals. So never for the home page, and never for the first pages of our comments time line.
&lt;br/&gt;&lt;br/&gt;

As you can see here Redis is working as a new element. It is not working as a traditional cache, there are no cache refreshes and the info in the Redis instance is always coherent. It is not either working as a database as you can flush the key and everything will continue working. I call it just a &amp;quot;live cache&amp;quot; but there are better names I bet.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Deletion and filtering&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

Note that it is possible to handle comments deletion using LREM. If deletions are pretty rare another option is to just skip the entry when rendering the specific comment, since our DB query to fetch the comment by ID will report us that the comment is no longe there.
&lt;br/&gt;&lt;br/&gt;

Also many times you want to have different listings with different filters. When this filters are limited in number (for example categories) you can simply use a different Redis list for every different filter you have. After all you are just taking 5000 items per list, and Redis can hold millions of items with little memory. As usually is a compromise, use your creativity!
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Leaderboards and related problems&lt;/h3&gt;
Another very common need that is hard to model with good performances in DBs that are not in-memory is to take a list of items, sorted by a score, updated in real time, with many updates arriving every second.
&lt;br/&gt;&lt;br/&gt;

The classical example is the leaderboard in an online game, for instance a Facebook game, but this pattern can be applied to a number of different scenarios. In the online game example you receive a very high number of score updates by different users. WIth this scores you usually want to:
&lt;ul&gt;&lt;li&gt;Show a leaderboard with the top #100 scores.&lt;/li&gt;

&lt;li&gt;Show the user its current global rank.&lt;/li&gt;

&lt;/ul&gt;
This operations are trivial using a Redis sorted set, even if you have millions of users and millions of new scores per minute.
&lt;br/&gt;&lt;br/&gt;

This is how mount this pattern: every time a new score is received by an user, we do:
&lt;pre class="code"&gt;
ZADD leaderboard &amp;lt;score&amp;gt; &amp;lt;username&amp;gt;
&lt;/pre&gt;
&lt;i&gt;Note: you may want to use the user ID instead of the username, it is up to your design&lt;/i&gt;
&lt;br/&gt;&lt;br/&gt;

To get the top 100 users by score is as easy as &lt;b&gt;ZREVRANGE leaderboard 0 99&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

Similarly to tell the user its global rank you just do &lt;b&gt;ZRANK leaderboard &amp;lt;username&amp;gt;&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

Note that you can do more than this, for instance it is trivial to show the user the scores of users &amp;quot;near&amp;quot; his position, that is, to show the portion of the leaderboard that includes the score of our user.
&lt;h3&gt;Order by user votes and time&lt;/h3&gt;
A notable variation of the above leaderboard pattern is the implementation of a site like Reddit or Hacker News, where news are ordered accordingly to a forumla similar to:
&lt;pre class="code"&gt;
score = points / time^alpha
&lt;/pre&gt;
So user votes will raise the news in a proportional way, but time will take the news down exponentially.
Well the actual algorithm is up to you, this will not change our pattern.
&lt;br/&gt;&lt;br/&gt;

This pattern works in this way, starting from the observation that probably only the latest, for instance, 1000 news are good candidates to stay in the home page, so we can ignore all the others.
The implementation is simple:
&lt;ul&gt;&lt;li&gt;Every time a new news is posted we add the ID into a list, with LPUSH + LTRIM in order to take only the latest 1000 items.&lt;/li&gt;

&lt;li&gt;There is a worker that gets this list and continually computes the final score of every news in this set of 1000 news. The result is used to populate a sorted set with ZADD. Old news are removed from the sorted set in the mean time as a cleanup operation.&lt;/li&gt;

&lt;/ul&gt;
At this point we have a sorted set composed of 1000 news sorted by our score. This sorted set can be queried 100k times per second for the top news, so it will be easy to scale the site this way.
&lt;br/&gt;&lt;br/&gt;

The key idea here is that our sorting, made by the background worker, is not a work proportional to the number of users watching the news site.
&lt;br/&gt;&lt;br/&gt;

For the &amp;quot;just posted&amp;quot; section the list of IDs can be used raw, or using the first pattern proposed in this blog post.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Implement expires on items&lt;/h3&gt;
Another way to use sorted sets is to index stuff by time. We just use the unix time as score.
This can be used in general to index things by time, but a notable usage is to expire things in our main database when a given amount of time has elapsed.
&lt;br/&gt;&lt;br/&gt;

This is the pattern:
&lt;ul&gt;&lt;li&gt;Every time a new item is added to our (non Redis) database we add it into the sorted set. As score we use the time at which this item should expire, in other words the current_time+time_to_live.&lt;/li&gt;

&lt;li&gt;There is a background worker doing queries in the sorted set using for instance ZRANGE ... WITHSCORES to take the latest 10 items. If there are scores representing unix times already in the past, we delete this items from the database.&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Counting stuff&lt;/h3&gt;
Redis is a good counter, thanks to INCRBY and other similar commands.
&lt;br/&gt;&lt;br/&gt;

How many times you wanted to add new counters in your database, to take statistics or to show new informations to your users, but avoided it since it is a write-intensive task for your database? This happened to me many times in the past.
&lt;br/&gt;&lt;br/&gt;

Well, just use Redis and don't care! With atomic increments you can take all your counts, reset them atomically with GETSET if needed, put expires in your counters, so that you can take the count of events only if the time difference between those events is less then a given amount of seconds.
&lt;br/&gt;&lt;br/&gt;

For instance using just:
&lt;pre class="code"&gt;
INCR user:&amp;lt;id&amp;gt;
EXPIRE user:&amp;lt;id&amp;gt; 60
&lt;/pre&gt;
You can take the count of how many page views the user did recently, without a pause greater than 60 seconds between page views. When this count reaches, for instance, 20, it is time to show some banner, or reminder, or tip, or what you want.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Unique N items in a given amount of time&lt;/h3&gt;
Another interesting example of statistic that is trivial to do using Redis but is very hard with other kind of databases is to see how many unique users visited a given resource in a given amount of time.
For instance I want to know the number of unique registered users, or IP addresses, that accessed a given article in an online newspaper.
&lt;br/&gt;&lt;br/&gt;

Every time I get a new pageview I just do the following:
&lt;pre class="code"&gt;
SADD page:day1:&amp;lt;page_id&amp;gt; &amp;lt;user_id&amp;gt;
&lt;/pre&gt;
Of course instead of day1 you may want to use the first second of today, as unix time, like: time()-(time()%3600*24), or something like that.
&lt;br/&gt;&lt;br/&gt;

Want to know the number of unique users? Just do &lt;b&gt;SCARD page:day1:&amp;lt;page_id&amp;gt;&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

Need to test if a specific user already accessed that page? Just do &lt;b&gt;SISMEMBER page:day1:&amp;lt;page_id&amp;gt;&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Real time analysis of what is happening, for stats, anti spam, or whatever&lt;/h3&gt;
We did just a few examples, but if you study the Redis command set and combine the data structures in an interesting way you can model an huge number of real time stats with little efforts, in order to power your anti spam systems, or the quality of service you can provide to user thanks to the new information.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Pub/Sub&lt;/h3&gt;
Do you know that Redis includes a fairly high performance implementation of Pub/Sub?
&lt;br/&gt;&lt;br/&gt;

Redis Pub/Sub is very very simple to use, stable, and fast, with support for pattern matching, ability to subscribe/unsubscribe to channels on the run, and so forth. You can read more about it in the &lt;a href="http://redis.io/topics/pubsub"&gt;Redis PubSub official documentation&lt;/a&gt;.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Queues&lt;/h3&gt;
You probably already noticed how Redis commands like list push and list pop make it suitable to implement queues, but you can do more than that: Redis has &lt;a href="http://redis.io/commands/blpop"&gt;blocking variants of list pop commands&lt;/a&gt; that will block if a list is empty.
&lt;br/&gt;&lt;br/&gt;

A common usage of Redis as a queue is the &lt;a href="https://github.com/blog/542-introducing-resque"&gt;Resque&lt;/a&gt; library, implemented and popularized by Github's folks.
&lt;br/&gt;&lt;br/&gt;

With our &lt;a href="http://redis.io/commands/rpoplpush"&gt;http://redis.io/commands/rpoplpush&lt;/a&gt; list rotation commands it is possible to implement queues with interesting semantics that will make your background workers happier! (For instance you can implement a rotating list to fetch RSS feeds again and again, so that every worker will pick the RSS that was fetched more in the past, and thus needs to be updated ASAP). Similarly using sorted sets it is possible to implement priority queues easily.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Caching&lt;/h3&gt;
This section alone would deserve a specific blog post... so in short here I'll say that Redis can be used as a replacement for memcached in order to turn your cache into something able to store data in an simpler to update way, so that there is no need to regenerate the data every time. See for reference the first pattern published in this article.
&lt;br/&gt;&lt;br/&gt;

&lt;h3&gt;Redis can fix your problems now!&lt;/h3&gt;
You can use Redis &lt;b&gt;right now&lt;/b&gt; to do things that will make your users happier, your systems less complex, your site more responsive. You don't need to replace your current setup in order to use it, just start using Redis to do new things that were otherwise not possible, or hard, or too costly.
&lt;br/&gt;&lt;br/&gt;

Have fun!
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;You can discuss this entry here or into&lt;/b&gt; &lt;a href="http://news.ycombinator.com/item?id=2705475"&gt;Hacker News&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;86063 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 14:21:56 | &lt;a href="http://antirez.com/post/take-advantage-of-redis-adding-it-to-your-stack.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/take-advantage-of-redis-adding-it-to-your-stack.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=235"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/Oxrh1hj4h0Y" height="1" width="1"/&gt;</description>
   <dc:date>2011-06-28T14:21:56+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/take-advantage-of-redis-adding-it-to-your-stack.html</feedburner:origLink></item>
  <item>
   <title>An update on Redis and Lua</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/TrOVfI0_Szc/an-update-on-redis-and-lua.html</link>
   <guid isPermaLink="false">http://antirez.com/post/234</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;I released the scripting branch &lt;a href="http://antirez.com/post/scripting-branch-released.html"&gt;just two weeks ago&lt;/a&gt;, but it is already a super hot topic: people are hacking with it, using it in production, dropping forked versions of Redis with C implemented specific commands in favor of simple Lua scripts. I was very in doubt about the opportunity of adding scripting to Redis, it was clear that the benefit was huge, but was the downside huge as well?
&lt;br/&gt;&lt;br/&gt;

Apparently &lt;i&gt;the way&lt;/i&gt; scripting is added is the key factor. I think we got the API right, this was already a good start, but there is more than that. A big question is, should you be able to interact with the external world from Redis scripts, like writing files, require libs, and so forth? I think this is not the case. Scripting should just allow users to do client side using EVAL what usually was previously possible only creating specialized commands in C. Also in the previous blog post I explained how the lack of side effects is crucial for the correctness of AOF and Replication in a scripted environment.
&lt;br/&gt;&lt;br/&gt;

So the scripting feature will be limited to what you can do with vanilla Lua, and with the redis() command that lets you interact with Redis of course, and a few &lt;i&gt;standard&lt;/i&gt; additions that will be registered by Redis in the Lua interpreter (so you know every Redis instance with scripting support contains this features by default), basically:
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;Bitops lua library, that is, bitwise operations. Not a part of vanilla Lua.&lt;/li&gt;

&lt;li&gt;A Json library.&lt;/li&gt;

&lt;li&gt;SHA1&lt;/li&gt;

&lt;/ul&gt;
Why this additions? Because bit operations are more or less part of every language. In Lua they are into a lib as Lua is minimal by design. But it is very useful. Think for instance about implementing a bloom filter in Redis.
&lt;br/&gt;&lt;br/&gt;

As for Json, a lot of people store Json objects inside Redis. It is handy if it is possible to manipulate this objects from Lua. This means you can ask with a one line script to retrieve or modify just a field of your Json object.
&lt;br/&gt;&lt;br/&gt;

Another commonly useful thing is to take digests. Want to make sure a list is a give version? Take the Sha1 of the elements from Lua and return just that. In general a good hash function is a Swiss Army Knife tool, and it is already part of Redis, so we'll provide a sha1() function that given a string returns the hexadecimal SHA1.
&lt;br/&gt;&lt;br/&gt;

There is always time to add more, but this should be enough for many needs.
&lt;h2&gt;News on Lua &amp;lt;-&amp;gt; Redis type conversion&lt;/h2&gt;
I just pushed a new commit in the scripting branch on github that changes how types are converted from/to Redis protocol types.
&lt;br/&gt;&lt;br/&gt;

A key feature is the ability to return from Lua commands everything is valid in the Redis protocol, so that your Lua scripts can return everything a C coded command is able to return. We where &lt;i&gt;almost&lt;/i&gt; there with the old scripting branch, but what was missing was the ability to return tables of tables. Now this is implemented as well.
&lt;br/&gt;&lt;br/&gt;

However there is another issue. What if you want to return a multi bulk reply form Lua, with an element that is a null bulk reply? As we converted the Lua nil type into the Redis null bulk reply, something like that is needed:
&lt;pre class="code"&gt;
EVAL &amp;quot;return {1,2,3,nil,5,6}&amp;quot; 0
&lt;/pre&gt;
But guess what? This does not work. In Lua there is a single aggregate data type that is used also to represent arrays, but there is no way when you ask for element at position '4' to know if it is nil since the element is not present as the Array only has three elements, or since the element is actually set to the nil value.
&lt;br/&gt;&lt;br/&gt;

My solution was to convert null (multi) bulk types into false, and the other way around, false returned from Ruby is turned as a null bulk reply.
&lt;br/&gt;&lt;br/&gt;

This was the only sane option as it allows to do natural things like:
&lt;pre class="code"&gt;
if not redis('get','foo') ...
&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

So this is the new full conversion table used by the latest version of the scripting branch:
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Redis protocol to Lua type&lt;/b&gt;
&lt;ul&gt;&lt;li&gt;integer reply -&amp;gt; Lua number&lt;/li&gt;

&lt;li&gt;bulk reply -&amp;gt; Lua string&lt;/li&gt;

&lt;li&gt;Multi bulk reply -&amp;gt; Lua table (may nest other types, as in Redis protocol semantics)&lt;/li&gt;

&lt;li&gt;Status reply -&amp;gt; Lua table with a single 'ok' field containing a string&lt;/li&gt;

&lt;li&gt;Error reply -&amp;gt; Lua table with a single 'err' field containing a string&lt;/li&gt;

&lt;li&gt;Nil bulk reply and Nil multi bulk reply ($-1 and *-1) -&amp;gt; Lua false boolean type&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Lua type to Redis protocol&lt;/b&gt;
&lt;ul&gt;&lt;li&gt;Lua number -&amp;gt; integer reply (removing everything after the point if it is not an int)&lt;/li&gt;

&lt;li&gt;Lua string -&amp;gt; Bulk reply&lt;/li&gt;

&lt;li&gt;Lua table (Array) -&amp;gt; Multi Bulk Reply (may nest other types, including other tables)&lt;/li&gt;

&lt;li&gt;Lua table with a single 'ok' field -&amp;gt; Status Reply&lt;/li&gt;

&lt;li&gt;Lua table with a single 'err' field -&amp;gt; Error reply&lt;/li&gt;

&lt;li&gt;Lua boolean true -&amp;gt; Integer reply with value 1&lt;/li&gt;

&lt;li&gt;Lua boolean false -&amp;gt; Nil bulk reply&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

It's not a one to one map as Redis and Lua type semantic is a bit different, but I think it is good enough and the best we can get.
&lt;br/&gt;&lt;br/&gt;

Needless to say, with after the response from the community and playing with scripting more in this days, I'm pretty sure it will make it into a stable release as soon as possible.
&lt;br/&gt;&lt;br/&gt;

Enjoy scripting!&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;36968 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 15:50:51 | &lt;a href="http://antirez.com/post/an-update-on-redis-and-lua.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/an-update-on-redis-and-lua.html#disqus_thread"&gt;discuss&lt;/a&gt; | &lt;a href="/print.php?postid=234"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/TrOVfI0_Szc" height="1" width="1"/&gt;</description>
   <dc:date>2011-05-13T15:50:51+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/an-update-on-redis-and-lua.html</feedburner:origLink></item>
  <item>
   <title>Scripting branch released</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/qAXw8zkfgQ4/scripting-branch-released.html</link>
   <guid isPermaLink="false">http://antirez.com/post/233</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;&lt;div class="emph"&gt;
&lt;b&gt;Warning&lt;/b&gt;: the scripting API was modified in recent versions of 'scripting' and '2.2-scripting' branch. Now Lua can call Redis commands using Redis.call('get',...) instead of Redis('get',...).
Also the EVALSHA command is now available in both branches.
&lt;/div&gt;
&lt;br/&gt;&lt;br/&gt;

As expected I did not resisted to the temptation of implementing a branch with Lua scripting support for Redis, and this weekend was the right moment to attack the problem, regardless of the fact that in Italy 1th of May is the &amp;quot;International Workers' Day&amp;quot; and everybody is supposed to don't work. But I actually had a lot of fun coding, and managed to attend a concert, to play at bowling, drink some good vodka, and watch the &amp;quot;Source Code&amp;quot; movie, so I'll call this a busy weekend :)
&lt;br/&gt;&lt;br/&gt;

So after some Lua C API crash course, roughly 400 lines of code, and 8 hours of work in two days, this is the result &lt;a href="https://github.com/antirez/redis/tree/scripting"&gt;in the scripting branch at github&lt;/a&gt;, but most of the code is self contained in the &lt;a href="https://github.com/antirez/redis/blob/scripting/src/scripting.c"&gt;scripting.c&lt;/a&gt; file if you want to understand how the implementation works (hint: it is really simple).
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;IMPORTANT:&lt;/b&gt; the fact that now we have a scripting branch does not imply that we'll ever see this in a stable branch. I wrote an actual implementation so that we can collectively test how good is scripting for Redis. However I'll take the branch rebased against the unstable branch (thanks to the fact the implementation is almost completely self-contained in a single file, so that will be trivial) AND I must admit, now that I played with scripting and that I found a suitable API, I'm impressed by the potential.
&lt;br/&gt;&lt;br/&gt;

&lt;h2&gt;First steps&lt;/h2&gt;
Lua scripting is exposed to Redis as a single command called &lt;b&gt;EVAL&lt;/b&gt;.
&lt;pre class="code"&gt;
EVAL &amp;lt;body&amp;gt; &amp;lt;num_keys_in_args&amp;gt; [&amp;lt;arg1&amp;gt; &amp;lt;arg2&amp;gt; ... &amp;lt;arg_N&amp;gt;]
&lt;/pre&gt;
The &lt;i&gt;body&lt;/i&gt; argument is a Lua script. It is followed by a variable number of arguments, prefixed by the number of keys that are in those arguments. So for instance if I call &lt;b&gt;eval somescript 3 a b c d e f&lt;/b&gt; the arguments a, b, c are considered keys, all the rest of the arguments are considered as normal arguments. Key arguments are received by the Lua script as elements inside the &lt;b&gt;KEYS&lt;/b&gt; table, all the other arguments are stored into the &lt;b&gt;ARGV&lt;/b&gt; table instead.
&lt;br/&gt;&lt;br/&gt;

You may wonder why there is this distinction between keys and arguments. The reason is, if you respect this semantic, the Eval command will be for Redis not different from any other command.
Redis only knows a few things about registered commands: the number of arguments, and what arguments are keys. For example knowing about keys makes Redis Cluster able to forward your requests to the right node. The EVAL command is designed in order to be not different. However this is not enforced. If you want to do strange stuff, you can access all the key space from Lua scripts: we are free to shoot in our own foots :)
&lt;pre class="code"&gt;
redis&amp;gt; eval &amp;quot;return {KEYS[1],KEYS[2],ARGV[1],ARGV[2]}&amp;quot; 2 key1 key2 arg1 arg2
1) &amp;quot;key1&amp;quot;
2) &amp;quot;key2&amp;quot;
3) &amp;quot;arg1&amp;quot;
4) &amp;quot;arg2&amp;quot;
&lt;/pre&gt;
The above example shows two important points: the first is that our arguments are correctly passed via KEYS and ARGS tables. The second is that if you return a Lua Array (that is actually a Lua table indexed by incrementing integers starting from 1) from a Lua script, it will be returned to the user as a multi bulk reply.
&lt;br/&gt;&lt;br/&gt;

Note that we are sending the same script every time instead of storing a procedure. I discussed this in the previous blog post, but the point is that ending with different instances holding different versions of the scripts is really bad. With our current EVAL semantics every instance is the same, regardless of the fact it was just started or not, and regardless of its redis.conf file. However for the scripting engine to be fast we can't keep compiling the same script again and again, so internally Redis will compile the script only the first time it was seen. If the same body is seen again, the already compiled version is used. This makes EVAL a very fast command as we'll see later.
&lt;br/&gt;&lt;br/&gt;

However if bandwidth will be an issue, I've an exit strategy that I'll discuss later in this post.
&lt;h2&gt;Returning Redis data types from Lua&lt;/h2&gt;
We already saw how returning an array from the Lua script generates a multi bulk reply, but here is the full list of returned types:
&lt;pre class="code"&gt;
redis&amp;gt; eval &amp;quot;return 10&amp;quot; 0
(integer) 10
redis&amp;gt; eval &amp;quot;return 'foobar'&amp;quot; 0
&amp;quot;foobar&amp;quot;
redis&amp;gt; eval &amp;quot;return {1,2,'a','b'}&amp;quot; 0
1) (integer) 1
2) (integer) 2
3) &amp;quot;a&amp;quot;
4) &amp;quot;b&amp;quot;
redis&amp;gt; eval &amp;quot;return {err='Some Error'}&amp;quot; 0
(error) Some Error
redis&amp;gt; eval &amp;quot;return {ok='This is a status reply'}&amp;quot; 0
This is a status reply
&lt;/pre&gt;
As you can see you can return everything a Redis command can return, including errors and status replies. So:
&lt;ul&gt;&lt;li&gt;A Lua number is returned as an Redis integer reply.&lt;/li&gt;

&lt;li&gt;A Lua string is returned as a Redis bulk reply.&lt;/li&gt;

&lt;li&gt;A Lua array is returned as a Redis multi bulk reply.&lt;/li&gt;

&lt;li&gt;A Lua table with an 'err' field is returned as a Redis error reply.&lt;/li&gt;

&lt;li&gt;A Lua table with an 'ok' field is returned as a Redis status code reply.&lt;/li&gt;

&lt;/ul&gt;
&lt;h2&gt;Accessing Redis from Lua&lt;/h2&gt;
Our interface to Redis is a single redis() function exposed to the Lua interpreter:
&lt;pre class="code"&gt;
OK
redis&amp;gt; eval &amp;quot;return redis('get',KEYS[1])&amp;quot; 1 x
&amp;quot;foo&amp;quot;
&lt;/pre&gt;
It is as simple as calling Redis with the command as first argument followed by all the other arguments.
The arguments type can be either string or number. Numbers are automatically converted into strings, as Redis commands accept only strings arguments from clients even when the actual meaning of the argument is a number.
Now the really interesting part is how the redis() function return values to Lua: using exactly the reverse of the conversion table above. That is, Redis integer replies are passed to Lua as integers, Bulk replies as Lua strings, and so forth. Everything maps to a Lua type and the other way around, so we can return the result of the redis() function as we did in the example above.
Lua coders may complain that Lua idiomatic error handling is different, but the point here is that Redis errors are treated as a kind of value. For instance you can do things like:
&lt;pre class="code"&gt;
ret = redis(... some command ...)
... do more work ...
return ret
&lt;/pre&gt;
Without even caring about what ret was, if a multi bulk reply, an error, or whatever.
Btw don't worry, nothing is written on the stone, this is an experimental branch and everything is fluid and can be changed later if needed.
&lt;h2&gt;An actual example&lt;/h2&gt;
One of the reasons I'm positive about integrating scripting into Redis in the near future (but don't take this as a promise!) is that is almost our only salvation from making Redis bloated.
For instance yesterday an user wrote in the Redis mailing list writing about how to conditionally decrement a counter, only if the current counter value is greater than zero.
&lt;br/&gt;&lt;br/&gt;

Our current solution is MULTI/EXEC/WATCH, but guess what? It is slow since we are forced to move data from the Redis server to the Redis client,  inspect the current state, and finally issue the commands to modify our dataset. Without to mention that like all the optimistic locking approaches does not work when there is too much contention. After all it is &lt;i&gt;so&lt;/i&gt; simple to conditionally decrement server side, right? But everybody has a different problem. How much commands should we add? With scripting all this specific problems are solved in a general way without making the Redis server a mess with a big number of commands, and without trying to implement our &amp;quot;little language&amp;quot; that will later turn in an ill conceived real language.
&lt;br/&gt;&lt;br/&gt;

So welcome to our first example, decrementing a value only if the value is greater than a given value we pass as argument.
&lt;br/&gt;&lt;br/&gt;

&lt;script src="https://gist.github.com/950965.js?file=gistfile1.rb"&gt;&lt;/script&gt;
&lt;br/&gt;&lt;br/&gt;

The above program produces the following output as you could expect.
&lt;br/&gt;&lt;br/&gt;

&lt;pre class="code"&gt;
ruby cond-decr.rb
3
2
1
0
0
&lt;/pre&gt;
&lt;h2&gt;Two more examples&lt;/h2&gt;
Another common request in the mailing list is to provide &amp;quot;NX&amp;quot; or &amp;quot;EX&amp;quot; versions of Redis commands, that is, commands that are executed only if the target key does not exists or exists.
The following example implements INCREX, that is, increment only if the counter already exists.
&lt;br/&gt;&lt;br/&gt;

&lt;script src="https://gist.github.com/951393.js?file=gistfile1.rb"&gt;&lt;/script&gt;
&lt;br/&gt;&lt;br/&gt;

Executing it will output:
&lt;pre class="code"&gt;
nil
false
11
&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

And finally a more complex example: List shuffling.
&lt;br/&gt;&lt;br/&gt;

&lt;script src="https://gist.github.com/951399.js?file=gistfile1.rb"&gt;&lt;/script&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;h2&gt;How fast it is?&lt;/h2&gt;
Small Lua scripts are so fast that it is basically the same as implementing the command in C!
For instance a small script like the conditional DECR example takes in my MBA 11&amp;quot; (that is slooow) 31 microseconds compared to 14 microseconds of the real SET command. This timings only consider the execution of the command, without all the client I/O, protocol parsing, dispatching, and so forth. Basically once you add all the overhead what you find is that from the outside you can't say which is faster, if SET or the conditional DECR lua script.
&lt;h2&gt;Atomicity&lt;/h2&gt;
Lua scripts are executed like C implemented commands, in a completely atomic way.
This means that you can do everything, but also that you should be aware that you are blocking the server if you execute too slow scripts.
&lt;h2&gt;Replication and AOF&lt;/h2&gt;
Instead of replicating (or writing in the AOF) the executed commands, whole scripts are replicated.
This means that if you want to use replication or AOF with scripting you should write scripts without side effects, not using time or any other external events in order to do their work.
&lt;br/&gt;&lt;br/&gt;

In other words a script should always produce the same result if the initial dataset is the same.
&lt;br/&gt;&lt;br/&gt;

It is very important to do it this way instead of replicating all the single commands generated by the script for a simple reason: with scripting you can do things 10 or even 100 times faster than sending commands.  If we don't replicate scripts but single commands, the master -&amp;gt; slave link will get saturated in no time, OR, the slave instance will not be able to process things as fast as needed.
&lt;h2&gt;Bandwidth is an issue!&lt;/h2&gt;
I already received many tweets about how bandwidth can become an issue since we have to send the same script again and again. But at the same time it is very important to avoid registering scripts for the reasons exposed, we really don't want to have to deal with versioning of scripts among different instances, or with source code of applications calling some non well specified script inside a Redis instance.
&lt;br/&gt;&lt;br/&gt;

But can we have the best of both world? I think we actually can! If bandwidth will be an issue I'll implement the &lt;b&gt;EVALSHA&lt;/b&gt; command. Basically the command will be exactly like EVAL, but instead of the script body it will take the SHA1 of the script body. If the script with such an hash was already seen, it is executed as usually, otherwise an error is returned.
&lt;br/&gt;&lt;br/&gt;

Client libraries will abstract all this from users. The user will always pass the full script, but the client library will try to use the hash. If an error is raised by Redis the client will use EVAL in order to define the script the first time. I don't want to add this right now. Ideas gets better after some time, like red wine.
&lt;h2&gt;Have fun&lt;/h2&gt;
So now I'm back to Redis Cluster for the next weeks, but please if you like Redis scripting play with it and have fun. We need some more experience with it in order to understand what to do with it!
&lt;br/&gt;&lt;br/&gt;

I promise to take the branch rebased against the unstable branch and to fix big issues if you find any.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Feedbacks? &lt;/b&gt; &lt;a href="http://news.ycombinator.com/item?id=2506027"&gt;You can comment this blog post on Hacker News&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;49425 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 12:43:53 | &lt;a href="http://antirez.com/post/scripting-branch-released.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/scripting-branch-released.html"&gt;65 comments&lt;/a&gt; | &lt;a href="/print.php?postid=233"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/qAXw8zkfgQ4" height="1" width="1"/&gt;</description>
   <dc:date>2011-05-02T12:43:53+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/scripting-branch-released.html</feedburner:origLink></item>
  <item>
   <title>Redis and scripting</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/VVnZ8z8SdQ0/redis-and-scripting.html</link>
   <guid isPermaLink="false">http://antirez.com/post/232</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Before doing Redis I was completely addicted with one thing: implementing scripting languages.
I implemented a number of languages, for instance I wrote three implementations of the Tcl language (one is &lt;a href="http://openocd.berlios.de/doc/html/About-Jim_002dTcl.html"&gt;currently actively used&lt;/a&gt;), Scheme interpreters, interpreters for stack based languages similar to FORTH, an interpreter for the Joy language written in Tcl, a macro system for Tcl, and read the source code of Ruby, Python, and many other dynamic language implementations. When I said addicted I meant addicted.
&lt;br/&gt;&lt;br/&gt;

After all I consider Redis a DSL itself... so apart from persistence I'm not really out of my previous business :) So you may wonder how this passion for scripting did not showed into the internals of Redis, that instead is a project very near to the bare metal: C and only C with a focus into efficiency and memory footprint.
&lt;br/&gt;&lt;br/&gt;

The reason is, adding scripting is a big step forward in some way. It means to make everything more dynamic, and I was very very concerned about adding scripting before having a good idea about Redis Cluster: would the scripting capability play well with the cluster? Other problems were related to the idea of defining commands. I don't like the idea of instances with commands defined inside a config file, every Redis instance should be capable of doing everything, without the problem of having different instances with different versions of user defined commands.
&lt;br/&gt;&lt;br/&gt;

For all this reasons I thought at scripting again and again in the latest months... one step after the other I believe I fixed most of the problems I had with scripting, mainly:
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;What scripting language to use?&lt;/li&gt;

&lt;li&gt;What is the semantic of scripting? Should users define commands? Should the command definition be a command itself? How to make sure different commands are in sync in different instances?&lt;/li&gt;

&lt;li&gt;What about software engineering? When you read a source code using Redis and you see something like &amp;quot;Redis.myStrangeCommand key1 value1&amp;quot; what do you do? Need to check the instance to see what the newly defined command does? That sucks.&lt;/li&gt;

&lt;li&gt;What about Redis Cluster? How scripting and cluster interacts?&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

Finally I think I've good solutions for all this problems. So I think it is time to start working at a branch implementing scripting. For now just a branch, the experience of our brave users will tell us if the experiment will turn into a real feature or not. But the real question is &lt;b&gt;why scripting?&lt;/b&gt;.
&lt;br/&gt;&lt;br/&gt;

There are a few fundamental problems that scripting can fix in a wonderful way:
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;Scripting makes Redis much faster for some kind of task. Many complex operations that now require some kind of read-compute-write workflow in the client side will just be simple commands that will take a single exchange with the server. And bandwidth is very important... we discovered this talking with guys that are using Redis in big environments.&lt;/li&gt;

&lt;li&gt;Most Redis workflow tend to be I/O bound, and not CPU bound. And even when you see the CPU at 100% it is likely all about protocol handling. This is almost impossible to avoid as Redis commands are too much faster than dealing with I/O. Lookup of a key into an hash table, some trivial operation, and so forth. With scripting we can put at much better use our bandwidth and CPU power.&lt;/li&gt;

&lt;li&gt;But the fundamental problem is the following: we currently have to either deny features to avoid bloating and leave unsatisfied users, or bloat the server. The problem is, there are many things that you don't want as a command as they are very specific. But this guys actually need this commands, a lot, for their use case. With scripting this problem is completely solved: Redis exports only the general abstractions, what you need 99% of the times. For the 1% use case you write a simple script.&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

And now... I claimed that the above problems with scripting are solved. How?
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Don't define commands&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

Instead of defining commands in some way we can simply send the script again and again. Redis scripts will be usually super short. People need to do things like: set this only of this key already contains that specific value. Or check all the elements of this sorted set in a given range and return the average value. And so forth. So we can do just:
&lt;pre class="code"&gt;
EVAL &amp;quot;... some script ...&amp;quot; arg1 arg2 arg3 arg4 ...
&lt;/pre&gt;
Redis will try to be smart enough to reuse an interpreter with the command defined. But the point is, this solves a lot of problems in a single step! Now there is no longer the problem of defining commands, instances with different versions of the same command (especially in a cluster scenario), and it is everything evident from the source code of the application.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Specify what arguments are keys&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

Actually to deal well with cluster, with the experimental &amp;quot;disk backed Redis&amp;quot; things we are doing, and all the future stuff that we could do to make Redis a more interesting product, we only need to know one thing: what of the arguments are keys? To do so we can simply add a new argument to the EVAL command:
&lt;pre class="code"&gt;
EVAL &amp;quot;... some script ...&amp;quot; num_keys arg1 arg2 arg3 ...
&lt;/pre&gt;
Now we know that only the first num_keys arguments are keys, and we can treat EVAL exactly like all the other commands, without to care at all about the semantics of the script executed.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Use a sane language&lt;/b&gt;
&lt;br/&gt;&lt;br/&gt;

I think that for what we need Lua beats everybody else hands down. The language is not one that I particularly like, compared to Ruby for instance, but who cares? We are programmers and can code a short script in any language we want, but the point is, Lua is a wonderful implementation. Easy to embed, without even a configure script, like Redis! And FAST.
&lt;br/&gt;&lt;br/&gt;

It's really time to try this into a Redis branch ;) So stay tuned as in the next days I'm sure I'll get up with the right swing to code a first implementation we can collectively play with, to refine our feelings.
&lt;br/&gt;&lt;br/&gt;

You can comment this entry &lt;a href="http://news.ycombinator.com/item?id=2490068"&gt;in the Hacker News post&lt;/a&gt;&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;42232 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 16:25:06 | &lt;a href="http://antirez.com/post/redis-and-scripting.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/redis-and-scripting.html"&gt;28 comments&lt;/a&gt; | &lt;a href="/print.php?postid=232"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/VVnZ8z8SdQ0" height="1" width="1"/&gt;</description>
   <dc:date>2011-04-27T16:25:06+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/redis-and-scripting.html</feedburner:origLink></item>
  <item>
   <title>Backporting into Redis 2.4 and other news</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/0tm3jiO60nM/2-4-and-other-news.html</link>
   <guid isPermaLink="false">http://antirez.com/post/231</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;I think I should write more about Redis development... lately I was so focused on writing the code and the Redis Book that finding the time to blog about Redis was really hard, but I'll try to improve in the next weeks. However today I want to provide some fresh news to Redis users: to have some insight into the near future of a project can be very interesting for developers planning to start a new project with Redis.
&lt;br/&gt;&lt;br/&gt;

Currently we have three development branches of development: 2.2, 2.4, Redis Cluster (unstable branch).
&lt;br/&gt;&lt;br/&gt;

2.2 is a bugfix only development line, so we'll continue to ship 2.2.x versions only to fix bugs.
&lt;br/&gt;&lt;br/&gt;

2.4 is our new branch, it is just a few days old. Our old development model with a stable branch and an unstable branch did not worked well, we needed something in the middle. There is simply a lot of stuff that can be back ported from the unstable branch.
&lt;br/&gt;&lt;br/&gt;

The unstable branch where Redis Cluster development is happening, will take time to reach stability as the cluster is a big project (our idea is to release a first stable version of Redis Cluster later this summer). 2.4 is a way to put something better than 2.2 in the hands of our users ASAP.
&lt;br/&gt;&lt;br/&gt;

We hope to ship 2.4 in an estimated time frame of 6 weeks. It will include the following changes compared to 2.2:
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;Memory optimized sorted sets. This means that small sorted sets will take little memory like small hashes, lists, and sets composed of integers are doing already.&lt;/li&gt;

&lt;li&gt;Variadic versions [LR]PUSH, SADD, ZADD, ... so you can, for instance, push multiple values inside a list with a single command. I measured the difference with a few benchmarks and the difference is really dramatic compared to pipelining of many LPUSH commands.&lt;/li&gt;

&lt;li&gt;Big improvements in .rdb persistence. Now specially encoded types are saved directly as they are. Just to give you an example, if you have a dataset composed of lists with an average of 100 elements you can expect 50x faster .rdb persistence.&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

All the above stuff is already inside redis unstable of course, but with 2.4 it will be readily available to all the users in short time. The current 2.4 branch only includes the first two changes, I'm working on merging the latest.
&lt;br/&gt;&lt;br/&gt;

&lt;h2&gt;How to play with Redis Cluster&lt;/h2&gt;
We have also some news about Redis Cluster. You can test with your hands what we have already.
The following is an howto about testing Redis Cluster. Note: &lt;b&gt;Redis Cluster is not complete&lt;/b&gt;, it is currently an alpha with a lot of missing features, and it is not stable. Here the goal is just to provide a preview.
&lt;br/&gt;&lt;br/&gt;

To play with Redis Cluster fire three instances with the following configuration:
&lt;pre class="code"&gt;
port 6379
cluster-enabled yes
cluster-config-file nodes-1.conf
&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

Use port 6379 for the first instance, 6380 and 6381 for the other ports.
Also make sure to use a different &lt;i&gt;cluster-config-file&lt;/i&gt; name, nodes-1.conf, nodes-2.conf, nodes-3.conf. The cluster config file is not something you should change by hand, is a file where a cluster node saves the current configuration to reload the state at restart.
&lt;br/&gt;&lt;br/&gt;

Now that you have three instances running you can start performing some command:
&lt;pre class="code"&gt;
redis&amp;gt; connect 127.0.0.1 6379
redis&amp;gt; cluster info
cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
redis&amp;gt; cluster nodes
0c2f029a52bec8d17c43b137c74205fade1b1921 :0 myself - 0 0 disconnected
&lt;/pre&gt;
As you can see this node only knows about a single node, that is, itself. You can see this from the &amp;quot;myself&amp;quot; flag in the &lt;b&gt;cluster nodes&lt;/b&gt; output. The &lt;b&gt;cluster info&lt;/b&gt; output instead shows how out of the 4096 hash slots in which the key space is divided, nothing is assigned. This is why this node will not be happy to reply to queries:
&lt;pre class="code"&gt;
redis&amp;gt; get foo
(error) ERR The cluster is down. Check with CLUSTER INFO for more information
&lt;/pre&gt;
So the first thing to do is to join the cluster, that is, make nodes aware that there are other nodes around, as this is a completely new cluster.
&lt;br/&gt;&lt;br/&gt;

As a first step we join the instance running at 6379 with the instance running at 6380:
&lt;pre class="code"&gt;
redis&amp;gt; connect 127.0.0.1 6379
redis&amp;gt; cluster meet 127.0.0.1 6380
OK
redis&amp;gt; cluster nodes
0c2f029a52bec8d17c43b137c74205fade1b1921 :0 myself - 0 0 disconnected
96fad8c3b4df5f86ac4abe6205a253c640c751ef 127.0.0.1:6380 master - 1303136527 1303136527 connected
&lt;/pre&gt;
As you can see now 6379 knows about 6380, and this is true for 6380 as well of couse as the new nodes did an handshake:
&lt;pre class="code"&gt;
redis&amp;gt; connect 127.0.0.1 6380
redis&amp;gt; cluster nodes
0c2f029a52bec8d17c43b137c74205fade1b1921 127.0.0.1:6379 master - 1303136590 1303136590 connected
96fad8c3b4df5f86ac4abe6205a253c640c751ef :0 myself - 0 0 disconnected
&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

I can already see in your face the &amp;quot;WTF this fields mean&amp;quot; expression... so every line of 'info nodes' is composed of the following fields, from left to right:
&lt;pre class="code"&gt;
node_id
latest_know_ip_address_and_port
role_in_cluster
node_id_of_master_if_it_is_a_slave
last_ping_sent_time
last_pong_received_time
link_status
&lt;/pre&gt;
Every node has an ID that will be used for all the live of the node. All this info are saved in the nodes.conf file. The format of this file is exactly the same as the &lt;b&gt;cluster nodes&lt;/b&gt; output as I was lazy to invent something new but this turned to be an advantage actually (less code, more descriptive info nodes).
&lt;br/&gt;&lt;br/&gt;

Now Redis Cluster nodes are like bored old ladies, they gossip a lot about other nodes. But the good thing is that at least cluster nodes are very well informed, and only report informations they are pretty sure about ;)
&lt;br/&gt;&lt;br/&gt;

Every node every second sends a PING packet to some random node, actually this node is not selected at random, but among nodes that are believed to be OK but with the oldest &lt;b&gt;pong_received&lt;/b&gt; field in the node structure, so we tend to ping nodes that we don't chat with since more time.
&lt;br/&gt;&lt;br/&gt;

In every PING packet, and in the PONG reply, there is a gossip section where we inform the other node about informations about other nodes. Also when a node pings or pongs another node, there are a lot of detailed information about the node sending the packet.
&lt;br/&gt;&lt;br/&gt;

For a node to be marked as failing we need to both detect that it did not replied to our pings from some time, AND also we need to receive that another node has troubles wit this node, thanks to the gossip section. When this happens the node marks this other node as failing, and sends a &amp;quot;mark-as-failed&amp;quot; message to all the other known nodes.
&lt;br/&gt;&lt;br/&gt;

Let's test gossip in practice. Know we have 6379 joined with 6380. What happens if we join 6380 with 6381 is that also 6379 and 6381 will meet. But Redis Nodes are like good families girls, they only trust and meet with other nodes either already trusted (in their nodes table) or trusted by their friends. The &lt;b&gt;only way&lt;/b&gt; to make a Redis Node talking with another node that is not already in the known nodes list, nor in the know nodes of another trusted node is via the &lt;b&gt;CLUSTER MEET&lt;/b&gt; command.
&lt;pre class="code"&gt;

redis&amp;gt; connect 127.0.0.1 6381
redis&amp;gt; cluster meet 127.0.0.1 6380
OK
redis&amp;gt; connect 127.0.0.1 6379
redis&amp;gt; cluster nodes
8f1e863160f2627108451d0a0155127e8b1b4597 127.0.0.1:6381 noflags - 1303137505 1303137505 connected
0c2f029a52bec8d17c43b137c74205fade1b1921 :0 myself - 0 1303137500 disconnected
96fad8c3b4df5f86ac4abe6205a253c640c751ef 127.0.0.1:6380 master - 1303137505 1303137505 connected
&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

Now all the three nodes are connected and aware of their friends... however the nodes are still not able to reply to queries as hash slots are not assigned at all. To assign hash slots we need to send &amp;quot;CLUSTER ADDSLOTS&amp;quot; commands. We assign part of the 4096 slots to all the nodes, so that all the slots will be covered:
&lt;pre class="code"&gt;
$ echo '(0..1000).each{|x| puts &amp;quot;CLUSTER ADDSLOTS &amp;quot;+x.to_s}' | ruby | redis-cli -p 6379 &amp;gt; /dev/null
$ echo '(1001..2500).each{|x| puts &amp;quot;CLUSTER ADDSLOTS &amp;quot;+x.to_s}' | ruby | redis-cli -p 6380 &amp;gt; /dev/null
$ echo '(2501..4095).each{|x| puts &amp;quot;CLUSTER ADDSLOTS &amp;quot;+x.to_s}' | ruby | redis-cli -p 6381 &amp;gt; /dev/null
&lt;/pre&gt;
(note: actually CLUSTER ADDSLOTS can accept any number of hash slots as parameters, but redis-cli does not work well with huge command lines, so we send a command for every hash slot).
&lt;br/&gt;&lt;br/&gt;

Ok now we should have a much more interesting cluster. Let's try to ask some node about how things are going:
&lt;pre class="code"&gt;
redis&amp;gt; connect 127.0.0.1 6379
redis&amp;gt; cluster nodes
8f1e863160f2627108451d0a0155127e8b1b4597 127.0.0.1:6381 master - 1303138327 1303138327 connected 2501-4095
0c2f029a52bec8d17c43b137c74205fade1b1921 :0 myself - 0 1303138326 disconnected 0-1000
96fad8c3b4df5f86ac4abe6205a253c640c751ef 127.0.0.1:6380 master - 1303138326 1303138326 connected 1001-2500
redis&amp;gt; cluster info
cluster_state:ok
cluster_slots_assigned:4096
cluster_slots_ok:4096
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3
&lt;/pre&gt;
Yes! Now our cluster state is OK. As you can see near every line of cluster nodes output there is the is the list of assigned slots. This informations all propagated thanks to the gossip section of PING/PONG packets. We are ready to try some actual query:
&lt;pre class="code"&gt;
redis&amp;gt; get foo
(error) MOVED 3990 127.0.0.1:6381
redis&amp;gt; get bar
(nil)
&lt;/pre&gt;
Now nodes accept our requests finally. The first request was about hash slot 3990 as the key 'foo' will hash to that hash slot. So we got routed to the right node. A good client will remember this and will directly hit the right node the next time.
&lt;br/&gt;&lt;br/&gt;

Ok, that's all for now. I hope that while I can't show a full solution for now this journey in the status of Redis Cluster was more interesting than just reading my tweets about &amp;quot;I'm working at cluster&amp;quot;.
&lt;br/&gt;&lt;br/&gt;

Also note that to operate on a cluster you'll actually never do this kind of stuff by hand. The &lt;b&gt;redis-trib&lt;/b&gt; program will do all this for you, but my thought was that it is a lot less instructive to just type 'redis-trib create ...'. I wanted to show a bit more of the inner workings.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;35074 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 14:55:43 | &lt;a href="http://antirez.com/post/2-4-and-other-news.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/2-4-and-other-news.html"&gt;6 comments&lt;/a&gt; | &lt;a href="/print.php?postid=231"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/0tm3jiO60nM" height="1" width="1"/&gt;</description>
   <dc:date>2011-04-18T14:55:43+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/2-4-and-other-news.html</feedburner:origLink></item>
  <item>
   <title>On the web server scalability and speed are almost the same thing</title>
   <link>http://feedproxy.google.com/~r/antirez/~3/5a7KF1fa6TE/scalability-and-speed-of-web-apps.html</link>
   <guid isPermaLink="false">http://antirez.com/post/230</guid>
   <description>&lt;div class="blogpost"&gt;&lt;div style="clear:both"&gt;&lt;/div&gt;&lt;div class="blogposttext"&gt;Today I complained on twitter that the obvious way to start writing an application with Ruby and Sinatra is too slow by default. Substituting a template calling the
&lt;pre class="code"&gt;
erb :index
&lt;/pre&gt;
takes &lt;i&gt;a few&lt;/i&gt; milliseconds (just substituting &amp;quot;hello&amp;quot; in toy template). Benchmarking it with apache benchmark shows how a similar hello world app written in PHP serves 1500 requests per second, while using erb this trivial substitution can handle 250 requests per second (both benchmarks ran on my MBA 11&amp;quot;).
&lt;br/&gt;&lt;br/&gt;

Probably there is some lame reason why this happens, like opening the template file again and again or alike (excluding the template the method dispatch of Sinatra + Mongrel is able to serve 600 requests per second, so it is the template substitution).
&lt;br/&gt;&lt;br/&gt;

My point is, it is not ok that by default it is so damn slow. It's not Ruby that is slow, this is not computationally intensive code. Is is simply that nobody cares to provide the basic solution as a fast one apparently. I'm sure that with a few tricks I can handle that, and be happy using Ruby, that is a language that I love semantically, instead of PHP that is a language that I don't like at all.
&lt;br/&gt;&lt;br/&gt;

Many twitter replies were in the tone like &amp;quot;but speed is not equal to scalability&amp;quot;.
&lt;br/&gt;&lt;br/&gt;

I disagree about that. If there is something cool about web programming is that often the web side is &lt;i&gt;trivial to scale&lt;/i&gt; conceptually. Just add more web servers, there is no shared data.
&lt;br/&gt;&lt;br/&gt;

Your only bottleneck in a web app should be: the databases, the workers. There are no excuses for the page generation to be slow. In the web &lt;b&gt;speed is scalability&lt;/b&gt; because every web server is conceptually a parallel computation unit. So if the web page generation takes 10 ms instead of 100 ms I can server everything with just 10% of the hardware.
&lt;br/&gt;&lt;br/&gt;

I love Ruby but this is just another instance of &amp;quot;slow by default&amp;quot; that I don't understand very well, and it is not a matter of optimizing for programmer's performance, you can substitute a simple template 1500 times per second in a MBA, how the lame PHP is teaching us.
&lt;br/&gt;&lt;br/&gt;

&lt;b&gt;Edit: the code I used as requested&lt;/b&gt;. The main point of this two code fragments is, it should be similar to what most people would write as a first example to accomplish that specific work. The code tries to also be conceptually equivalent, loading the template at every request, and performing a substitution involving parsing the template and evaluating the code.
&lt;br/&gt;&lt;br/&gt;

PHP code:
&lt;pre class="code"&gt;
&amp;lt;?
$user = $_GET['username'];
include(&amp;quot;template.php&amp;quot;);
?&amp;gt;&lt;br /&gt;&lt;br /&gt;The above runs at 1500 requests per second.&lt;br /&gt;&lt;br /&gt;tempalte.php is:&lt;br /&gt;&lt;br /&gt;&amp;lt;html&amp;gt;
&amp;lt;body&amp;gt;
&amp;lt;? echo(&amp;quot;Hello &amp;quot;.$user); ?&amp;gt;
&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

Ruby code:
&lt;pre class="code"&gt;
require 'rubygems'
require 'sinatra'&lt;br /&gt;&lt;br /&gt;if 1
    require 'erubis'
    Tilt.register :erb, Tilt[:erubis]
end&lt;br /&gt;&lt;br /&gt;before do
    # Bla
end&lt;br /&gt;&lt;br /&gt;get '/slow/:username' do
    @var = &amp;quot;Hello #{params[:username]}&amp;quot;
    erb :index
end&lt;br /&gt;&lt;br /&gt;get '/fast/:username' do
    return &amp;quot;Hello #{params[:username]}&amp;quot;
end&lt;br /&gt;&lt;br /&gt;get '/subst/:username' do
    f = File.open(&amp;quot;template.tpl&amp;quot;)
    t = f.read
    res = t.sub(&amp;quot;%content%&amp;quot;,&amp;quot;Hello #{params[:username]}&amp;quot;);
    f.close
    return res
end&lt;br /&gt;&lt;br /&gt;The template is:&lt;br /&gt;&lt;br /&gt;Username = &amp;lt;%= @var %&amp;gt;&lt;br /&gt;&lt;br /&gt;Yes, just one line with one var.&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;
&lt;br/&gt;&lt;br/&gt;

&lt;ul&gt;&lt;li&gt;/fast is basically as fast as Sinatra/Mongrel dispatch, so reaches 550 requests/second.&lt;/li&gt;

&lt;li&gt;/slow does what the PHP code actually does, and runs at 300 requests/second. This is the sad part. should be more or less like 'fast'.&lt;/li&gt;

&lt;li&gt;/subst is (pretty incredible) much faster than /slow. 450 requests/second, even if the same file is opened again and again in ruby land.&lt;/li&gt;

&lt;/ul&gt;
&lt;br/&gt;&lt;br/&gt;

Another data point is this very blog you are reading. It's the lamest PHP (written by me in a day just to have a blog engine with a few special features I liked), using many many MySQL queries per page. It serves 250 pages per second in a blog post with 15 comments, including parsing the post that is done with my own function that processes a markdown-alike stuff to convert it into HTML. The same 250 requests second we have with the default Hello World using the erb template.&lt;/div&gt;&lt;div class="blogpostinfo"&gt;&lt;div class="blogpoststats"&gt;46669 views&lt;sup&gt;&lt;a href="/page/uniquevisitors"&gt;*&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;Posted at 18:21:31 | &lt;a href="http://antirez.com/post/scalability-and-speed-of-web-apps.html"&gt;permalink&lt;/a&gt; | &lt;a href="http://antirez.com/post/scalability-and-speed-of-web-apps.html"&gt;49 comments&lt;/a&gt; | &lt;a href="/print.php?postid=230"&gt;print&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/antirez/~4/5a7KF1fa6TE" height="1" width="1"/&gt;</description>
   <dc:date>2011-04-01T18:21:31+00:00</dc:date>
  <feedburner:origLink>http://antirez.com/post/scalability-and-speed-of-web-apps.html</feedburner:origLink></item>
 </channel>
</rss>

