<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;AkUAQH86cSp7ImA9WhBaEEs.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492</id><updated>2013-05-20T18:44:01.119+02:00</updated><title>RealTime Data Compression</title><subtitle type="html">Development blog on compression algorithms</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://fastcompression.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>85</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/FastDataCompression" /><feedburner:info uri="fastdatacompression" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;DUIDR34_fCp7ImA9WhBbEUU.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-5607006648873906616</id><published>2013-04-26T16:39:00.000+02:00</published><updated>2013-05-10T14:06:16.044+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-05-10T14:06:16.044+02:00</app:edited><title>Fighting code bloat (C template-style)</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://images.sixrevisions.com/2010/04/07-10_improve_seo_websites_code_bloat.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="61" src="http://images.sixrevisions.com/2010/04/07-10_improve_seo_websites_code_bloat.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;A little detail has always annoyed me within the &lt;a href="https://code.google.com/p/lz4/"&gt;LZ4&lt;/a&gt;&amp;nbsp;source code : the presence of 2&amp;nbsp;compression&amp;nbsp;functions, and 2 decompression functions.&lt;br /&gt;
&lt;br /&gt;
In both cases, the 2 variants are very close to each other. But the differences are large enough to justify 2 separate functions (such as different underlying types).&amp;nbsp;While it's a minor annoyance, the situation could not last.&lt;br /&gt;
&lt;br /&gt;
Creating the second function was relatively easy : just copy/paste the first function, and modify whatever is necessary. Problems start with code maintenance. Whenever modifying or correcting one function, it's necessary to not forget to also modify the second one, whenever it's applicable. Doing this multiple times, it's likely that a few minor changes get their way&amp;nbsp;in, especially when the code is large (which, thankfully, it is not for LZ4 yet).&lt;br /&gt;
But that's only the beginning of the problems.&lt;br /&gt;
What if other variants are needed ?&lt;br /&gt;
Then, it will be necessary to create a new function almost similar to previous ones, or multiply the current number of variants by 2 if it is an&amp;nbsp;additional&amp;nbsp;on/off parameter. What if I need &lt;i&gt;another &lt;/i&gt;on/off parameter ? That's 8 similar functions. Obviously, this doesn't scale.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;u&gt;C++ template&lt;/u&gt;&lt;/b&gt;&lt;br /&gt;
Dealing with a similar issue in C++ has its solution : it's called &lt;a href="http://en.wikipedia.org/wiki/Template_(C%2B%2B)"&gt;Template&lt;/a&gt;. Getting a list of parameters with predictable values in front of the function template will instruct the compiler to create as many versions of the function as required. It's a very handy way to write the function once, and have it&amp;nbsp;instantiated&amp;nbsp;many times with combination of modifications.&lt;br /&gt;
&lt;br /&gt;
Alas, C coders are not blessed with such a comprehensive tool. &lt;a href="http://en.wikipedia.org/wiki/C99"&gt;C99&lt;/a&gt;, which is the latest language update to care about, doesn't define them. (The more recent &lt;a href="http://en.wikipedia.org/wiki/C11_(C_standard_revision)"&gt;C11&lt;/a&gt;&amp;nbsp;is still too young for widespread deployment, and anyway, even the new&amp;nbsp;&lt;span style="background-color: white; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px;"&gt;Type-generic&amp;nbsp;&lt;/span&gt;defined in C11 is a long shot from Template).&lt;br /&gt;
&lt;br /&gt;
Googling around shows this is a recurrent issue, with already multiple attempts at mimicking template behavior using C macros. With current limitations, it is the sole strategy to adopt. Most of these attempts are fairly complex, which doesn't help for debug, and come with various limitations, depending on attempted objectives (mostly focused on parameter types).&lt;br /&gt;
&lt;br /&gt;
LZ4 is going to require new functions very soon, so the problem of duplicated lines of code is going to be adamant. It becomes urgent to solve it.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;u&gt;The case for inline functions&lt;/u&gt;&lt;/b&gt;&lt;br /&gt;
One potential way to do it is to use&lt;a href="http://www.greenend.org.uk/rjk/tech/inline.html"&gt; inline functions&lt;/a&gt;. Such functions will be&amp;nbsp;instantiated&amp;nbsp;in-place, where they are called. In many ways, inline functions behave the same as macros with parameters, but with the big advantage of being typed and compiled, resulting in much cleaner code to debug. By defining some parameters which mere objective is to enable or disable some parts of the code (typically some checks), the compiler will automatically create the right optimal code, removing useless branches. This is a good start.&lt;br /&gt;
&lt;br /&gt;
However, inline functions also come with limitations, especially in C.&lt;br /&gt;
First, inline is merely a compilation hint. The compiler is free to decide if the function will be instantiated or referenced, the programmer has no direct control on this decision. A basic rule of thumb is : if the function is very small, it will most probably be instantiated in-place, while if it is large, it most likely won't. However, there is a large gap between these two extreme situations, where it's more difficult to guarantee anything.&lt;br /&gt;
&lt;br /&gt;
This limitation push towards small inline functions, which by the way is the intention of the standard. So, instead of creating a very large all-encompassing function, it could be a good idea to cut it into smaller pieces. Alas, there is another problem : in contrast with macros, inline function do not modify their parameters (like normal functions do). But most of the time, the snippets of code have the&amp;nbsp;responsibility&amp;nbsp;to modify several variables, a single result is simply not enough. Since it's not possible to pass variables by references (this is C++ only), to reach this goal, it's possible to pass parameters as pointers to variables. It works (there is such an example into lz4hc). It just makes it even harder for the compiler to understand the intend and optimize such a mess, and therefore less likely to create the best fastest code.&lt;br /&gt;
&lt;br /&gt;
The last limitation is even less forgiving : what can be done when the difference between two versions concerns underlying types ?&lt;br /&gt;
A good example is the HTYPE of LZ4, which can be either an U16 (for the 64K version), a BYTE* pointer (for 32-bits), or an U32 (for 64-bits). The code is almost the same, just this particular type changes depending on the version. How to do that with an inline function ? The simple answer is : you can't.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;u&gt;LZ4 solution&lt;/u&gt;&lt;/b&gt;&lt;br /&gt;
So here is my attempt. In order to reduce the number of functions&amp;nbsp;written&amp;nbsp;into LZ4 source code, preparing the creation of newer ones sharing almost the same code, each function is written into a dedicated file, included several times into the main .c source file, preceded by a set of #define which trigger specific behaviors.&lt;br /&gt;
&lt;br /&gt;
Hence were created lz4_encoder.h and lz4_decoder.h.&lt;br /&gt;
Both files are unusual : they are neither real "headers", nor compilable piece of code. They are meant to be included into another source file (*.c), preceded by a list of #define, some of them being mandatory (such as FUNCTION_NAME), and others being optional (such as LIMITED_OUTPUT).&lt;br /&gt;
Each time the *.h file is included, it creates a new function.&lt;br /&gt;
It becomes possible to write the function once, and create multiple variations of it, greatly simplifying code maintenance.&lt;br /&gt;
&lt;br /&gt;
The situation is not all rosy. First, the main function becomes more complex to write, since it encompasses all possible variations. It's sometimes necessary to refactor the code just to make it more readable, since a collection of :&lt;br /&gt;
#ifdef CONDITION&lt;br /&gt;
&amp;nbsp; &amp;nbsp; specific_code();&lt;br /&gt;
#endif&lt;br /&gt;
can quickly impact readability, resulting in a negative impact on code maintenance.&lt;br /&gt;
&lt;br /&gt;
Another drawback is the necessity for any user program to import several files (lz4.c +&amp;nbsp;lz4_encoder.h&amp;nbsp;+&amp;nbsp;&amp;nbsp;lz4_decoder.h), while a single&amp;nbsp;lz4.c&amp;nbsp;was enough up to now.&lt;br /&gt;
This increased file management complexity was the main reason I've avoided this strategy up to now. But, with the streaming interface in the near future, it was a necessity to ensure that new functions can easily be created while keeping the source code size under control.&lt;br /&gt;
&lt;br /&gt;
In the end, the code refactoring effort also created some immediate win. Performance of several functions improved, notably for the fast variant of the decompression function, the compression of small packets, and the High Compression variant. This is a consequence of "sanitizing" the code, removing useless tests from variants which don't need them, and finding minor differences between 2 versions, keeping only the better one.&lt;br /&gt;
&lt;br /&gt;
Next goal in list is inter-dependent block compression and decompression, an essential piece of the puzzle towards Streaming Interface.&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/c_Wacpb7YFg" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/5607006648873906616/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2013/04/fighting-code-bloat-c-template-style.html#comment-form" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/5607006648873906616?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/5607006648873906616?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/c_Wacpb7YFg/fighting-code-bloat-c-template-style.html" title="Fighting code bloat (C template-style)" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>6</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2013/04/fighting-code-bloat-c-template-style.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkEEQHs8eyp7ImA9WhBVGEo.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-8450256274687455105</id><published>2013-04-09T11:18:00.002+02:00</published><updated>2013-04-25T10:30:01.573+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-04-25T10:30:01.573+02:00</app:edited><title>LZ4 Streaming format : Final specifications</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-nc1Peg5lUfo/UWwT9BNisSI/AAAAAAAAAtk/wFaUKzdMHlo/s1600/LZ4+Streaming+format.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-nc1Peg5lUfo/UWwT9BNisSI/AAAAAAAAAtk/wFaUKzdMHlo/s1600/LZ4+Streaming+format.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
The LZ4 Streaming Format specification has progressed quite a bit since&lt;a href="http://fastcompression.blogspot.fr/2013/03/a-streaming-format-for-lz4.html"&gt; last post&lt;/a&gt;, taking into consideration most issues raised by commenters. It has now reached version &lt;i&gt;1.4 (see edit below)&lt;/i&gt;, which looks stable enough to start implementing the next version of &lt;a href="http://code.google.com/p/lz4/"&gt;LZ4&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;div style="text-align: center;"&gt;
&lt;a href="https://docs.google.com/document/d/1gZbUoLw5hRzJ5Q71oPRN6TO4cRMTZur60qip-TE7BhQ/edit"&gt;&lt;b&gt;LZ4 Streaming format : Specifications v1.4&lt;/b&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
As a consequence, save any last-minute important item raised by contributors, the currently published specification will be used in upcoming LZ4 releases.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;[Edit]&lt;/b&gt; : and last-minute change there is. Following a &lt;a href="https://groups.google.com/forum/?fromgroups=#!topic/lz4c/9PKm5aqMX_I"&gt;suggestion by Takayuki Matsuoka&lt;/a&gt;, the header checksum is now slightly different, in an effort to become more friendly with read-only media, hopefully improving clarity in the process. Specification version is now raised to v1.3.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;[Edit 2]&lt;/b&gt; : A first version of LZ4c, implementing the above specification, is&lt;a href="http://code.google.com/p/lz4/"&gt; available at Google Code&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;[Edit 3]&lt;/b&gt; : Following recommendations from&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Mark_Adler"&gt;Mark Adler&lt;/a&gt;, version v1.4 re-introduce stream checksum. It's not correct to assume that block checksum makes stream checksum redundant : block&amp;nbsp;checksum&amp;nbsp;only validates that each block has no error, while stream checksum verify that all blocks are present and in correct order. Finally, stream checksum also validates the encoding/decoding stage itself.&lt;br /&gt;
v1.4 also introduces the definition of "skippable chunks", which can encapsulate user-defined data of any kind, and be integrated into a flow of LZ4 streams.&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/S33pQrA6Y-Y" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/8450256274687455105/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html#comment-form" title="26 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/8450256274687455105?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/8450256274687455105?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/S33pQrA6Y-Y/lz4-streaming-format-final.html" title="LZ4 Streaming format : Final specifications" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-nc1Peg5lUfo/UWwT9BNisSI/AAAAAAAAAtk/wFaUKzdMHlo/s72-c/LZ4+Streaming+format.png" height="72" width="72" /><thr:total>26</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUUHQnwzeyp7ImA9WhBWFEs.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-3405700522591807243</id><published>2013-03-21T12:02:00.003+01:00</published><updated>2013-04-09T01:20:33.283+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-04-09T01:20:33.283+02:00</app:edited><title>A Streaming format for LZ4</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-L4lGvntswLU/UVDjrsRpCyI/AAAAAAAAAj8/-HAJ1STsm3E/s1600/LZ4+Streaming+format.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="72" src="http://1.bp.blogspot.com/-L4lGvntswLU/UVDjrsRpCyI/AAAAAAAAAj8/-HAJ1STsm3E/s400/LZ4+Streaming+format.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div style="text-align: center;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;br /&gt;
It is a long time since I'm willing to produce a suitable streaming format for LZ4. As stated earlier, the format defined into &lt;i&gt;lz4demo &lt;/i&gt;was not meant to become widespread, and is too limited. It's also completely unsuitable for network-oriented protocols.&lt;br /&gt;
&lt;br /&gt;
As a consequence, you'll find in the following link a &lt;a href="https://docs.google.com/document/d/1WkQXV5kSS7n6yE33zf3LcFFE9_om8XJwqEGrXCBvVhI/edit"&gt;nearly-final specification of the LZ4 streaming format&lt;/a&gt;, in &lt;a href="http://en.wikipedia.org/wiki/OpenDocument"&gt;OpenDocument format&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
It's only "nearly" final, since there are still a few questions left, summarized at the end of each chapter.&lt;br /&gt;
&lt;br /&gt;
However, the main properties seem settled. The format accomodates for a large selection of buffer sizes, authorizes sequential and independant blocks, embed a few checksum options, accept arbitrary flushes at any point, and even define provisions for preset dictionaries mode.&lt;br /&gt;
&lt;br /&gt;
At this stage, the very last questions seem ready to be settled in the next few weeks. Inputs, comments are welcomed.&lt;br /&gt;
&lt;br /&gt;
&lt;div style="text-align: center;"&gt;
&lt;b&gt;&lt;a href="https://docs.google.com/document/d/1WkQXV5kSS7n6yE33zf3LcFFE9_om8XJwqEGrXCBvVhI/edit"&gt;LZ4 streaming format specification&lt;/a&gt;&lt;/b&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/b&gt;
&lt;b&gt;&lt;i&gt;[Edit]&amp;nbsp;&lt;/i&gt;&lt;/b&gt;progresses :&lt;br /&gt;
Settled thanks to your comments (here and direct email):&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Endian convention : more votes for Little Endian.&lt;/li&gt;
&lt;li&gt;Stream Size : 8 bytes seems okay&lt;/li&gt;
&lt;li&gt;Stream checksum : removed (v0.7), block-level checksum seems enough&lt;/li&gt;
&lt;li&gt;High compression flag : removed (v0.8), seems not useful enough&lt;/li&gt;
&lt;li&gt;Block Maximum size : reduced table (v0.9), from 1KB to 4MB&lt;/li&gt;
&lt;li&gt;Block size : simplified compressed/uncompressed flags (v1.0)&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;b&gt;&lt;i&gt;[Edit] &lt;/i&gt;&lt;/b&gt;answering spec-related questions directly into the post&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Jim&amp;gt; suggestion is to allow different checksums, with a 16-bit word identifying which hash&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
Actually, it was my initial intention.&lt;br /&gt;
But i eventually backed off. Why ?&lt;br /&gt;
&lt;br /&gt;
One of LZ4 strong points is its simple specification, which makes it possible for any programmer to produce an LZ4-compatible algorithm of its own, in any language. To reach this goal, complexity is always fought and reduced to a minimum.&lt;br /&gt;
&lt;br /&gt;
If multiple hash algorithms are possible, then the decoder will have to support them all to be compatible with the specification. It's a significant burden, which will deter or slow down people willing to produce, test and maintain their own decoding routine.&lt;br /&gt;
&lt;br /&gt;
Obviously,&amp;nbsp;the streaming format proposed here is not supposed to be "the most appropriate for any usage". I intend it to become "mainstream", cross-platform, and to replace the current "static format" of "lz4demo". But there will always be some specific circumstances in which another format will do a better job.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Matt&amp;gt; deflate format supports preset dictionaries, but nobody uses them. I would drop it.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
Actually, I had a few requests for such a feature. The idea is that pre-set dictionaries can make a great difference when it comes to sending a lot of small independant blocks.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Matt&amp;gt; Do you need checksums at all?&lt;/i&gt;&lt;br /&gt;
&lt;span style="background-color: #fafafa; color: #333333; font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;
I think so. The format is for both short-lived transmission data, and for storage one. Checksum is almost mandatory for the second use-case. Anyway, in the proposal, Checksum is still an "option", so it can be disabled by the sender if it seems better for its own use case.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Matt&amp;gt; Do you need both block and stream checksums? Probably not.&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;Mark&amp;gt; Stream Checksum: I don't see the point if data block checksum gives the appropriate protection&amp;nbsp;&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
That's the good question. It's kind of 50/50 when it comes to evaluating comments.&lt;br /&gt;
The simplicity argument looks good though.&lt;br /&gt;
&lt;b&gt;&lt;i&gt;[Edit 2]&lt;/i&gt;&lt;/b&gt; Stream checksum is removed (v0.7+), keeping only block-level checksum&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Mark&amp;gt; Do you really think your block maximum size values make sense (...)&amp;nbsp;All in all, I tend to think it is a too wide choice anyway&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
Good point. In v0.9, the choice of values for block maximum size has been reduced.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Matt&amp;gt;&amp;nbsp;Do you need variable sized fields just to save a few bytes in the header? Probably not.&amp;nbsp;&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;Mark&amp;gt;&amp;nbsp;I would make that "compressed flag" explicit, and thus keep only one "data size" field disregarding if the data was compressed or not. (...) .&amp;nbsp;I'm not even sure you would need two possible sizes just to save one byte per block.&amp;nbsp;&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
Good points. Apparently, this part looks too complex, and would deserve some re-thinking.&lt;br /&gt;
&lt;i&gt;[Edit 2]&lt;/i&gt; : the specification regarding block size is simplified in v1.0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Adrien&amp;gt; Would it be better to let users select their own dictionary identifiers, rather than requiring Dict-ID to be the result of passing the dictionnary through xxHash-32 ?&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
Good point . This behavior mimics the spec of &lt;a href="http://www.ietf.org/rfc/rfc1950.txt"&gt;RFC1950 &lt;/a&gt;(&lt;a href="http://www.zlib.net/"&gt;zlib&lt;/a&gt;). The only advantage i can think of is that it allows the decoder (and encoder) to check if the dictionary is the right one. I'm unsure if this is really useful though....&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Takayuki&amp;gt;&amp;nbsp;LZ4 stream may be (pre) loaded on Ready Only Memory.&amp;nbsp;In this case, temporal masking 0 for BC.Checkbits is difficult.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
Correct. The current proposal is to load the header into RAM in order to perform the masking and checksum there. Is that an issue ?&lt;br /&gt;
Another proposal could be to not check the checkbits for&amp;nbsp;ROM&amp;nbsp;pre-loaded streams, since potential transmission error is nullified for such scenario.&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/1Vd94NNvQaQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/3405700522591807243/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2013/03/a-streaming-format-for-lz4.html#comment-form" title="22 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/3405700522591807243?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/3405700522591807243?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/1Vd94NNvQaQ/a-streaming-format-for-lz4.html" title="A Streaming format for LZ4" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-L4lGvntswLU/UVDjrsRpCyI/AAAAAAAAAj8/-HAJ1STsm3E/s72-c/LZ4+Streaming+format.png" height="72" width="72" /><thr:total>22</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2013/03/a-streaming-format-for-lz4.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkMMR3YzfSp7ImA9WhBSF0k.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-8761751514451787226</id><published>2012-12-12T20:16:00.002+01:00</published><updated>2013-02-25T00:28:06.885+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-02-25T00:28:06.885+01:00</app:edited><title>xxHash : new version</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.blogger.com/blogger.g?blogID=834134852788085492" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="108" src="http://www.partow.net/images/hash_avalance.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;It's a few monthes since the&lt;a href="http://fastcompression.blogspot.fr/2012/04/selecting-checksum-algorithm.html"&gt; initial release of xxHash&lt;/a&gt;. The algorithm has almost fullfilled its initial objective, which is to provide a Hash function for error detection fast enough to use within LZ4 streaming interface.&lt;br /&gt;
&lt;br /&gt;
Although the "fundamentals" were fine, a few details were still rough on the edges.&amp;nbsp;The most important "missing feature" was the ability to provide input data in several consecutive blocks. When hashing a large file for example, the allocated buffer might not be large enough to store the whole input within a single block.&lt;br /&gt;
&lt;br /&gt;
In order to accomodate this need, a &lt;a href="http://code.google.com/p/xxhash/"&gt;new version of xxHash&lt;/a&gt; has been created, which is BSD licensed.&lt;br /&gt;
&lt;br /&gt;
The way it works is by dividing the job into 3 parts :&lt;br /&gt;
&lt;a href="http://code.google.com/p/xxhash/source/browse/trunk/xxhash.h"&gt;XXH32_init()&lt;/a&gt; creates the context structure in which intermediate results will be stored.&lt;br /&gt;
This structure must be passed as an argument of function&amp;nbsp;&lt;a href="http://code.google.com/p/xxhash/source/browse/trunk/xxhash.h"&gt;XXH32_feed()&lt;/a&gt;, which is used to provide input in several consecutive blocks. Any number of blocks is possible, there is no limit.&lt;br /&gt;
When all data has been provided, it's time to retrieve the result, using&amp;nbsp;&lt;a href="http://code.google.com/p/xxhash/source/browse/trunk/xxhash.h"&gt;XXH32_result()&lt;/a&gt;. This function also takes care of de-allocating the context structure.&lt;br /&gt;
&lt;br /&gt;
A "single pass" function is also provided, both for simplicity (a simple function call is enough) and for performance. The latter is important if the amount of data to hash is small (&amp;lt;100 bytes, also called "small keys"). In this case, since there is no intermediate structure to allocate &amp;amp; maintain, the savings are significant.&lt;br /&gt;
&lt;br /&gt;
To simplify usage and interoperability, there is now a single xxHash version, which is "strong" (meaning it successfully pass all tests from &lt;a href="http://code.google.com/p/smhasher/wiki/SMHasher"&gt;SMHasher test suite&lt;/a&gt;). This is possible because the new version is also &lt;i&gt;&lt;u&gt;faster &lt;/u&gt;&lt;/i&gt;(5.4GB/s on my Core 2 Duo, to be compared with 4.2GB for the older one). The speed difference does no longer justify a "fast" version with lessened distribution guarantee.&lt;br /&gt;
&lt;br /&gt;
The framework is also more extensible, meaning that versions for 64-bits, 128-bits and 256-bits can appear in the future. But for the time being, the focus is really on the 32-bits version. It's designed to be very fast on all kind of 32-bits CPU, including embedded ones (such as ARM), with still the objective to become a companion error checker for LZ4 streaming.&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/qeY4tiXj9jY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/8761751514451787226/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/12/xxhash-new-version.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/8761751514451787226?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/8761751514451787226?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/qeY4tiXj9jY/xxhash-new-version.html" title="xxHash : new version" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/12/xxhash-new-version.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkcFQ38yeCp7ImA9WhJSEkQ.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-5884216700158511750</id><published>2012-07-03T06:21:00.000+02:00</published><updated>2012-07-03T06:33:32.190+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-07-03T06:33:32.190+02:00</app:edited><title>Log file compression</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.blackbeltcoder.com/Articles/files/a-convenient-logfile-class/LogFileSample.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="167" src="http://www.blackbeltcoder.com/Articles/files/a-convenient-logfile-class/LogFileSample.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;Although i'm currently on holliday, with limited access to the world wide web,&lt;br /&gt;
i would like to link here an &lt;a href="https://groups.google.com/d/msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ" target="_blank"&gt;interesting contribution from Vitor Oliveira&lt;/a&gt;, an LZ4 user,&amp;nbsp;which seems to have found some pretty impressive results for log file compression&amp;nbsp;by using a simple technique :&lt;br /&gt;
multi-pass compression.&lt;br /&gt;
&lt;br /&gt;
More specifically, his method, which involves several pass of &lt;a href="http://fastcompression.blogspot.fr/p/lz4.html" target="_blank"&gt;LZ4 &lt;/a&gt;algorithms, seems to produce compressed file which are several times smaller than &lt;a href="http://zlib.net/" target="_blank"&gt;zlib&lt;/a&gt;, while requiring&amp;nbsp;only&amp;nbsp;a fraction of the computation cost.&lt;br /&gt;
&lt;br /&gt;
Some preliminary results :&lt;br /&gt;
zlib (one pass) : 54 MB, 265ms&lt;br /&gt;
LZ4 (one pass) : 56 MB, 6ms&lt;br /&gt;
LZ4 (multi-pass) : 4 MB, 16 ms&lt;br /&gt;
&lt;br /&gt;
Since log file compression is a relatively common scenario, i figure this was interesting to share :&lt;br /&gt;
&lt;a href="https://groups.google.com/d/msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ"&gt;https://groups.google.com/d/msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/3QTD5LVGSW4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/5884216700158511750/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/07/log-file-compression.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/5884216700158511750?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/5884216700158511750?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/3QTD5LVGSW4/log-file-compression.html" title="Log file compression" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/07/log-file-compression.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkEMQ3s_fip7ImA9WhVbFE0.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-4630996975042771853</id><published>2012-05-30T21:04:00.000+02:00</published><updated>2012-05-30T21:04:42.546+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-05-30T21:04:42.546+02:00</app:edited><title>Compressed data transmission</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.maxnet.co.nz/uploads/images/products/wideareanetwork_1.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="135" src="http://www.maxnet.co.nz/uploads/images/products/wideareanetwork_1.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;If there is a situation where data is inherently&amp;nbsp;short-lived, it is communication. Data starts its live on the sender side, and end it on the receiving side, a few milliseconds later.&lt;br /&gt;
&lt;br /&gt;
Or does it ? Sometimes, data comes from a file into a local storage, or can be stored at the receiving side. In such case, data is merely "traveling", but is not "short-lived".&lt;br /&gt;
&lt;br /&gt;
Does it make any difference ? In fact, yes, it does.&lt;br /&gt;
&lt;br /&gt;
When it comes to sending a file content, this data can be "prepared" in advance. Which means it can be compressed ahead of sending it. Very strong (asymmetric) algorithms can be used for compression, as long as decoding remains "fast enough" to cope with data speed. This leads to major bandwidth reduction, and therefore improve cost and perceived transmission speed.&lt;br /&gt;
&lt;br /&gt;
When it comes to sending "short-lived" data, it means this data did not exist before being produced, and the sole purpose of this data existence is to be sent, and (generally) consumed on receiving end. There is no way to "prepare" such data in advance, it must be compressed "on the fly", which means "fast".&lt;br /&gt;
&lt;br /&gt;
But there is another terrible side effect : compression performance primarily comes from its capacity to "learn patterns", and re-apply them in an optimal way. Which means, for compression to be effective, a minimum of "historic data" must have already been processed for the next data to be properly compressed. With a file content, the history can be the entire file itself, which could mean a lot of megabytes, and therefore excellent perspectives for compression.&lt;br /&gt;
The situation is much more severe when data is generated and compressed "on the fly" : maybe the message to be sent is only a few tens of bytes long. How to compress such a thing ?&lt;br /&gt;
&lt;br /&gt;
Let's study this use case.&lt;br /&gt;
A first "naive" implementation would simply create a message, make a packet out of it, compress it and then send it.&lt;br /&gt;
This implementation is unlikely to bring tangible benefits, since IP packets are typically small, trying to match &lt;a href="http://en.wikipedia.org/wiki/Maximum_transmission_unit"&gt;MTU &lt;/a&gt;in order to avoid &lt;a href="http://en.wikipedia.org/wiki/IP_fragmentation"&gt;fragmentation &lt;/a&gt;side-effects.&lt;br /&gt;
&lt;br /&gt;
A second, more compression-friendly, implementation, could try to amass enough information before starting to compress it, and then send the compressed data using as many packets as necessary.&lt;br /&gt;
This will certainly bring better compression performance, but introduces another problem, &lt;a href="http://en.wikipedia.org/wiki/Latency_(engineering)"&gt;latency&lt;/a&gt;. Waiting for "enough data to be compressed" can lead to unacceptable delays.&lt;br /&gt;
For example, in real-time games, player command must be sent basically a.s.a.p.&lt;br /&gt;
As another use case, some systems may generate little data (a temperature probe for example), separated by long cycle&amp;nbsp;duration.&lt;br /&gt;
Therefore, waiting for "enough data" is not a valid strategy in such circumstances.&lt;br /&gt;
&lt;br /&gt;
A third, more complex, strategy, would use all past transmitted&amp;nbsp;data&amp;nbsp;as a kind of "dictionary", to help compress the next packet to come.&lt;br /&gt;
This basically requires the dictionary to remain "synchronized" at both end, sender and receiver. This is achievable in an "ideal" environment (no loss, no error), which is quite common&amp;nbsp;in fact&amp;nbsp;when using &lt;a href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol"&gt;TCP &lt;/a&gt;transmission.&lt;br /&gt;
&lt;br /&gt;
So, to sum up, we have some freshly generated data to send, of any size but typically small (a few hundreds of bytes), and we want to use all previously transmitted data as dictionary to improve compression, which requires some kind of "memory" at both sender and receiver end.&lt;br /&gt;
This looks possible.&lt;br /&gt;
In fact, this is a direct usage of &lt;a href="http://fastcompression.blogspot.fr/2012/05/members-properties.html"&gt;"variable block sizes" concept&lt;/a&gt; which i expressly ruled out as "not useful" in an earlier blog note :). Now seems a good time to consider it again...&lt;br /&gt;
&lt;br /&gt;
Such implementation would however require some new functions, able to re-use and save some "history data", instead of starting from freshly clean tables. This will require quite some work to achieve.&lt;br /&gt;
&lt;br /&gt;
As a side effect of such methodology, it also means that such compressed packet are not compatible with &lt;a href="http://en.wikipedia.org/wiki/Stateless_protocol"&gt;stateless &lt;/a&gt;protocols : since they depend on previously sent data, they are inherently stateful. But so are TCP sessions anyway...&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/DHyniWisTmE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/4630996975042771853/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/05/compressed-data-transmission.html#comment-form" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/4630996975042771853?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/4630996975042771853?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/DHyniWisTmE/compressed-data-transmission.html" title="Compressed data transmission" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>7</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/05/compressed-data-transmission.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkQASH86eip7ImA9WhVbEkk.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-6045689488120221401</id><published>2012-05-28T23:20:00.001+02:00</published><updated>2012-05-28T23:25:49.112+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-05-28T23:25:49.112+02:00</app:edited><title>Members properties</title><content type="html">&lt;a href="http://www.sqs.com/en/sa/_images/6_2_SQS-PractiQ-EED.jpg" imageanchor="1" style="clear: left; display: inline !important; float: left; margin-bottom: 1em; margin-right: 1em; text-align: center;"&gt;&lt;img border="0" height="65" src="http://www.sqs.com/en/sa/_images/6_2_SQS-PractiQ-EED.jpg" width="200" /&gt;&lt;/a&gt;&lt;br /&gt;
&amp;nbsp;After spending some time on&amp;nbsp;expected&amp;nbsp;properties at streaming level, let's now get to the core of the objective, regarding the compressed data parameters.&lt;br /&gt;
&lt;br /&gt;
As &lt;a href="http://fastcompression.blogspot.fr/2012/05/useful-compressed-streaming-properties.html"&gt;stated previously&lt;/a&gt;, a compressed stream consists of several members, the most important ones being compressed data sets. Each member starts with a header, in order to identify its content. And each header starts with a magic number, a kind of 'ID tag'.&lt;br /&gt;
&lt;br /&gt;
We'll focus here on "LZ4 compressed data set". The stream design above allows adding any future compression algorithm at a later stage.&lt;br /&gt;
&lt;br /&gt;
And let's take as an example the framing format defined by&lt;a href="http://code.google.com/p/lz4/source/browse/trunk/lz4demo.c?spec=svn66&amp;amp;r=66"&gt; lz4demo&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
1) There is a magic number, which is&amp;nbsp;&lt;span style="background-color: white; color: #006666; font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace; font-size: 13px; white-space: pre-wrap;"&gt;0x184C2102,&lt;/span&gt;in little endian format.&lt;br /&gt;
2) There are no explicit parameters. In fact, all parameters are implicit.&lt;br /&gt;
They are :&lt;br /&gt;
- The compressed data set is cut into blocks of 8MB&lt;br /&gt;
- Each block starts with a field giving its size (therefore, the compressed size)&lt;br /&gt;
- Blocks are&amp;nbsp;independent&lt;br /&gt;
- The original data size is not stored. It will be known on decoding completion&lt;br /&gt;
- There is no checksum&lt;br /&gt;
&lt;br /&gt;
Well, even with such limitations, the format nonetheless works perfectly fine. It's just a little too restricted to become a "generic format", and therefore, the objective of the specification is to provide more room for parameters selections.&lt;br /&gt;
&lt;br /&gt;
We have already established in previous blog posts that allowing&amp;nbsp;&lt;a href="http://fastcompression.blogspot.fr/2012/04/detecting-errors.html"&gt;checksum for Error detection&lt;/a&gt; is an important selectable feature.&lt;br /&gt;
Another important one is the ability to &lt;a href="http://fastcompression.blogspot.fr/2012/05/memory-buffer-management.html"&gt;select block size&lt;/a&gt;, since they directly control the amount of memory buffers necessary at decoding side.&lt;br /&gt;
&lt;br /&gt;
Let's now study and establish potential needs for a few other properties :&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Source data size&lt;/b&gt;&lt;br /&gt;The original size of source data is not an absolute necessity : it's always possible to decode without it,&amp;nbsp;&lt;i&gt;as long as buffer sizes&lt;/i&gt;&amp;nbsp;are properly described.&lt;br /&gt;&lt;br /&gt;But it is nonetheless useful. For example, thanks to this information, the number of blocks within the current member can be calculated beforehand. Moreover the amount of data to decode from the last block is known.&lt;br /&gt;Or, if there is a single block, the exact amount of memory can be allocated, instead of the block maximum size.&lt;br /&gt;It is also useful to display the processing position (yep, we decoded 80MB, but does that represent 10% or 90% of the stream to decode ?)&lt;br /&gt;&lt;br /&gt;However, there are also circumstances in which this data is not known. For example, if the input was piped to the compressing process, then the size will be known only on hitting its end. This might be too late to "retrofit" the output.&lt;br /&gt;Another situation is when several compressed data sets are appended into a single stream : then the "source data size" field only applies to the current data set, but the total size is not known.&lt;br /&gt;&lt;br /&gt;Therefore, since it is useful but not compulsory, this information shall be present, but as an option only.&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Uncompressed blocks&lt;/b&gt;&lt;br /&gt;A simple but important feature, since it avoids the bandwidth overhead and CPU consumption of the compression format when it's useless.&lt;br /&gt;This could be done very cheaply, by accepting that, if the size of the compressed block is the same as the defined one, then it's necessarily uncompressed.&lt;br /&gt;&lt;br /&gt;This suggestion looks simple enough for most blocks, except for the last one, which size is unknown (but capped). &lt;br /&gt;Therefore, it would be necessary to know the size of the last block to compare it to the compressed one, and therefore determine if the block is compressed or not.&lt;br /&gt;&lt;br /&gt;Another idea would be : let's give up this complexity, the last block is always compressed, even if compression is either useless or detrimental.&lt;br /&gt;Actually, it's not a good idea to "not handle the last block", since there is a&amp;nbsp;disastrous&amp;nbsp;corner case : supposed that the compressed size of the last block is exactly the size of an uncompressed full block : then the decoding will assume it's uncompressed, leading to data corruption.&lt;br /&gt;&lt;br /&gt;This corner case can be avoided by enforcing a simple rule : a compressed block is necessary smaller than original size. Therefore, as the last block has a size &amp;lt;= block size, its compressed size is necessarily &amp;lt; block size. Hence, if the size of this last block is the maximum size, then we are in the specific but valid corner case where the last block size is exactly the maximum size of a block, and is not compressible.&lt;br /&gt;&lt;br /&gt;OK, enough of corner cases, let's now be in the normal situation where the last block size is a fraction of the maximum block size. How could we know it is uncompressed ?&lt;br /&gt;&lt;br /&gt;This problem could be mitigated by inserting an information to know that we are dealing with the last block. For example, knowing the original size of the source data is enough for this need.&lt;br /&gt;&lt;br /&gt;But it's not always available. As said previously, this is just an option, since in some streaming mode, this information is unknown. Therefore we need another indicator.&lt;br /&gt;&lt;br /&gt;It could be something as simple as a bit, which simply tells that there is another block to follow, and as a consequence, the current block is full sized. As a bonus, this mechanism also protects against "silent block truncation" (when the compressed stream is cut exactly at the border between 2 blocks).&lt;br /&gt;On reaching the last block, we need another piece of information, either the uncompressed size of the block, or if the block is compressed. The latter seems more compact.&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Zero-filled blocks&lt;/b&gt;&lt;br /&gt;This idea was actually proposed by&amp;nbsp;&lt;a href="http://cbloomrants.blogspot.fr/"&gt;Charles Bloom&lt;/a&gt;&amp;nbsp;: it's not rare, for a section of input data, to be filled with zeros.&lt;br /&gt;The idea would be to "mark" such blocks with a specific prefix, such as "0".&lt;br /&gt;For such situation to have reasonable chances to happen, the block size must be small enough. For example, this will probably almost never happen with lz4demo block size (8MB), while this is going to be much more frequent with very small blocks, such as 4KB ones.&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Error correction&lt;/b&gt;&lt;br /&gt;While error detection has been much talked about, nothing has been said up to now about error correction.&lt;br /&gt;That's both because this feature is much more complex to provide and of questionable usefulness.&lt;br /&gt;&lt;br /&gt;Error correction is mostly useful in situations when there is no way to "resend" erroneous data. This applies to real-time&amp;nbsp;codec&amp;nbsp;(such as voice or video) and stored archive.&lt;br /&gt;The difficulty in both cases is that erroneous data tends to be "bursty". For example, when a storage sector fails, we don't lose just a few bytes, but an entire sector size, typically 4KB. Same for transmission, where the most common error is a missing packet.&lt;br /&gt;Dealing with large burst of errors requires some specific techniques, which unfortunately cost much&amp;nbsp;processing power and memory. As a consequence, the CPU and memory budget for error correction is way beyond LZ4 one, which makes the association a questionable choice.&lt;br /&gt;&lt;br /&gt;Therefore, it seems this feature is not expected to be "generic enough" to reserve a place into the generic framing format specification. Obviously, forking is always possible, and even recommended, to support specific features.&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;&lt;b&gt;Allow multi-threaded compression and decompression&lt;/b&gt;&lt;span style="font-weight: normal;"&gt;&lt;br /&gt;Multi-threaded compression is easily achievable thanks to the division of input data into "blocks".&lt;/span&gt;&lt;br style="font-weight: normal;" /&gt;&lt;span style="font-weight: normal;"&gt;Multi-threaded decoding is also possible if those blocks are "independent".&lt;/span&gt;&lt;br style="font-weight: normal;" /&gt;&lt;span style="font-weight: normal;"&gt;Both mode shall be possible, and selectable&lt;/span&gt;&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Variable block sizes&lt;/b&gt;&lt;br /&gt;This one is tricky : up to now, we have been talking about "fixed size" blocks only, with only the last block of a compressed data set having an unknown (but capped) size.&lt;br /&gt;The idea here would be to&amp;nbsp;authorize&amp;nbsp;blocks of arbitrary size, instead of fixed ones.&lt;br /&gt;&lt;br /&gt;The benefits are two-fold :&lt;/li&gt;
&lt;ul&gt;
&lt;li&gt;Separate data on "natural boundaries", in order to improve compression ratio and speed&lt;/li&gt;
&lt;li&gt;Allow data insertion of any size&lt;br /&gt;&lt;br /&gt;The first point is simple to argue with : such benefit only occurs with very-high ratio (and slow) compression algorithms, such as CM, which "learn" the data behavior through statistics. There is no tangible benefit in trying to do the same for LZ4.&lt;br /&gt;&lt;br /&gt;The second benefit is more interesting, since it&amp;nbsp;authorizes some flexibility in archive management.&lt;br /&gt;Actually, this is mitigated by the possibility to concatenate compressed data sets (or "members") together in a single stream or file.&lt;br /&gt;Inserting data could therefore be&amp;nbsp;realized&amp;nbsp;by cutting the initial member into 2 parts, inserting the new member, and&amp;nbsp;concatenating&amp;nbsp;the 3 members together.&lt;br /&gt;As a consequence, it seems the format already supports such scenario, without needing variable block sizes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;ul&gt;&lt;br class="Apple-interchange-newline" /&gt;&lt;/ul&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Partial extraction and Quick Jump Table&lt;/b&gt;&lt;br /&gt;Another potential idea is that, within a member, one could need to only extract a specific portion.&lt;br /&gt;It's always possible to decode the full data set and get to the required location, but sometimes this might be overkill. For example, one may need a few MB of data which happen to be a few GB away from the starting point.&lt;br /&gt;&lt;br /&gt;However, the idea to decode just the necessary part introduces a few restrictions :&lt;/li&gt;
&lt;ul&gt;
&lt;li&gt;First, the input media should be seekable. It makes little sense to partially decode a piped streams, since the decoding process is likely faster than the pipe itself.&lt;/li&gt;
&lt;li&gt;Second, the compressed data shall be cut into&amp;nbsp;independent&amp;nbsp;blocks. Otherwise, it would be necessary to decode, and therefore read, all previous blocks&lt;/li&gt;
&lt;li&gt;Third, to avoid to decode "too much data", the blocks shall be small enough, with corresponding impact on compression ratio (the smaller the block, the lower the compression ratio).&lt;/li&gt;
&lt;li&gt;Fourth, since the i/o process is likely slower than LZ4 decoding, there is a benefit only if it is possible to quick-jump to the right location immediately.&lt;br /&gt;This can be achieved thanks to a table at the beginning of the compressed file. Such a table can only be filled after compression, and therefore is incompatible with non-seekable output.&lt;/li&gt;
&lt;li&gt;Fifth, such "table" mechanism at member level would be useless in members appending scenarios.&lt;br /&gt;&lt;br /&gt;These are quite many restrictions, for the benefit of a hardly-requested feature.&lt;br /&gt;So probably this capability shall be left to a dedicated framing format.&lt;br /&gt;Moreover, should the input stream be seekable, it's still possible to "hop" over blocks without reading/decoding them. This is still slower than a direct jump, but still a sensible potential speed improvement.&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Error detection algorithm&lt;/b&gt;&lt;br /&gt;As a quick follow up of&amp;nbsp;&lt;a href="http://fastcompression.blogspot.fr/2012/04/selecting-checksum-algorithm.html"&gt;selecting-checksum-algorithm&lt;/a&gt;, one could note that i had not specified a&amp;nbsp;preferred&amp;nbsp;checksum algorithm, only a preferred checksum size (32-bits).&lt;br /&gt;Although at this stage i'm somewhat inclined to use&amp;nbsp;&lt;a href="http://code.google.com/p/xxhash/"&gt;xxhash-strong&lt;/a&gt;, due to its speed and very good distribution property, there is still a chance that the algorithm might be found unsuitable at a later stage. Therefore, some provision should be left to allow another algorithm to take over later on if need be.&lt;br /&gt;&lt;br /&gt;Pushing the idea a bit further, one could think "let the user select its own checksum algorithm". While the idea may sound nice, it goes against the principle of interoperability, which is exactly what this framing format tries to achieve. Therefore, only clearly defined checksum algorithms shall be allowed.&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
I believe this post went through most foreseeable requirements for the LZ4 framing format.&lt;br /&gt;
So now seems a reasonable time to start a header specification.&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/zsP7o9Xz-ns" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/6045689488120221401/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/05/members-properties.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/6045689488120221401?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/6045689488120221401?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/zsP7o9Xz-ns/members-properties.html" title="Members properties" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/05/members-properties.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0MBR348fCp7ImA9WhVUGUk.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-6855021350965638844</id><published>2012-05-25T13:27:00.000+02:00</published><updated>2012-05-25T13:30:56.074+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-05-25T13:30:56.074+02:00</app:edited><title>Useful compressed streaming properties</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.sqs.com/en/sa/_images/6_2_SQS-PractiQ-EED.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="65" src="http://www.sqs.com/en/sa/_images/6_2_SQS-PractiQ-EED.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;The previous articles were primarily targeted at&amp;nbsp;&lt;a href="http://fastcompression.blogspot.fr/2012/04/detecting-errors.html"&gt;error detection&lt;/a&gt; and &lt;a href="http://fastcompression.blogspot.fr/2012/05/memory-buffer-management.html"&gt;memory buffer management&lt;/a&gt;, which&amp;nbsp;are, arguably, very important features. We'll continue digging into the properties which seem necessary to build a suitably universal compressed streaming format.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Compressed data appending&lt;/b&gt;&lt;br /&gt;There are some use cases in which newly created compressed&amp;nbsp;data is simply appended to an existing file or stream. Some time later, the file will be decoded and/or filtered, searching for data of potential interest.&lt;br /&gt;The decoder must not choke with such input. It shall simply continue the decoding, dealing with each compressed data one after another.&lt;br /&gt;&lt;br /&gt;This feature is relatively simple to support : it is merely necessary to&amp;nbsp;&lt;i&gt;not assume&amp;nbsp;&lt;/i&gt;that, after the current compressed stream, an EOF marker will necessarily happen.&lt;br /&gt;&lt;br /&gt;The change is simple, but it also means that a single stream may host several compressed data sets. This situation is reminiscent of the well specified and documented &lt;a href="http://www.ietf.org/rfc/rfc1952.txt"&gt;gzip RFC 1952&lt;/a&gt;&amp;nbsp;:&lt;br /&gt;"&lt;i&gt;A gzip file consists of a series of "members" (compressed data sets).&lt;/i&gt;"&lt;br /&gt;This is a good fit for this situation, so we'll use the same naming convention.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Authorized members&lt;/b&gt;&lt;br /&gt;A typical compressed stream will have a single member.&lt;br /&gt;The member will, necessarily, start with a header, which allows its screening as a "valid" or "invalid" input.&lt;br /&gt;A simple way to achieve this selection is to start with a "magic number", which then is compared to a list. If it's not in the list, input is rejected as "invalid".&lt;br /&gt;The "magic number" is enough to tell what kind of decoder is necessary, and which header parameters follow.&lt;br /&gt;To ensure proper screening from invalid (noisy) input, the magic number shall be large enough for most of its values to be considered "invalid", thus reducing the perspective of "false positive" to negligible levels.&lt;br /&gt;&lt;br /&gt;There is however a specific category that could be defined before-hand : skippable members.&lt;br /&gt;&lt;br /&gt;Skippable members do not contain compressed data. Instead, they may, for example, contain user comments, or a description of the compressed content (for example, file properties), or an electronic signature, &amp;nbsp;it is even possible to have a program suitable for a certain environment, well, whatever.&lt;br /&gt;The point is that in some circumstances, these members are not strictly necessary, and may be skipped entirely to retrieve just the compressed content.&lt;br /&gt;&lt;br /&gt;Skipping such member requires, obviously, to know its size.&lt;br /&gt;Therefore, the specification shall allow :&amp;nbsp;&lt;/li&gt;
&lt;ul&gt;
&lt;li&gt;A range of authorized "magic number values", known as skippable members&lt;/li&gt;
&lt;li&gt;A mandatory "member size" field&lt;/li&gt;
&lt;li&gt;Beyond that, any data in the member is "user-specified"&lt;br /&gt;&lt;br /&gt;Any magic number which fall outside of the list of authorized member is considered "invalid", and stop the decoding process, flagging the input as "corrupted".&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Clear endian convention&lt;/b&gt;&lt;br /&gt;Some CPU read data using &lt;a href="http://en.wikipedia.org/wiki/Endianness"&gt;Little Endian&lt;/a&gt; convention, others use &lt;a href="http://en.wikipedia.org/wiki/Endianness"&gt;Big Endian&lt;/a&gt; convention.&lt;br /&gt;Other conventions exist too (such as PDP endian), but it's fair to say they are rare.&lt;br /&gt;&lt;br /&gt;Anyway, the point is that, to ensure all these platforms can exchange data between themselves, a single convention shall be selected and enforced&lt;br /&gt;&lt;br /&gt;In my view, Little Endian is the most common convention these days, with x86/x64 on one side and ARM on the other side. Together, they almost "own" the CPU market, TV game console being the main noticeable market outside of their reach.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Compatibility&amp;nbsp;with non-seekable streams&lt;/b&gt;&lt;br /&gt;Being compatible with "streaming mode" explicitly excludes&amp;nbsp;dependency&amp;nbsp;on "seekable" property.&lt;br /&gt;&lt;br /&gt;This use case targets pipes, in either input or output mode, or both.&lt;br /&gt;Piped input data is provided "on the fly" by a generating process, such as &lt;a href="http://en.wikipedia.org/wiki/Tar_%28file_format%29"&gt;tar &lt;/a&gt;for example. There is no way to "know its size". It will be discovered on hitting the end of the stream.&lt;br /&gt;Piped output data is immediately consumed by a user process. There is no way to "come back" to an earlier position and correct or add an information there : every new information must necessarily be appended.&lt;br /&gt;&lt;br /&gt;When both input and output are piped, it creates a set of restrictions.&lt;br /&gt;&lt;br /&gt;Hopefully, it is not so difficult to overcome.&lt;br /&gt;First, a stream doesn't work "byte by byte". It is in fact a succession of memory buffers. One of them is the compression buffer.&lt;br /&gt;Knowing the size of input source data is useful in one case : when its smaller than the default block size. This is fine, since in this case, all input can be "loaded" into the compression buffer, and then compressed. Since we then write the result to the output buffer, it's possible to write there the size of input. During decoding, this size will allow allocation of just the necessary amount of memory, instead of the full block-size.&lt;br /&gt;This is a useful but small benefit : we could have allocated the buffer at its maximum "block" size instead.&lt;br /&gt;&lt;br /&gt;When input is larger than a single block (=buffer size), we have a problem, and we can't "write" the size of the input, since we don't know it yet.&lt;br /&gt;Well, it's not that much of a problem. LZ4 doesn't "need" this size to decode. And the buffer is already being allocated to its maximum block size anyway.&lt;br /&gt;&lt;br /&gt;In other words, knowing the uncompressed source size is not mandatory. We can live without.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Enforce data alignment&lt;/b&gt;&lt;br /&gt;LZ4 does not require any data alignment, so it does not look necessary.&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
&lt;br /&gt;
The next article will look more closely at "members" properties.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/CeZ1-QItlpE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/6855021350965638844/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/05/useful-compressed-streaming-properties.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/6855021350965638844?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/6855021350965638844?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/CeZ1-QItlpE/useful-compressed-streaming-properties.html" title="Useful compressed streaming properties" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/05/useful-compressed-streaming-properties.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUEBQnsyfSp7ImA9WhVUEUQ.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-8820807341892276180</id><published>2012-05-15T23:26:00.000+02:00</published><updated>2012-05-16T21:47:33.595+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-05-16T21:47:33.595+02:00</app:edited><title>Dealing with blocks side-effects</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.record-producer.com/i2/latency.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="112" src="http://www.record-producer.com/i2/latency.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;Continuing on the &lt;a href="http://fastcompression.blogspot.fr/2012/05/memory-buffer-management.html"&gt;previous post analysis&lt;/a&gt; of the &lt;a href="http://code.google.com/p/lz4/source/browse/trunk/lz4demo.c?spec=svn66&amp;amp;r=66"&gt;lz4demo&lt;/a&gt;'s current framing format, another side-effect created by this simple format is "&lt;a href="http://en.wikipedia.org/wiki/Latency_%28engineering%29"&gt;latency&lt;/a&gt;".&lt;br /&gt;
&lt;br /&gt;
Since the fixed block size is 8MB, the codec must wait for the first block 
to be completely filled before starting any decoding or compressing operation. It effectively defers processing by a few microseconds.&lt;br /&gt;
&lt;br /&gt;
This&amp;nbsp;issue may not&amp;nbsp;seem large, especially if underlying I/O is fast. Nonetheless, not all I/O are fast, and even in such cases, an 8MB "starting time" is bound to be measurably worse than a 256KB one for example.&lt;br /&gt;
&lt;br /&gt;
As a consequence, a framing format with a smaller block size would offer better and smoother processing throughput.&lt;br /&gt;
&lt;br /&gt;
Which
 leads us to a last and important issue :&amp;nbsp;independent&amp;nbsp;blocks.&lt;br /&gt;
While 
this strategy is good for simplicity and multi-threading, it's bad for 
compression : it translates into a worsened compression ratio on the first 
64KB of each block.&lt;br /&gt;
&lt;br /&gt;
With block sizes of 8MB, this effect is 
negligible (affecting compression ratio by less than 0.1%). However, the 
smaller the block size, &lt;a href="http://fastcompression.blogspot.fr/2011/01/multi-threaded-compression.html"&gt;the worse the side-effect&lt;/a&gt;. With small block sizes, this effect can no longer be neglected.&lt;br /&gt;
&lt;br /&gt;
Therefore, should the blocks remain independent&amp;nbsp;?&lt;br /&gt;
&lt;br /&gt;
Indeed. By
 making the next block depending on the previous one, 
it nullifies the problem of worsened compression ratio. But it
 also makes it impossible to decode a compressed block independently, 
with negative consequences on multi-threading and partial decoding 
capabilities.&lt;br /&gt;
Is that really an issue ? Well, it really depends on the use case.&lt;br /&gt;
&lt;br /&gt;
In many circumstances, 
such as simple file compression or stream compression, it does not matter, 
since data is encoded and decoded sequentially anyway. Throwing away the
 capability to multi-thread decompression seems bad, but in fact, most of the time, I/O is unable to 
cope with LZ4 decoding speed (around 1GB/s). So a single decoding thread is enough to handle almost any I/O load.&lt;br /&gt;
Since there is little need for partial decoding, nor for 
multithreaded decoding, the compression ratio gain looks more useful.&lt;br /&gt;
&lt;br /&gt;
There is just a little remaining problem :&lt;br /&gt;
While the decoding function will need few adaptation to handle this new use case, most of the complexity being located into the buffer manager, the compression function on the other hand has to be adapted.&lt;br /&gt;
&lt;br /&gt;
While each block were independant, compression could start with a pristine clean reference table.&lt;br /&gt;
But
 with sequentially dependant blocks, the&amp;nbsp;initialization&amp;nbsp;becomes more 
complex : the previous 64K needs to be copied in front of the next 
block, and then loaded/hashed into the reference table, before starting compression. It obviously costs CPU and time.&lt;br /&gt;
A variant is to just "translate" the references already loaded into 
the table as a result of compressing the previous block, but it's limited to "single thread" scenario.&lt;br /&gt;
&lt;br /&gt;
OK, but now that we can reference data from previous blocks, how far should we go ? The natural maximum distance is the "copy window size". This size is, by default, 64KB for LZ4. But it could happen that the receiving end of the compressed stream has not enough memory to store that much data. In such case, the compressor must be careful in not using references beyond the memory capacity of the receiver. In other words, it must&amp;nbsp;deliberately&amp;nbsp;discard long-distance copy operations.&lt;br /&gt;
&lt;br /&gt;
Should such a use case be part of the generic framing format or not ?&lt;br /&gt;
My answer would be : it's okay, as long as an easy solution can be found.&lt;br /&gt;
&lt;br /&gt;
How could that work ? Once again, let's focus on the decoder side.&lt;br /&gt;
I'll imagine a controller with only 4K memory available as buffer.&lt;br /&gt;
A simple way to handle such case is by dividing this space into 2 parts : 2K for the "previous block", and 2K for the "block to decode". So we end up with :&lt;br /&gt;
- Block size = 2K = memory limit / 2&lt;br /&gt;
- Maximum copy reference = lower limit of previous block&lt;br /&gt;
&lt;br /&gt;
Obviously, there are other possibilities, such as cutting data into even small parts (for example 1K blocks) and having a back reference of up to 3 previous blocks. But as a first approximation, it seems these variant will provide almost equivalent results while being more complex.&lt;br /&gt;
&lt;br /&gt;
This situation can be&amp;nbsp;summarized&amp;nbsp;with a simple rule : never reference data beyond one block distance.&lt;br /&gt;
&lt;br /&gt;
With only this simple rule in place, it seems the default LZ4 framing format could be made compatible even with environments with severely limited RAM, provided the encoder selects a suitable block size.&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/vo1_4IZM-ME" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/8820807341892276180/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/05/dealing-with-blocks-side-effects.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/8820807341892276180?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/8820807341892276180?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/vo1_4IZM-ME/dealing-with-blocks-side-effects.html" title="Dealing with blocks side-effects" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/05/dealing-with-blocks-side-effects.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUEMSXY5eyp7ImA9WhVUEE0.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-2468128674300593585</id><published>2012-05-14T16:00:00.001+02:00</published><updated>2012-05-14T17:01:28.823+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-05-14T17:01:28.823+02:00</app:edited><title>Memory buffer management</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.yaldex.com/java_tutorial/images/chap1016-01_0.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="174" src="http://www.yaldex.com/java_tutorial/images/chap1016-01_0.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;One of the objectives for the &lt;a href="http://fastcompression.blogspot.fr/2012/04/file-container-format-for-lz4.html"&gt;LZ4 framing format&lt;/a&gt; is to instruct the &lt;a href="http://fastcompression.blogspot.fr/p/lz4.html"&gt;LZ4&lt;/a&gt; decoder how to best handle memory to decode the compressed data. &lt;br /&gt;
&lt;br /&gt;
To illustrate with a simple example, let's study the current framing format of &lt;a href="http://code.google.com/p/lz4/source/browse/trunk/lz4demo.c?spec=svn66&amp;amp;r=66"&gt;lz4demo&lt;/a&gt;. &lt;br /&gt;
&lt;br /&gt;
This framing has no option, and is therefore completely static. At the core of the specification is a "data split" methodology, which automatically split input data (typically files) into multiple blocks of 8MB each. That is to say, if the source data is larger than 8MB, then the first block will only hold the first 8MB of data, and the next block too, up to the point that only one block remains. This last one can have any size between 0 and 8 MB. The last block can also be the first one, if source data is &amp;lt; 8MB.&lt;br /&gt;
&lt;br /&gt;
Moreover, each block is completely independant. Which means each block can be compressed/decompressed in any order, without dependency, except to output data blocks in the same order as input ones. This property is very interesting for multithreading.&lt;br /&gt;
&lt;br /&gt;
Now, this simple scheme comes also with some problems of its own. Let's look into it.&lt;br /&gt;
&lt;br /&gt;
First, since we don't know the size of the source data, the decoder has to allocate 8MB of memory, no matter what. Maybe the file is only 4KB for example ? Well, the decoder don't know it.&lt;br /&gt;
To be fair, knowing the maximum compression ratio (255:1), it could guess that, if the compressed size is 2KB, then the decoded size cannot be more than 512KB. But this is awkward and awful trick.&lt;br /&gt;
&lt;br /&gt;
For modern computers, it seems this is a minor issue. However, in many other circumstances, such as embedded and low-power devices, this amount of memory is way too large.&lt;br /&gt;
&lt;br /&gt;
To limit the size of allocated memory, a potential work-around for the decoder is to use a "sliding buffer" methodology.&lt;br /&gt;
For example, data would be decoded into blocks of size 256KB, a lot smaller than the full 8MB size. On reaching the end of the small block, decoded data would be written to disk, keeping only the last 64KB ones (copy window size) for reference of the next small block.&lt;br /&gt;
This is workable, and would save a lot of memory. But there are a few drawbacks :&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;A specific decoding function is necessary to do this, which does not exist (yet).&lt;br /&gt;Such function needs to keep track of its "internal state", in order to resume decoding where it left.&lt;br /&gt;Which means, for example, that we may be in the middle of match copy operation.&lt;br /&gt;The "exit" and "error" conditions are also different and more numerous.&lt;br /&gt;As a consequence, the code for such function will be more complex and slower.&lt;/li&gt;
&lt;li&gt;Adding to the performance issue, the "sliding" mechanism involves repeateadly copying the last 64KB of each small block in preparation for the next one.&lt;/li&gt;
&lt;li&gt;Although 256KB is a lot smaller than 8MB, it's still too large for some low-memory applications.&lt;br /&gt;Alas there is a limit to this methodology, which is the size of the "window copy operation" (64KB) plus a reasonable amount of data to decode, typically another 64KB.&lt;br /&gt;So, the minimum decoding buffer for this methodology is 128KB. &lt;br /&gt;This is better, but still too large.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
As a consequence, it seems clear that there is no such thing as "one-size fits all". The size of blocks needs to be customizable, in order to fit a broader range of situations.&lt;br /&gt;
&lt;br /&gt;
Additionnally, it seems better to help the decoder allocating only what's needed, especially, when it can be smaller than the default block size. Which means that, for small data source (smaller than the block size), it would be interesting to provide the original source size : this will allow the decoder to only allocate what's necessary, instead of the default maximum block size.&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/C5ChTnDCFE8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/2468128674300593585/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/05/memory-buffer-management.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/2468128674300593585?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/2468128674300593585?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/C5ChTnDCFE8/memory-buffer-management.html" title="Memory buffer management" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/05/memory-buffer-management.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0ADQ3c7eyp7ImA9WhVWGE0.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-917703461638955751</id><published>2012-04-30T18:00:00.000+02:00</published><updated>2012-04-30T18:02:52.903+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-04-30T18:02:52.903+02:00</app:edited><title>Selecting a Checksum algorithm</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://avtanski.net/projects/data_logger/checksumming.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="146" src="http://avtanski.net/projects/data_logger/checksumming.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;Following the entry regarding &lt;a href="http://fastcompression.blogspot.fr/2012/04/detecting-errors.html"&gt;Error detection&lt;/a&gt;, it is still necessary to find a correct algorithm to do the job.&lt;br /&gt;
&lt;br /&gt;
In other circumstances, the solution would have been a simple reliance on &lt;a href="http://en.wikipedia.org/wiki/Cyclic_redundancy_check"&gt;CRC32&lt;/a&gt;. After all, this algorithm is already in use in many applications, including modem transmission (&lt;a href="http://en.wikipedia.org/wiki/V.42#Error_control_and_data_compression"&gt;ITU-T V42&lt;/a&gt;), &lt;a href="http://en.wikipedia.org/wiki/PKZIP"&gt;Zip&lt;/a&gt;, MPEG2, etc. So it looks solid.&lt;br /&gt;
&lt;br /&gt;
And solid it is, but fast it is not.&lt;br /&gt;
As&amp;nbsp;&lt;a href="http://fastcompression.blogspot.fr/2012/04/detecting-errors.html?showComment=1334332994499#c7502797290056519582"&gt;stated by Shelwien&lt;/a&gt;, a fast hard-wired version of CRC32 exists, within the most recent Intel CPUs. However, not only is it a limited choice in term of target CPU, it is also a different&amp;nbsp;variant&amp;nbsp;(i.e. incompatible)&amp;nbsp;from the normalized one. Therefore, software mode cannot be avoided.&lt;br /&gt;
&lt;br /&gt;
And what's the use of a very fast compression algorithm if the checksum takes much longer to be calculated ?&lt;br /&gt;
&lt;br /&gt;
To be more specific, &lt;a href="http://fastcompression.blogspot.fr/p/lz4.html"&gt;LZ4 &lt;/a&gt;decoder works roughly at 1GB/s on my test machine. So, to be almost negligible, a checksum algorithm would need to be ~10x faster, which would give 10GB/s. This is obviously not possible, keeping in mind that the RAM speed doesn't cope with such situation : a memcpy() operation is limited to 2.5GB/s on the very same machine.&lt;br /&gt;
&lt;br /&gt;
So OK, we can't reach the ideal situation, but what would be the best possible one, i.e. the fastest one ?&lt;br /&gt;
&lt;br /&gt;
This is not an easy question, since making a fast but weak checksum with very poor properties is the usual outcome. For example, sum all your bytes one by one, and here you have your checksum ! Except that its distribution is absolutely horrible, resulting in too many potentially undetected errors.&lt;br /&gt;
&lt;br /&gt;
In order to assess if a checksum algorithm features some good properties, it is necessary to test them, with an appropriate tool.&lt;br /&gt;
And we are lucky, such a tool exists.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Austin Appleby&lt;/i&gt; created &lt;a href="http://code.google.com/p/smhasher/wiki/MurmurHash"&gt;MurmurHash&lt;/a&gt;, in a bid to provide the fastest possible hash algorithm. In order to test its own algorithm, he also created &lt;a href="http://code.google.com/p/smhasher/wiki/SMHasher"&gt;&lt;b&gt;SMHasher&lt;/b&gt;&lt;/a&gt;, which is an invaluable tool to assess the performance of any hash function. As can be guessed, MurmurHash answer perfectly to all criteria, but that doesn't mean that the tool is flawed, it's more a testament that MurmurHash was developed and improved along the lines of SMHasher. That being said, the methodology is sound, and results look reliable.&lt;br /&gt;
&lt;br /&gt;
I immediately used the opportunity to test different checksum algorithm provided within the package. Starting with CRC32 of course.&amp;nbsp;And the results are not excellent.&lt;br /&gt;
CRC32 speed is 430MB/s on the test system. That's not bad, but compared to LZ4 decoder, it is more than twice slower, which means the CPU would take more time to check the data rather than actually decoding it.&lt;br /&gt;
Furthermore, the distribution properties of CRC32 are not "best in class". Quite far from it in fact. The bit distribution is typically "biased", which means that the probably for a bit position to be 0 or 1 is not a near-perfect ~50%. Worse, bit independence is nonexistent. It fails some tests, such as the "2-bytes keyset", and the "HighBits keyset", which has devastating consequences on bit distribution.&lt;br /&gt;
This doesn't make CRC32 unusable. It just shows that this older algorithm is not perfect.&lt;br /&gt;
More importantly, as expected, it is not fast enough for our needs.&lt;br /&gt;
&lt;br /&gt;
Let's move on to another candidate,&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function"&gt;FNV&lt;/a&gt;,&amp;nbsp;which is also part of the package.&lt;br /&gt;
&lt;b&gt;FNV &lt;/b&gt;(Fowler-Noll-Vo) takes a reputation of being an extremely fast hash function. Many variants have appear around the web, with the same core principles, marrying XOR with multiplications in an inner loop.&lt;br /&gt;
The speed is indeed better, but at 550 MB/s, it's not that great.&lt;br /&gt;
Furthermore, the properties of FNV are no better than CRC32, featuring the same weaknesses, plus failing badly at "Cyclic test", having poor results on "sparse keyset", and very poor on both low and high bits keyset, etc. Well, the results are so bad, i could not advise this function to anyone.&lt;br /&gt;
&lt;br /&gt;
So, let's go straight to the king of the hill. The most recent version of &lt;b&gt;MurmurHash &lt;/b&gt;(v3.a) is said to be the fastest hash function available. We are only interested in the 32-bits version, so let's put that one on the grill.&lt;br /&gt;
And yes, fast it is. The speed is &lt;b&gt;2.7 GB/s&lt;/b&gt;. This is just huge. It's close enough to my RAM speed limit. Interestingly, the speed is depending on input being aligned on 4-bytes boundaries (32 bits). If not, the speed dips to 2.0 GB/s, which is still good.&lt;br /&gt;
More importantly, with this speed, we get all SMHasher properties verified, with no single trace of distribution weakness. That's great. We have a candidate !&lt;br /&gt;
&lt;br /&gt;
************************************************************************************&lt;br /&gt;
&lt;br /&gt;
The story could have ended here. But as usual, i couldn't resist. I would be willing to put some of my own function to the test, thanks to SMHasher.&lt;br /&gt;
An old "fast hash" of mine would end up with the same speed as MurmurHash, but much worse properties.&lt;br /&gt;
&lt;br /&gt;
That could have been the end. After all, MurmurHash is so fast, maybe it is merely limited by RAM speed after all ?&lt;br /&gt;
So i made another, simpler, attempt purely targeting speed. It resulted in an 8GB/s hash function. Yep. &amp;nbsp;This is much more than my RAM speed, so how could it be ?&lt;br /&gt;
Well, in fact, it is much more than memcpy(), but there's nothing magic : in fact, memcpy() not only read input from memory, but also &lt;i&gt;write to memory&lt;/i&gt;. This is something the hash function does not need to do. The input is simply "sunk" into registers. So, it can be that fast, and there is definitely some room left for more speed.&lt;br /&gt;
&lt;br /&gt;
I therefore spent a little more time studying what makes MurmurHash such a great hash function.&lt;br /&gt;
&lt;br /&gt;
I believe there are 2 keys parts :&lt;br /&gt;
&lt;br /&gt;
The first one "blends" the fed input bytes into registers, thanks to several simple operations, the most important of them being &lt;a href="http://msdn.microsoft.com/en-us/library/5cc576c4(v=vs.80).aspx"&gt;_rotl()&lt;/a&gt;.&lt;br /&gt;
_rotl() (Rotate left) ensures that the bits that were previously high now becomes low. So it effectively nullify the problem of high bits having a small contribution to the overall hash distribution.&lt;br /&gt;
&lt;br /&gt;
The second one digest the final blended result thanks to a serie of small steps known as "avalanche". It just destroys any kind of bit bias or correlation left that the blend would have left.&lt;br /&gt;
&lt;br /&gt;
With both being combined, we get an almost perfect hash distribution.&lt;br /&gt;
&lt;br /&gt;
So i started from there, re-using the above "fast simple hash" and improving it with the presented ideas. I ended up with a &lt;b&gt;7.5 GB/s&lt;/b&gt; hash function, which is about just Hellish.&lt;br /&gt;
It would pass all the tests except for a single one, "Keyset 'Sparse' - 2048-bit keys with up to 2 bits set". That's because the blending phase creates some "twin bits" far&amp;nbsp;apart&amp;nbsp;from each other, at varying distances depending on bit position. This is obviously bad for a crypto-hash, but a 32-bits checksum is anyway not meant for this task.&lt;br /&gt;
We simply want to detect errors, more probably transmission and storage failures, or manual glitches. For such cases, since twin bits are very far&amp;nbsp;apart, it seems it would be extremely unlucky to have random errors happening just exactly on twin bits pattern. In fact, the probability seems lower than getting a collision from two completely different inputs on the 32-bits&amp;nbsp;checksum. Which is the limit of this exercise.&lt;br /&gt;
&lt;br /&gt;
Anyway, for the sake of it, i also created a stronger version of the algorithm, which add an operation in the inner loop to make the twin bit pattern vanish. This version reaches &lt;u&gt;4.2 GB/s&lt;/u&gt;, which is still good, but noticeably slower than the fast version. Well, that's the price to pay for stronger properties.&lt;br /&gt;
&lt;br /&gt;
The final result of these investigation is called &lt;b&gt;xxHash&lt;/b&gt;, and is now an &lt;a href="http://code.google.com/p/xxhash/"&gt;open-sourced project, hosted on Google Code&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
************************************************************************************&lt;br /&gt;
&lt;br /&gt;
Now a decision still remains to be done : a checksum algorithm has to be selected&amp;nbsp;for the LZ4 framing format.&lt;br /&gt;
&lt;br /&gt;
I feel interested by the kind of speed that xxHash can offer, since it seems appropriate alongside the fast LZ4 decoder. But it's still a very young algorithm ...&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/quo4CklRHjg" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/917703461638955751/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/04/selecting-checksum-algorithm.html#comment-form" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/917703461638955751?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/917703461638955751?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/quo4CklRHjg/selecting-checksum-algorithm.html" title="Selecting a Checksum algorithm" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>8</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/04/selecting-checksum-algorithm.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkUHQnk6cSp7ImA9WhVXEkg.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-1850084842937940130</id><published>2012-04-12T00:27:00.000+02:00</published><updated>2012-04-12T21:17:13.719+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-04-12T21:17:13.719+02:00</app:edited><title>Detecting errors</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.sqs.com/en/sa/_images/6_2_SQS-PractiQ-EED.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="65" src="http://www.sqs.com/en/sa/_images/6_2_SQS-PractiQ-EED.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;After
 thinking a blip second about it, it seems clear to me that an important feature expected from the &lt;a href="http://fastcompression.blogspot.fr/2012/04/file-container-format-for-lz4.html"&gt;framing format&lt;/a&gt; is a good enough guarantee that the regenerated data is correct.&lt;br /&gt;
Surely there are also valid usages of LZ4 which may not need error detection and its associated CPU and bandwidth overhead, hence this feature will remain optional in the final specification, but nonetheless recommended.&lt;br /&gt;
&lt;br /&gt;
This requirement is really very common. And a reasonable method to check for unintentional errors (such as&amp;nbsp;transmission or storage&amp;nbsp;ones)&amp;nbsp;can be provided with a simple&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Checksum"&gt;checksum&lt;/a&gt;. Typically, a good 32-bits checksum will make it relatively improbable (less than 1 in a billionth) that random changes stay unnoticed. This method is typically used by &lt;a href="http://en.wikipedia.org/wiki/Zip_(file_format)"&gt;Zip&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
One could note that the &lt;a href="http://fastcompression.blogspot.fr/p/lz4.html"&gt;LZ4&lt;/a&gt; decoder already includes some form of detection for malformed input. It's correct, but
 it's not enough, because it only checks that provided&amp;nbsp;instructions are valid. The main objective was to forbid any attempt to read or write outside of the memory buffers. Hence, it cannot detect a corrupted&amp;nbsp;literal&amp;nbsp;within the compressed stream for example.&lt;br /&gt;
&lt;br /&gt;
On the other side, one could also require the method to guarantee the input data never was modified, neither randomly nor&amp;nbsp;intentionally&amp;nbsp;(&lt;a href="http://en.wikipedia.org/wiki/Data_integrity"&gt;data integrity&lt;/a&gt;). Such objective, however, seems out of scope.&lt;br /&gt;
To begin with, it require an &lt;a href="http://en.wikipedia.org/wiki/Cryptographic_hash_function"&gt;electronic signature&lt;/a&gt;,
 at least 128-bits such as MD-5. But even MD-5 is now considered too weak for 
this use, and shall be superseded by stronger 160-bits SHA-1 and such.&lt;br /&gt;
One obvious issue is speed. What's the use of a 
super-fast compression / decompression algorithm if the checksum function is several times 
slower ?&lt;br /&gt;
But the most important issue is that the signature is not enough. The signature's mission is to make it almost impossible to find a modification to the original source file which 
produces the same electronic signature. But since the electronic 
signature is provided &lt;i&gt;within &lt;/i&gt;the framing format, a malicious attacker would 
just need to change both the content and the signature.&lt;br /&gt;
Therefore, an effective protection shall also require encryption, or a separate signature 
storage and transmission, which is way beyond the&amp;nbsp;objective of this&amp;nbsp;framing format.&lt;br /&gt;
So we'll just consider random &lt;a href="http://en.wikipedia.org/wiki/Data_corruption"&gt;data corruption&lt;/a&gt;&amp;nbsp;from now on.&lt;br /&gt;
&lt;br /&gt;
The next question is : what has to be&amp;nbsp;checksum-ed&amp;nbsp;?&lt;br /&gt;
Due to the popularity of the Zip format, an obvious answer could be "the original data".&lt;br /&gt;
But let's look into it, it's not that obvious.&lt;br /&gt;
&lt;br /&gt;
Take the &lt;a href="http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt"&gt;snappy framing format&lt;/a&gt; for example. It checksums the original data of each chunk individually, with each chunk size being &amp;lt;= 32KB. The main advantage is that an incorrect chunk is detected immediately, there is no need to wait for the end of the stream for this result. This is especially important when the stream is very large (for example 1GB), and the consumer process wants to use the decoded data without waiting for the last chunk.&lt;br /&gt;
The main drawback, however, is silent truncation. Imagine for example that the last few&amp;nbsp;chunks are missing. There will be no way to detect that.&lt;br /&gt;
&lt;br /&gt;
Another possible answer is to checksum the compressed data instead of the original one.&lt;br /&gt;
This is valid. A compressed stream can only be decoded into a single result. Hence, protecting the compressed data from errors is the same as protecting the original data.&lt;br /&gt;
And it brings several advantages with it.&lt;br /&gt;
First, by checking the checksum against the compressed stream, it&amp;nbsp;warns against data corruption 
even&amp;nbsp;before&amp;nbsp;attempting to decode.&lt;br /&gt;
Second, since compressed data is generally smaller than original one, the checksum function will be faster to execute. Both properties have positive impact on &lt;a href="http://en.wikipedia.org/wiki/CPU_power_dissipation"&gt;CPU consumption&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Note however that for the first property to be true, each chunk must be&amp;nbsp;checksumed, not just the entire stream, because the decoding starts immediately on reading the first chunk from i/o, both to save time and memory.&lt;br /&gt;
&lt;br /&gt;
OK, so considering these advantages, i'm almost sold to the method of checksuming the compressed chunks. The most important drawback i would still worry about is the silent truncation issue of snappy-frame.&lt;br /&gt;
&lt;br /&gt;
It seems however this can be easily solved. One just needs to ensure that the "continuation" information is provided. And there are several possibilities to do it. For example, provide the size of the original data into the framing format, and calculate the number of (fixed size) chunks. Or embed the continuation information into a field associated with each chunk, and make it part of the checksumed data. There are surely other possibilities. The main idea is that the "end of stream" information must be provided, not just detected on hitting the EOF marker.&lt;br /&gt;
&lt;br /&gt;
One last property that could be missing is using the checksum as a kind of "weak signature". For example, in the zip format, since the checksum is calculated from the original data, it is possible to "check" that the file present into the zip file is likely the expected one.&lt;br /&gt;
But i'm not sure if such usage has any real value, not even convenience. Has stated earlier, such "signature" is much too weak to be trusted, so it can merely be used as an "hint", not much else.&lt;br /&gt;
&lt;br /&gt;
As usual comments are welcomed.&lt;br /&gt;
There is another point to discuss, which is the checksum algorithm itself.&lt;br /&gt;
We'll keep that for another blog note.&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/33-4cZeRnAA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/1850084842937940130/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/04/detecting-errors.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/1850084842937940130?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/1850084842937940130?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/33-4cZeRnAA/detecting-errors.html" title="Detecting errors" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>3</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/04/detecting-errors.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEICQnkzeSp7ImA9WhVXEUo.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-2768898838327309266</id><published>2012-04-08T17:58:00.001+02:00</published><updated>2012-04-11T21:29:23.781+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-04-11T21:29:23.781+02:00</app:edited><title>A file container format for LZ4</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://sd-1.archive-host.com/membres/images/miniatures/182754578/SpeedCompare.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://sd-1.archive-host.com/membres/images/miniatures/182754578/SpeedCompare.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;With increased usage of &lt;a href="http://fastcompression.blogspot.fr/p/lz4.html"&gt;LZ4&lt;/a&gt;&amp;nbsp;come new use cases, and with them new requirements.&lt;br /&gt;
&lt;br /&gt;
The recent ports of LZ4 to &lt;a href="http://pypi.python.org/pypi/lz4"&gt;Python &lt;/a&gt;and &lt;a href="http://search.cpan.org/dist/Compress-LZ4/"&gt;Perl&lt;/a&gt;&amp;nbsp;are
 two excellent examples, demultiplying the possible usages by making the
 algorithm available to a broader range of applications, including easy-to-deploy scripts. &lt;br /&gt;
&lt;br /&gt;
In this context, a simple question is asked : how can one be sure that a file compressed with, for example, the &lt;a href="http://pypi.python.org/pypi/lz4"&gt;Python&lt;/a&gt;&amp;nbsp;port of LZ4, will be decodable by the &lt;a href="https://github.com/stangelandcl/LZ4Sharp"&gt;C#&lt;/a&gt; port of LZ4 ?&lt;br /&gt;
&lt;br /&gt;
Which is indeed a very good question. And the sorry answer is that : there is no such guarantee. At least, not yet.&lt;br /&gt;
&lt;br /&gt;
LZ4
 has been developed&amp;nbsp;as an in-memory compression algorithm : it takes a 
memory buffer as an input, and provides the compressed result into an 
output memory buffer. Whatever is done before or after that operation is
 beyond its area of&amp;nbsp;responsibility. The fact that these buffers may be 
read, or written, to disk is just one of many possibilities.&lt;br /&gt;
&lt;br /&gt;
This
 is a correct positioning. There are too many possible&amp;nbsp;usages&amp;nbsp;for LZ4, 
and a lot of them will never bother with on-disk files or network streaming.&lt;br /&gt;
&lt;br /&gt;
Nevertheless,
 it happens that one natural usage of compression algorithm is file 
compression, and it applies to LZ4 too. In this case, the compressed 
result is written to a media, and may be decoded later, possibly by a 
completely different system. To ensure this operation will be always 
possible, even when using any other version of LZ4 written in any 
language, it is necessary to define a &lt;i&gt;file container format&lt;/i&gt;. &lt;br /&gt;
&lt;br /&gt;
To be more complete, a container format has been defined into &lt;a href="http://code.google.com/p/lz4/source/browse/trunk/lz4demo.c"&gt;lz4demo&lt;/a&gt;,
 but it was never meant to become a reference. It is too limited for 
this ambition. Therefore, a new container format specification is 
necessary.&lt;br /&gt;
&lt;br /&gt;
The ideas in this post will be largely borrowed from&lt;a href="http://fastcompression.blogspot.fr/2011/11/format-for-compressed-streams-part-2.html"&gt; this older one&lt;/a&gt;, where a container format was discussed and defined for &lt;a href="http://fastcompression.blogspot.fr/p/zhuff.html"&gt;Zhuff&lt;/a&gt;. Most of the points are still applicable for &lt;a href="http://fastcompression.blogspot.fr/p/lz4.html"&gt;LZ4&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
We'll start by defining the objectives. What are the&amp;nbsp;responsibilities&amp;nbsp;of a file container format ? Well, 
mostly, it shall ensure that the original file will be restored entirely
 and without errors. This can be translated into the following technical
 missions :&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Detect valid containers from invalid input (compulsory)&lt;/li&gt;
&lt;ul&gt;
&lt;li&gt;Detect / control errors (recommended)&lt;/li&gt;
&lt;ul&gt;
&lt;li&gt;Correct errors (optional)&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;li&gt;Help the decoder to organize its memory buffers&amp;nbsp;&lt;/li&gt;
&lt;ul&gt;
&lt;li&gt;buffer sizes, and number of buffers&amp;nbsp;(compulsory)&lt;/li&gt;
&lt;li&gt;provide user control over buffer sizes (optional)&lt;/li&gt;
&lt;/ul&gt;
&lt;li&gt;Organize data into&amp;nbsp;independent&amp;nbsp;chunks for multi-threaded operations (optional)&lt;/li&gt;
&lt;li&gt;Allow to quick-jump to a specific data segment within the container (optional)&lt;/li&gt;
&lt;li&gt;Be compatible with non-seekable streams, typically pipe mode (compulsory)&lt;/li&gt;
&lt;li&gt;Enforce alignment rules (optional)&lt;/li&gt;
&lt;li&gt;Allow simplified concatenation of compressed streams (optional) &lt;br /&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
This is already quite a few missions. But most of them will be possible, as we'll see later.&lt;br /&gt;
&lt;br /&gt;
Another important thing to define is what's&amp;nbsp;&lt;b&gt;&lt;i&gt;&lt;u&gt;not&lt;/u&gt;&lt;/i&gt; &lt;/b&gt;within this format missions :&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Save the file name (debatable)&lt;/li&gt;
&lt;li&gt;Save file attributes and path&lt;/li&gt;
&lt;li&gt;Group several files (or directory) into a single compressed stream&lt;/li&gt;
&lt;li&gt;Provide optional user space, typically for comments (debatable)&lt;/li&gt;
&lt;li&gt;Split the compressed stream into several output files&lt;br /&gt;
&lt;/li&gt;
&lt;/ul&gt;
In my view, these elements should be part of an &lt;b&gt;&lt;i&gt;&lt;u&gt;archive &lt;/u&gt;&lt;/i&gt;&lt;/b&gt;format.&lt;br /&gt;
While the file container format takes care of&amp;nbsp;regenerating&amp;nbsp;original&amp;nbsp;&lt;i&gt;&lt;u&gt;content&lt;/u&gt;&lt;/i&gt;,
 the archive will also take care of directory structure. It will be able
 to save file names, and re-organize them in a better way to help 
compression. On decoding, it will regenerate the entire tree structure, 
placing each file at its correct location, with sometimes also original 
attributes.&lt;br /&gt;
An archive format therefore occupy a higher layer in the structure. And it typically depends itself on such container formats.&lt;br /&gt;
&lt;br /&gt;
Last
 but not least, we'll make the definition of "file" a bit broader, 
Unix-like, in order to also encompass "pipe'd data". The "file container
 format" then becomes a "stream container format", with files being 
one of the possible streams.&lt;br /&gt;
&lt;br /&gt;
All these elements can be provided in a relatively compact and efficient header.&lt;br /&gt;
But first, let's discuss the most important features.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;[Note]&lt;/i&gt; Considering the comments, it seems the word "container" does not properly define the objective of this specification. Another word shall be found. "&lt;i&gt;Framing format&lt;/i&gt;" is the current candidate.&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/vngugnVEepE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/2768898838327309266/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/04/file-container-format-for-lz4.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/2768898838327309266?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/2768898838327309266?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/vngugnVEepE/file-container-format-for-lz4.html" title="A file container format for LZ4" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>5</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/04/file-container-format-for-lz4.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUYCRns6eip7ImA9WhVRGUU.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-7383337496787896257</id><published>2012-01-08T21:42:00.000+01:00</published><updated>2012-03-29T04:12:47.512+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-03-29T04:12:47.512+02:00</app:edited><title>Probability Update</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.pr-owl.org/images/bn_belief_update.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="97" src="http://www.pr-owl.org/images/bn_belief_update.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;Following the&amp;nbsp;&lt;a href="http://fastcompression.blogspot.com/2012/01/binary-arithmetic-coder.html" target="_blank"&gt;previous post on Binary Arithmetic Coder&lt;/a&gt;, we left with a simple-to-ask but difficult-to-answer question : how to update the probability that the next value will be a 0 or a 1 ?&lt;br /&gt;
&lt;br /&gt;
Indeed. Presuming that we can update P0 (probability that next value will be a zero) suppose that we accept a simple idea : the past is a strong indicator for the future.&lt;br /&gt;
&lt;br /&gt;
This may not always be true, but for compression it appears to work relatively well. At this stage, we can outline the strong relationship between compression and &lt;a href="http://en.wikipedia.org/wiki/Prediction" target="_blank"&gt;prediction&lt;/a&gt;&amp;nbsp;: compression ratio will only be as good as the algorithm can predict the future (the next bit in this case).&lt;br /&gt;
&lt;br /&gt;
Trying to predict the future is not limited to compression, and massive amount of investment have been and are currently spent around this applied concept. In this post however, we'll keep our mind focused on compression, considering other fields only if they can help this objective.&lt;br /&gt;
&lt;br /&gt;
A first simple way to evaluate P0 is to count the number of 0 and 1 previously&amp;nbsp;seen.&lt;br /&gt;
Then, simply set : P0 = N0 / (N0+N1).&lt;br /&gt;
It works, and is a simple way to achieve convergence towards optimal stationary statistics.&lt;br /&gt;
&lt;br /&gt;
But, in fact, no source is really static. They constantly evolve (otherwise, compression would be trivial). The probability should therefore adapt to the source, and consequently, more weight should be given to the most recent bits.&lt;br /&gt;
&lt;br /&gt;
A very simple way to achieve this property is by giving to new bits a fixed share of the updated probability. For example 20%. This is an easy method to gradually and smoothly "reduce" the weight of older records.&lt;br /&gt;
And we get :&lt;br /&gt;
newP0 = oldP0 x (1-share) + NewBitIsZero x share&lt;br /&gt;
&lt;br /&gt;
This works too, although we start to wonder what should be such "share" ?&lt;br /&gt;
20% ? 10% ?&lt;br /&gt;
&lt;br /&gt;
It is very tempting to believe that the optimal &lt;i&gt;adaptation share &lt;/i&gt;can be found as a unique value. And indeed, through brute force testings (such as &lt;a href="http://en.wikipedia.org/wiki/Genetic_algorithm" target="_blank"&gt;Genetic Algorithms&lt;/a&gt;) an optimal value can be found.&lt;br /&gt;
[Edit : an excellent example of&amp;nbsp; "optimal adaptation share" is provided by Shelwien here : &lt;a href="http://nishi.dreamhosters.com/u/book1bwt_wr.png"&gt;http://nishi.dreamhosters.com/u/book1bwt_wr.png&lt;/a&gt; ]&lt;br /&gt;
However, the optimal value will be true only for a given file, number of contexts and precision level. Change any parameter, and the optimal value will change too..&lt;br /&gt;
&lt;br /&gt;
Could such optimal &lt;i&gt;adaptation share &lt;/i&gt;be guessed beforehand, through calculation, rather than sheer experimentation ? Well, unfortunately no. It depends too much on the source, although some basic rules of thumb do apply : if the file is small, the share will be higher. If it is large, the share will be lower.&lt;br /&gt;
&lt;br /&gt;
This points towards something retrospectively obvious : at the beginning of the file, when statistics are still scarce, the adaptation must be faster. Then, the more we accumulate statistics, the more accurate they should become, so the adaptation share must become lower.&lt;br /&gt;
&lt;br /&gt;
It does not answer to the question "how much", but hints towards a moving value, becoming lower as we progress into the source file.&lt;br /&gt;
&lt;br /&gt;
In order to get closer to the "how much" answer,&amp;nbsp;I've&amp;nbsp;collected some statistics, that we'll see and comment in a later post...&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/HKmoB4DKqzE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/7383337496787896257/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/01/probability-update.html#comment-form" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/7383337496787896257?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/7383337496787896257?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/HKmoB4DKqzE/probability-update.html" title="Probability Update" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>8</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/01/probability-update.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D04HRno4cSp7ImA9WhRWF0Q.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-3038968502799661011</id><published>2012-01-05T00:01:00.000+01:00</published><updated>2012-01-05T21:38:57.439+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-05T21:38:57.439+01:00</app:edited><title>Binary Arithmetic Coder</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.da.isy.liu.se/undergrad/exjobb/finished/cabac-2005/en/cabac1-2005.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="157" src="http://www.da.isy.liu.se/undergrad/exjobb/finished/cabac-2005/en/cabac1-2005.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;There is a way to code an element using less than one bit of space. Although the idea might sound weird, it is in fact a proven method since a few decades now, under the name of &lt;a href="http://en.wikipedia.org/wiki/Arithmetic_coding" target="_blank"&gt;Arithmetic Coding&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
As a simple explanation, let's say that arithmetic coding is no longer about using the optimal number of bits to code an element, as does &lt;a href="http://en.wikipedia.org/wiki/Huffman_coding" target="_blank"&gt;Huffman&lt;/a&gt;, since this method can't do better than one bit per element. We will now define "ranges", within which each element will take a share (such as 2%, 75%, etc.). This method is extremely precise, since each element will cost exactly its &lt;a href="http://en.wikipedia.org/wiki/Information_entropy" target="_blank"&gt;entropy&lt;/a&gt;, such as 2.53 bits (precisely, not an approximation to 2 or 3), or 0.17 bits.&lt;br /&gt;
&lt;br /&gt;
For more details on how it works, i invite you to read the &lt;a href="http://en.wikipedia.org/wiki/Arithmetic_coding" target="_blank"&gt;Wikipedia entry&lt;/a&gt;, which is fairly good.&lt;br /&gt;
&lt;br /&gt;
The method, however, has some drawbacks. To begin with, defining the precise share of each element&amp;nbsp;incurs&amp;nbsp;a header cost. This situation is pretty visible with &lt;a href="http://fastcompression.blogspot.com/p/huff0-range0-entropy-coders.html" target="_blank"&gt;Range0&lt;/a&gt;, which is a block-based &lt;a href="http://en.wikipedia.org/wiki/Range_encoding" target="_blank"&gt;range coder&lt;/a&gt;. In order to compensate the (relatively) large size of the header, i had to accept block sizes of 128KB, which is much more than the equivalent Huff0 (16KB), resulting in a less precise entropy adaptation.&lt;br /&gt;
&lt;br /&gt;
A way to remove the header cost is to not transmit any header. Then, entropy adaptation is done dynamically, by recalculating shares after each update.&lt;br /&gt;
&lt;br /&gt;
Unfortunately, this method also has its costs. This time, speed is the price to pay. A lot of speed actually. Adjusting a single element requires renormalizing all others, which can be too much of a burden for a 256-element alphabet, such as a Byte.&lt;br /&gt;
&lt;br /&gt;
This is where the Binary Arithmetic Coder can help.&lt;br /&gt;
In contrast with previous coders, this one is limited to a 2-elements alphabet, 1 or 0. It's basically a yes/no switch. With this restriction in mind, the Binary Arithmetic Coder comes with a crucial advantage : a trivial update mechanism.&lt;br /&gt;
&lt;br /&gt;
A single value is enough to define the switch. It represents the share of 0, while 1 will take the rest. If 0 is very likely, the share will be high. If we believe that, on next update, the probability of 0 will be even higher, then it's enough to increase the value. Otherwise, decrease the value. That's it.&lt;br /&gt;
&lt;br /&gt;
With such a trivial update mechanism now in place, it's time to concentrate on the update rule. The question asked is "by how much should i increase/decrease the value" ? Simple question, but the answer to it will need quite some developments. And we'll keep that for another blog entry.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/U3k1FEB0-q0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/3038968502799661011/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2012/01/binary-arithmetic-coder.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/3038968502799661011?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/3038968502799661011?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/U3k1FEB0-q0/binary-arithmetic-coder.html" title="Binary Arithmetic Coder" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>5</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2012/01/binary-arithmetic-coder.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEINRns9eSp7ImA9WhRVEEo.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-2272950933698858807</id><published>2011-12-19T16:27:00.000+01:00</published><updated>2012-01-09T03:36:37.561+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-09T03:36:37.561+01:00</app:edited><title>LZ4 into Hadoop-MapReduce</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://paulbarsch.files.wordpress.com/2011/11/mapreduce.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="142" src="http://paulbarsch.files.wordpress.com/2011/11/mapreduce.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;After a very fast evaluation, &lt;a href="http://fastcompression.blogspot.com/p/lz4.html" target="_blank"&gt;LZ4&lt;/a&gt; has been recently integrated into the Apache project &lt;a href="http://hadoop.apache.org/" target="_blank"&gt;Hadoop - MapReduce&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
This is an important news, since, in my humble opinion, Hadoop is among the most advanced and ambitious projects to date (an opinion which is &lt;a href="http://www.guardian.co.uk/technology/2011/mar/25/media-guardian-innovation-awards-apache-hadoop" target="_blank"&gt;shared by some&lt;/a&gt;). It also serves as an excellent&amp;nbsp; illustration of LZ4 usage, as an in-memory compression algorithm for big server applications.&lt;br /&gt;
&lt;br /&gt;
But first, a few words on Hadoop.&lt;br /&gt;
By 2005, Google shook the IT world by presenting &lt;a href="http://en.wikipedia.org/wiki/BigTable" target="_blank"&gt;Big Table&lt;/a&gt;, its home-grown distributed database with &lt;a href="http://en.wikipedia.org/wiki/Eventual_consistency" target="_blank"&gt;eventual consistency&lt;/a&gt;, able to store virtually the entire web and queries it. BigTable was built on top of &lt;a href="http://en.wikipedia.org/wiki/Google_File_System" target="_blank"&gt;Google FS&lt;/a&gt;, a virtual file system covering the entire planet, tens of thousands of computers distributed in hundreds of datarooms all over the world, as if it was a single massive one. This limitless amount of stored data could then be processed in parallel, typically for query preparation, thanks to the &lt;a href="http://en.wikipedia.org/wiki/MapReduce" target="_blank"&gt;MapReduce &lt;/a&gt;framework, which allows to process &lt;a href="http://en.wikipedia.org/wiki/Petabyte" target="_blank"&gt;petabytes &lt;/a&gt;of data in a small amount of time (if you can afford the number of servers necessary for that).&lt;br /&gt;
&lt;br /&gt;
At this very moment, Google stealed the crown of programmatic champion from Microsoft. It was now clear that they were setting the future. Although most of these technologies were already studied, it was the first time they were executed together and so well, at such a huge scale for commercially available products. This gave Google literally years of advance over the competition, since most of its Web products were based on these capabilities.&lt;br /&gt;
&lt;br /&gt;
Since then, all other "big names" of IT, (namely Yahoo, Facebook, Amazon, IBM, Microsoft, Apple, etc.) have been willing to duplicate this architecture. The result of all these efforts finally converged into the open-source project Hadoop.&lt;br /&gt;
Hadoop now has most of the capabilities presented in 2005 by Google, including a Distributed File storage system (HDFS), a distributed Database (HBase), and the same distributed-computing framework as Google, MapReduce. &lt;br /&gt;
&lt;br /&gt;
So, where does that leave any place for &lt;a href="http://fastcompression.blogspot.com/p/lz4.html" target="_blank"&gt;LZ4 &lt;/a&gt;?&lt;br /&gt;
Well, in such architecture, compression is used as a performance enabler.&lt;br /&gt;
&lt;br /&gt;
As can be guessed, massive amounts of data are traveling between the different nodes. Moreover, each node is also processing a fair amount of data, more or less permanently.&lt;br /&gt;
In such situations, compression offers some advantages : less data to transfer means less time and cost to send/receive it. It also means that more data can be stored into RAM memory, or that more data can remain into the CPU cache. All this translates into better system speed.&lt;br /&gt;
&lt;br /&gt;
Or does it ? For this affirmation to be true, it is mandatory for the compression algorithm to be "unobtrusive", which means it should consume very little CPU cycles. Otherwise, the cost of compression can void or reverse the speed advantage. This basically means only fast compressors do qualify for the job.&lt;br /&gt;
&lt;br /&gt;
In the beginning, LZO was such a champion. It offered great speed, however with some important usage limitations, due to its GPL license.&lt;br /&gt;
Then early 2011, Google released Snappy, ex-zippy, the very same algorithm used by Google in its own BigTable implementation. It quickly became a great alternative, thanks to its better licensing policy and better performance.&lt;br /&gt;
&lt;br /&gt;
LZ4 was also released this year, just after Snappy. Google's notoriety means there was basically little attention left for competing algorithms. But it also raised awareness that Fast compression algorithms have a role in IT architecture. LZ4 gradually improved overtime, to the point of providing now better performance than Google's creation. One Hadoop's contributors, &lt;a href="https://issues.apache.org/jira/browse/HADOOP-7657" target="_blank"&gt;Binglin Chang, made the effort to implement LZ4&lt;/a&gt; as a JNI patch, and compare it directly to Snappy. LZ4 &lt;a href="https://issues.apache.org/jira/browse/HADOOP-7657?focusedCommentId=13170733&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13170733" target="_blank"&gt;performance was found better &lt;/a&gt;than Snappy, even when using Snappy's own set of calibration tests.&lt;br /&gt;
In a relatively quick decision process, the LZ4 patch was then &lt;a href="https://issues.apache.org/jira/browse/HADOOP-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;amp;focusedCommentId=13171869#comment-13171869" target="_blank"&gt;integrated into the main Hadoop - MapReduce source trunk&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;/* &lt;u&gt;Update &lt;/u&gt;: Google's Snappy developer kindly reminds that benchmark figures depend on the tested configuration, and states that on their own test platform, Snappy keeps an edge with regards to compression speed. See comment : &amp;nbsp;&lt;a href="http://fastcompression.blogspot.com/2011/12/lz4-into-hadoop-mapreduce.html?showComment=1326071743955#c7649536680897239608"&gt;http://fastcompression.blogspot.com/2011/12/lz4-into-hadoop-mapreduce.html?showComment=1326071743955#c7649536680897239608&lt;/a&gt;&amp;nbsp;*/&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
The advantage of using fast compression algorithms, as does Hadoop, can be replicated into many server-side applications, for example DataBases. Recently, column-oriented databases have been dragging attention, since they make heavy usage of compression to grab some impressive performance advantage. The idea remains the same : compress data to keep more of it into RAM and into CPU cache : it directly translates into better performance.&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/GKO3a-JE0PQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/2272950933698858807/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2011/12/lz4-into-hadoop-mapreduce.html#comment-form" title="12 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/2272950933698858807?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/2272950933698858807?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/GKO3a-JE0PQ/lz4-into-hadoop-mapreduce.html" title="LZ4 into Hadoop-MapReduce" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>12</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2011/12/lz4-into-hadoop-mapreduce.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEcARX4_eip7ImA9WhVTFkk.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-1258481742661732755</id><published>2011-12-12T21:35:00.000+01:00</published><updated>2012-03-02T00:54:04.042+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-03-02T00:54:04.042+01:00</app:edited><title>Advanced Parsing Strategies</title><content type="html">&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://psycnet.apa.org/journals/abn/112/4/images/thumb_abn_112_4_578_fig5a.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="200" src="http://psycnet.apa.org/journals/abn/112/4/images/thumb_abn_112_4_578_fig5a.jpg" width="149" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;Getting better compression ratio with LZ algorithms requires more than looking for long matches. It also requires some "parsing".&lt;br /&gt;
&lt;br /&gt;
To explain it simply, one would assume that once he finds a "best possible match" at current position, he just has to encode it and move forward.&lt;br /&gt;
&lt;br /&gt;
But in fact, there are better possibilities. The naive encoding strategy is called&amp;nbsp;"greedy match". A proven better proposition is to check if, at next position, a better (i.e. longer) match exists. If it does, then the current match is dropped in favor of the next one. This is called "lazy match", and typically improves the compression ratio by 1-2%, which is not bad considering that the &amp;nbsp;compression format remains unaffected.&lt;br /&gt;
&lt;br /&gt;
Lazy match is simple enough to understand, and clearly illustrates the trade-off at stake : compute more searches, in order to find better (longer) matches and improve compression ratio.&lt;br /&gt;
&lt;br /&gt;
On the other extreme side, there is a strategy called "Optimal parsing", which consists in computing a serie of matches at each position, and then select the best ones by a reverse traversal computation (like a "shortest path" algorithm). Optimal parsing requires some huge computation, especially at search step, since each and every position must be fully verified with a complete list of match candidates in ascending order.&lt;br /&gt;
&lt;br /&gt;
Not willing to pay too much on the CPU side, i've tried to find a middle way, which would mostly emulate the "optimal parsing" result but with a CPU cost closer to Lazy match.&lt;br /&gt;
&lt;br /&gt;
The main ideas is as follows :&lt;br /&gt;
Suppose i'm getting a match of length &lt;i&gt;ML &lt;/i&gt;at position &lt;b&gt;&lt;i&gt;P&lt;/i&gt;&lt;/b&gt;.&lt;br /&gt;
The only reason i would not want to encode it immediately is if it exists a better match between &lt;b&gt;&lt;i&gt;P&lt;/i&gt;&lt;/b&gt; and &lt;b&gt;&lt;i&gt;P+ML&lt;/i&gt;&lt;/b&gt;.&lt;br /&gt;
We take the assumption that, at position &lt;b&gt;&lt;i&gt;P&lt;/i&gt;&lt;/b&gt;, we have found the best possible match. Then, the smallest possible "better match" must have a length &lt;i&gt;ML+1&lt;/i&gt;&amp;nbsp;and start and position &lt;b&gt;&lt;i&gt;P+1&lt;/i&gt;&lt;/b&gt;.&lt;br /&gt;
Consequently, the "smallest and closest better match" must end at position (P+1) + (ML+1) =&amp;nbsp;&lt;b style="font-style: italic;"&gt;P+ML+2&lt;/b&gt;.&lt;br /&gt;
&lt;br /&gt;
Therefore, should such a better match exist, i'll find it by searching for the sequence stored at position &lt;b&gt;&lt;i&gt;P+ML+2-MINMATCH&lt;/i&gt;&lt;/b&gt;. Moreover, should any other better match exist, it will necessary include the sequence stored at position&amp;nbsp;&lt;i style="font-weight: bold;"&gt;P+ML+2-MINMATCH. &lt;/i&gt;So i will get all of them, and consequently the best of them, by simply searching for this sequence.&lt;br /&gt;
&lt;br /&gt;
Compared to Optimal Parsing, which requires a full search at each position, this is a huge speed improvement.&lt;br /&gt;
&lt;br /&gt;
However, the search is more complex than it sounds. To begin with, the "longest match" may start at any position between&lt;b&gt;&lt;i&gt; P&lt;/i&gt;&lt;/b&gt; and &lt;b&gt;&lt;i&gt;P+ML+2-MINMATCH&lt;/i&gt;&lt;/b&gt;. To find it, it will be necessary to check the match length in both forward and backward direction.&lt;br /&gt;
It means it's not possible to use advanced match finders, such as &lt;a href="http://en.wikipedia.org/wiki/Binary_search_tree" target="_blank"&gt;BST&lt;/a&gt; or &lt;a href="http://code.google.com/p/mmc/" target="_blank"&gt;MMC&lt;/a&gt;, which tend to eliminate unpromising candidates. It's not suitable here,&amp;nbsp;since the total match length may be achieved thanks to backward direction. Therefore, each and every stream which contains the searched sequence must be checked.&lt;br /&gt;
&lt;br /&gt;
In these circumstances, search algorithm is basically limited to Hash Chain, which is only suitable for "short range" searches. So, it's a first limitation.&lt;br /&gt;
&lt;br /&gt;
If no better match is found, i can safely write the current best match, and then start again the search.&lt;br /&gt;
&lt;br /&gt;
If a better match is found, it means we have overlapping matches, with the second one being better than the first.&amp;nbsp;It would be tempting to simply shorten the first match, in order to "make room" for the longer one, and then start again the process, but alas it's more complex than that. This decision will depend on P3.&lt;br /&gt;
&lt;br /&gt;
Let's analysed the situation more thoroughly.&lt;br /&gt;
We have 3 overlapping matches, at positions P1, P2, &amp;amp; P3, with :&lt;br /&gt;
ML3 &amp;gt; ML2 &amp;gt; ML1&lt;br /&gt;
P2 &amp;lt;= E1 (=P1+ML1)&lt;br /&gt;
P3 &amp;lt;= E2 (=P2+ML2)&lt;br /&gt;
&lt;br /&gt;
If P3&amp;lt;E1, then P2 is discarded, P3 becomes P2, and we search a new P3.&lt;br /&gt;
If P3=E1, then P1 can be encoded, P2 is discarded, P3 becomes P1, and we continue the search.&lt;br /&gt;
&lt;br /&gt;
Obviously, the situation is less trivial in the more common situation when P3 &amp;gt; E1.&lt;br /&gt;
&lt;br /&gt;
If P3 &amp;lt; E1 + VerySmallLength, then it's very probable that P2 will not pay for itself. A match costs an offset and a length, this can cost more than a few literals. Dealing with this case is almost equivalent to P3=E1, so we'll consider it closed.&lt;br /&gt;
&lt;br /&gt;
If P3-P2 &amp;gt; ML1, then we know that the 2nd match will be longer than the 1st one. So we can proceed with shortening P1&amp;nbsp;length&amp;nbsp;to ML1=P2-P1. Then we can encode P1, P2 becomes P1, P3 becomes P2, and we continue the search.&lt;br /&gt;
&lt;br /&gt;
So now we are left with the difficult case. We have 3 consecutive matches, the limit between P3 and P2 is not yet settled (we would need P4 to decide), and depending on this decision, it impacts the limit between P1 and P2.&lt;br /&gt;
&lt;br /&gt;
Let's give an example :&lt;br /&gt;
&lt;br /&gt;
P1 = 0&lt;br /&gt;
ML1 = 8&lt;br /&gt;
E1 = P1 + ML1 = 8&lt;br /&gt;
&lt;br /&gt;
P2 = 5&lt;br /&gt;
ML2 = 10&lt;br /&gt;
E2 = P2 + ML2 = 15&lt;br /&gt;
&lt;br /&gt;
P3 = 12&lt;br /&gt;
ML3 = 12&lt;br /&gt;
&lt;br /&gt;
P3-E1 = 4 : This is enough to "pay for P2"&lt;br /&gt;
&lt;br /&gt;
But should i do :&lt;br /&gt;
ML1 = 8 ; ML2 = 4&lt;br /&gt;
or&lt;br /&gt;
ML1 = 5; ML2 = 7 ?&lt;br /&gt;
&lt;br /&gt;
They look equivalent, but they are not, since the entropy difference between 4 and 5 is likely to be higher than the entropy difference between 7 and 8.&lt;br /&gt;
Therefore, if we keep the objective of a total length of 12, the first choice is the best one.&lt;br /&gt;
&lt;br /&gt;
However, it's not yet sure that P3 will effectively starts at 12. It all depends on P4, which is not yet known.&lt;br /&gt;
For example, if P4=E2=15, then obviously P3 disappears, and the better choice is ML1 = 5, ML2 = 10.&lt;br /&gt;
But, if no P4 is found, then P3=12, and the better choice is ML1 = 8, ML2 = 4.&lt;br /&gt;
&lt;br /&gt;
So, if we want to decide now, we have to accept that the best choice is not yet known, and we have to settle with something sub-optimal. As we are already working on an approximation, this should not be considered a big issue, as long as the approximation is "good enough".&lt;br /&gt;
For programming simplification, we'll assume that the second solution is the better one, shortening ML1 as is necessary to make room for P2.&lt;br /&gt;
It means we can now encode P1. P2 becomes P1, P3 becomes P2, and we continue the process.&lt;br /&gt;
&lt;br /&gt;
So here we have an algorithm, which tries to create a "nearly optimal" match distribution, taking local decisions to favor longer matches.&lt;br /&gt;
As a great property, it searches only N+1 list of matches per serie of N consecutive matches, which is a great time saver compared to Optimal Parsing.&lt;br /&gt;
&lt;br /&gt;
Time for some evaluation. How well does it fare ?&lt;br /&gt;
&lt;br /&gt;
Well, compared to the deep lazy match strategy implemented into &lt;a href="http://fastcompression.blogspot.com/p/zhuff.html" target="_blank"&gt;Zhuff &lt;/a&gt;(-c2 mode), surprisingly little. The compression ratio barely improved by about 1%. At least, it's a gain...&lt;br /&gt;
&lt;br /&gt;
There are several reasons to explain this disappointing result.&lt;br /&gt;
&lt;br /&gt;
First, the &lt;a href="http://fastcompression.blogspot.com/2011/05/zhuffmax-parsing-improvements.html" target="_blank"&gt;deep lazy match strategy of Zhuff v0.8 -c2&lt;/a&gt; is already an advanced parsing algorithm. It heavily favors larger matches, and is complemented by a small memory to re-fill skipped space. So it's pretty good actually, which limits remaining potential savings.&lt;br /&gt;
&lt;br /&gt;
But the Deep Lazy match strategy of Zhuff 0.8 requires on average 3xN forwards searches per N match. This is in contrast with the strategy in this post, which only requires N+1 searches, albeit more complex ones (forwards &amp;amp; backward). As a consequence, the new strategy is &lt;i&gt;&lt;b&gt;50% faster &lt;/b&gt;&lt;/i&gt;than the previous deep lazy match one. Now it looks better.&lt;br /&gt;
&lt;br /&gt;
Second, the proposed strategy is only based on maximizing match length, deliberately forgetting any other cost contributor, such as, for example, offset length.&lt;br /&gt;
In a variable offset representation, small offsets are very likely to cost less, if not much less than larger ones. It's still possible to use this property "tactically", by using the smallest known offset able to provide the selected match length, but that's just an opportunistic gain, not an advantage that the selection algorithm takes into consideration.&lt;br /&gt;
Moreover, the full list of candidates can only be built for the first position P1. For the next ones (P2, P3), only a reduced list is likely to be found, since we learn P2 and P3 positions during the backward search. As a consequence, quite frequently, only a single offset is known (the longest one). This further limits the "opportunistic" offset selection gain.&lt;br /&gt;
&lt;br /&gt;
For Zhuff, the offset cost vary from 9 to 22 bits. More realistically, it hovers between 12 and 21 bits. That's more than one byte of difference. Knowing that a literal is also compressed, with an average cost of about 6 bits per literal (depending on source file), then a long offset costs quite more than a small offset + 1 literal.&lt;br /&gt;
&lt;br /&gt;
In fact, to select the better choice, it would be necessary to properly estimate, at each step, the cost of each contributor, offset, match length, literals and flags.&lt;br /&gt;
&lt;br /&gt;
Well, that's the whole point of Optimal Parsing...&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/sDac07O1kzo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/1258481742661732755/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2011/12/advanced-parsing-strategies.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/1258481742661732755?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/1258481742661732755?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/sDac07O1kzo/advanced-parsing-strategies.html" title="Advanced Parsing Strategies" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2011/12/advanced-parsing-strategies.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkIMSXwzeCp7ImA9WhNVFU4.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-5035695301776988234</id><published>2011-12-04T02:25:00.001+01:00</published><updated>2012-12-26T17:16:28.280+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-12-26T17:16:28.280+01:00</app:edited><title>Fast sequence comparison</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://ars.els-cdn.com/content/image/1-s2.0-S016747819700170X-gr4.gif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="88" src="http://ars.els-cdn.com/content/image/1-s2.0-S016747819700170X-gr4.gif" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;If there is one thing that most Compression algorithms must do well and fast, it is comparing byte sequences. Finding a match depends on it, and finding the best match requires numerous comparisons to be&amp;nbsp;realized.&lt;br /&gt;
&lt;br /&gt;
A straightforward way to achieve this function using C code would look like this :&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;while (*(bytePos+len) == *(ComparePos+len)) len++;&lt;/span&gt;&lt;/blockquote&gt;
It works well, and is trivial to understand. However, we are giving away a critical property of modern CPU : they can process whole WORD in a single step. For 32 bits CPU, it means that 4 Bytes could be compared in a single instruction. This is expectedly faster than loading and comparing 4 times each single byte.&lt;br /&gt;
&lt;br /&gt;
An optimised comparison function then becomes like this ;&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;while (*(int32*)(bytePos+len) == *(int32*)(ComparePos+len)) len+=4;&lt;/span&gt;&lt;/blockquote&gt;
&lt;blockquote class="tr_bq"&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;if (*(int16*)(bytePos+len) == *(int16*)(ComparePos+len)) len+=2;&lt;/span&gt;&lt;/blockquote&gt;
&lt;blockquote class="tr_bq"&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;if (*(bytePos+len) == *(ComparePos+len)) len++;&lt;/span&gt;&lt;/blockquote&gt;
While more complex, this version will provide better performance, especially for longer matches.&amp;nbsp;It has however two drawbacks.&lt;br /&gt;
&lt;br /&gt;
The first problem comes from the very need to compare int32 WORD values. Most modern CPU will have no problem with that, but some, such as ARM, will require these values to be &lt;i&gt;aligned&lt;/i&gt;. This means that the WORD value must start at a boundary which is a multiple of 4. Well, since compression algorithms have to compare sequences which start at arbitrary position, this condition cannot be&amp;nbsp;accommodated. ARM will have to live with the simpler comparison loop.&lt;br /&gt;
&lt;br /&gt;
The second problem comes from the trailing comparisons. On leaving the main loop, since we know that we have less than 4 identical bytes, we still need to check if they are 0, 1, 2 or 3. This can be optimally achieved by using 2 comparisons.&lt;br /&gt;
&lt;br /&gt;
Not bad, but still, it means there is a minimum of 3 comparisons to exit this function, and comparisons aren't good for &lt;a href="http://en.wikipedia.org/wiki/Instruction_pipeline" target="_blank"&gt;CPU pipelines&lt;/a&gt;, especially unpredictable ones. The CPU will &lt;a href="http://en.wikipedia.org/wiki/Branch_prediction" target="_blank"&gt;try to "guess"&lt;/a&gt; the result of these comparisons, in order to keep its pipeline busy by speculatively executing the next instructions. But if it fails, it will have to stall and flush the whole pipeline, resulting in significant performance penalty.&lt;br /&gt;
&lt;br /&gt;
For this very reason, it can be preferable to use mathematical&amp;nbsp;formulas&amp;nbsp;to get the same result, even when they are more complex, since they avoid branching, ensuring a predictable CPU throughput.&lt;br /&gt;
&lt;br /&gt;
In our situation, we want to get rid of the trailing comparisons and end up with a mathematical formula which gives us the number of continuous identical bytes, starting from the first one. We'll consider that we live in a little endian world in the next part of this post, then the first byte becomes the lowest one.&lt;br /&gt;
&lt;br /&gt;
We know that at least one bit is different, since otherwise we would still be in the main loop.&lt;br /&gt;
To find this bit, we will use a simple XOR operation between the compared sequences :&lt;br /&gt;
&lt;br /&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;difference = *(int32*)bytePos ^*(int32*)comparePos;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
If (difference==0), then both sequences are identical.&lt;br /&gt;
Since they are not, one bit 1&amp;nbsp;at least&amp;nbsp;exists.&amp;nbsp;Finding how many bytes are identical between the 2 sequences is now equivalent to finding the rank of the smallest bit 1.&lt;br /&gt;
&lt;br /&gt;
A naive strategy to find lowest bit 1 rank would be to test bits recursively, such as :&lt;br /&gt;
&lt;br /&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;while ((difference&amp;amp;1)==0) { difference&amp;gt;&amp;gt;=1; rank++; }&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
But obviously, this is not going to be our strategy, since we end up with even more comparisons than we started with.&lt;br /&gt;
&lt;br /&gt;
Enters Nicolaas De Bruijn.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://upload.wikimedia.org/wikipedia/commons/a/af/Nicolaas_de_Bruijn.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="200" src="http://upload.wikimedia.org/wikipedia/commons/a/af/Nicolaas_de_Bruijn.jpg" width="139" /&gt;&lt;/a&gt;&lt;/div&gt;
The &lt;a href="http://en.wikipedia.org/wiki/De_Bruijn_sequence" target="_blank"&gt;De Bruijn sequence&lt;/a&gt; will help us to transform this problem into a mathematical formula. It states that, given an alphabet A with k elements, there is at least one cyclic sequence C within which any possible sub-sequence of size n using alphabet A exists exactly once. It even provides a methodology to build one.&lt;br /&gt;
Such a cyclic sequence can become terribly large. We are lucky that for computers, A is just a 2 elements alphabet, bits1 &amp;amp; 0. But we still need to manage n.&lt;br /&gt;
&lt;br /&gt;
We'll do so by keeping&amp;nbsp;only&amp;nbsp;the lowest bit 1 from the xor'ed difference vector. It can be achieved thanks to a simple trick :&lt;br /&gt;
&lt;br /&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;LowestBitOne = difference &amp;amp; -difference;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
It works because we are in a &lt;a href="http://en.wikipedia.org/wiki/Two%27s_complement" target="_blank"&gt;two's complement&lt;/a&gt; world. To summarize, given a value (i), its negative value (-i) can be obtained by&amp;nbsp;inverting&amp;nbsp;all bits, and then adding 1. As a consequence, all bits will be set to zero (since 1 &amp;amp; 0 = 0) except the last (lowest) bit 1, due to the +1.&lt;br /&gt;
&lt;br /&gt;
Now, only the lowest bit 1 remains,&amp;nbsp;which means we have only 32 possible values to consider, and they are all 2^N.&lt;br /&gt;
&lt;br /&gt;
Thanks to the De Bruijn theorem, we now can map these 32 values into a small table of size 32.&lt;br /&gt;
We will create&amp;nbsp;a DeBruijn bit sequence&amp;nbsp;which maps all values from 0 to 31 into it (n=5). Since multiplying by 2^N is equivalent to left-shifting by N, the analogy with DeBruijn becomes obvious : we want a bit sequence which, when shifted left bit by bit, produces all possible values between 0 and 31 exactly once.&lt;br /&gt;
&lt;a href="http://en.wikipedia.org/wiki/File:De_bruijn_graph-for_binary_sequence_of_order_4.svg" target="_blank"&gt;This image &lt;/a&gt;provides a construction method to build such a bit sequence, based on &lt;a href="http://en.wikipedia.org/wiki/Hamiltonian_path" target="_blank"&gt;Hamiltonian path&lt;/a&gt;. It's a bit complex, so here is one such solution for 5 bits :&lt;br /&gt;
&lt;br /&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;00000111011111001011010100110001&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
In theory, the sequence is supposed to be cyclic, but since we will only shift left,&amp;nbsp;we fill the higher bits with 0&amp;nbsp;in order to "simulate" cyclic behavior.&lt;br /&gt;
It might not look obvious that all values from 0 to 31 are present in this chain, so you can also try it for yourself to verify it.&lt;br /&gt;
&lt;br /&gt;
Knowing the serie generated by shifting the above De Bruijn sequence is now equivalent to knowing the shift which was used. So&amp;nbsp;it provides the result we are looking for with a simple table lookup :&lt;br /&gt;
&lt;br /&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;DeBruijnIndex = (0x077CB531 *&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;LowestBitOne&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;) &amp;gt;&amp;gt; 27;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New',Courier,monospace;"&gt;Result = DeBruijnTable[DeBruijnIndex];&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
And here we are, we have changed 2 comparisons into 3 mathematical operations and one table lookup. Is that a gain ?&lt;br /&gt;
&lt;br /&gt;
Well, according to &lt;a href="http://code.google.com/p/lz4/" target="_blank"&gt;LZ4 &lt;/a&gt;latest benchmark, it seems it is. Gains vary from negligible to measurable, but always positive, so on average it seems a step into the right direction.&lt;br /&gt;
This trick is now integrated into &lt;a href="http://code.google.com/p/lz4/" target="_blank"&gt;LZ4 r41 and later versions&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/nrbSj8M5Jk0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/5035695301776988234/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2011/12/fast-sequence-comparison.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/5035695301776988234?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/5035695301776988234?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/nrbSj8M5Jk0/fast-sequence-comparison.html" title="Fast sequence comparison" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>3</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2011/12/fast-sequence-comparison.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUBR3c9cSp7ImA9WhRRFU8.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-3887063303952118541</id><published>2011-11-28T21:19:00.001+01:00</published><updated>2011-11-29T00:50:56.969+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-29T00:50:56.969+01:00</app:edited><title>Zhuff get upgraded (v0.8)</title><content type="html">&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://sd-1.archive-host.com/membres/images/182754578/ZhuffMax.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="150" src="http://sd-1.archive-host.com/membres/images/182754578/ZhuffMax.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;As a first implementation of the &lt;a href="http://fastcompression.blogspot.com/2011/11/format-for-compressed-streams-part-2.html" target="_blank"&gt;recently proposed compressed stream format&lt;/a&gt;, here comes a new version of &lt;a href="http://fastcompression.blogspot.com/p/zhuff.html" style="font-weight: bold;" target="_blank"&gt;Zhuff, v0.8&lt;/a&gt;. It was indeed my main target, since the previous format was incompatible by design with Pipe mode.&lt;br /&gt;
&lt;br /&gt;
Therefore, this new version comes with all the features recently introduced into &lt;a href="http://fastcompression.blogspot.com/p/lz4.html" target="_blank"&gt;LZ4&lt;/a&gt;, such as :&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;All compression levels into a single binary (this removes the need for separate Zhuff-HC and Zhuff-Max binaries)&lt;/li&gt;
&lt;li&gt;Pipe mode support&lt;/li&gt;
&lt;li&gt;Windows Installer&lt;/li&gt;
&lt;li&gt;Context Menu integration (windows installer version only)&lt;/li&gt;
&lt;li&gt;Directory Compression with shar (windows installer version only)&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
and some other minor improvements :&lt;br /&gt;
- New benchmark mode with multiple files&lt;br /&gt;
- Overwrite confirmation mode&lt;br /&gt;
- Silent/Verbose modes&lt;br /&gt;
- Pause on exit&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://fastcompression.blogspot.com/p/zhuff.html" target="_blank"&gt;Zhuff &lt;/a&gt;algorithm itself got a small change from v0.7.&lt;br /&gt;
It introduces an early "entropy estimator", which tries to evaluate if it is worthwhile to compress a sub-stream using entropy coder &lt;a href="http://fastcompression.blogspot.com/p/huff0-range0-entropy-coders.html" target="_blank"&gt;Huff0&lt;/a&gt;. It works pretty well, and save some CPU cycles, however effect remains small, barely noticeable, at about 2-3% more speed.&lt;br /&gt;
Therefore, this version main focus is about bringing more features.&lt;br /&gt;
&lt;br /&gt;
You can &lt;a href="http://fastcompression.blogspot.com/p/zhuff.html" target="_blank"&gt;download it here&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/9ZKtj7Zqg9c" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/3887063303952118541/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2011/11/zhuff-get-upgraded.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/3887063303952118541?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/3887063303952118541?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/9ZKtj7Zqg9c/zhuff-get-upgraded.html" title="Zhuff get upgraded (v0.8)" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2011/11/zhuff-get-upgraded.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0QBRno9fyp7ImA9WhRbGEw.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-734816986416137697</id><published>2011-11-24T23:07:00.003+01:00</published><updated>2012-02-09T20:22:37.467+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-09T20:22:37.467+01:00</app:edited><title>A format for compressed streams, part 2</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://doc.xceedsoft.com/products/Xceedfilesystem/Sources/CompressedStream.gif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="57" src="http://doc.xceedsoft.com/products/Xceedfilesystem/Sources/CompressedStream.gif" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;Following my &lt;a href="http://fastcompression.blogspot.com/2011/11/format-for-compressed-file.html" target="_blank"&gt;recent post&lt;/a&gt;, here is what i could come up with after a few minutes of thinking : 2 bytes might be enough for my need.&lt;br /&gt;
&lt;br /&gt;
Well, mostly. To begin with, a magic number is still necessary to distinguish a valid stream from random garbage. 4 bytes seems the standard way to go for that role. By using an unusual bit layout, it should make stream confusion pretty rare (almost 1 in 2^32= 4 billions if correctly selected).&lt;br /&gt;
&lt;br /&gt;
Since a stream identity must lead to a compatible compression decoder, this gave me the idea of merging decoder version with magic number. The more the number of supported versions, the less efficient is the 4 bytes header at distinguishing "garbage" streams. But well, since we start at 32 bits, even a reduced identifier length (28-30 bits) should do its job properly.&lt;br /&gt;
&lt;br /&gt;
The next thing we need is a header version. This is different from decoder version, since we may add new properties to the stream format while not modifying the compression decoder itself. It's impossible to guess how many versions will be developed in the future, so let's settle with 2 bits for now (4 values) and reserve a 3rd bit just in case. Even if we run out of version number, it will be possible to use one of the last values to add a few bytes to the header, where version number can be extended.&lt;br /&gt;
&lt;br /&gt;
Next thing, we need to retrieve the size of the (uncompressed) stream. Well, if it can be known. That's not so sure by the way, since in a pipe scenario, there is no way to "probe" for the size of the input stream.&lt;br /&gt;
So there is a bit to tell if the stream size is provided into the header. By the way, how much bytes should be used for this size ? 8 Bytes is all around (that's 16 &lt;a href="http://en.wikipedia.org/wiki/Exabyte"&gt;Exabytes&lt;/a&gt;, no less), but maybe too much in most circumstances, where 4-bytes (4 Gigabytes) is good enough. So i'll use 2 bits to distinguish between "No Size provided", 8-Bytes size and 4-Bytes size. While at it, since&amp;nbsp;I've&amp;nbsp;one more available&amp;nbsp;value, i'll open the possibility for 2-Bytes size (64KB).&lt;br /&gt;
&lt;br /&gt;
OK, so now, maybe the stream is cut into multiple&amp;nbsp;independent&amp;nbsp;pieces, called "chunks", for easier multi-threading and random access ? I need another bit to tell.&lt;br /&gt;
&lt;br /&gt;
If there are several "chunks", what are their size ?&lt;br /&gt;
For simplicity, in this first version, i will only allow identical  "uncompressed" chunk sizes into a single stream. But which one ?&lt;br /&gt;
Although&amp;nbsp;I've&amp;nbsp;got my own idea for file-compression scenario with&amp;nbsp;&lt;a href="http://fastcompression.blogspot.com/p/zhuff.html"&gt;Zhuff &lt;/a&gt;compression, maybe some users will have different needs. The smaller the chunk size, the worse the compression ratio, but the better for memory usage and random access. So, in order to open the stream container format to new usage scenarios, i'm willing to take a risk, and make chunk size selectable.&lt;br /&gt;
What values should be&amp;nbsp;authorized&amp;nbsp;?&lt;br /&gt;
For a first version, a sensible simplification is to only allow sizes which are power of 2, so a few bits will be enough.&lt;br /&gt;
Anyway, a "chunk size" should not be too small, otherwise compression ratio can be so bad that it is hardly effective. So let's say that the minimum chunk size shall be 1KB.&lt;br /&gt;
&lt;br /&gt;
OK, so if we use "chunk size = 0" to tell that there is no chunk (single continuous stream), in 4 bits, it is possible to represent any 2^n value from 1KB to 16MB, which seems enough for my needs. Let's keep a 5th bit in reserve, just in case new usages require larger chunks.&lt;br /&gt;
&lt;br /&gt;
Fine,&amp;nbsp;I've&amp;nbsp;got my main stream parameters. I still need some bits for other planned functions, such as the presence of an offset table for random access, header and data chunk checksums, and alignment rules. But there are still enough bits for that within a 2 bytes option header.&lt;br /&gt;
&lt;br /&gt;
So, it gives :&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;4 Bytes&lt;/b&gt; : Base Magic Number + Decoder Version&lt;br /&gt;
&lt;b&gt;2 Bytes&lt;/b&gt; : Options (detailed)&lt;br /&gt;
- bits 0-1 : Header Version&lt;br /&gt;
- bit 2 : Reserved (extended version number) (must be zero)&lt;br /&gt;
- bits 3-6 : Chunk Size (or no chunk)&lt;br /&gt;
- bit 7 : Reserved (extended chunk size)&amp;nbsp;(must be zero)&lt;br /&gt;
- bits 8-9 : Stream Uncompressed Size format (0, 2, 4 or 8 Bytes)&lt;br /&gt;
- bits 10-15 : Reserved (Offset Table, Checksums, Alignment) (must be zero)&lt;br /&gt;
&lt;b&gt;0, 2, 4 or 8&lt;/b&gt; &lt;b&gt;Bytes &lt;/b&gt;: Stream Uncompressed Size&lt;br /&gt;
&lt;br /&gt;
Then repeat N Times (with N = Number of chunks)&lt;br /&gt;
&lt;b&gt;4 &lt;/b&gt;&lt;b&gt;Bytes&amp;nbsp;&lt;/b&gt;: Compressed Chunk Size&lt;br /&gt;
&lt;b&gt;Data Chunk&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
So far so good. This format even allows me to naturally handle non-compressed chunks : if "compressed chunk size = uncompressed chunk size", then obviously chunk is uncompressed.&lt;br /&gt;
&lt;br /&gt;
But a non trivial difficulty emerges : should the Stream Uncompressed Size be unknown,&amp;nbsp;what happens for&amp;nbsp; the last chunk&amp;nbsp;?&lt;br /&gt;
Well, in this situation, the last chunk's uncompressed size is unknown.&lt;br /&gt;
This does not happen if we know Stream Uncompressed Size : we can automatically calculate the number of chunks, and the uncompressed size of the last chunk.&lt;br /&gt;
&lt;br /&gt;
To solve this situation, i will "flag" the last chunk. Let's use the highest bit for the flag. In this case, "compressed chunk size" will be followed by a 4-Bytes value, which is the "uncompressed chunk size". Problem solved.&lt;br /&gt;
&lt;br /&gt;
[&lt;i&gt;Edit&lt;/i&gt;] &lt;br /&gt;
Another trick regarding the last chunk would be to detect its end simply because we reach the end of input (EOF marker). Then, no flag would be needed. The "compressed size" could also be easily calculated, since the end of the compressed stream is known. It would save a few bytes.&lt;br /&gt;
&lt;br /&gt;
The problem is, it only works if the stream is the only data into the file or the pipe. Should it be followed by any other data, this detection method would not work.&lt;br /&gt;
&lt;br /&gt;
Could that happen ? Surely. For instance, multiple streams are stored into a single archive file. A potential way to solve the problem is then to require the "higher" format to know the size of each compressed streams. It would then be in charge to tell it to the stream decoder. &lt;br /&gt;
Such capability seems common for an encapsulating format, but i feel nonetheless uneasy with the requirement...&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/nHX_mz-qg4k" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/734816986416137697/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2011/11/format-for-compressed-streams-part-2.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/734816986416137697?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/734816986416137697?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/nHX_mz-qg4k/format-for-compressed-streams-part-2.html" title="A format for compressed streams, part 2" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2011/11/format-for-compressed-streams-part-2.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0MAR34zcCp7ImA9WhRSF0k.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-4752803209647698365</id><published>2011-11-18T22:41:00.001+01:00</published><updated>2011-11-20T00:30:46.088+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-20T00:30:46.088+01:00</app:edited><title>A format for compressed streams</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://sd-1.archive-host.com/membres/images/182754578/aviContainerS.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://sd-1.archive-host.com/membres/images/182754578/aviContainerS.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;I've been pretty careful at designing compression format &amp;amp; algorithm, but much less detailed when it comes to compressed stream format. A good example of this situation is given in &lt;a href="http://fastcompression.blogspot.com/2011/05/lz4-explained.html"&gt;this post&lt;/a&gt;, where &lt;a href="http://fastcompression.blogspot.com/2011/05/lz4-explained.html"&gt;LZ4 compressed format&lt;/a&gt; is precisely defined, but not a single word is&amp;nbsp;mentioned&amp;nbsp;regarding the stream format generated by&amp;nbsp;&lt;a href="http://fastcompression.blogspot.com/p/lz4.html"&gt;LZ4.exe&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
But there is a difference. The stream format has more&amp;nbsp;responsibilities&amp;nbsp;than "only" compression.&lt;br /&gt;
For a simple example, a stream typically starts with a "Magic Number", which hints the decoder that the stream being decoded is most probably a valid one : if the magic number does not match, then what follows will be nonsense to the decoder. It's one of many ways to avoid some basic error, such as feeding the decoder with random data.&lt;br /&gt;
&lt;br /&gt;
Format definitions are very important, since once settled, they are keys to create compatible applications. Therefore, once published, a format should either not change, or guarantee that any future change will not break existing applications. This is called forward and backward compatibility, and is not trivial.&lt;br /&gt;
Since a great deal of attention must be pushed into this definition, instead of rushing one, i want to underline in this post what are the objectives of the format, and how they are&amp;nbsp;achieved,&amp;nbsp;in an attempt to both clarify my mind and get comments.&lt;br /&gt;
&lt;br /&gt;
First definition : what is a stream ?&lt;br /&gt;
At its most basic level, a file satisfy the definition of a stream. These are ordered bytes, with a beginning and an end.&lt;br /&gt;
File is in fact a special stream with added properties. A file size can to be known beforehand. And in most circumstances, it is likely to be stored into a seekable media, such as an HDD.&lt;br /&gt;
But a stream is not limited to a file. It can also be a bunch of files grouped together (tar), or some dynamically generated data. It may be read from a streaming media, such as pipe, with no seek capability (no way to "jump" forward or backward) and no prior knowledge of its size : we'll learn that the stream is finished on hitting its end.&lt;br /&gt;
&lt;br /&gt;
OK, so now we are getting into the "hard" part : what are the missions of the Compressed Stream Format ?&lt;br /&gt;
Here is a list, from the top of my mind, sorted in priority order :&lt;br /&gt;
&lt;br /&gt;
1) Be compatible with both streaming and seekable media, in both directions (compression and decompression)&lt;br /&gt;
2) Detect valid streams from invalid ones&lt;br /&gt;
3) Designed for backward &amp;amp; forward compatibility&lt;br /&gt;
4) Control data validity (e.g. checksum)&lt;br /&gt;
5) Optionally slice the stream into multiple&amp;nbsp;independent&amp;nbsp;"chunks", for multi-threading purpose&lt;br /&gt;
6) Offer user control over chunk size&lt;br /&gt;
7) Allow to quick-jump to a specific location into the stream (&lt;i&gt;seekable media only&lt;/i&gt;)&lt;br /&gt;
8) Provide a way to correct transmission errors, or at least retrieve as much data as possible from a transmission error&lt;br /&gt;
9) Enforce alignment rules&lt;br /&gt;
&lt;br /&gt;
Options 1 to 5 seem really compulsory, while beyond that point, they may be questionable.&lt;br /&gt;
&lt;br /&gt;
There are other missions which i'm still unsure if they should join the list.&lt;br /&gt;
For example, should the stream format embed a definition of the compression algorithm ?&lt;br /&gt;
That's not what i'm doing currently : the "magic number" is also associated to a specific family of compatible decoders, and therefore already performs that role.&lt;br /&gt;
&lt;br /&gt;
Should it allow some user comments ?&lt;br /&gt;
Here, i'm expecting this feature to rather&amp;nbsp;be&amp;nbsp;part of an Archive format.&lt;br /&gt;
&lt;br /&gt;
What about file names ? Should there be a place for them into the format ?&lt;br /&gt;
In my opinion, this is exactly the role of an Archive format, listing, sorting, placing files names and directory structure. Furthermore, embedding a name into a compressed stream format disregards the fact that some streams are not single files.&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/QdF7z8DAsPo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/4752803209647698365/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2011/11/format-for-compressed-file.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/4752803209647698365?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/4752803209647698365?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/QdF7z8DAsPo/format-for-compressed-file.html" title="A format for compressed streams" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2011/11/format-for-compressed-file.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkQFQX8_eip7ImA9WhRSE0w.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-2198797850829117155</id><published>2011-10-22T00:30:00.002+02:00</published><updated>2011-11-14T23:38:30.142+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-14T23:38:30.142+01:00</app:edited><title>Progressive Hash Series : a new method to find repetitions</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.mr-think.com/res/default/180pxCuckoo.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="200" src="http://www.mr-think.com/res/default/180pxCuckoo.png" width="85" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;I've been investigating an old idea which was occupying my mind, in order to create a fast approximative match finder for long range searches. Although it took some time to "think" the problem correctly, in the end, the solution looks surprisingly simple.&lt;br /&gt;
&lt;br /&gt;
The idea starts from a familiar structure, a Hash Table.&lt;br /&gt;
In order to create a fast look-up algorithm, it is just needed to hash the "&lt;i&gt;minmatch&lt;/i&gt;" initial sequence, and store the position into the table. Later on, when checking for the same sequence, we'll find it in its cell.&lt;br /&gt;
&lt;br /&gt;
OK, that's fine, but even without collisions, it does only provide us with a single answer, which is the closest sequence starting with &lt;i&gt;minmatch &lt;/i&gt;bytes. But maybe, somewhere else farther, there is a better, i.e. longer match.&lt;br /&gt;
&lt;br /&gt;
Looking at these other possibilities can be done in several ways. A simple method consists in linking all positions sharing the same hash value into a list. It's called &lt;a href="http://en.wikipedia.org/wiki/Hash_chain"&gt;Hash Chain&lt;/a&gt;, and it works fairly well. But if the sequence is really redundant, the number of positions searched can become terribly high, and with increased depth of search, it becomes prohibitive.&lt;br /&gt;
&lt;br /&gt;
Hence the simple idea : in order to avoid waiting forever, the search is stopped after a number of attempts. The higher the number, the better the result. The lower the number, the faster the search. This is the trade-off.&lt;br /&gt;
Basically, it means that for large distance search, there is no shame in making a "partial search" in order to achieve acceptable speed.&lt;br /&gt;
&lt;br /&gt;
OK, so since the number of searches can be&amp;nbsp;arbitrarily&amp;nbsp;limited, why not storing them in a table in the first place ? The row number is given by the hash, and all corresponding positions are orderly stored into the row. This is much faster to run, since there are less memory jumps, and easier to setup.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.ross.net/compression/binary/p035_03_drac_lee5.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://www.ross.net/compression/binary/p035_03_drac_lee5.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;a href="http://www.blogger.com/"&gt;&lt;/a&gt;&lt;span id="goog_1822180919"&gt;&lt;/span&gt;&lt;span id="goog_1822180920"&gt;&lt;/span&gt;&lt;br /&gt;
This method is not so new, and has been especially&amp;nbsp;championed&amp;nbsp;in the early 90's by &lt;a href="http://www.ross.net/compression/"&gt;Ross Williams&lt;/a&gt;, for its compressor LZRW-3a. Therefore we'll call it LZRW.&lt;br /&gt;
LZRW structure has in my opinion 2 advantages : one is controlled speed, through the size of rows (=Nb of attempts), and the other one is controlled memory, through the selection of table size.&lt;br /&gt;
&lt;br /&gt;
This latest property is critical : most "full search" methods require a huge amount of memory to work properly. By "huge", i mean something in the range of 10x (with the exception of Hash Chains, but they are too slow for long distance searches anyway). So you want to search within a 256MB&amp;nbsp;dictionary&amp;nbsp;? You need more than 2.5GB of memory. Farewell 32 bits systems.&lt;br /&gt;
&lt;br /&gt;
One has to wonder : is it the right method ? Such amount of memory is simply&amp;nbsp;prohibitive. Maybe by accepting a "less perfect" search but using less memory, we may nonetheless achieve a better compression rate thanks to the use of longer search distances. This is a position defended by Bulat Ziganshin, for its compressor &lt;a href="http://www.freearc.org/"&gt;FreeArc&lt;/a&gt;. For this scenario, LZRW comes as an obvious candidate : you can for example setup a 256MB search table, and use it to look into a 1GB dictionary. Now, long distance searches look affordable !&lt;br /&gt;
&lt;br /&gt;
OK, so LZRW works, but there is no miracle : the longer the search, the more time it costs. In the end, the return on investment can be quite low, and with large number of searches (say for example 256), the search becomes so slow as becoming unpractical.&lt;br /&gt;
This is to be expected, all that is guaranteed by the table is that all elements share the same row, hence the same Hash Value. But that's it. So after finding an element with &lt;i&gt;minmatch &lt;/i&gt;common bytes, we'll test another, and another, and so on. This is wasteful. What we want after finding &lt;i&gt;minmatch &lt;/i&gt;bytes is to find another position with at least &lt;i&gt;minmatch+1&lt;/i&gt; common bytes.&lt;br /&gt;
&lt;br /&gt;
In order to avoid testing each and every position, several methods are possible. One is to build a Binary Tree on top of the row. It is very efficient. A particularly good implementation of this idea was made by Christian Martelock for its RZM compressor (now discontinued).&amp;nbsp;Unfortunately, it also means even more memory, consumed for the BT structure. And in the end, it is not so much different from a full BT structure over the entire search window, except that we lose full search capability.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://sd-1.archive-host.com/membres/images/182754578/grass200.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://sd-1.archive-host.com/membres/images/182754578/grass200.png" /&gt;&lt;/a&gt;&lt;/div&gt;
OK i'm looking for something simpler. After checking for a &lt;i&gt;minmatch &lt;/i&gt;candidate, i want to look for a &lt;i&gt;minmatch+1&lt;/i&gt; one, straight away. No search nor wandering.&lt;br /&gt;
This can be done quite simply : just hash the "&lt;i&gt;minmatch+1&lt;/i&gt;" sequence, and store its position directly into an hash Table.&lt;br /&gt;
OK, so there are several tables you say, one per sequence size ?&lt;br /&gt;
Well, why ? No, a single one.&lt;br /&gt;
&lt;br /&gt;
Just share the Hash Table among all the sequences. The simple idea here, is that a given position should be stored only once in the table, either with the hash of its "&lt;i&gt;minmatch&lt;/i&gt;" first bytes, or "&lt;i&gt;minmatch +1&lt;/i&gt;", or "&lt;i&gt;minmatch+2&lt;/i&gt;", well whatever.&lt;br /&gt;
So, which sequence size should be stored ? Well, just start with the "&lt;i&gt;minmatch&lt;/i&gt;" one. When a new equivalent sequence gets into the same table cell, we just move the old position at its "&lt;i&gt;minmatch+1&lt;/i&gt;" hash, replacing its previous slot with the new position. And next time a&amp;nbsp;"&lt;i&gt;minmatch+1&lt;/i&gt;" sequence is found, we move the old sequence to "&lt;i&gt;minmatch+2&lt;/i&gt;" and so on. So the position will move into the table, up to the point where it is considered "dead", either off limit, or because of elimination rule.&lt;br /&gt;
&lt;br /&gt;
We obviously have to take into consideration the natural&amp;nbsp;occurrence&amp;nbsp;of collisions into this search structure. And it can happen between any sequence size. For example, a newer "&lt;i&gt;minmatch&lt;/i&gt;" sequence could be willing to occupy the same slot as a different "&lt;i&gt;minmatch+2&lt;/i&gt;" sequence. Well, one has to move on, or be eliminated.&lt;br /&gt;
That's where different rules can be applied. The most basic one is that the youngest position always win. It's good enough, and i recommend it. But it can also be mixed with some more complex heuristic, to avoid losing long-distance anchors&amp;nbsp;too fast. Keeping in mind that this structure is created precisely to afford a partial search over a long distance, when there is not enough memory to keep one entry per position.&lt;br /&gt;
&lt;br /&gt;
So we've got our basic idea here, a cross-over between &lt;a href="http://fastcompression.blogspot.com/2010/11/cuckoo-hashing.html"&gt;cuckoo hashing&lt;/a&gt; and progressively longer hashed sequence. Hence the proposed name, &lt;b&gt;&lt;i&gt;Progressive Hash Series&lt;/i&gt;&lt;/b&gt;.&lt;br /&gt;
&lt;br /&gt;
Now is time to put this idea into practice. I've been creating a simple long-range match finder based on greedy matching strategy. It's a simple and fast way to compare above methods. The file tested is &lt;a href="http://mattmahoney.net/dc/textdata.html"&gt;&lt;b&gt;enwik9&lt;/b&gt;&lt;/a&gt;, used into &lt;a href="http://mattmahoney.net/dc/text.html"&gt;Matt's Mahoney LTCB&lt;/a&gt;. This file is very redundant, and particularly intensive for the match finder. LZRW and PHS are compared on their capacity to find longer matches (which translates into better compression), and on their respective speeds. Here are the results :&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://sd-1.archive-host.com/membres/images/182754578/ComparisonLZRW.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://sd-1.archive-host.com/membres/images/182754578/ComparisonLZRW.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
So what do we learn from these figures ?&lt;br /&gt;
First, PHS looks more efficient than LZRW. For an identical amount of memory, it converges faster towards optimal match length. This is particularly telling for the 4-attempts version, which is about has efficient as a 80-attempts LZRW.&lt;br /&gt;
&lt;br /&gt;
Note however that for a given number of attempts, PHS is more complex, and requires more time. For example, the 4-attempts PHS is approximately the same speed as a 14-attempts LZRW. So, in the end, we've got the compression power of a 80-attempts LZRW for the cost of a 14-attempts one. It is still a gain.&lt;br /&gt;
&lt;br /&gt;
This is my first review of this new method, and i guess we have only scratched the surface of it. Further refinements are to be expected ahead.&lt;br /&gt;
I have not found yet this idea described somewhere on the net. It is not even mentioned in the &lt;a href="http://cbloomrants.blogspot.com/2011/09/09-30-11-string-match-results-part-5.html"&gt;thorough Match Finder review by Charles Bloom&lt;/a&gt;. So, maybe, after all, this idea might be new.&lt;br /&gt;
&lt;br /&gt;
[&lt;i&gt;Edit&lt;/i&gt;] As specified by Charles Bloom, merging several hashes of different lengthes into the same table is not new (see comments). However, the update mechanism presented here might be.&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/RfNJoI11sNk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/2198797850829117155/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2011/10/progressive-hash-series-new-method-to.html#comment-form" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/2198797850829117155?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/2198797850829117155?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/RfNJoI11sNk/progressive-hash-series-new-method-to.html" title="Progressive Hash Series : a new method to find repetitions" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>6</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2011/10/progressive-hash-series-new-method-to.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUIMQnk_fyp7ImA9WhRbGEo.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-5635971988533746422</id><published>2011-10-02T23:53:00.001+02:00</published><updated>2012-02-10T13:39:43.747+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-10T13:39:43.747+01:00</app:edited><title>LZ4 in commercial application</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.console-arcade.com/wp-content/uploads/2011/10/1000-Tiny-Claws.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="78" src="http://www.console-arcade.com/wp-content/uploads/2011/10/1000-Tiny-Claws.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;Time for great news. Although&amp;nbsp;i've been informed of a few open-source projects or commercial application working to integrate &lt;a href="http://fastcompression.blogspot.com/p/lz4.html"&gt;LZ4 &lt;/a&gt;into their production, this is actually the first time that a company has completed a product with it. A product that you can buy by the way.&amp;nbsp;Moreover, it's kind enough to tell publicly about it.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
LZ4 is in-use within a video game called &lt;a href="http://blog.eu.playstation.com/2011/09/15/1000-tiny-claws-coming-to-ps-minis-this-october/"&gt;&lt;b&gt;1000 Tiny Claws&lt;/b&gt;&lt;/a&gt;, by &lt;a href="http://www.mediatonic.co.uk/about/"&gt;Mediatonic&lt;/a&gt;. The company is young but certainly not new, and has already received several good mentions for some of its previous productions, namely "&lt;a href="http://www.destructoid.com/review-who-s-that-flying--187822.phtml"&gt;Who's that Flying&lt;/a&gt;", and "&lt;a href="http://uk.psp.ign.com/articles/108/1085867p1.html"&gt;Monsters (probably) stole my princess&lt;/a&gt;" (i really love that title :). Therefore their new game stirs a lot of expectation. Let's wish them great success with their new sky-pirates adventures !&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://farm7.static.flickr.com/6200/6143751627_9b844c6577.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="150" src="http://farm7.static.flickr.com/6200/6143751627_9b844c6577.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Within the game, LZ4 is used&amp;nbsp;mostly&amp;nbsp;to decompress resources. These are many sprites, background tiles, animations and graphics in "cartoon style". Resources are not limited to graphics, and some text files, profiles and models may also join the fray. But typically graphics tend to inflate memory budget quite much faster than the others.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.psnstores.com/wp-content/uploads/2011/09/resize.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="106" src="http://www.psnstores.com/wp-content/uploads/2011/09/resize.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
For this use case, LZ4 can be useful thanks to its very fast decoding speed, making the decoding not just painless, but advantageous (i.e. faster !) compared to using resource&amp;nbsp;in plain uncompressed mode. The decoder is also memory-less, which also helps.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://farm7.static.flickr.com/6184/6144302838_d2dcf7f23c.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="150" src="http://farm7.static.flickr.com/6184/6144302838_d2dcf7f23c.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
The compression part of LZ4 is not used within the game, but obviously, it has been necessary to compress resources before injecting them into the game. For this use, the &lt;a href="http://code.google.com/p/lz4hc/"&gt;high compression version LZ4-HC&lt;/a&gt; has been put to the task. Not only does it compress 20-30% better, saving space and bandwidth, it also makes the compressed data even easier to decode. So this is all gains for this use case.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://sd-1.archive-host.com/membres/images/182754578/1000tc_credits.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="42" src="http://sd-1.archive-host.com/membres/images/182754578/1000tc_credits.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;i&gt;and of course, the all-important credit line...&lt;/i&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;i&gt;1000TC engine is work from &lt;b&gt;Gabor Dorka&lt;/b&gt;&lt;/i&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Note that, at the time the game was created, LZ4-HC was GPL, but there is no problem with that, since LZ4-HC is not shipped within the game. Only the decoder is, which is BSD, and incurs no restriction.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
The game is scheduled October 4th, on the PSN. Now is time to frag some monsters...&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/3pPHQn73zXQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/5635971988533746422/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2011/10/lz4-in-commercial-application.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/5635971988533746422?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/5635971988533746422?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/3pPHQn73zXQ/lz4-in-commercial-application.html" title="LZ4 in commercial application" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://farm7.static.flickr.com/6200/6143751627_9b844c6577_t.jpg" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2011/10/lz4-in-commercial-application.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DU4MQnc7eyp7ImA9WhdVFUg.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-3270917541354449053</id><published>2011-09-20T00:09:00.000+02:00</published><updated>2011-09-21T00:33:03.903+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-21T00:33:03.903+02:00</app:edited><title>Cost of accessing variables</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://www.cs.cf.ac.uk/Dave/C/solaris_thr.gif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="130" src="http://www.cs.cf.ac.uk/Dave/C/solaris_thr.gif" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;I finally found a way to make the -c1 compression mode of LZ4 compatible with multi-threading.&lt;br /&gt;
&lt;br /&gt;
This mode comes from the earlier LZ4-HC (v0.9) branch, which was created long ago, back in these days when none of my algorithms was multi-thread ready. The only objective was performance, and single thread execution. Heavy use of global variables were part of the deal to reach better speed.&lt;br /&gt;
&lt;br /&gt;
This lesson proved right once again.&lt;br /&gt;
&lt;br /&gt;
The first stage of enabling multi-threading is to get rid of all global and static variables. They simply cannot be shared by several threads. At least not easily.&lt;br /&gt;
To do this, you simply encapsulate all necessary variables in a container (a pointer towards a structure), and then pass the container as a parameter of the function. This way, each thread has its own set of data to handle.&lt;br /&gt;
&lt;br /&gt;
From a code perspective, the changes are very little : just make all prior variables points towards their equivalent in the structure, and there you go. This way, it is&amp;nbsp;surprisingly&amp;nbsp;fast and simple to make a code multi-thread compatible.&lt;br /&gt;
&lt;br /&gt;
But there was a catch : speed instantly dropped by 20%. This is a lot, and can be&amp;nbsp;easily&amp;nbsp;measured with benchmark tools.&lt;br /&gt;
&lt;br /&gt;
In a bid to solve this issue, i went back once again at the "global variable" effect. The code remained identical, so the change from Global Variable to Container was the only reason for the 20% difference drop. Why ? This is the complex part.&lt;br /&gt;
&lt;br /&gt;
In fact, when you call a variable, using for example :&lt;br /&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable[number] = 1;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span"&gt;the compiler will generate very different ASM codes depending on the declaration of&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable&lt;/span&gt;&lt;span class="Apple-style-span"&gt;.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span class="Apple-style-span"&gt;If&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span"&gt;is declared as a global or static variable, the compiler knows exactly where stands the beginning of&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span"&gt;in the heap. Then it knows the size of the cells. Therefore, it only needs to generate :&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable-baseAddress + number x&amp;nbsp;mytable-cellSize&lt;/span&gt;&lt;br /&gt;
to reach the proper memory address.&lt;br /&gt;
&lt;br /&gt;
&lt;span class="Apple-style-span"&gt;Now, if&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span"&gt;is passed as a parameter within a container, it is all different. The first thing to do is to create a pointer named &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable&amp;nbsp;&lt;/span&gt;and which points towards the beginning of the table within the container.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
Now there are 2 possible situations :&lt;br /&gt;
1)&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable&lt;/span&gt;&amp;nbsp;base&amp;nbsp;Address&amp;nbsp;is stored into a function variable (temporary one, into the stack). It needs to be read to collect&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable&lt;/span&gt;&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;-baseAddress,&lt;/span&gt;&amp;nbsp;and only&amp;nbsp;then&amp;nbsp;can the addition with&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;number x&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable&lt;/span&gt;&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;-cellSize&amp;nbsp;&lt;/span&gt;happen.&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;table-cellSize&amp;nbsp;&lt;/span&gt;can be deduced from the pointer type.&lt;br /&gt;
This generates one more read for each call.&lt;br /&gt;
2) the compiler remembers the association. Therefore,&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;mytable&lt;/span&gt;&amp;nbsp;will not be stored as a variable into the stack, but dynamically translated into &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;container-baseAddress + table-baseDelta&lt;/span&gt;.&lt;br /&gt;
&lt;span class="Apple-style-span"&gt;I doubt this is any better than the previous proposition.&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
In order to avoid such a situation, one solution is to declare some of these variables as local ones. They will be created into the stack. Then, the access cost is almost similar to global/static variables : a local variable is accessed as base local stack address&amp;nbsp;plus known delta. This is one more addition operation than for global/static variables, but that's ok, it is almost the same performance.&lt;br /&gt;
(Note : a post was created discussing this effect earlier this year :&amp;nbsp;&lt;a href="http://fastcompression.blogspot.com/2011/03/multi-threaded-entropy-experiment-huff0.html"&gt;http://fastcompression.blogspot.com/2011/03/multi-threaded-entropy-experiment-huff0.html&lt;/a&gt;)&lt;br /&gt;
&lt;br /&gt;
The real issue is that you can't create too much data into the stack. Its size is limited.&lt;br /&gt;
So, for larger&amp;nbsp;data sets, i see no other way than to allocate the necessary memory (using &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;malloc()&lt;/span&gt;) and access it with a pointer. This is bound to cost some performance compared with global/static variable, since there is one more pointer read per call, but i have yet to find a way around it.&lt;br /&gt;
&lt;br /&gt;
Anyway, with a good mixture of local variables and allocated ones, the speed loss of -c1 mode was limited to 6% in single thread scenario. Not yet invisible, but still a noticeable improvement compared to previous situation. And at least, now, the code scales almost linearly with the number of CPU cores available.&lt;br /&gt;
&lt;br /&gt;
You can &lt;a href="http://fastcompression.blogspot.com/p/lz4.html"&gt;grab the latest LZ4 version (v1.1) at its homepage&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
[&lt;i&gt;Edit&lt;/i&gt;] thanks to Richard Sim for correcting the heap/stack confusion.&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/-IYFPQdvUto" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/3270917541354449053/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2011/09/cost-of-accessing-variables.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/3270917541354449053?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/3270917541354449053?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/-IYFPQdvUto/cost-of-accessing-variables.html" title="Cost of accessing variables" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2011/09/cost-of-accessing-variables.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0QHQ3g7eCp7ImA9WhdVFEs.&quot;"><id>tag:blogger.com,1999:blog-834134852788085492.post-9166517245849628142</id><published>2011-09-19T00:22:00.003+02:00</published><updated>2011-09-19T22:48:52.600+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-19T22:48:52.600+02:00</app:edited><title>LZ4 reaches v1.0</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://sd-1.archive-host.com/membres/images/miniatures/182754578/SpeedCompare.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://sd-1.archive-host.com/membres/images/miniatures/182754578/SpeedCompare.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;Finally, a long planned update of LZ4 is out today, reaching v1.0. It's time to integrate all the various&amp;nbsp;compression&amp;nbsp;versions into a single binary, with the user able to select the compression level it wants by a simple switch (-c0, -c1 or -c2).&lt;br /&gt;
&lt;br /&gt;
It is also time to integrate newer versions of the algorithm, since there were quite many improvements since latest release. As a consequence, v1.0 is faster and more efficient, as can be seen in this table :&lt;br /&gt;
&lt;span class="Apple-style-span" style="background-color: #fafafa; color: #333333; font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;
&lt;table class="stg_table tborder" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; border-collapse: collapse; font-size: inherit; margin-bottom: 1em;"&gt;&lt;tbody&gt;
&lt;tr class="alt2"&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;&lt;/td&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;v0.9&lt;/td&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;v1.0&lt;/td&gt;&lt;/tr&gt;
&lt;tr class="alt1"&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;compression ratio&lt;/td&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;2.061&lt;/td&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;2.075&lt;/td&gt;&lt;/tr&gt;
&lt;tr class="alt2"&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;compression speed&lt;/td&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;260 MB/s&lt;/td&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;295 MB/s&lt;/td&gt;&lt;/tr&gt;
&lt;tr class="alt1"&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;decompression speed&lt;/td&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;810 MB/s&lt;/td&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;990 MB/s&lt;/td&gt;&lt;/tr&gt;
&lt;tr class="alt2"&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;memory usage&lt;/td&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;33MB&lt;/td&gt;&lt;td style="border-bottom-color: rgb(0, 0, 0); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(0, 0, 0); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(0, 0, 0); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(0, 0, 0); border-top-style: solid; border-top-width: 1px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.5em; padding-left: 0.5em; padding-right: 0.5em; padding-top: 0.5em;"&gt;17MB&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
If you are using the&lt;a href="http://code.google.com/p/lz4/"&gt; open-source version of LZ4&lt;/a&gt;&amp;nbsp;or &lt;a href="http://code.google.com/p/lz4hc/"&gt;LZ4-HC&lt;/a&gt;, it is recommended to update to latest source release.&lt;br /&gt;
&lt;br /&gt;
You can grab this latest win32 binary at the &lt;a href="http://fastcompression.blogspot.com/p/lz4.html"&gt;LZ4 Homepage&lt;/a&gt;.&lt;img src="http://feeds.feedburner.com/~r/FastDataCompression/~4/Qto8QdapisQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://fastcompression.blogspot.com/feeds/9166517245849628142/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://fastcompression.blogspot.com/2011/09/lz4-reaches-v10.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/9166517245849628142?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/834134852788085492/posts/default/9166517245849628142?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/FastDataCompression/~3/Qto8QdapisQ/lz4-reaches-v10.html" title="LZ4 reaches v1.0" /><author><name>Yann Collet</name><uri>https://plus.google.com/117014154967737150730</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://fastcompression.blogspot.com/2011/09/lz4-reaches-v10.html</feedburner:origLink></entry></feed>
