<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom">
  <title>Krzysztof Kowalczyk blog</title>
  <link href="http://blog.kowalczyk.info/atom.xml" rel="alternate" />
  <id>http://blog.kowalczyk.info/atom.xml</id>
  <updated>2002-06-16T00:31:07-07:00</updated>
  <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/KrzysztofKowalczykBlog" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="krzysztofkowalczykblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
   <title>How I ported pigz from Unix to Windows</title>
   <link href="http://blog.kowalczyk.info/article/4/How-I-ported-pigz-from-Unix-to-Windows.html" rel="alternate" />
   <updated>2013-03-20T11:25:36-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:4</id>
   <content type="html">&lt;p&gt;I just finished &lt;a href="https://github.com/kjk/pigz"&gt;porting&lt;/a&gt; &lt;a href="http://zlib.net/pigz/"&gt;pigz&lt;/a&gt; from Unix to Windows (you can download &lt;a href="http://blog.kowalczyk.info/software/pigz-for-windows.html"&gt;pre-compiled binaries&lt;/a&gt;). This article describes how I did it.&lt;/p&gt;

&lt;p&gt;Pigz was clearly written with Unix in mind, with no thought given to cross-platform portability.&lt;/p&gt;

&lt;p&gt;Thankfully, it's a relatively simple, command-line program that sticks to using standard C library.&lt;/p&gt;

&lt;h2&gt;Porting pthreads&lt;/h2&gt;

&lt;p&gt;Pigz uses pthreads for threading. Porting pthreads code to Windows would be a nightmare. Lucky me: someone already did all the hard work and &lt;a href="https://github.com/GerHobbelt/pthread-win32"&gt;implemented pthreads APIs&lt;/a&gt; on top of Windows API, in only 20.000 lines of code. It seems to Just Work.&lt;/p&gt;

&lt;h2&gt;Porting dirent&lt;/h2&gt;

&lt;p&gt;Another Unix-only API that pigz uses is &lt;a href="http://pubs.opengroup.org/onlinepubs/007908799/xsh/dirent.h.html"&gt;dirent.h&lt;/a&gt;, for reading the content of directories. I was lucky again: someone created a &lt;a href="http://www.two-sdg.demon.co.uk/curbralan/code/dirent/dirent.html"&gt;Windows port of dirent APIs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Misc fixes&lt;/h2&gt;

&lt;p&gt;There are few functions that pigz uses that are present in Visual Studio's C library, but under a different name (e.g. &lt;code&gt;stat&lt;/code&gt; is &lt;code&gt;_stat&lt;/code&gt;, &lt;code&gt;fstat&lt;/code&gt; is &lt;code&gt;_fstat&lt;/code&gt; etc.). This is easily fixed with a &lt;code&gt;#define&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Visual C doesn't define &lt;code&gt;ssize_t&lt;/code&gt; and &lt;code&gt;PATH_MAX&lt;/code&gt;. A &lt;code&gt;typedef&lt;/code&gt; here, a &lt;code&gt;#define&lt;/code&gt; there solves those problems.&lt;/p&gt;

&lt;p&gt;I consolidated all such fixes in a single &lt;a href="https://github.com/kjk/pigz/blob/master/win32/wincompat.h"&gt;wincompat.h&lt;/a&gt; file.&lt;/p&gt;

&lt;p&gt;Pigz &lt;code&gt;#include&lt;/code&gt;s some .h files that are only available under Unix. I used &lt;code&gt;#ifndef _WIN32&lt;/code&gt; around those.&lt;/p&gt;

&lt;h2&gt;Build system&lt;/h2&gt;

&lt;p&gt;Pigz uses standard Unix build tools: gcc and make.&lt;/p&gt;

&lt;p&gt;While there are ports of GNU make to Windows and gcc-based compilers that can generate windows binaries (mingw), most Windows developers prefer using Visual Studio IDE.&lt;/p&gt;

&lt;p&gt;Creating Visual Studio project files from scratch is time consuming and annoying, especially when you want to support multiple versions of Visual Studio.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://industriousone.com/premake"&gt;Premake&lt;/a&gt; makes it easier. It's a meta-build system: you write a &lt;a href="https://github.com/kjk/pigz/blob/master/premake4.lua"&gt;text file&lt;/a&gt; that defines the project, which files to compile, compilation flags etc. and premake generates Visual Studio files from that description.&lt;/p&gt;

&lt;p&gt;Premake supports Visual Studio 2008 and 2010 (and 2012 supports 2010 project files via conversion).&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;As far as porting goes, this one was easy thanks to existing efforts that created compatibility shims for the hard parts.&lt;/p&gt;

&lt;p&gt;Premake is an interesting tool that allows to save time creating and maintaining Visual Studio projects.&lt;/p&gt;
</content>
  </entry>
  <entry>
   <title>Pigz windows port</title>
   <link href="http://blog.kowalczyk.info/article/2/Pigz-windows-port.html" rel="alternate" />
   <updated>2013-03-19T18:13:02-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:2</id>
   <content type="html">&lt;p&gt;&lt;a href="http://zlib.net/pigz/"&gt;Pigz&lt;/a&gt; is a parallel gzip - it speeds up compression/decompression by using multiple threads.&lt;/p&gt;

&lt;p&gt;Until today it was only available for Unix. But not anymore!&lt;/p&gt;

&lt;p&gt;I've &lt;a href="http://blog.kowalczyk.info/software/pigz-for-windows.html"&gt;ported it to Windows&lt;/a&gt;. Get it while it's hot.&lt;/p&gt;
</content>
  </entry>
  <entry>
   <title>Thoughts on Go after writing 3 websites</title>
   <link href="http://blog.kowalczyk.info/article/uvw2/Thoughts-on-Go-after-writing-3-websites.html" rel="alternate" />
   <updated>2012-12-31T15:30:45-08:00</updated>
   <id>tag:blog.kowalczyk.info,1999:1441010</id>
   <content type="html">	&lt;h1&gt;Go for writing web servers&lt;/h1&gt;

	&lt;p&gt;&lt;b&gt;Summary&lt;/b&gt;: in my experience Go is a good language for building websites/web servers.&lt;/p&gt;

	&lt;p&gt;It&amp;#8217;s easy to get excited about new technology like Go. The question is: how does it stand up to scrutiny after daily use?&lt;/p&gt;

	&lt;p&gt;I&amp;#8217;ve written 3 web applications in Go, they&amp;#8217;ve been running in production for over a month so I feel justified in publishing my opinion.&lt;/p&gt;

	&lt;p&gt;In the past I wrote web applications in Perl, PHP, Python (web.py, Tornado, App Engine) so those are the technologies I campare Go to.&lt;/p&gt;

	&lt;p&gt;Of the 3 websites, &lt;a href="http://www.apptranslator.org"&gt;AppTranslator&lt;/a&gt;, is web service for crowd-sourcing translation for software and was written completely from scratch.&lt;/p&gt;

	&lt;p&gt;&lt;a href="http://forums.fofou.org"&gt;Fofou&lt;/a&gt; is a simple forum and is a port of an earlier version I did for App Engine. Finally, &lt;a href="http://blog.kowalczyk.info"&gt;this web site&lt;/a&gt; is a blog engine (also a port of an earlier App Engine version).&lt;/p&gt;

	&lt;p&gt;One reason to migrate from App Engine to my own server was to save money. At my levels of traffic (~3 requests per second) I was paying ~$80/month, mostly for the frontend instance hours. &lt;/p&gt;

	&lt;p&gt;Another reason was to do more complex processing (App Engine is great as long as you don&amp;#8217;t have to do something that App Engine doesn&amp;#8217;t support).&lt;/p&gt;

	&lt;p&gt;Finally, I wanted to see how Go will handle a real life project. The best way to test a new technology is on a project with a predictable (and relatively small) scope. &lt;/p&gt;

	&lt;p&gt;All websites run on the same &lt;a href="http://www.kimsufi.co.uk/"&gt;Kimsufi 24&lt;/a&gt; dedicated server (which is pretty beefy for its $60/month price). I&amp;#8217;m using latest Ubuntu for the OS.&lt;/p&gt;

	&lt;h1&gt;Things you need for building a web application&lt;/h1&gt;

	&lt;p&gt;What functionality is typically needed for writing a web application and how does Go support it?&lt;/p&gt;

	&lt;h2&gt;Http server&lt;/h2&gt;

	&lt;p&gt;A very capable http server is part of standard library (&lt;a href="http://golang.org/pkg/net/http/"&gt;net/http&lt;/a&gt;).&lt;/p&gt;

	&lt;p&gt;It parses incoming http requests, provides an easy way to send http responses back.&lt;/p&gt;

	&lt;h2&gt;Url routing&lt;/h2&gt;

	&lt;p&gt;High-level web frameworks provide url routing i.e. a way to say "call this function to handle this url". Go has a simple built-in router. I also use &lt;a href="http://gorilla-web.appspot.com/pkg/mux"&gt;mux&lt;/a&gt; which is built on top of the built-in router.&lt;/p&gt;

	&lt;h2&gt;Templates&lt;/h2&gt;

	&lt;p&gt;A lot of what web server does is returning html. Constructing this html is greatly simplified by using templates. Go has a powerful &lt;a href="http://golang.org/pkg/html/template/"&gt;html/template&lt;/a&gt; library for that. To me it seems roughly equivalent of Django or Tornado templates.&lt;/p&gt;

	&lt;p&gt;There are other templating libraries for Go, but I found the above built-in package satisfactory, so I didn&amp;#8217;t even try to use them.&lt;/p&gt;

	&lt;h2&gt;Cookies&lt;/h2&gt;

	&lt;p&gt;Basic support for cookies is part of built-in http library. To generate cookies that cannot be spoofed or hi-jacked, I used &lt;a href="http://gorilla-web.appspot.com/pkg/securecookie"&gt;securecookie&lt;/a&gt; library&lt;/p&gt;

	&lt;h2&gt;Databases&lt;/h2&gt;

	&lt;p&gt;There are Go libraries for all of the popular databases (MySQL, PostgreSQL, MongoDB, Redis etc.) but I haven&amp;#8217;t used them.&lt;/p&gt;

	&lt;p&gt;I used what I call NoDB approach i.e. I wrote a very simple storage system that uses text files. Data is kept in memory and persisted in an append-only file.&lt;/p&gt;

	&lt;p&gt;This wouldn&amp;#8217;t be the right approach for services that require more sophisticated functionality but was good enough for my needs and didn&amp;#8217;t take long to implement.&lt;/p&gt;

	&lt;h2&gt;Oauth&lt;/h2&gt;

	&lt;p&gt;There are three oauth libraries that I know of. I used &lt;a href="https://github.com/garyburd/go-oauth"&gt;this one&lt;/a&gt;. I didn&amp;#8217;t have any particular reason to choose this one over the others. I only needed it for implementing Twitter-based authentication, this library worked so I used it.&lt;/p&gt;

	&lt;h2&gt;Generating atom (rss) feeds&lt;/h2&gt;

	&lt;p&gt;One cannot respect a blog engine that doesn&amp;#8217;t provide full-text rss feed. I couldn&amp;#8217;t find an existing package so I build a simple (and small) library for &lt;a href="https://github.com/kjk/apptranslator/tree/master/ext/src/atom"&gt;generating atom feeds&lt;/a&gt;&lt;/p&gt;

	&lt;h2&gt;JSON and XML support&lt;/h2&gt;

	&lt;p&gt;Go has a built-in support for &lt;a href="http://golang.org/pkg/encoding/json/"&gt;JSON&lt;/a&gt; and &lt;a href="http://golang.org/pkg/encoding/xml/"&gt;XML&lt;/a&gt;.&lt;br&gt;
There are APIs for raw parsing/serialization. If you know the shape of the data, you can serialize (marshal, in Go&amp;#8217;s parlance) data structures to JSON or XML and de-serialize (unmarshal) from JSON or XML into a struct.&lt;/p&gt;

	&lt;h2&gt;S3 access, support for zip files&lt;/h2&gt;

	&lt;p&gt;This one is not universally needed. My web apps have built-in backup functionality which stores data, sometimes in the form of a .zip file, in s3.&lt;/p&gt;

	&lt;p&gt;Go has support for creating and decompressing zip and tar archives in the standard library. For s3 support I use &lt;a href="https://launchpad.net/goamz"&gt;goamz&lt;/a&gt;.&lt;/p&gt;

	&lt;h2&gt;Unit tests&lt;/h2&gt;

	&lt;p&gt;This is not specific to writing web server - all your important code should have unit tests.&lt;/p&gt;

	&lt;p&gt;Go has a &lt;a href="http://golang.org/doc/code.html#Testing"&gt;built-in API and tool support&lt;/a&gt; for writing and running tests.&lt;/p&gt;

	&lt;h1&gt;Deployment&lt;/h1&gt;

	&lt;p&gt;Deploying a new version of your code to the server is a pain in the ass regardless of the language used.&lt;/p&gt;

	&lt;p&gt;I wrote relatively short deployment script using &lt;a href="http://fabfile.org"&gt;Fabric&lt;/a&gt; (which is a python library and a tool for running deployment scripts). It copies the source files to the server, compiles them on the server, runs unit tests, shuts-down existing instance of the service and launches the new version. It stops if there&amp;#8217;s anything wrong along the way.&lt;/p&gt;

	&lt;p&gt;It&amp;#8217;s really important to be able to quickly deploy new versions of the software so those days I would write the deployment script as the first thing in the project. Doing all those deployment steps manually would be very annoying.&lt;/p&gt;

	&lt;h1&gt;Server config and misc thoughts&lt;/h1&gt;

	&lt;p&gt;The overall setup of the server is pretty standard: each service is running as a separate process on a local port. Nginx is running on port 80 and proxies the traffic to a given service based on Host header.&lt;/p&gt;

	&lt;h1&gt;Show me the code&lt;/h1&gt;

	&lt;p&gt;The source for all three projects is publicly available on Github, using liberal BSD license: &lt;a href="https://github.com/kjk/apptranslator"&gt;App Translator&lt;/a&gt;, &lt;a href="https://github.com/kjk/fofou"&gt;Fofou&lt;/a&gt;, &lt;a href="https://github.com/kjk/web-blog"&gt;blog&lt;/a&gt;&lt;/p&gt;

	&lt;p&gt;Feel free to learn from the code or use it in your own projects.&lt;/p&gt;

	&lt;h1&gt;Parting thoughts&lt;/h1&gt;

	&lt;p&gt;I think writing non-trivial web services is a sweet spot for Go. &lt;/p&gt;

	&lt;p&gt;Most of the needed functionality is part of standard library. For almost everything else there are 3rd party libraries.&lt;/p&gt;

	&lt;p&gt;Writing in Go is almost as fast and fluent as writing in Python but the code is order of magnitude faster and uses less memory.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>SumatraPDF 2.2 released</title>
   <link href="http://blog.kowalczyk.info/article/1/SumatraPDF-22-released.html" rel="alternate" />
   <updated>2012-12-25T20:50:12-08:00</updated>
   <id>tag:blog.kowalczyk.info,1999:1</id>
   <content type="html">	&lt;p&gt;&lt;a href="https://code.google.com/p/sumatrapdf/source/browse/trunk/AUTHORS"&gt;We&lt;/a&gt; are happy to announce release 2.2 of &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf"&gt;SumatraPDF&lt;/a&gt;, a multi-format PDF, XPS, ebook (ePub, MOBI), DjVu reader for Windows.&lt;/p&gt;

	&lt;p&gt;In version 2.2 we&amp;#8217;ve added support for &lt;a href="http://en.wikipedia.org/wiki/FictionBook"&gt;FictionBook&lt;/a&gt; ebook format. &lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added support for PDF documents encrypted with Acrobat X.&lt;/p&gt;

	&lt;p&gt;In a rare case where printing looks different than display, you can now use “Print as image” compatibility option in print dialog.&lt;/p&gt;

	&lt;p&gt;For better viewing of manga comic books we now have a new command-line option: -manga-mode [1|true|0|false]. &lt;/p&gt;

	&lt;p&gt;Finally, we&amp;#8217;ve made many robustness fixes and improvements. Too small to describe them in detail but in aggregate they make a difference.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>Design and implementation of translation system for desktop software</title>
   <link href="http://blog.kowalczyk.info/article/wjb1/Design-and-implementation-of-translation-system-.html" rel="alternate" />
   <updated>2012-12-16T18:50:28-08:00</updated>
   <id>tag:blog.kowalczyk.info,1999:1518013</id>
   <content type="html">	&lt;h2&gt;A web-based, crowd-sourced translation system.&lt;/h2&gt;

	&lt;p&gt;I just finished building &lt;a href="http://www.apptranslator.org/"&gt;AppTranslator&lt;/a&gt;, which is a third iteration of a system that allows people to contribute translations for &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf/"&gt;SumatraPDF&lt;/a&gt;.&lt;/p&gt;

	&lt;p&gt;This post describes the big picture of how it&amp;#8217;s designed and implemented. Everything I discuss is &lt;a href="https://github.com/kjk/apptranslator"&gt;open-source&lt;/a&gt;, so you can dig into code for more details.&lt;/p&gt;

	&lt;p&gt;The goals of translation system were to make it easy for me to add new strings for translations and easy for other people to contribute translations.&lt;/p&gt;

	&lt;p&gt;I call it a "system" because it consists of several parts that are designed to act as a sensible whole.&lt;/p&gt;

	&lt;h2&gt;How translations are stored in C++ code&lt;/h2&gt;

	&lt;p&gt;I try to design for simplicity of implementation to minimize the amount of code to write. Often it means that I reject popular solutions if I can design a simpler one.&lt;/p&gt;

	&lt;p&gt;SumatraPDF is a C++ desktop software for Windows. There already is a popular system for maintaining translation for such programs: use resource dlls, one dll per supported language. The translations are kept in .rc files as strings and referred to by numeric ids from code via &lt;a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms647486.aspx"&gt;LoadString()&lt;/a&gt; API.&lt;/p&gt;

	&lt;p&gt;That would be very clumsy both for me (when adding new translations) and for potential translators (they have to edit files in a cryptic format). It also requires distributing multiple dlls (I prefer to ship my apps as single executables).&lt;/p&gt;

	&lt;h2&gt;Managing translation strings in C++ code&lt;/h2&gt;

	&lt;p&gt;The first part of the system was therefore a new design for managing translation strings in the app.&lt;/p&gt;

	&lt;p&gt;All translations are marked with &lt;a href="https://code.google.com/p/sumatrapdf/source/browse/trunk/src/Translations.h"&gt;_TR() macro&lt;/a&gt; in the code.&lt;/p&gt;

	&lt;p&gt;A &lt;a href="https://code.google.com/p/sumatrapdf/source/browse/trunk/scripts/update_translations.py"&gt;python script&lt;/a&gt; extracts those strings from the source. It then reads translations provided by contributors and generates a C++ file with strings. _TR() macro evaluates to calling a  simple supporting code in &lt;a href="https://code.google.com/p/sumatrapdf/source/browse/trunk/src/Translations.cpp"&gt;Translations.cpp&lt;/a&gt; that looks up a translation of a string given currently selected language.&lt;/p&gt;

	&lt;h2&gt;How translations are contributed by users&lt;/h2&gt;

	&lt;p&gt;SumatraPDF is an open-source project so I rely on contributions from users to keep it translated.&lt;/p&gt;

	&lt;p&gt;The system for collecting contributions had 3 iterations.&lt;/p&gt;

	&lt;p&gt;In first iteration, I had a single text file with all translations.&lt;/p&gt;

	&lt;p&gt;It looked like this:&lt;/p&gt;

	&lt;p&gt;&lt;pre class="prettyprint"&gt;
String to translate
de:German translation
fr:French translation

Another string to translate
de:another German translation
&lt;/pre&gt;&lt;/p&gt;

	&lt;p&gt;People would download the latest version of the file from svn, add missing translations and e-mail it to me. I would check-in that to svn and re-run the script that rebuilds C++ file with strings.&lt;/p&gt;

	&lt;p&gt;At some point the file became very big so I split it into multiple files, one per language.&lt;/p&gt;

	&lt;p&gt;It was working ok but the process of submitting translation was time consuming for translators and the process of updating the code was time consuming for me.&lt;/p&gt;

	&lt;p&gt;For that reason I built a &lt;a href="http://www.apptranslator.org"&gt;web-based service&lt;/a&gt; which makes it much easier to contribute a translation.&lt;/p&gt;

	&lt;p&gt;I also added the necessary API endpoints in the server to allow writing scripts for automating uploading strings to translate and downloading latest translations.&lt;/p&gt;

	&lt;h2&gt;The design of web-based UI&lt;/h2&gt;

	&lt;p&gt;A web-based UI for editing translation is not a novel idea. However, my brief research shows that few do it well.&lt;/p&gt;

	&lt;p&gt;I try to be pragmatic about things. If a decent 3rd party system with enough flexibility to meet my needs already did exist, I would rather use it than develop my own (writing code takes time).&lt;/p&gt;

	&lt;p&gt;I&amp;#8217;m tempted to say that I designed for simplicity but, while being true, it&amp;#8217;s also rather vague. More accurately: I designed for simplicity of the translator workflow.&lt;/p&gt;

	&lt;p&gt;You can judge the UI &lt;a href="http://www.apptranslator.org"&gt;yourself&lt;/a&gt; but let me point out few specific points:	&lt;ul&gt;
		&lt;li&gt;it takes 2 clicks from main page to a point where you can submit translation. The first click is to select the project (as AppTranslator is designed to support multiple projects) and the second click to select the language&lt;/li&gt;
		&lt;li&gt;on the page for a given language, the untranslated strings are at the top. The rest is sorted&lt;/li&gt;
		&lt;li&gt;I don&amp;#8217;t require creating an account for AppTranslator but authenticate with Twitter&lt;/li&gt;
	&lt;/ul&gt;&lt;/p&gt;

	&lt;p&gt;Those might seem like obvious points but I found that other systems I&amp;#8217;ve surveyed were baroque enough to inspire Kafka. &lt;/p&gt;

	&lt;p&gt;Ubuntu&amp;#8217;s &lt;a href="https://wiki.ubuntu.com/Translations"&gt;translation system&lt;/a&gt; gets a special recognition for amount of bureaucracy, complicated workflows (joining a translation team to submit a translation?) and bad copy (they feel it&amp;#8217;s important to inform you that in order to contribute a translation you need an internet connection, among many other useless bits of information). &lt;/p&gt;

	&lt;p&gt;Mozilla &lt;a href="https://wiki.mozilla.org/L10n:Home_Page"&gt;is marginally better&lt;/a&gt;&lt;/p&gt;

	&lt;h2&gt;Technical specs&lt;/h2&gt;

	&lt;p&gt;AppTranslator is written in Go and &lt;a href="https://github.com/kjk/apptranslator"&gt;open-source&lt;/a&gt; (BSD license).&lt;/p&gt;

	&lt;p&gt;I run it on Ubuntu server. It&amp;#8217;s possible, but &lt;a href="https://github.com/kjk/apptranslator/blob/master/docs/deploy_your_own.txt"&gt;not easy&lt;/a&gt; to run your own instance.&lt;/p&gt;

	&lt;p&gt;SumatraPDF is also &lt;a href="https://code.google.com/p/sumatrapdf/"&gt;open-source&lt;/a&gt;. The code is in C++ with a bunch of python helper scripts to automate interaction with AppTranslator server.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>Speeding up Go (and C++) with custom allocators</title>
   <link href="http://blog.kowalczyk.info/article/u5o7/Speeding-up-Go-and-C-with-custom-allocators.html" rel="alternate" />
   <updated>2012-11-26T12:17:53-08:00</updated>
   <id>tag:blog.kowalczyk.info,1999:1407031</id>
   <content type="html">	&lt;h2&gt;Speeding up Go with custom allocators&lt;/h2&gt;

	&lt;p&gt;&lt;b&gt;Summary&lt;/b&gt;: using a custom allocator I was able to speed up an allocation heavy program (&lt;a href="http://shootout.alioth.debian.org/u64/performance.php?test=binarytrees"&gt;binary-trees benchmark&lt;/a&gt;) ~4x.&lt;/p&gt;

	&lt;p&gt;Allocation is expensive. It holds true for all languages. At the time of this writing, Go (version 1.0.3) doesn&amp;#8217;t have a garbage collector that is as sophisticated as, say, garbage collector in JVM. There&amp;#8217;s work being done to improve Go&amp;#8217;s GC but today allocations in Go are not as cheap as they could be.&lt;/p&gt;

	&lt;p&gt;This can be seen in binary-trees benchmark which has almost no computation but millions of allocations of small objects. As a result, Java implementation is about 7x faster.&lt;/p&gt;

	&lt;p&gt;I was able to speed up Go code by about 4x by using a custom allocator.&lt;/p&gt;

	&lt;p&gt;The benchmark builds a large binary tree composed of nodes:&lt;/p&gt;

	&lt;p&gt;&lt;pre class="prettyprint "&gt;
type struct Node {
  int item
  left, right *Node
}
&lt;/pre&gt;&lt;/p&gt;

	&lt;p&gt;To allocate a new node we use &lt;code&gt;&amp;Node{item, left, right}&lt;/code&gt;.&lt;/p&gt;

	&lt;h2&gt;Improving allocation-heavy code&lt;/h2&gt;

	&lt;p&gt;First, a correction. When I said that allocation is expensive, I over-simplified. &lt;/p&gt;

	&lt;p&gt;In garbage-collected languages allocation is actually very cheap. In a good allocator it&amp;#8217;s just a single arithmetic operation to bump a pointer, which is orders of magnitude cheaper than even the best &lt;code&gt;malloc()&lt;/code&gt; implementation. &lt;code&gt;Malloc()&lt;/code&gt; has to maintain data structures to keep track of allocated memory so that &lt;code&gt;free()&lt;/code&gt; can return it back to the OS.&lt;/p&gt;

	&lt;p&gt;More complicated reality is that it&amp;#8217;s garbage collection phase that is expensive. &lt;/p&gt;

	&lt;p&gt;Garbage collection (gc) is triggered every once in a while. It recursively scans all the allocated objects, starting from known roots and chasing pointers. It figures out which objects are not referenced by any other object and frees them (this is the "garbage" in garbage collection that has just been collected).&lt;/p&gt;

	&lt;p&gt;There are 2 insights we get from knowing how garbage collection works internally:	&lt;ul&gt;
		&lt;li&gt;the more objects there are, the more expensive garbage collection is&lt;/li&gt;
		&lt;li&gt;the more pointers we need to chase, the more expensive gc is&lt;/li&gt;
	&lt;/ul&gt;&lt;br&gt;
 &lt;br&gt;
Knowing what the problem is, we know what the solution should be:	&lt;ul&gt;
		&lt;li&gt;allocate less objects (e.g. by allocating them in bulk or re-using previously allocated objects)&lt;/li&gt;
		&lt;li&gt;don&amp;#8217;t use pointers&lt;/li&gt;
	&lt;/ul&gt;&lt;/p&gt;

	&lt;p&gt;As it happens, the majority of the 4x speedup I got in this particular benchmark came from not using pointers&lt;/p&gt;

	&lt;h2&gt;Speeding binary-trees shootout benchmark&lt;/h2&gt;

	&lt;p&gt;We said that we should avoid pointers, so that garbage collector doesn&amp;#8217;t have to chase them. The new definition of &lt;code&gt;Node&lt;/code&gt; struct is:&lt;/p&gt;

	&lt;p&gt;&lt;pre class="prettyprint "&gt;
type NodeId int

type struct Node {
  int item
  left, right NodeId
}
&lt;/pre&gt;&lt;/p&gt;

	&lt;p&gt;We changed &lt;code&gt;left&lt;/code&gt; and &lt;code&gt;right&lt;/code&gt; fields from &lt;code&gt;*Node&lt;/code&gt; to an alias type &lt;code&gt;NodeId&lt;/code&gt;, which is just a unique integer representing a node.&lt;/p&gt;

	&lt;p&gt;What &lt;code&gt;NodeId&lt;/code&gt; means is up to us to define and we define it thusly: it&amp;#8217;s an index into a &lt;code&gt;[]Node&lt;/code&gt; array. That array is the backing store (i.e. allocator) for our nodes.&lt;/p&gt;

	&lt;p&gt;When we need to allocate another node, we expand the &lt;code&gt;[]Node&lt;/code&gt; array and return the index to that node. We can trivially map &lt;code&gt;NodeId&lt;/code&gt; to &lt;code&gt;*Node&lt;/code&gt; by doing &lt;code&gt;&amp;array[nodeId]&lt;/code&gt;.&lt;/p&gt;

	&lt;p&gt;Our implementation is a bit more sophisticated. In Go it&amp;#8217;s easy to extend the array with &lt;code&gt;append()&lt;/code&gt; but it involves memory copy. We avoid that by pre-allocating nodes in buckets and using an array of arrays for storage. The code is still relatively simple:&lt;/p&gt;

	&lt;p&gt;&lt;pre class="prettyprint "&gt;
const nodes_per_bucket = 1024 * 1024

var (
	all_nodes    [][]Node = make([][]Node, 0)
	nodes_left   int      = 0
	curr_node_id int      = 0
)

func NodeFromId(id NodeId) *Node {
	n := int(id) - 1
	bucket := n / nodes_per_bucket
	el := n % nodes_per_bucket
	return &amp;all_nodes[bucket][el]
}

func allocNode(item int, left, right NodeId) NodeId {
	if 0 == nodes_left {
		new_nodes := make([]Node, nodes_per_bucket, nodes_per_bucket)
		all_nodes = append(all_nodes, new_nodes)
		nodes_left = nodes_per_bucket
	}
	nodes_left -= 1
	node := NodeFromId(NodeId(curr_node_id + 1))
	node.item = item
	node.left = left
	node.right = right

	nodes_left -= 1
	curr_node_id += 1
	return NodeId(curr_node_id)
}
&lt;/pre&gt;&lt;/p&gt;

	&lt;p&gt;Remaining changes to the code involve adding &lt;code&gt;NodeFromId()&lt;/code&gt; call in a few places.&lt;/p&gt;

	&lt;p&gt;You can compare &lt;a href="https://github.com/kjk/kjkpub/blob/master/gobench/bintree.go"&gt;original&lt;/a&gt; to my &lt;a href="https://github.com/kjk/kjkpub/blob/master/gobench/bintree3.go"&gt;faster version&lt;/a&gt;.&lt;/p&gt;

	&lt;p&gt;Another minor advantage if using integers instead of pointers in Node struct is that on 64-bit machines we use only 4 bytes for an int vs. 8 bytes for a pointer.&lt;/p&gt;

	&lt;h2&gt;Drawbacks of custom allocators&lt;/h2&gt;

	&lt;p&gt;The biggest drawback is that we lost ability to free objects. Memory we&amp;#8217;ve allocated will never be returned to the OS until the program exits.&lt;/p&gt;

	&lt;p&gt;It&amp;#8217;s not a problem in this case, since the tree only grows and the program ends when it&amp;#8217;s done.&lt;/p&gt;

	&lt;p&gt;In different code this could be a much bigger issue. There are ways to free memory even with custom allocators but they require more complexity and evolve towards implementing a custom garbage collector at which point it might be better to go back to simple code and leave the work to built-in garbage collector.&lt;/p&gt;

	&lt;h2&gt;How come Java is so much faster?&lt;/h2&gt;

	&lt;p&gt;It&amp;#8217;s a valid question: both Java and Go have garbage collectors, why is Java&amp;#8217;s so much better on this benchmark?&lt;/p&gt;

	&lt;p&gt;I can only speculate.&lt;/p&gt;

	&lt;p&gt;Java, unlike Go, uses generational garbage collector, which has 2 arenas: one for young (newly allocated) objects (called nursery) and one for old objects.&lt;/p&gt;

	&lt;p&gt;It has been observed that most objects die young. Generational garbage collector allocates objects in nursery. Most collections only collect objects in nursery. Objects that survived collections in nursery are moved to the second arena for old objects, which is collected at a much lower rate.&lt;/p&gt;

	&lt;p&gt;Go collector is a simpler mark-and-sweep collector which has to scan all allocated objects during every collection.&lt;/p&gt;

	&lt;p&gt;Generational garbage collectors have overhead because they have to copy objects in memory and update references between objects when that happens. On the other hand they can also compact memory, improving caching and they scan a smaller number of objects during each collection.&lt;/p&gt;

	&lt;p&gt;In this particular benchmark there are many Node objects and they never die, so they are promoted to rarely collected arena for old objects and each collection is cheaper because it only looks at a small number of recently allocated Node objects.&lt;/p&gt;

	&lt;p&gt;There&amp;#8217;s hope for Go, though. The implementors are aware that garbage collector is not as good as it could be and there is an ongoing work on implementing a better one.&lt;/p&gt;

	&lt;h2&gt;A win in C++ as well&lt;/h2&gt;

	&lt;p&gt;Optimizing by reducing the amount of allocations or making allocations faster is applicable to non-gc languages as well, like C and C++, because &lt;code&gt;malloc()\free()&lt;/code&gt; are relatively slow functions.&lt;/p&gt;

	&lt;p&gt;Back in the day when I was working on Poppler, I achieved a significant ~19&lt;span&gt; speedup by &lt;a href="http://blog.kowalczyk.info/article/Performance-optimization-story.html"&gt;improving a string class&lt;/a&gt; to avoid an additional allocation in 90&lt;/span&gt; of the cases. I now use this trick in my C++ code e.g. in &lt;a href="https://code.google.com/p/sumatrapdf/source/browse/trunk/src/utils/Vec.h"&gt;SumatraPDF code&lt;/a&gt;&lt;/p&gt;

	&lt;p&gt;I also managed to improve Poppler by another ~25% by using a simple, &lt;a href="https://bugs.freedesktop.org/show_bug.cgi?id=7910"&gt;custom allocator&lt;/a&gt;&lt;/p&gt;

	&lt;p&gt;Since then I use this trick whenever I can.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>Hiding duplicate content from your site via robots.txt</title>
   <link href="http://blog.kowalczyk.info/article/53n6/Hiding-duplicate-content-from-your-site-via-robo.html" rel="alternate" />
   <updated>2012-10-22T01:43:33-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:238002</id>
   <content type="html">	&lt;p&gt;Many blogs, including this one, generate duplicate content. For example, &lt;a href="http://blog.kowalczyk.info/blog"&gt;the archive pages&lt;/a&gt; duplicate the content of individual posts, they just show them in a different way (a couple of posts per page, as opposed to a single post per page).&lt;/p&gt;

	&lt;p&gt;That unfortunately clogs search engines. Being a perfectionist that I am, I want that a search for e.g. "15minutes" (my &lt;a href="/software/15minutes/"&gt;simple timer application&lt;/a&gt;) leads people to individual blog posts about it and not to aggregate pages with random other content.&lt;/p&gt;

	&lt;p&gt;Thankfully there&amp;#8217;s a way to tell search engines to not index parts of your site. It&amp;#8217;s &lt;a href="http://www.javascriptkit.com/howto/robots.shtml"&gt;quite simple&lt;/a&gt; and in five minutes I cooked up the following &lt;a href="/robots.txt"&gt;robots.txt&lt;/a&gt; for my site:&lt;br&gt;
&lt;pre class="prettyprint"&gt;
User-agent: *
Disallow: /page/
Disallow: /tag/
Disallow: /notes/
&lt;/pre&gt;&lt;/p&gt;

	&lt;p&gt;In my particular case, archive pages all start with &lt;code&gt;/page/&lt;/code&gt; or &lt;code&gt;/notes/&lt;/code&gt; and &lt;code&gt;/tag/&lt;/code&gt; is another namespace with duplicate content (shows a list of articles with a given tag).&lt;/p&gt;

	&lt;p&gt;For this technique to work the names duplicate pages have to follow a pattern, but that&amp;#8217;s easy enough to ensure, especially if you write your own blog software, like &lt;a href="http://github.com/kjk/web-blog"&gt;I do&lt;/a&gt;.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>How I sped up Go by 20% (or is Go really slower than Java?)</title>
   <link href="http://blog.kowalczyk.info/article/u3d4/How-I-sped-up-Go-by-20-or-is-Go-really-slower-th.html" rel="alternate" />
   <updated>2012-09-16T01:55:48-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:1404040</id>
   <content type="html">	&lt;h2&gt;The story of a faulty benchmark&lt;/h2&gt;

	&lt;p&gt;I confess: I didn&amp;#8217;t really speed up Go by 20%. However, if you were to believe simplistic arguments, I did.&lt;/p&gt;

	&lt;p&gt;It all started when a comment on Hacker News claimed that Java is faster than Go and pointed to &lt;a href="http://shootout.alioth.debian.org/u32/benchmark.php?test=all&amp;lang=go&amp;lang2=java"&gt;those benchmarks&lt;/a&gt; as proof.&lt;/p&gt;

	&lt;p&gt;Indeed, every Go benchmark there is 2x to 10x slower than Java 7 program.&lt;/p&gt;

	&lt;p&gt;That seemed wrong to me - Go compiles to native code. Java, ultimately, does too, but compilation happens at runtime, which is an overhead that Go doesn&amp;#8217;t have, so in principle Go should be faster.&lt;/p&gt;

	&lt;p&gt;I took a look at &lt;a href="https://github.com/kjk/kjkpub/blob/master/gobench/revcomp.go"&gt;Go implementation&lt;/a&gt;. I&amp;#8217;m not an expert in implementing reverse-complement algorithm but I immediately noticed that the code does a lot of unnecessary memory allocations and copying.&lt;/p&gt;

	&lt;p&gt;I looked at &lt;a href="http://shootout.alioth.debian.org/u64/program.php?test=revcomp&amp;lang=java&amp;id=6"&gt;Java implementation&lt;/a&gt; and what do you know: it&amp;#8217;s implemented more efficiently, without unnecessary memory allocations and copying. Plus it parallelizes the computation.&lt;/p&gt;

	&lt;p&gt;I wrote &lt;a href="https://github.com/kjk/kjkpub/blob/master/gobench/revcomp8c.go"&gt;my own Go implementation&lt;/a&gt;, more similar to Java&amp;#8217;s. &lt;/p&gt;

	&lt;p&gt;The result? On my Mac Pro I got about 20% speedup.&lt;/p&gt;

	&lt;p&gt;The argument of HN commenter was: look at those benchmarks, Go is 2x slower than Java.&lt;/p&gt;

	&lt;p&gt;It&amp;#8217;s a simplistic but unfortunately powerful argument: most people won&amp;#8217;t take the time to look at the code to determine that part of the issue is that Java implementation is simply better than Go one.&lt;/p&gt;

	&lt;p&gt;By that simplistic argument I&amp;#8217;ve improved speed of Go by 20% by writing a slightly better implementation of the benchmark.&lt;/p&gt;

	&lt;p&gt;The story doesn&amp;#8217;t end here. The above results were for single core x86 machine. If you look at results for &lt;a href="http://shootout.alioth.debian.org/u64/benchmark.php?test=all&amp;lang=go&amp;lang2=java"&gt;single-core 64-bit machine&lt;/a&gt;, Go actually wins on this and 2 other benchmarks.&lt;/p&gt;

	&lt;p&gt;The explanation is simple: 64-bit Go compiler is a little bit smarter than 32-bit Go compiler.&lt;/p&gt;

	&lt;h2&gt;Is Go really slower than Java?&lt;/h2&gt;

	&lt;p&gt;There is no simplistic answer to this, but there are rules of thumb.&lt;/p&gt;

	&lt;p&gt;On equivalent programs, Go generates programs similar in speed or faster. 64-bit version will be slightly better than 32-bit version.&lt;/p&gt;

	&lt;p&gt;Java garbage collector is much better than Go&amp;#8217;s which explains why allocation-heavy benchmarks (e.g. binary-trees) perform so much better (there&amp;#8217;s almost no computation there, just allocation of a lot of tree nodes).&lt;/p&gt;

	&lt;p&gt;Those benchmarks aren&amp;#8217;t currently representative of Go performance - from cursory look it seems that C++ and Java implementations are extremely optimized and Go implementations aren&amp;#8217;t (I&amp;#8217;ve submitted my improved version, hopefully it&amp;#8217;ll get incorporated).&lt;/p&gt;

	&lt;p&gt;Go is already competitive with Java and will only get better. Let&amp;#8217;s not forget that Java had over 20 years of investments in code generation and garbage collection. Go is only 5 years old. There already were compiler and garbage collection improvements since the latest released version.&lt;/p&gt;

	&lt;p&gt;One area where Go wins undeniably is memory usage - the programs use at least order of magnitude less memory. Java is paying the cost of sophisticated virtual machine.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>Websites with free ePub and mobi ebooks</title>
   <link href="http://blog.kowalczyk.info/article/nn2x/Websites-with-free-ePub-and-mobi-ebooks.html" rel="alternate" />
   <updated>2012-05-04T15:58:27-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:1103001</id>
   <content type="html">	&lt;p&gt;Since &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf"&gt;Sumatra&lt;/a&gt; now supports 2 most popular eBook formats (ePub and mobi), I&amp;#8217;ve started collecting a list of websites that offer &lt;a href="/articles/where-to-get-free-ebooks-epub-mobi.html"&gt;free ePub and mobi eBooks&lt;/a&gt;.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>SumatraPDF 2.1 released</title>
   <link href="http://blog.kowalczyk.info/article/nbie/SumatraPDF-21-released.html" rel="alternate" />
   <updated>2012-05-03T13:30:42-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:1088006</id>
   <content type="html">	&lt;p&gt;&lt;a href="http://blog.kowalczyk.info/software/sumatrapdf"&gt;SumatraPDF&lt;/a&gt; is a multi-format PDF, XPS, ebook (ePub, MOBI), DjVu etc. reader for Windows and &lt;a href="http://www.ohloh.net/p/4623/contributors"&gt;we&lt;/a&gt; are pleased to announce version 2.1.&lt;/p&gt;

	&lt;p&gt;The biggest change in this version is a support for ePub ebook format. ePub is one of the 2 most popular ebook standards (the other one, &lt;a href="http://blog.kowalczyk.info/articles/mobi-ebook-reader-viewer-for-windows.html"&gt;mobi&lt;/a&gt;, is already supported by Sumatra). Unfortunately we can only read ebooks that are not DRM protected.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added File/Rename menu item to rename currently viewed file (contributed by Vasily Fomin).&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added support for multi-page TIFF files, TGA images, more comic book (CBZ) metadata.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added support for JPEG XR images (available on Windows Vista or later, for Windows XP the &lt;a href="http://www.microsoft.com/en-us/download/details.aspx?id=32"&gt;Windows Imaging Component&lt;/a&gt; has to be installed)&lt;/p&gt;

	&lt;p&gt;The installer is now signed, for even better installation experience.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>Buying a certificate for signing windows applications</title>
   <link href="http://blog.kowalczyk.info/article/lh6f/Buying-a-certificate-for-signing-windows-applica.html" rel="alternate" />
   <updated>2012-04-15T22:28:01-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:1002039</id>
   <content type="html">	&lt;p&gt;Recently I&amp;#8217;ve bought a code signing certificate so that I can sign my Windows application &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html"&gt;SumatraPDF&lt;/a&gt; (I&amp;#8217;m hoping it&amp;#8217;ll reduce number of false positive from various anti-virus programs that claim that SumatraPDF is suspect).&lt;/p&gt;

	&lt;p&gt;There were some things that I wish I knew before starting the process, so I&amp;#8217;m documenting the process here for posterity.&lt;/p&gt;

	&lt;p&gt;There are many places to buy a code signing certificate. I bought it from &lt;a href="http://ksoftware.net/"&gt;K Software&lt;/a&gt; as they have low prices and there&amp;#8217;s enough info on the internet to assure me that it&amp;#8217;s the right kind of certificate for signing Windows apps. The certificate is actually issued by Comodo, K Software is only a reseller (but they have lower prices than Comodo - go figure).&lt;/p&gt;

	&lt;p&gt;I bought a certificate valid for 3 years ($245). The validity only affects whether you can use the certificate to sign. After you sign the app, its signature is valid forever. &lt;/p&gt;

	&lt;p&gt;After certificate expires, you can renew it. The shortest (and cheapest) validity period is 1 year. I opted for 3 years because it minimizes the hassle of renewing every year.&lt;/p&gt;

	&lt;p&gt;Important note before you start: you also need to have some internet domain registered in your name (or in your organization&amp;#8217;s name) and to minimize the troubles make sure there is a valid e-mail address with that domain that you can receive (I use Google Apps for that domain and use it to forward e-mails for that domain to my personal gmail account). &lt;/p&gt;

	&lt;p&gt;The domain is necessary to complete verification process of your identity that Comodo does. It&amp;#8217;s a strange requirement for a certificate for signing applications but I&amp;#8217;m guessing it&amp;#8217;s because certificate have roots in SSL/internet and in that case domain name is required.&lt;/p&gt;

	&lt;p&gt;Signing an app with a certificate basically serves as a stamp that says "this application has been signed by company/individual X". For the system to work, X must be a legitimate company/person and not, say, a hacker.&lt;/p&gt;

	&lt;p&gt;For that reason the organization that issues certificates (in this case Comodo) needs to verify the identity of the person buying the certificate so that e.g. I can&amp;#8217;t order a certificate that says my name is "Microsoft" and start signing my apps as coming from Microsoft.&lt;/p&gt;

	&lt;p&gt;The verification process starts after your purchase the certificate. The details depend on whether you&amp;#8217;re a company or an individual. I ordered as an individual and the verification process was:&lt;/p&gt;

	&lt;ul&gt;
		&lt;li&gt;They asked for a copy of valid id. I e-mailed them a photo of my driver&amp;#8217;s license (taken with iPhone)&lt;/li&gt;
		&lt;li&gt;Then they asked for a copy of a phone bill. I e-mailed them the PDF bill I downloaded from AT&amp;amp;T&amp;#8217;s website&lt;/li&gt;
		&lt;li&gt;Then they called me on my phone to verify the phone number on the bill is my phone number&lt;/li&gt;
	&lt;/ul&gt;

	&lt;p&gt;All in all, the back and forth took the whole day.&lt;/p&gt;

	&lt;p&gt;After the certificate is issued you need to download it to a file. To do that you need to visit a web page that Comodo created for you in a supported browser (FireFox or IE, Chrome is not supported, I used FireFox) on the same computer that was used to order the certificate.&lt;/p&gt;

	&lt;p&gt;That creates a certificate and adds it to browser&amp;#8217;s certificate store. Finally, you export the certificate to a file. The steps are detailed at &lt;a href="http://blog.ksoftware.net/"&gt;http://blog.ksoftware.net/&lt;/a&gt;.&lt;/p&gt;

	&lt;p&gt;As to actual signing, I use K Software&amp;#8217;s ksign tool (the command-line version ksigncmd that I call from my build script).&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>SumatraPDF 2.0 released</title>
   <link href="http://blog.kowalczyk.info/article/l41c/SumatraPDF-20-released.html" rel="alternate" />
   <updated>2012-04-02T14:54:09-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:985008</id>
   <content type="html">	&lt;p&gt;&lt;a href="http://blog.kowalczyk.info/software/sumatrapdf"&gt;SumatraPDF&lt;/a&gt; is a multi-format PDF, XPS, ebook (MOBI), DjVu etc. reader for Windows and &lt;a href="http://www.ohloh.net/p/4623/contributors"&gt;we&lt;/a&gt; are pleased to announce version 2.0.&lt;/p&gt;

	&lt;p&gt;The biggest change in this version is that we can now read ebooks in mobi format. Mobi is a format developed for MobiPocket Reader and made popular by Amazon&amp;#8217;s Kindle. Unfortunately, eBooks purchased from Amazon are protected by DRM so SumatraPDF (or any other third-party reader) cannot read them. The good news is that there are other sources of ebooks formatted in mobi format.&lt;/p&gt;

	&lt;p&gt;The UI for reading ebooks is different than the UI used for other documents. It&amp;#8217;s not hard to notice that the inspiration from the UI came from Kindle&amp;#8217;s PC application.&lt;/p&gt;

	&lt;p&gt;There were other changes as well:&lt;/p&gt;

	&lt;p&gt;We can now open CHM documents from network drives.&lt;/p&gt;

	&lt;p&gt;After selecting an area with the mouse, the area can be copied to the clipboard as an image with a right-click context menu.&lt;/p&gt;

	&lt;p&gt;Sumatra has always been a small and fast program and because we applied &lt;a href="http://code.google.com/p/sumatrapdf/source/browse/trunk/src/ucrt/readme.txt"&gt;extreme size optimization techniques&lt;/a&gt;, it&amp;#8217;s smaller than ever. If we didn&amp;#8217;t apply our extreme size-reduction techniques, the installer would be bigger by 9% (which is around 400 kB) (as a bonus, the size-reducing code we developed is available to other programmers under liberal BSD license).&lt;/p&gt;

	&lt;p&gt;And as always there are many smaller improvements: even better PDF support etc.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>A list of chm readers/viewers for Windows</title>
   <link href="http://blog.kowalczyk.info/article/gqmj/A-list-of-chm-readersviewers-for-Windows.html" rel="alternate" />
   <updated>2011-12-09T16:47:56-08:00</updated>
   <id>tag:blog.kowalczyk.info,1999:781003</id>
   <content type="html">	&lt;p&gt;I&amp;#8217;ve compiled a list of &lt;a href="http://blog.kowalczyk.info/articles/chm-reader-viewer-for-windows.html"&gt;CHM readers for windows&lt;/a&gt;. It&amp;#8217;s a surprisingly long list.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>SumatraPDF 1.9 released</title>
   <link href="http://blog.kowalczyk.info/article/fyuh/SumatraPDF-19-released.html" rel="alternate" />
   <updated>2011-11-24T00:44:29-08:00</updated>
   <id>tag:blog.kowalczyk.info,1999:745001</id>
   <content type="html">	&lt;p&gt;&lt;a href="http://www.ohloh.net/p/4623/contributors"&gt;We&lt;/a&gt; are pleased to announce 1.9 release of &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf"&gt;SumatraPDF&lt;/a&gt;, a small, fast, free PDF, CHM, DjVu, XPS, CBZ and CBR reader for Windows.&lt;/p&gt;

	&lt;p&gt;The most significant addition in this release is support for CHM documents. Sumatra is slowly getting support for more and more document formats.&lt;/p&gt;

	&lt;p&gt;Robert Prouse contributed support for touch gestures (available on Windows 7 or later, if you have the right hardware i.e. touch-enabled screen).&lt;/p&gt;

	&lt;p&gt;Audio and video files linked from PDF documents are now opened in an external media player.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve also improved support for PDF transparency groups.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>Introducing Volante - a database for C# (.NET)</title>
   <link href="http://blog.kowalczyk.info/article/epbm/Introducing-Volante-a-database-for-C-NET.html" rel="alternate" />
   <updated>2011-09-22T17:12:27-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:686002</id>
   <content type="html">	&lt;p&gt;&lt;a href="http://blog.kowalczyk.info/software/volante/database.html"&gt;Volante&lt;/a&gt; is a small, fast, object-oriented, embeddable database designed for seamless integration with C# (and other .NET languages). Today I&amp;#8217;ve made a first public release.&lt;/p&gt;

	&lt;p&gt;Almost every program needs to persist some data. There are many ways to do that: serialize data as XML or JSON, use SQLite etc.&lt;/p&gt;

	&lt;p&gt;I&amp;#8217;ve been recently writing desktop .NET applications in C# and existing (free) options didn&amp;#8217;t meet my needs well.&lt;/p&gt;

	&lt;p&gt;The closest best solution is SQLite, but it doesn&amp;#8217;t integrate with C# well: you have to convert your object to/from SQL tables.&lt;/p&gt;

	&lt;p&gt;If I was the world&amp;#8217;s toughest programmer I would write  an object-oriented database engine from scratch and it would be designed from the beginning for seamless integration with .NET framework.&lt;/p&gt;

	&lt;p&gt;Thankfully, I didn&amp;#8217;t have to do that. What I needed already existed in the form of &lt;a href="http://en.wikipedia.org/wiki/Perst"&gt;Perst&lt;/a&gt; project.&lt;/p&gt;

	&lt;p&gt;There was one wrinkle: while early versions of Perst were under BSD license, with version 2.50 the code was acquired by a company McObject and is now distributed under GPL and those who can&amp;#8217;t use GPL can purchase commercial license from McObject.&lt;/p&gt;

	&lt;p&gt;Not so great for one person who doesn&amp;#8217;t yet make money from his software.&lt;/p&gt;

	&lt;p&gt;I decided to adopt the Perst code base. I picked the latest 2.49 version that was still licensed under BSD (copyright cannot be change retroactively) and I&amp;#8217;ve spent the last couple of months writing comprehensive documentation, writing tests, fixing bugs discovered by tests, modernizing the code base.&lt;/p&gt;

	&lt;p&gt;Today I&amp;#8217;ve reached a point where I&amp;#8217;m comfortable releasing this code publicly as version 0.9.&lt;/p&gt;

	&lt;p&gt;I&amp;#8217;ve retained the BSD license of early Perst versions so the code is free to use in both open-source and commercial projects.&lt;/p&gt;

	&lt;p&gt;Volante database serves the same niche as SQLite: an embedded database engine for your desktop C# applications. Like with SQLite, the database is in a single file. &lt;/p&gt;

	&lt;p&gt;There are significant differences from SQLite.&lt;/p&gt;

	&lt;p&gt;.NET is an object-oriented framework. Volante is an object-oriented database to offer the best integration with .NET. Volante uses B-Trees to implement indexes, which allows quickly finding objects with desired properties.&lt;/p&gt;

	&lt;p&gt;Volante is extremely small: Volante.dll is only 180 KB.&lt;/p&gt;

	&lt;p&gt;I distribute Volante.dll for Microsoft&amp;#8217;s .NET (works in .NET 2.0 and later) but it can also be compiled from sources and used under Mono.&lt;/p&gt;

	&lt;p&gt;I was dogfooding Volante from day one in my three .NET applications and it&amp;#8217;s been performing great.&lt;/p&gt;

	&lt;p&gt;I hope you&amp;#8217;ll find it useful too.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>SumatraPDF 1.8 released</title>
   <link href="http://blog.kowalczyk.info/article/ej5e/SumatraPDF-18-released.html" rel="alternate" />
   <updated>2011-09-18T19:41:14-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:678002</id>
   <content type="html">	&lt;p&gt;&lt;a href="http://www.ohloh.net/p/4623/contributors"&gt;We&lt;/a&gt; are pleased to announce 1.8 release of &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf"&gt;SumatraPDF&lt;/a&gt;, a small, fast, free PDF, DjVu, XPS, CBZ and CBR reader for Windows.&lt;/p&gt;

	&lt;p&gt;This is a smallish release.&lt;/p&gt;

	&lt;p&gt;We improved support for PDF form text fields.&lt;/p&gt;

	&lt;p&gt;We did speed up handling of some types of djvu files.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve made a bunch of minor improvements and bug fixes.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>SumatraPDF 1.7 released</title>
   <link href="http://blog.kowalczyk.info/article/cbo9/SumatraPDF-17-released.html" rel="alternate" />
   <updated>2011-07-17T18:28:26-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:575001</id>
   <content type="html">	&lt;p&gt;&lt;a href="http://www.ohloh.net/p/4623/contributors"&gt;SumatraPDF developers&lt;/a&gt; are pleased to announce 1.7 release of &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf"&gt;SumatraPDF&lt;/a&gt;, a PDF, DjVu, XPS, CBZ and CBR reader for Windows.&lt;/p&gt;

	&lt;p&gt;In this release we&amp;#8217;ve added user-defined favorites (i.e. bookmarks). You can create one or more favorites for a given file, navigate to a favorite and delete them.&lt;/p&gt;

	&lt;p&gt;Favorites are accessed either via a menu items in Favorites top-level menu or displayed as a tree in the sidebar.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve improved support for right-to-left languages, like Arabic.&lt;/p&gt;

	&lt;p&gt;Logical page numbers are displayed and used, if document provides them (such as i, ii, iii, etc.).&lt;/p&gt;

	&lt;p&gt;We allow to restrict SumatraPDF&amp;#8217;s features with more granularity; see &lt;a href="http://code.google.com/p/sumatrapdf/source/browse/trunk/docs/sumatrapdfrestrict.ini"&gt;this document&lt;/a&gt; for more information.&lt;/p&gt;

	&lt;p&gt;Command-line argument -named-dest now also matches strings in table of contents.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve improved support for EPS files (requires Ghostscript)&lt;/p&gt;

	&lt;p&gt;Installer is now more robust. Previously an installation could fail if a web browser using Sumatra&amp;#8217;s web browser dll was running. Now installer detects this and will ask to close the browser before proceeding.&lt;/p&gt;

	&lt;p&gt;Until next release.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>How to make software crash less</title>
   <link href="http://blog.kowalczyk.info/article/c4qb/How-to-make-software-crash-less.html" rel="alternate" />
   <updated>2011-06-30T15:32:59-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:566003</id>
   <content type="html">	&lt;p&gt;You don&amp;#8217;t want your software to crash, do you? This post describes my experiences in making desktop Mac and Windows software crash less.&lt;/p&gt;

	&lt;h2&gt;Know thy crashes&lt;/h2&gt;

	&lt;p&gt;The most important step to fixing your crashes is to know about them. &lt;/p&gt;

	&lt;p&gt;Given how complex and varied our desktop operating systems are (3 major version of Windows in active use, thousands of minor and major ways that each installation of Windows or Mac can be different) our ability to comprehensively test software is not good.&lt;/p&gt;

	&lt;p&gt;Sure, if you&amp;#8217;re Microsoft or Adobe you can reinvest some of the revenue to hire an army or testers, setup compatibility labs etc. but for a small company or a single developer this is not realistic. &lt;/p&gt;

	&lt;p&gt;Bugs most often lurk in untested code and even a very good testing effort is unlikely to encounter all the many things that can go wrong in real life.&lt;/p&gt;

	&lt;h2&gt;Get the crash reports automatically&lt;/h2&gt;

	&lt;p&gt;Very few people bother to tell software vendors about crashes. They just shrug and restart. The only realistic way to be informed about crashes is to automatically gather crash reports without user involvement.&lt;/p&gt;

	&lt;p&gt;This is a proven idea. Microsoft was one of the pioneers of using this technique for Windows OS and they publicly praise it for letting them fix the most frequent crashes and increase stability of Windows. Many clued in organization do it as well: Apple, Mozilla, Google.&lt;/p&gt;

	&lt;h2&gt;Getting the crashes - the mechanics&lt;/h2&gt;

	&lt;p&gt;Regardless of the platform, the solution involves two parts:	&lt;ul&gt;
		&lt;li&gt;a server, which accepts crash reports from the software&lt;/li&gt;
		&lt;li&gt;code inside the software itself. It gets activated when a crash happens and sends crash report to the server&lt;/li&gt;
	&lt;/ul&gt;&lt;/p&gt;

	&lt;h3&gt;The server&lt;/h3&gt;

	&lt;p&gt;The server part is simple. You can use any technology to write it, any protocol you want.&lt;/p&gt;

	&lt;p&gt;Personally I use &lt;a href="https://appengine.google.com/"&gt;App Engine&lt;/a&gt;, re-use HTTP POST protocol and run on standard HTTP port (80) for maximum compatibility with client-side firewall software.&lt;/p&gt;

	&lt;p&gt;It&amp;#8217;s literally few lines of code to parse incoming POST requests, store them in the database and provide a basic web interface for easy browsing. As an additional bonus, App Engine is free if your traffic is small enough,&lt;/p&gt;

	&lt;p&gt;If you write in PHP and run on a small VPS, it&amp;#8217;ll work just as well.&lt;/p&gt;

	&lt;h3&gt;Client side on Windows in C# apps&lt;/h3&gt;

	&lt;p&gt;In C# I setup a global exception handler (in WPF it means setting up handlers for App.DispatcherUnhandledException and AppDomain.CurrentDomain.UnhandledException). &lt;/p&gt;

	&lt;p&gt;I then use HttpWebRequest class to send exception message and callstack as HTTP POST (using multipart/form-data).&lt;/p&gt;

	&lt;p&gt;The code is less than hundred lines and took mere hours to write.&lt;/p&gt;

	&lt;h3&gt;Client side on Windows in C++ apps&lt;/h3&gt;

	&lt;p&gt;In C++ everything is substantially harder. The big picture is similar: 	&lt;ul&gt;
		&lt;li&gt;install global handler for unhandled exceptions (with SetUnhandledExceptionFilter())&lt;/li&gt;
		&lt;li&gt;in that handler generate the crash report and HTTP submit it to the server&lt;/li&gt;
	&lt;/ul&gt;&lt;/p&gt;

	&lt;p&gt;The details are substantially more complicated. The code is more than a thousand lines and took me days to perfect.&lt;/p&gt;

	&lt;p&gt;The biggest issue was that, unlike in C#, you don&amp;#8217;t get readable callstack in native code. You need symbols (.pdb files) for that. My solution is convoluted, but works:	&lt;ul&gt;
		&lt;li&gt;during build process, I archive .pdb files on the server (I use S3, but any web server will do)&lt;/li&gt;
		&lt;li&gt;in the crash handler I download the symbols locally, on demand&lt;/li&gt;
		&lt;li&gt;I create crash report containing human-readable callstacks of all threads and some other useful information&lt;/li&gt;
		&lt;li&gt;I HTTP POST it to the server&lt;/li&gt;
	&lt;/ul&gt;&lt;/p&gt;

	&lt;p&gt;You can re-use my work. The code is part of my &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf/"&gt;SumatraPDF&lt;/a&gt; open-source project. Most of it is in &lt;a href="http://code.google.com/p/sumatrapdf/source/browse/trunk/src/CrashHandler.cpp"&gt;CrashHandler.cpp&lt;/a&gt; and, unlike most of Sumatra, is under liberal BSD license.&lt;/p&gt;

	&lt;h3&gt;Client side on Mac OS X&lt;/h3&gt;

	&lt;p&gt;The good thing about Mac is that it already creates human-readable crash reports for you. They are stored in ~/Library/Logs/CrashReporter/ directory.&lt;/p&gt;

	&lt;p&gt;There&amp;#8217;s no need to handle crashes yourself. I just check at startup if there&amp;#8217;s a new crash report for my app (the files are named after your application) from a previous run. If there is, I submit it to the server (and delete so that it&amp;#8217;s not sent multiple times).&lt;/p&gt;

	&lt;h2&gt;The alternatives&lt;/h2&gt;

	&lt;p&gt;While the general idea is always the same, there are different ways of implementing it.&lt;/p&gt;

	&lt;p&gt;On Windows a simpler solution is to capture so-called minidumps (using MiniDumpWriteDumpProc() Windows API) instead of going to the trouble of generating human-readable crash reports client side.&lt;/p&gt;

	&lt;p&gt;I did that too. The problem with that approach is that you have to inspect each crash dump manually in the debugger (e.g. WinDBG). I wrote a python script that automated the process (you can script it by launching cdb debugger with the right parameters and making it run !analyze -v)).&lt;/p&gt;

	&lt;p&gt;Unfortunately, cdb is buggy and was hanging on some dump files. It&amp;#8217;s probably possible to work around with a timeout in the python script, but at that point I stopped caring.&lt;/p&gt;

	&lt;p&gt;Windows provides native support for minidumps. Google took minidump design and provided cross-platform implementation for Windows, Mac and Linux, as part of &lt;a href="http://code.google.com/p/google-breakpad/"&gt;breakpad&lt;/a&gt; project.&lt;/p&gt;

	&lt;p&gt;Breakpad is the crash reporting system used by Google for Chrome and Mozilla for Firefox. It contains both client and server parts for native (C/C++ or Objective C) code.&lt;/p&gt;

	&lt;p&gt;I used it once for a Mac app. For Objective C I prefer the approach described above as it&amp;#8217;s simpler to implement, but I&amp;#8217;m sure that&amp;#8217;s a solid and well tested approach.&lt;/p&gt;

	&lt;p&gt;On Windows, crash reports from your app are already sent to Microsoft as part of &lt;a href="http://en.wikipedia.org/wiki/Windows_Error_Reporting"&gt;Windows Error Reporting&lt;/a&gt;. Apparently, it&amp;#8217;s possible to for third party developers to get access to those reports but I never did that, so don&amp;#8217;t know what&amp;#8217;s involved in the process.&lt;/p&gt;

	&lt;h2&gt;The SumatraPDF experience&lt;/h2&gt;

	&lt;p&gt;So how well does it work in practice?&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve implemented the system described here in Sumatra 1.5. Sumatra is a rather complicated piece of C++ code and quite popular (several thousand of downloads per day).&lt;/p&gt;

	&lt;p&gt;Before 1.5 we had a system where we would save the minidump to a disk and after a crash we would ask the user to report it in our bug tracker and attach minidump to the bug report.&lt;/p&gt;

	&lt;p&gt;It became obvious that almost no one did that. We&amp;#8217;ve only gotten few crash reports from users in few months. Our automated system was sending us tens of crash reports per day.&lt;/p&gt;

	&lt;p&gt;Once we knew about the problems, we could attempt to fix them. Some problems we could fix just by looking at crash report. Some required writing stress tests to make them easier to reproduce locally. Some of them we can&amp;#8217;t fix (e.g. because they are caused by buggy printer drivers or other software that injects buggy dlls into our process).&lt;/p&gt;

	&lt;p&gt;We do know that we fixed some of the bugs. We can see that a new release generates less crashes and by looking at crash reports we can tell that some crashes that happened frequently in previous releases do not happen anymore.&lt;/p&gt;

	&lt;p&gt;Building automated crash reporting system was the best investment we could have made for improving reliability of SumatraPDF.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>Experience porting 4k lines of C code to go</title>
   <link href="http://blog.kowalczyk.info/article/af1h/Experience-porting-4k-lines-of-C-code-to-go.html" rel="alternate" />
   <updated>2011-06-01T23:40:38-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:486053</id>
   <content type="html">	&lt;p&gt;I&amp;#8217;ve been recently interested in &lt;a href="http://golang.org"&gt;Go&lt;/a&gt;, a new programming language.&lt;/p&gt;

	&lt;p&gt;The best way to learn a language is to use it in a small but real project.&lt;/p&gt;

	&lt;p&gt;I needed a program that generates HTML from textile format. The way of least resistance would be to implement it in Python, as I&amp;#8217;ve coded a similar thing in the past. Instead I decided to use this as an opportunity to get more familiar with Go.&lt;/p&gt;

	&lt;p&gt;The biggest part of the problem was the textile to HTML conversion. There is no existing Go code for that so I decided to port &lt;a href="https://github.com/tanoku/upskirt"&gt;upskirt&lt;/a&gt; C library, as it does the job in the most performant way (it has a hand-written, disciplined parser as opposed to most other solutions that just throw cryptic regular expression at the problem).&lt;/p&gt;

	&lt;p&gt;The bottom line is: porting C code, at least in this case, was fast, boring, mechanical process (and that is a good thing).&lt;/p&gt;

	&lt;p&gt;Go&amp;#8217;s syntax is heavily inspired by C. The differences that I&amp;#8217;ve encountered most frequently:&lt;/p&gt;

	&lt;ul&gt;
		&lt;li&gt;syntax for declaring variables is different (and better)&lt;/li&gt;
		&lt;li&gt;while keyword is missing, replaced by a more versatile for syntax&lt;/li&gt;
		&lt;li&gt;function declaration syntax is different&lt;/li&gt;
	&lt;/ul&gt;

	&lt;p&gt;Fortunately, the transformations were simple and mostly mechanical.&lt;/p&gt;

	&lt;p&gt;It took me just few days to manually translate around 4000 lines of C code into ~3200 lines of Go code.&lt;/p&gt;

	&lt;p&gt;Saving ~800 lines of code (20%) is good, but the interesting part is: where do the savings come from?&lt;/p&gt;

	&lt;p&gt;The core parsing/html generating logic didn&amp;#8217;t shrink much. The big savings came from the fact that Go has a built-in growable array type and upskirt C code had to spend 924 lines re-implementing that in C.&lt;/p&gt;

	&lt;p&gt;An unexpected advantage of Go was its safety. The C code implements parsing by partying on char * pointers. Such code is notorious for causing lots of subtle, hard to test for bugs. Go doesn&amp;#8217;t allow this kind of pointer arithmetic and instead provides slices, which are a view into an underlaying array.&lt;/p&gt;

	&lt;p&gt;Slices provide out-of-bounds checks. Just by recoding in Go I found out-of-bounds access bug in the &lt;a href="https://github.com/tanoku/upskirt/issues/24"&gt;original C code&lt;/a&gt;.&lt;/p&gt;

	&lt;p&gt;Thanks to similarity of Go and C syntax, porting algorithmic code from C is simple.&lt;/p&gt;

	&lt;p&gt;All things considered, Go is quickly becoming my new preferred language (taking the crown away from Python). It combines the good attributes of Python (lightweight syntax, garbage collection) with good attributes of C (fast execution thanks to compilation to native code and programmer&amp;#8217;s control over memory layout) and adds some unique capabilities of its own (concurrency via gorutines and channels).&lt;/p&gt;

	&lt;p&gt;BTW: if you want markdown implementation for Go, use &lt;a href="https://github.com/russross/blackfriday"&gt;blackfriday&lt;/a&gt;. It&amp;#8217;s also direct port of upskirt and I abandoned my port in favor of contributing to blackfriday, since it was slightly ahead and there&amp;#8217;s no need for two nearly identical projects.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>SumatraPDF 1.6 released</title>
   <link href="http://blog.kowalczyk.info/article/ao9k/SumatraPDF-16-released.html" rel="alternate" />
   <updated>2011-05-30T21:42:25-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:498008</id>
   <content type="html">	&lt;p&gt;&lt;a href="http://www.ohloh.net/p/4623/contributors"&gt;SumatraPDF developers&lt;/a&gt; are quite pleased to announce 1.6 release of &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf"&gt;SumatraPDF&lt;/a&gt;, a PDF, XPS, DjVu, CBZ and CBR reader for Windows.&lt;/p&gt;

	&lt;p&gt;In this release we&amp;#8217;ve added support for &lt;a href="http://djvu.org/"&gt;DjVu&lt;/a&gt; file format.&lt;/p&gt;

	&lt;p&gt;When no document is open, we display a list of frequently read document as thumbnails. This functionality is inspired by new tab page in Chrome.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added support for displaying Postscript documents. This requires recent Ghostscript version to be already installed - we don&amp;#8217;t bundle it ourselves.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added support for displaying a folder containing images: drag the folder to SumatraPDF window&lt;/p&gt;

	&lt;p&gt;We now support clickable links and a Table of Content for XPS documents.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added printing progress and allow canceling printing process.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added Print toolbar button.&lt;/p&gt;

	&lt;p&gt;Experimental: we&amp;#8217;ve added previewing of PDF documents in Windows Vista and 7. This creates thumbnails and displays documents in Explorer&amp;#8217;s Preview pane. Needs to be explicitly selected during install process. We&amp;#8217;ve had reports that it doesn&amp;#8217;t work on 64-bit Windows which is why we call it experimental.&lt;/p&gt;

	&lt;p&gt;This is how "frequently read" list looks like:&lt;/p&gt;

	&lt;p&gt;&lt;img src="http://kjkpub.s3.amazonaws.com/blog/sumatra/sum-shot-03-small.png" style="display: block; margin: 0 auto;" alt=""&gt;&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>Easy vs. probable or how to make money with software</title>
   <link href="http://blog.kowalczyk.info/article/ahcj/Easy-vs-probable-or-how-to-make-money-with-softw.html" rel="alternate" />
   <updated>2011-05-14T17:15:17-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:489043</id>
   <content type="html">	&lt;h2&gt;It it hard to make $1000 a month?&lt;/h2&gt;

	&lt;p&gt;There&amp;#8217;s a question on Quora: &lt;a href="http://www.quora.com/Is-it-hard-to-build-market-and-maintain-a-web-app-that-makes-at-least-1000-a-month"&gt;Is it hard to build, market and maintain a web app that makes at least $1000 a month?&lt;/a&gt;&lt;/p&gt;

	&lt;p&gt;Most of the answers are happily saying "no, it&amp;#8217;s not hard at all", usually justifying it with wishful thinking ("it&amp;#8217;s not hard to get 50 people pay you $20 a month").&lt;/p&gt;

	&lt;p&gt;Those answers are misleading.&lt;/p&gt;

	&lt;p&gt;I don&amp;#8217;t blame the respondents. They answered the question the way it was asked. &lt;/p&gt;

	&lt;h2&gt;What you ask vs. what you want to know&lt;/h2&gt;

	&lt;p&gt;The problem is that the person who asked the question really wanted to know how probable it is, not how hard. There&amp;#8217;s a big difference.&lt;/p&gt;

	&lt;h2&gt;A personal anecdote&lt;/h2&gt;

	&lt;p&gt;Many years ago, in the early days of Palm PDAs I wrote an English dictionary for Palm OS. For  more than 2 years I made several thousands dollars per months from it.&lt;/p&gt;

	&lt;p&gt;Was it hard? Not at all. It took me about 2 weeks to write the first version and there was nothing especially difficult about it - there are thousands of programmers that could have done it.&lt;/p&gt;

	&lt;p&gt;My website was simplistic, my marketing skills limited to uploading my app to an app store-like website (PalmGear) which handled selling the app.&lt;/p&gt;

	&lt;p&gt;Nothing I did was difficult. However, my success wasn&amp;#8217;t very probable.&lt;/p&gt;

	&lt;p&gt;I wrote one of the first (if not the first) dictionaries. Things were different in 2001. Today everyone and their mother jumps on building iPhone or Android apps bandwagons but back then I had no competition from known dictionary publishers (publishers are not risk taking, future embracing organizations).&lt;/p&gt;

	&lt;p&gt;For a while I had the market to myself and that helped to establish my program as one of the most popular programs for Palm OS. Eventually competition arrived and it did have negative effect on my sales, but they had a really hard time catching up with me.&lt;/p&gt;

	&lt;p&gt;Lesson number 1: timing matters.&lt;/p&gt;

	&lt;p&gt;Getting timing right is not hard or easy, it&amp;#8217;s simply not likely that you&amp;#8217;ll get the timing right. If the right timing is obvious, it&amp;#8217;ll be obvious for many people and, paradoxically, means it&amp;#8217;s too late. If it&amp;#8217;s not obvious, there&amp;#8217;s a big probability that whatever you think the right timing is, you&amp;#8217;re wrong.&lt;/p&gt;

	&lt;p&gt;The right timing is only obvious in retrospect. &lt;/p&gt;

	&lt;p&gt;In my case, I didn&amp;#8217;t write my dictionary because I thought it&amp;#8217;s the right time to do it if I wanted to make money. I was just excited about new technology (Palm Pilots), wanted to write some program for it and a dictionary seemed like a good choice.&lt;/p&gt;

	&lt;h2&gt;Do it once vs. do it many times&lt;/h2&gt;

	&lt;p&gt;Another way to look at it is repeatability.&lt;/p&gt;

	&lt;p&gt;The easy success of my app wasn&amp;#8217;t lost on me. If I could replicate that success several times, I would get quite rich with little effort. &lt;/p&gt;

	&lt;p&gt;So I&amp;#8217;ve made more apps for Palm OS. They were more complex than my dictionary. They took longer to write, but still weren&amp;#8217;t hard to write.&lt;/p&gt;

	&lt;p&gt;I assume that my marketing, sales and other business skills were at least as good as before. And yet, the other apps were all failures. &lt;/p&gt;

	&lt;p&gt;My first success was easy but it wasn&amp;#8217;t easily repeatable.&lt;/p&gt;

	&lt;p&gt;Many people who answered the Quora question did build successful websites making $1000 or more a month. Does that mean that their experience is good enough to declare that achieving what they did is easy?&lt;/p&gt;

	&lt;p&gt;No. Just like me, they all have only one or two successful products. If it&amp;#8217;s easy, then why don&amp;#8217;t they re-invest their capital in making more products? After all, if you have 100 products each making $1000 per month, you&amp;#8217;ll make more than a million a year.&lt;/p&gt;

	&lt;p&gt;The answers are a good example of selection bias. Even if something has 1% probability of happening, if there are 100 people who tried, 1 of them will succeed. Given the popularity of Quora, it&amp;#8217;s not hard to find lots of people who tried something with low probability of success and did it successfully. When they see someone asking if doing it was hard, they jump in saying that it wasn&amp;#8217;t.&lt;/p&gt;

	&lt;p&gt;They are truthful and trying to be helpful and encouraging but we don&amp;#8217;t get the complete picture because we don&amp;#8217;t hear from much bigger number of people who tried and failed.&lt;/p&gt;

	&lt;p&gt;The successes might have been easy, but they were not likely.&lt;/p&gt;

	&lt;h2&gt;Just ship it&lt;/h2&gt;

	&lt;p&gt;Making software (be it a web site, a mobile app, a desktop app) that makes money is something with low probability of success. It&amp;#8217;s not as bad as you think: making money in any self-directed business (as opposed to being employed) has low probability of success.&lt;/p&gt;

	&lt;p&gt;More importantly: you have the biggest influence on that probability.&lt;/p&gt;

	&lt;p&gt;The person who asked this question probably won&amp;#8217;t be successful. The answer is unknowable and knowing it would not make a difference.&lt;/p&gt;

	&lt;p&gt;You&amp;#8217;ll succeed or you&amp;#8217;ll fail but you&amp;#8217;ll not ever know it until you&amp;#8217;ve tried. Spending any time calculating your chances actually decreases the probability because it takes away time and energy from things that matter.&lt;/p&gt;

	&lt;p&gt;The single most important thing when building money making software product is this: ship it.&lt;/p&gt;

	&lt;p&gt;My success might have been accidental, but it wouldn&amp;#8217;t have happen if I didn&amp;#8217;t write the code, the website, the documentation and made my program available for people to buy.&lt;/p&gt;

	&lt;p&gt;If the product fails, either improve it or make another one. The more times you try, the more likely you&amp;#8217;re to succeed.&lt;/p&gt;

	&lt;p&gt;When you start getting sales, double down. The unassailable logic of software business is that it&amp;#8217;s easier to make a successful product even more successful than to create another one that is just as successful.&lt;/p&gt;

	&lt;p&gt;Look at any software business. Even the biggest successes, like Microsoft or Google, have only few extremely profitable products and despite having the brightest developers, mountains of money, cross-promotional opportunities, they struggle to create more money making products.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>90% of success is showing up - a proof</title>
   <link href="http://blog.kowalczyk.info/article/afrv/90-of-success-is-showing-up-a-proof.html" rel="alternate" />
   <updated>2011-05-07T18:05:04-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:487003</id>
   <content type="html">	&lt;p&gt;Woody Allen said that 90% of success is showing up. &lt;a href="http://www.nytimes.com/2011/05/08/technology/08class.html"&gt;This article&lt;/a&gt; is a proof of that.&lt;/p&gt;

	&lt;p&gt;In 2007 a Standford professor gave his student a taks: build a Facebook application and get people to use it.&lt;/p&gt;

	&lt;p&gt;The result? Some applications became so popular that students made good money out of them. Some even turned into companies.&lt;/p&gt;

	&lt;p&gt;The important part is that none of those student particularly wanted to build that app. It was simply something they were told to do.&lt;/p&gt;

	&lt;p&gt;The challenge in life is that no one tells us to do things that we&amp;#8217;ll own. Sure, at work bosses tell us what to do but the company owns the result of the work and all potential windfalls.&lt;/p&gt;

	&lt;p&gt;What the article shows that just by doing something and shipping it to the world, we might create an unexpected success.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>SumatraPDF 1.5 released</title>
   <link href="http://blog.kowalczyk.info/article/9ile/SumatraPDF-15-released.html" rel="alternate" />
   <updated>2011-04-23T19:37:04-07:00</updated>
   <id>tag:blog.kowalczyk.info,1999:444002</id>
   <content type="html">	&lt;p&gt;A new version of &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html"&gt;SumatraPDF&lt;/a&gt;, a free, small, fast, open-source PDF reader for Windows, is ready.&lt;/p&gt;

	&lt;p&gt;What&amp;#8217;s new in version 1.5?&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added support for XPS documents.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve also added support for CBZ and CBR files (popular formats for comic book files).&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added File/Save Shortcut menu item to create a shortcut to a specific place in a document.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added right-click context menu for copying text, link addresses and comments. In browser plugin context menu also has items for saving and printing.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added folder browsing. Ctrl+Shift+Right opens next PDF in the current folder, Ctrl-Shift+Left opens previous PDF in the current folder. Current folder is the one where currently opened PDF documents is located.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve also fixed handling of large PDF files in browser plugin in FireFox.&lt;/p&gt;

	&lt;p&gt;SumatraPDF is a creation of &lt;a href="http://www.ohloh.net/p/4623/contributors"&gt;SumatraPDF developers&lt;/a&gt;.&lt;/p&gt;

	&lt;p&gt;Let the downloads begin.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>SumatraPDF 1.4 released</title>
   <link href="http://blog.kowalczyk.info/article/95h6/SumatraPDF-14-released.html" rel="alternate" />
   <updated>2011-03-12T16:18:46-08:00</updated>
   <id>tag:blog.kowalczyk.info,1999:427002</id>
   <content type="html">	&lt;p&gt;Good news everyone: a new version of &lt;a href="http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html"&gt;SumatraPDF&lt;/a&gt;, a free, small, fast, open-source PDF viewer for Windows, is ready.&lt;/p&gt;

	&lt;p&gt;What&amp;#8217;s new in this version?&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added browser plugin for Firefox/Chrome/Opera (Internet Explorer is not supported).&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added IFilter that enables full-text search of PDF files in Windows Desktop Search (i.e. search from Windows Vista/7&amp;#8217;s Start Menu).&lt;/p&gt;

	&lt;p&gt;Browser plugin and IFilter are not installed by default so you need to use options in the installer and check the appropriate checkboxes. You can uninstall them by re-running the installer and de-selecting the options.&lt;/p&gt;

	&lt;p&gt;In 1.3 we&amp;#8217;ve improved text selection to use left mouse button. The downside of that was that left mouse button couldn&amp;#8217;t be used for scrolling when mouse cursor is over text. To compensate for that, in 1.4 you can scroll with right mouse button.&lt;/p&gt;

	&lt;p&gt;In 1.3 we&amp;#8217;ve introduced a new installer. Some of you missed the ability of choosing a custom installation directory that was not implemented in 1.3. We&amp;#8217;ve re-introduced support for non-standard installation directory in 1.4.&lt;/p&gt;

	&lt;p&gt;SumatraPDF focuses on reading PDFs and doesn&amp;#8217;t implement some more advanced functionality like filling out forms or making annotations. If you need this functionality, we&amp;#8217;ve made it easy to re-open current document via File menu in Adobe Reader (if it&amp;#8217;s installed). In 1.4 we also support Foxit and PDF-XChange.&lt;/p&gt;

	&lt;p&gt;To make SumatraPDF files smaller, we used to compress the executable with mpress. Unfortunately that caused some anti-virus programs to falsely report Sumatra as a virus. We no longer compress the executables that ship with the installer version, so don&amp;#8217;t be surprised that the files are now bigger. The portable, .zip version still ships as a single, compressed executable (so don&amp;#8217;t be surprised if it&amp;#8217;s flagged by some anti-virus software).&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve removed -title cmd-line option.&lt;/p&gt;

	&lt;p&gt;We&amp;#8217;ve added support for AES-256 encrypted PDF files, fixed an integer overflow reported by Jeroen van der Gun and and made other fixes and improvements to PDF handling.&lt;/p&gt;

	&lt;p&gt;If you wonder why we don&amp;#8217;t have browser plugin for Internet Explorer, the explanation is simple: no-one has written the necessary code.&lt;/p&gt;

	&lt;p&gt;SumatraPDF has been brought to you by &lt;a href="http://www.ohloh.net/p/4623/contributors"&gt;SumatraPDF developers&lt;/a&gt;.&lt;/p&gt;</content>
  </entry>
  <entry>
   <title>XML is really, really slow</title>
   <link href="http://blog.kowalczyk.info/article/935t/XML-is-really-really-slow.html" rel="alternate" />
   <updated>2011-03-10T18:25:56-08:00</updated>
   <id>tag:blog.kowalczyk.info,1999:424001</id>
   <content type="html">	&lt;p&gt;It&amp;#8217;s about 8 years late but W3C finally produced a standard for &lt;a href="http://www.readwriteweb.com/archives/new_xml_standard_for_super-fast_lightweight_applic.php"&gt;efficient XML representation&lt;/a&gt;. They call it somewhat opaquely &lt;a href="http://www.w3.org/TR/2011/REC-exi-20110310/"&gt;Efficient XML Interchange&lt;/a&gt; (EXI). I guess it sounds better than what we used to call such things back in the day: binary XML (which is a better description what it&amp;#8217;s about).&lt;/p&gt;

	&lt;p&gt;The money quote from the announcement:&lt;/p&gt;

	&lt;blockquote&gt;
		&lt;p&gt;They&amp;#8217;ve achieved over 100-fold performance improvements...&lt;/p&gt;
	&lt;/blockquote&gt;

	&lt;p&gt;One way to interpret this is: wow, those guys are really smart.&lt;/p&gt;

	&lt;p&gt;There is much simpler explanation: XML is really, really slow.&lt;/p&gt;

	&lt;h2&gt;Speed comes from architecture&lt;/h2&gt;

	&lt;p&gt;I&amp;#8217;m somewhat performance oriented in my programming work. One of the reasons I disliked the popularity of XML was that I saw how often it blinded people to engineering realities. Choosing XML was often a reason for dramatic performance issues that then had to be heroically recovered.&lt;/p&gt;

	&lt;p&gt;Those performance problems were, however, mostly self inflicted. XML is only one of the possible ways to store or exchange data but it certainly is one of the slowest.&lt;/p&gt;

	&lt;p&gt;You cannot get 100x speed up over technology that wasn&amp;#8217;t incredibly inefficient to begin with.&lt;/p&gt;

	&lt;p&gt;Sometimes you don&amp;#8217;t have the choice (when you have to inter-operate with systems that only offer XML) but I&amp;#8217;ve seen many cases where people did have the choice and made the wrong one.&lt;/p&gt;

	&lt;p&gt;XML isn&amp;#8217;t such a hot buzzword anymore so I don&amp;#8217;t see that problem as often but I still do see it.&lt;/p&gt;

	&lt;p&gt;For example, more than one text editor decided to store the syntax highlighter definitions as XML while it can be trivially stored in simple text format that can be parsed much faster, using less memory and, as a bonus, can actually be edited by human beings.&lt;/p&gt;

	&lt;p&gt;A bigger point is: speed comes from architecture. It&amp;#8217;s not that choosing the right architecture will make you dramatically better but choosing the wrong architecture will make you dramatically worse. &lt;/p&gt;

	&lt;p&gt;Choosing XML over more efficient ways of storing data is just one, but particularly frequent, example of that rule.&lt;/p&gt;

	&lt;h2&gt;Binary XML (EXI) is too little too late&lt;/h2&gt;

	&lt;p&gt;As to the standard itself: don&amp;#8217;t pay attention to it. It&amp;#8217;s too little, way too late.&lt;/p&gt;

	&lt;p&gt;About 8 years ago I did my small part in expanding XML capabilities of SQL Server. At the time XML was hot so Oracle, Microsoft, IBM raced to add native XML handling to their databases. It was going to be the next big thing. Possibly even bigger than big.&lt;/p&gt;

	&lt;p&gt;Trust me, the ridiculous inefficiency of XML wasn&amp;#8217;t lost on developers working on XML technologies 8 years ago. Coming up with a more efficient binary XML format is easy. Microsoft had its version (if not several of them) and other companies had theirs.&lt;/p&gt;

	&lt;p&gt;The real problem was: politics prevented people agreeing on a standard so no standard emerged.&lt;/p&gt;

	&lt;p&gt;There was an eruption of creating standards based on XML (SOAP, XML Schema, XQuery). If you don&amp;#8217;t know what those terms mean it&amp;#8217;s because they all failed (despite the fact that everyone was convinced they&amp;#8217;re going to be the next big thing).&lt;/p&gt;

	&lt;p&gt;In hindsight it was a terrible mistake to work on those big standards of speculative value but not solve a real problem people already had: an efficient, binary, standard format for storing XML.&lt;/p&gt;

	&lt;p&gt;If W3C came up with EXI 8 years ago, maybe people wouldn&amp;#8217;t feel the need to invent Protocol Buffers or Thrift and it would have won.&lt;/p&gt;

	&lt;p&gt;But solving this problem today is almost comically late. It has no chance of adoption. &lt;/p&gt;

	&lt;p&gt;If you&amp;#8217;re planning to use a custom, binary way of storing XML, using EXI is probably better but there&amp;#8217;s no way EXI will become so universally supported as XML and universal support (in software libraries, books etc.) is really the main thing XML has going for it (speed or human-readable syntax, on the other hand, are not XML&amp;#8217;s strengths).&lt;/p&gt;</content>
  </entry>
 </feed>
