<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;CUcCSH44fyp7ImA9WhRQGE0.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691</id><updated>2011-12-13T20:04:29.037+01:00</updated><category term="test" /><category term="scala" /><category term="quickcheck" /><category term="java" /><category term="shell" /><category term="vcs" /><category term="subversion" /><title>Thomas Jung's Blog</title><subtitle type="html" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://theyougen.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>29</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/TheYoungGeneration" /><feedburner:info uri="theyounggeneration" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><link rel="license" type="text/html" href="http://creativecommons.org/licenses/by/2.0/" /><entry gd:etag="W/&quot;CUcCSH4_fSp7ImA9WhRQGE0.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-6500357206813272687</id><published>2011-12-13T20:04:00.000+01:00</published><updated>2011-12-13T20:04:29.045+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-13T20:04:29.045+01:00</app:edited><title>Why I Created Daily Feed Recycler</title><content type="html">The motivation behind &lt;a href="http://daily-feed-recycler.appspot.com/"&gt;Daily Feed Recycler&lt;/a&gt; is that there is too much content on the Internet: good and bad. Once I found a good source, I do not have enough time to actually read everything. I just can’t read 5k word articles in a row. Even if I had the time to do it, I cannot mentally. It’s not fun.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-NEQR2LPdx2o/Tud8WNRYAVI/AAAAAAAAAFU/t2HHynFe5m8/s1600/recycle_feed.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-NEQR2LPdx2o/Tud8WNRYAVI/AAAAAAAAAFU/t2HHynFe5m8/s1600/recycle_feed.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
Feeds are a wonderful tool for authors and readers. They allow to stay informed about the changes on a page. Feed aggregators are part of this reading experience. They let you manage feeds: read articles, mark as read, subscribe and aggregate feeds. Applications creating feeds get this for free and there are a multitude of feed aggregators available.&lt;br /&gt;
&lt;br /&gt;
Feeds help with time problem to some extent, but they - wonderful as they are - have a dark side. Instead of solving the problem of organizing content in a way that you can read really good stuff, they organize content in a way that you can read the latest stuff. Yes, the new content is cool, but the classics are here to stay. You will not get a tweet from Goethe. With Daily Feed Recycler the good stuff is on an equal footing with the latest stuff. The content for every day is presented as new stuff in your feed aggregator. &lt;br /&gt;
&lt;br /&gt;
Content has to be presented in a digestible manner. Deep reading, not just skimming to get an overview. Time and mental energy has to be organized in a way to allow reading. The Daily Feed Recycler is thought of as a way to break down content in smaller parts and add a reminder that there’s still  good content to read. Using your feed aggregator you can decide when you  read it, if you ignore it or if you read it at all. Feed aggregators  are quite good nowadays and their flexibility is useful here.&lt;br /&gt;
&lt;br /&gt;
You can now create your own channel of daily content: the full bash reference, the mayor Linux man pages or the list of all decision biases from Wikipedia. Everything you like. I’m often looking for daily feeds but there are not that many of them out there, because it’s a lot of work to do and depends on a certain level of expertise. Daily Feed Recycler is not a competitor for the existing curated daily feeds. A curated feed can have a much better quality through an expert selection and logical ordering of content. &lt;br /&gt;
&lt;br /&gt;
Daily Feed Recycler follows the “Release early, Release often” philosophy. There are bugs, missing features and rough edges. I hope you find it useful nonetheless.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-6500357206813272687?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=wNBDt12SuuE:ZrFxbPulESE:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=wNBDt12SuuE:ZrFxbPulESE:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=wNBDt12SuuE:ZrFxbPulESE:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=wNBDt12SuuE:ZrFxbPulESE:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=wNBDt12SuuE:ZrFxbPulESE:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=wNBDt12SuuE:ZrFxbPulESE:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=wNBDt12SuuE:ZrFxbPulESE:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=wNBDt12SuuE:ZrFxbPulESE:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=wNBDt12SuuE:ZrFxbPulESE:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/wNBDt12SuuE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/6500357206813272687/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2011/12/why-i-created-daily-feed-recycler.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/6500357206813272687?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/6500357206813272687?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/wNBDt12SuuE/why-i-created-daily-feed-recycler.html" title="Why I Created Daily Feed Recycler" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-NEQR2LPdx2o/Tud8WNRYAVI/AAAAAAAAAFU/t2HHynFe5m8/s72-c/recycle_feed.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2011/12/why-i-created-daily-feed-recycler.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DU8NRXo_fSp7ImA9WhdTF00.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-61200605591296853</id><published>2011-07-15T07:10:00.001+02:00</published><updated>2011-07-15T07:31:34.445+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-07-15T07:31:34.445+02:00</app:edited><title>Find cruft with a new Mercurial extension</title><content type="html">After some fun with the &lt;a href="http://theyougen.blogspot.com/2011/05/find-cruft-in-your-source-code.html"&gt;quick and untested shell script&lt;/a&gt; that finds the oldest code in a Subversion repository, it is the next step to write a Mercurial extension. The simple &lt;a href="https://bitbucket.org/blob79/find-cruft-hg/src/c448bfefb6c6/hgext/cruft.py"&gt;Mercurial extension cruft&lt;/a&gt; does basically the same job as the shell scripts for Subversion. Being an extension it’s nicely integrated into Mercurial as the other extensions.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.flickr.com/photos/publius_ovidius/2906772611/" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="313" width="400" src="http://2.bp.blogspot.com/-RqWx7thgM-g/Th_QTrEiUNI/AAAAAAAAAFM/o7P0DPZ1Ofg/s400/attic.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
Python and Mercurial are relatively easy to get into. Mercurial provides the &lt;a href="http://mercurial.selenic.com/wiki/DeveloperInfo"&gt;Developer Info&lt;/a&gt; page which is really good. Additionally, there’s a &lt;a href="http://mercurial.selenic.com/wiki/WritingExtensions"&gt;guide how to write a Mercurial extension&lt;/a&gt;. The guide is good start for the Mecurial development. The rest can be easily picked up by reading the &lt;a href="https://bitbucket.org/blob79/find-cruft-hg/src/tip/mercurial/commands.py"&gt;code of other commands&lt;/a&gt; and extensions.&lt;br /&gt;
The code is readable and there are no big hurdles.&lt;br /&gt;
&lt;br /&gt;
The only thing I missed while writing the extension is type information in method signatures. As much as I like Python it’s ridiculous to write the type information in the pydoc and let the developer figure out the types. This one of the trade-offs you have to live with.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Testing Mercurial extensions&lt;/h3&gt;&lt;br /&gt;
It suffices to understand the integration tests tool Mercurial uses to test the extension itself. There’s some &lt;a href="http://mercurial.selenic.com/wiki/WritingTests"&gt;documentation&lt;/a&gt; for this as well. The basic idea behind &lt;a href="http://pypi.python.org/pypi/cram"&gt;Cram&lt;/a&gt;  is to start a process and check against the expected output.&lt;br /&gt;
&lt;br /&gt;
The integration test tool defines a small language. All lines that have no indentation are comments. Indented lines starting with $ are executed and all other lines are the expected output. For example a test looks like this:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;init

 $ hg init

 $ cat &amp;lt;&amp;lt;EOF &amp;gt;&amp;gt;a
 &amp;gt; c1
 &amp;gt; c2
 &amp;gt; EOF
 $ hg ci -A -m "commit 0"
 adding a

cruft

 $ hg cruft
 0 a c1
 0 a c2
&lt;/pre&gt;&lt;br /&gt;
First a repository is initialized: a file called a with the content (c1,c2) is committed and then the Mercurial is started with the cruft command. Without options the cruft extension prints all lines with newest lines first. The expected output is (0 a c1, 0 a c2) which is means the revision 0 file a and line c1; revision 0 file a and line c2.&lt;br /&gt;
&lt;br /&gt;
It’s fairly easy to get started with this tool. The only downside in &lt;a href="https://bitbucket.org/blob79/find-cruft-hg/src/c448bfefb6c6/tests/test-cruft.t"&gt;my tests&lt;/a&gt; is that the they reuse the same test fixture and do not reset the fixture for each test. They are not executed in isolation, which has a whole range of problems - redundancy and readability for example - but I didn’t feel that it was worth the effort to structure the tests otherwise.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Installing the extension&lt;/h3&gt;&lt;br /&gt;
The easiest way to install the extension is to download the cruft.py to a local folder and add a link to the extension file in the .hgrc file.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;[extensions]
cruft=~/.hgext/cruft.py
&lt;/pre&gt;&lt;br /&gt;
&lt;h3&gt;Using the extension&lt;/h3&gt;&lt;br /&gt;
After the installation you can execute pretty much the same commands as with the shell script version.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;hg help cruft

hg cruft

(no help text available)

options:

-l --limit VALUE   oldest lines taken into account
-c --changes       biggest change sets
-f --files         biggest changes per file
-X --filter VALUE  filter lines that match the regular expression
    --mq            operate on patch repository

use "hg -v help cruft" to show global options
&lt;/pre&gt;&lt;br /&gt;
I use here the &lt;a href="https://bitbucket.org/blob79/quickcheck/"&gt;quickcheck source&lt;/a&gt; code to show some sample output.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;hg cruft -l 5 -X "^(\s*}\s*|\s*/.*|\s*[*].*|\s*|\s*@Override\s*|.*class.*|import.*|package.*)$" quickcheck-core/src/main
&lt;/pre&gt;This finds the oldest 5 lines using the Java source code specific exclusion pattern (parentheses, imports, class definitions etc.) for the quickcheck-core/src/main folder. The output contains the revision number, source file and source code line.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;5 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/TupleGenerator.java public Object[] next() {
5 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/TupleGenerator.java ArrayList&amp;lt;Object&amp;gt; next = new ArrayList&amp;lt;Object&amp;gt;(generators.length);
5 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/TupleGenerator.java for (Generator&amp;lt;?&amp;gt; gen : generators) {
5 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/TupleGenerator.java next.add(gen.next());
5 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/TupleGenerator.java return next.toArray();
&lt;/pre&gt;You can also find the biggest change sets for the last 500 lines.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;hg cruft -X "^(\s*}\s*|\s*/.*|\s*[*].*|\s*|\s*@Override\s*|.*class.*|import.*
|package.*)$" -l 500 -c quickcheck-core/src/main
&lt;/pre&gt;This prints the revisions number, number of changed lines and commit comment of the change set.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;49 41 removed getClassification method from Property interface
moved Classification into quickcheck.property package
177 43 MutationGenerator, CloningMutationGenerator and CloningGenerator added
139 50 fixed generic var arg array problems
5 53 initial check in
&lt;/pre&gt;&lt;br /&gt;
Finally, you can find the files with the most lines changed by a single change set (again with the filter and for the 500 oldest lines).&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;hg cruft -X "^(\s*}\s*|\s*/.*|\s*[*].*|\s*|\s*@Override\s*|.*class.*|import.*
|package.*)$" -l 500 -f quickcheck-core/src/main
&lt;/pre&gt;This prints the revision number, file name, number of changes and change set commit comment.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;176 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/AbstractTreeGenerator.java 27 added tree generator
177 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/CloningGenerator.java 28 MutationGenerator, CloningMutationGenerator and CloningGenerator added
139 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/DefaultFrequencyGenerator.java 36 fixed generic var arg array problems
49 quickcheck-core/src/main/java/net/java/quickcheck/characteristic/Classification.java 41 removed getClassification method from Property interface
moved Classification into quickcheck.property package
&lt;/pre&gt;&lt;br /&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;br /&gt;
Developing a Mercurial extension is relatively easy given Python, the good Mercurial documentation, the good readability of the code and integration test tool. If you’re using Mercurial you should give Mercurial extension development a try. I’ve only recently read into Python again so this is the Python beginner’s version of a Mercurial extension. Help to improve the implementation is always appreciated.&lt;br /&gt;
&lt;br /&gt;
Learning Python and seeing how things are implemented there is fun. Looking at the PEPs and the associated process, they feel much more accessible and open than JSRs. The PEPs are also a track record of the advances the language makes and problems it tries to solve one after the other. There’s stuff in Python that you’ll probably never see in Java like the generator expressions. Everyone who had to replace an internal loop with an iterator will understand that this is not a toy. The language features seem to sum up quite nicely and result in a productive environment. As always, some things are unfamiliar or missing but there’s no perfect platform.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-61200605591296853?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=Nm0FWp6E5qc:XhufELDyMkY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=Nm0FWp6E5qc:XhufELDyMkY:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=Nm0FWp6E5qc:XhufELDyMkY:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=Nm0FWp6E5qc:XhufELDyMkY:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=Nm0FWp6E5qc:XhufELDyMkY:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=Nm0FWp6E5qc:XhufELDyMkY:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=Nm0FWp6E5qc:XhufELDyMkY:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=Nm0FWp6E5qc:XhufELDyMkY:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=Nm0FWp6E5qc:XhufELDyMkY:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/Nm0FWp6E5qc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/61200605591296853/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2011/07/find-cruft-with-new-mercurial-extension.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/61200605591296853?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/61200605591296853?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/Nm0FWp6E5qc/find-cruft-with-new-mercurial-extension.html" title="Find cruft with a new Mercurial extension" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-RqWx7thgM-g/Th_QTrEiUNI/AAAAAAAAAFM/o7P0DPZ1Ofg/s72-c/attic.jpg" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2011/07/find-cruft-with-new-mercurial-extension.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0cARn48fip7ImA9WhZVF0k.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-4786223076798863201</id><published>2011-05-30T09:35:00.002+02:00</published><updated>2011-05-30T09:37:27.076+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-05-30T09:37:27.076+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="subversion" /><category scheme="http://www.blogger.com/atom/ns#" term="vcs" /><category scheme="http://www.blogger.com/atom/ns#" term="shell" /><title>Find cruft in your source code repository</title><content type="html">Micheal Feathers wrote in his blog post &lt;a href="http://michaelfeathers.typepad.com/michael_feathers_blog/2011/05/the-carrying-cost-of-code-taking-lean-seriously.html"&gt;“The Carrying-Cost of Code: Taking Lean Seriously”&lt;/a&gt;&lt;br /&gt;
that is necessary to remove old code from your product to be able to add new features. His argument is that you get a better understanding of your production code this way. Rewriting your code constantly leads to more readable and compact code.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-q0PAdKdJjhM/Td-gjIqC7zI/AAAAAAAAAFA/xxe_Jda62Fc/s1600/attic.jpg" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="299" width="400" src="http://4.bp.blogspot.com/-q0PAdKdJjhM/Td-gjIqC7zI/AAAAAAAAAFA/xxe_Jda62Fc/s400/attic.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;blockquote&gt;"There are many places in the industry where existing mountains of code are a drag on progress.&lt;br /&gt;
[..]&lt;br /&gt;
Younger organizations without as much software infrastructure often have a competitive advantage provided they can ramp up to a base feature set quickly and provide value that more encumbered software-based companies can't.&amp;nbsp; It's a scenario that plays out over and over again, but people don't really talk about it.&lt;br /&gt;
[..]&lt;br /&gt;
I'd like to have code base where every line of code written disappears exactly three months after it is written.&lt;br /&gt;
[..]&lt;br /&gt;
I have the suspicion that a company could actually do better over the long term doing that, and the reason is because the costs of carrying code are real, but no one accounts for them."&lt;/blockquote&gt;&lt;br /&gt;
His reasoning goes so far as to ask product owners to remove features that are not needed. Software size seems to increase strictly monotonic. This makes maintenance harder and more costly. I’m not sure if you have to follow the advice strictly too improve your situation. Before you start arguing with your boss about removing features, it is a good idea to look for low-hanging fruit first: the oldest lines.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Metric&lt;/h3&gt;&lt;br /&gt;
The heuristic comes from the observation that a) software has bugs and that b) if the software is actually used bugs will be found and fixed. Fixing the bugs leads to new code as does changes in coding style, new APIs etc. Old unchanged code is either bug-free, feature-complete and state-off-the-art or something nobody cares about. I’d say the metric is not too bad to find some victims. (A metric like this one should be a tool to find problems not an absolute measurement. Metrics should not be taken too seriously and nobody should be tempted to cheat.)&lt;br /&gt;
&lt;br /&gt;
To put the idea into practice I’ve hacked some scripts to find suspects in a subversion repository. The scripts are:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;find the oldest lines in your repository&lt;/li&gt;
&lt;li&gt;find biggest change sets in your repository considering the oldest lines&lt;/li&gt;
&lt;li&gt;find files that are changed the most by a change set considering the n oldest lines&lt;/li&gt;
&lt;/ul&gt;&lt;br /&gt;
Too get some data I’ll use the legacy subversion repository of the Quickcheck project. Quickcheck moved to Mercurial some time ago. It’s a test to see if something significant can be found with this metric.&lt;br /&gt;
&lt;br /&gt;
The &lt;a href="https://bitbucket.org/blob79/find-cruft/src/tip/README.txt"&gt;readme&lt;/a&gt; contains instructions how you can run the scripts with your subversion repository. The script are based on a local repository mirror to speed up analysis. The analysis can be execute on any subtree of the repository.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Oldest lines&lt;/h3&gt;&lt;br /&gt;
Finding the oldest lines is quite simple first get all file names with svn list and then use svn blame to get the date for every line. These output is sorted by the revision (descending) of each line.&lt;br /&gt;
&lt;br /&gt;
The output of the &lt;a href="https://bitbucket.org/blob79/find-cruft/src/tip/oldest_lines.sh"&gt;oldest_lines.sh&lt;/a&gt; is unfiltered. To extract useful information it has to be filtered. The &lt;a href="https://bitbucket.org/blob79/find-cruft/src/tip/filter.sh"&gt;filter.sh&lt;/a&gt; does this for Java source code: removing empty lines, single closing braces, package declaration, imports and comments.&lt;br /&gt;
&lt;br /&gt;
These are the last lines of the filtered output for Quickcheck:&lt;br /&gt;
&lt;pre class="brush: plain"&gt;$ ./filter.sh | tail -5

6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; blob79&amp;nbsp;&amp;nbsp;&amp;nbsp; public int compare(Pair&amp;lt;Object, Double&amp;gt; o1, Pair&amp;lt;Object, Double&amp;gt; o2) { File: characteristic/Classification.java Line: 162
6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; blob79&amp;nbsp;&amp;nbsp;&amp;nbsp; next.add(gen.next()); File: generator/support/TupleGenerator.java Line: 34
6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; blob79&amp;nbsp;&amp;nbsp;&amp;nbsp; ArrayList&amp;lt;Pair&amp;lt;Object, Double&amp;gt;&amp;gt; toSort) { File: characteristic/Classification.java Line: 150
6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; blob79&amp;nbsp;&amp;nbsp;&amp;nbsp; @SuppressWarnings("unchecked") File: generator/CombinedGenerators.java Line: 126
6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; blob79&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Object[] next = generator.next(); File: generator/CombinedGenerators.java Line: 128
&lt;/pre&gt;&lt;br /&gt;
A potential victim here is the Classification class. It’s rudiment from the original Quickcheck implementation but never was used heavily. It’s a nice idea to do statistical testing but Classification could be removed from Quickcheck without loosing a significant feature.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Biggest change sets&lt;/h3&gt;&lt;br /&gt;
The &lt;a href="https://bitbucket.org/blob79/find-cruft/src/tip/top_change_sets.sh"&gt;second script top_change_sets.sh&lt;/a&gt; finds the biggest change sets considering only the n oldest lines. This results in an interesting output for the code base (oldest 1500 lines, top 5 change sets):&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;$ ./top_change_sets.sh 1500 5

r182 | blob79 | 20071219 19:15:24 +0100 (Wed, 19 Dec 2007) | 1 line
basic failed test instances serialization feature implementation
136 changes

r270 | blob79 | 20090603 18:52:52 +0200 (Wed, 03 Jun 2009) | 1 line
added pojo (a.k.a object) generator for interfaces
104 changes

r6 | blob79 | 20070707 07:29:14 +0200 (Sat, 07 Jul 2007) | 1 line
initial check in
68 changes

r204 | blob79 | 20080323 19:29:28 +0100 (Sun, 23 Mar 2008) | 1 line
added svn keyword id
52 changes

r198 | blob79 | 20080323 18:29:26 +0100 (Sun, 23 Mar 2008) | 3 lines
fixed logging for serializing and deserializing runner
mandate a user set characteristic name (for serialization of test values)
added system property for number of runs
48 changes
&lt;/pre&gt;&lt;br /&gt;
Revision 182,198 were commits related to the obscure test data serialization and deserialization scheme. Something I’ve already removed in the latest release. The two changes resulted in 184 lines still present in the current source.&lt;br /&gt;
&lt;br /&gt;
The revision 270 is not less obscure. It’s a declarative POJO object generator. The revision is so high in the list because it forced a lot of changes. This is not a good sign: obscure feature and lots of changes. That’s something worth to investigate.&lt;br /&gt;
&lt;br /&gt;
Revision 6 is the initial check in. So this should be okay.&lt;br /&gt;
&lt;br /&gt;
The last open issue revision 204 is the attack of the code formatters. They should be used with prudence as long as the down-stream tools can’t handle the changes properly. (Source control system should understand the AST of the source language.)&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;File changes&lt;/h3&gt;&lt;br /&gt;
Now we can take a look at the files with most changes from a single revision. If you execute the &lt;a href="https://bitbucket.org/blob79/find-cruft/src/tip/top_changes_in_file.sh"&gt;query top_changes_in_file.sh&lt;/a&gt; (500 oldest lines, top 5) for the Quickcheck source code you’ll see:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;$ ./top_changes_in_file.sh 500 5
r182 | blob79 | 2007-12-19 19:15:24 +0100 (Wed, 19 Dec 2007) | 1 line
basic failed test instances serialization feature implementation
34 changes | file: RunnerImpl.java

r180 | blob79 | 2007-12-07 18:59:59 +0100 (Fri, 07 Dec 2007) | 3 lines
MutationGenerator, CloningMutationGenerator and CloningGenerator added
26 changes | file: generator/support/CloningGenerator.java

r182 | blob79 | 2007-12-19 19:15:24 +0100 (Wed, 19 Dec 2007) | 1 line
basic failed test instances serialization feature implementation
24 changes | file: SerializingRunnerDecorator.java

r6 | blob79 | 2007-07-07 07:29:14 +0200 (Sat, 07 Jul 2007) | 1 line
initial check in
22 changes | file: characteristic/Classification.java

r179 | blob79 | 2007-10-13 09:14:30 +0200 (Sat, 13 Oct 2007) | 1 line
added tree generator
22 changes | file: generator/support/AbstractTreeGenerator.java
&lt;/pre&gt;&lt;br /&gt;
Besides the usual suspects serialization support and the Classification class two new suspects emerge: mutation generator and a tree generator. In favor of the tree generator and mutation generator implementation, they might be useful but aren’t widely used so this something worth to look at.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;br /&gt;
The metrics found multiple source files that are worth investigating. One feature that is already removed (serialization support), one likely victim (classification) and multiple places that are worth checking (mutation generator, tree generator, declarative POJO generator). The metrics seems to find unloved children in the code that are good candidates for removal or implementation improvements.&lt;br /&gt;
&lt;br /&gt;
I always like to remove code. Fewer lines of code means fewer spots where problems may emerge. Nobody can argue that if you can remove unused code that it’s better to keep the useless code - even if it’s tested and production-quality. That’s something like a reverse &lt;a href="http://en.wikipedia.org/wiki/You_ain%27t_gonna_need_it"&gt;YAGNI&lt;/a&gt;. If you really care the code will never disappear. You can find it in your source code management system. You should be okay with that fact that the old code will lose it’s relevance due to changes to the production system implementation. It can be a inspiration how it could be done if the world hadn’t changed. The burden of these changes are also the reason why it’s better to remove the code in the first place. Dragging it with you without any gain is plain waste.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-4786223076798863201?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=GXF1Pg2WrfE:mNXjzz23UNw:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=GXF1Pg2WrfE:mNXjzz23UNw:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=GXF1Pg2WrfE:mNXjzz23UNw:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=GXF1Pg2WrfE:mNXjzz23UNw:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=GXF1Pg2WrfE:mNXjzz23UNw:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=GXF1Pg2WrfE:mNXjzz23UNw:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=GXF1Pg2WrfE:mNXjzz23UNw:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=GXF1Pg2WrfE:mNXjzz23UNw:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=GXF1Pg2WrfE:mNXjzz23UNw:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/GXF1Pg2WrfE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/4786223076798863201/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2011/05/find-cruft-in-your-source-code.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/4786223076798863201?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/4786223076798863201?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/GXF1Pg2WrfE/find-cruft-in-your-source-code.html" title="Find cruft in your source code repository" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-q0PAdKdJjhM/Td-gjIqC7zI/AAAAAAAAAFA/xxe_Jda62Fc/s72-c/attic.jpg" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2011/05/find-cruft-in-your-source-code.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkEBQ3g4cSp7ImA9WhZXGUs.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-8772832643454009871</id><published>2011-05-09T18:55:00.001+02:00</published><updated>2011-05-09T19:04:12.639+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-05-09T19:04:12.639+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="quickcheck" /><title>Quickcheck 0.6 Release</title><content type="html">The &lt;a href="http://java.net/projects/quickcheck/pages/Home#Releases"&gt;version 0.6&lt;/a&gt; of &lt;a href="http://quickcheck.dev.java.net/"&gt;Quickcheck&lt;/a&gt; is ready to use. The main features are support for deterministic execution of generators and improvements on generators. The JUnit runner support was removed in this release.&lt;br /&gt;
&lt;br /&gt;
You can read an in detail description of &lt;a href="http://theyougen.blogspot.com/2011/03/using-deterministic-generators-with.html"&gt;deterministic execution in this blog post&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
The 0.6 release adds the following generators:&lt;br /&gt;
&amp;nbsp; - map generator maps(Generator&amp;lt;K&amp;gt; keys, Generator&amp;lt;V&amp;gt; values)&lt;br /&gt;
&amp;nbsp; - subset generator sets(Set&amp;lt;T&amp;gt;)&lt;br /&gt;
&amp;nbsp; - submap generator maps(Map&amp;lt;K, V&amp;gt;)&lt;br /&gt;
&amp;nbsp; - unique generator using a Comparator&amp;lt;T&amp;gt; to decide if two values are considered equivalent: uniqueValues(Generator&amp;lt;T&amp;gt;, Comparator&amp;lt;? super T&amp;gt;)&lt;br /&gt;
&amp;nbsp; - excluding generator based on a collection of input and a collection of excluded values: excludeValues(Collection&amp;lt;T&amp;gt; values, Collection&amp;lt;T&amp;gt; excluded&lt;br /&gt;
&amp;nbsp; - content generator type parameter are now co-variant for lists, iterators, sets and object arrays to allow creation of super type container generators (like Generator&amp;lt;List&amp;lt;Object&amp;gt;&amp;gt; = lists(integers()))&lt;br /&gt;
- PrimitiveGenerator added generators:&lt;br /&gt;
&amp;nbsp; - generator for java.lang.Object instances&lt;br /&gt;
&lt;br /&gt;
The dropped Junit Runner support means that the @ForAll annotation is no longer supported. Until lambda expression are supported&amp;nbsp; in Java (hopefully) the &lt;a href="http://theyougen.blogspot.com/2009/06/iterable-adapter-ways-to-define.html"&gt;Iterable adapter&lt;/a&gt; is a good workaround that allows to execute tests without too much boiler-plate. If you need all features the inner class will work just fine. Inner classes will become much better with the SAM-type conversion in Java 8 which is part of the language changes in Project Lambda.&lt;br /&gt;
&lt;br /&gt;
The general development direction and the main theme I’ve been working on besides the release was the support for generator expressions. This is a good way to implement tests for equals methods where a equals method should return false when one of the significant attributes of an object is not equal. You have to write a lot of boiler-plate to test a simple statement like: “This is not equal if one of the attributes is not equal” now. With generator expression this should become much easier.&lt;br /&gt;
&lt;br /&gt;
It’s quite tricky to create a nice API for expressions. One cul-de-sac was to implement it as a builder with a fluent interface. The method chaining is not an adequate. The chaining forces you to linerialize the definition of the expression. This does not fit well into the world of generators where delegation and nesting are natural concepts. I burned some time before I got that the underlying problem cannot be fixed with a clever API.&amp;nbsp; I hope the current approach terminates and the expression support is something you can work with in the 0.7 release.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-8772832643454009871?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=gMsXYoNOiOk:kFw-jgCYZoM:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=gMsXYoNOiOk:kFw-jgCYZoM:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=gMsXYoNOiOk:kFw-jgCYZoM:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=gMsXYoNOiOk:kFw-jgCYZoM:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=gMsXYoNOiOk:kFw-jgCYZoM:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=gMsXYoNOiOk:kFw-jgCYZoM:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=gMsXYoNOiOk:kFw-jgCYZoM:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=gMsXYoNOiOk:kFw-jgCYZoM:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=gMsXYoNOiOk:kFw-jgCYZoM:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/gMsXYoNOiOk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/8772832643454009871/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2011/05/quickcheck-06-release.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/8772832643454009871?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/8772832643454009871?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/gMsXYoNOiOk/quickcheck-06-release.html" title="Quickcheck 0.6 Release" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2011/05/quickcheck-06-release.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUGQHc_cSp7ImA9WhZTGE8.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-121411166114261806</id><published>2011-03-22T20:48:00.001+01:00</published><updated>2011-03-22T21:03:41.949+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-03-22T21:03:41.949+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="test" /><category scheme="http://www.blogger.com/atom/ns#" term="quickcheck" /><title>Using deterministic generators with Quickcheck</title><content type="html">The 0.6 release of &lt;a href="http://java.net/projects/quickcheck"&gt;Quickcheck&lt;/a&gt; supports deterministic generators. The goal is to be able to make the generation of values reproducible. This is useful when you are working with a bug from your favourite continuous integration server or when you would like to run a piece of code in the debugger repeatedly with the same results.&lt;br /&gt;
&lt;br /&gt;
A non-goal of the support is to remove the random nature Quickcheck. Values are still random to allow a good coverage but reproducibility is supported when needed. This way you have the best of both worlds.&lt;br /&gt;
&lt;br /&gt;
Quickcheck uses internally the linear congruential random number generator (RNG) implemented in the Java’s Random class. The interesting property of the RNG in the context of reproducible values is stated in the javadoc.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers.&lt;/blockquote&gt;&lt;br /&gt;
You can configure the seed used by Quickcheck with the &lt;a href="http://quickcheck.java.net/javadoc/net/java/quickcheck/generator/distribution/RandomConfiguration.html"&gt;RandomConfiguration&lt;/a&gt; class. It’s important to set the seed for every individual test method otherwise a RNG’s return values are dependent on execution order of the test methods. If you run different tests, add a new tests or execute the tests in a different order other values will be generated.&lt;br /&gt;
&lt;br /&gt;
The seed is generated randomly for the normal execution. This is the result of the RandomConfiguration.initSeed method call.  This way Quickcheck still produces random values. Use the setSeed method to set the seed for a test method.&lt;br /&gt;
&lt;br /&gt;
Instead of using the RandomConfiguration directly you should use the &lt;a href="http://quickcheck.java.net/javadoc/net/java/quickcheck/junit/SeedInfo.html"&gt;SeedInfo&lt;/a&gt; JUnit method rule that will run with every test method. Additionally, it adds the seed information, that is needed to reproduce the problem, into the AssertionError thrown.&lt;br /&gt;
&lt;br /&gt;
The SeedInfo can be used like every other JUnit method rule. It’s added as an member of the test class. The example generates values in a way that the assertion always fails.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Rule public SeedInfo seed = new SeedInfo();

@Test public void run(){
  Generator&amp;lt;Integer&amp;gt; unique = uniqueValues(integers());
  assertEquals(unique.next(), unique.next());
}
&lt;/pre&gt;&lt;br /&gt;
An example error message is: &lt;br /&gt;
&lt;blockquote&gt;java.lang.AssertionError: expected:&amp;lt;243172514&amp;gt; but was:&amp;lt;-917691317&amp;gt; (Seed was 3084746326687106280L.) &lt;br /&gt;
&lt;/blockquote&gt;You can also use the SeedInfo instance to set the seed for a test method to reproduce the problem from the AssertError.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;Rule public SeedInfo seed = new SeedInfo();

@Test public void restore(){
  seed.restore(3084746326687106280L);
  Generator&amp;lt;Integer&amp;gt; unique = uniqueValues(integers());
  assertEquals(unique.next(), unique.next());
}
&lt;/pre&gt;&lt;br /&gt;
Instead of setting the seed for individual tests you can also set the initial seed once for the random generator used by the JVM. If you run the test example from above (without the SeedInfo method rule member) and the configuration -Dnet.java.quickcheck.seed=42:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test public void run(){
   Generator&amp;lt;Integer&amp;gt; unique = uniqueValues(integers());
   assertEquals(unique.next(), unique.next());
}
&lt;/pre&gt;&lt;br /&gt;
You should get the result:&lt;br /&gt;
&lt;blockquote&gt;java.lang.AssertionError: expected:&amp;lt;977378563&amp;gt; but was:&amp;lt;786938819&amp;gt;&lt;/blockquote&gt;&lt;br /&gt;
The configuration of seed values replaces the serialization and deserialization support of earlier Quickcheck versions. Setting the seed is a much simpler way to reproduce values over multiple JVM executions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-121411166114261806?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=axYCqNZso5w:qE6YvmloczQ:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=axYCqNZso5w:qE6YvmloczQ:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=axYCqNZso5w:qE6YvmloczQ:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=axYCqNZso5w:qE6YvmloczQ:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=axYCqNZso5w:qE6YvmloczQ:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=axYCqNZso5w:qE6YvmloczQ:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=axYCqNZso5w:qE6YvmloczQ:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=axYCqNZso5w:qE6YvmloczQ:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=axYCqNZso5w:qE6YvmloczQ:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/axYCqNZso5w" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/121411166114261806/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2011/03/using-deterministic-generators-with.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/121411166114261806?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/121411166114261806?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/axYCqNZso5w/using-deterministic-generators-with.html" title="Using deterministic generators with Quickcheck" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2011/03/using-deterministic-generators-with.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0YERnc9fSp7ImA9WhZTF08.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-1867763600271849263</id><published>2011-03-21T18:38:00.000+01:00</published><updated>2011-03-21T18:38:27.965+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-03-21T18:38:27.965+01:00</app:edited><title>Revert - Sometimes going back is the way forward</title><content type="html">Revert is the reverse gear of your version control software. It removes all local changes and brings the local workspace back to clean state of a committed revision. It is an important tool in the revision control software tool box. Once in a while there is no way forward so you have to go backward to make progress.&lt;br /&gt;
&lt;br /&gt;
This may sound unintuitive. We are trying to make a change not  reverse it. Why should the tool that destroys all this hard work be the  best option in some circumstances? Firstly, you do not lose everything.  Even if you revert everything you gain some knowledge. At least that  this exact way does not work. This is a good data point. Secondly and  more obviously, revert let’s you start with a fresh state. More often  than not we are able to reach a working state again. Removing everything is  the fastest way to get to there.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh5.googleusercontent.com/-DbduyyTZDgo/TYTglFcw1KI/AAAAAAAAAE0/6vI6kbePqfw/s1600/shift.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="https://lh5.googleusercontent.com/-DbduyyTZDgo/TYTglFcw1KI/AAAAAAAAAE0/6vI6kbePqfw/s400/shift.jpg" width="300" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
I see mainly two scenarios for the revert command: planned mode and the accidental mode.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Planned mode revert&lt;/h3&gt;&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;Starting with a working state of your committed source code you can do some exploratory work. Find out what you where looking for and revert.&lt;br /&gt;
&lt;br /&gt;
Now you can start the work in a informed way from a working state. The artifacts of the exploration are removed. After reverting you do know that the state you are starting from works. To verify that an workspace state works you do need tools to catch the problems: a decent test coverage and other quality assurance measures.&lt;br /&gt;
&lt;br /&gt;
A corollary is that because you are planning to revert anyway you can change your workspace in every way you need for the exploration.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Accidental mode revert&lt;/h3&gt;&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;The first scenario was a bit too idyllic: you started your work with an exploratory mind set, found the precious information and clean up after yourself. Everything is planned, clean and controlled. This scenario is valid. You can do the exploratory work voluntarily. More often it is the case that you have dug yourself in. You need to find a way out.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Is this a hole or the basement?&lt;/h3&gt;&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;The first issue is to know when your in a hole and there is little chance to get out.&lt;br /&gt;
&lt;br /&gt;
Say you commit roughly every hour. Now you did not commit for four hours. Your change set becomes bigger and bigger. You see no way to get your tests running again. Different tests are broken after multiple trials to fix everything. Your in a hole.&lt;br /&gt;
&lt;br /&gt;
You made a change and it resulted in absolutely unexpected problems. Your tests are broken. You do not know why. There are red lights all over the place. Your in a hole.&lt;br /&gt;
&lt;br /&gt;
You made small, controlled, incremental changes for some time without committing. You did not bother to commit because everything was so simple. Now the changes become bigger you would like to commit but you can’t because you can’t bring the whole system to run again. You are in a hole.&lt;br /&gt;
&lt;br /&gt;
The commonality of the three examples is that your not in control of the process. The world you created determines your next steps. This happens to everyone. It’s normal. It happens all the time. Otherwise our work would be predictable day in and out - how boring. (I would go so fare as to say that in other circumstances it’s a good sign that you can follow the inherent conclusions of your system. This way it’s productive that you are determined by the conclusions of your system because it is consistent.)&lt;br /&gt;
&lt;br /&gt;
If there is such a thing as experience in hole digging it’s to see the problem coming and to stop early. If it happened often enough to you, you should know the signs. You’ll know that knee deep holes are deep enough to stop and that it’s not necessary to disappear completely.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Ways out&lt;/h3&gt;&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;Now after you found out that have a problem all energy should be put in it. Don’t try to be too smart. Solve this one problem. You have two options to get out of the hole: fixing the current state or revert.&lt;br /&gt;
&lt;br /&gt;
Fixing the current state can work. You find enough information to fix the problem. You’ll lose some time but nothing of your work. Once the current state works it’s good a idea to commit now. This creates a save point. If there are more problems lurking down the road you can always come back to this state. The problem is that you might not find the problem. Finding a way out now is hard. Your change set adds to the complexity of the underlying problem. Your changes obfuscate the problem and make it harder to analyze. Everything you do will increase the change set complexity further.&lt;br /&gt;
&lt;br /&gt;
When fixing the current state is too hard, you have to revert your work to keep up the pace. Now you have the problem that you have already sunk so much time and the next step is to roll everything back to the state you started from. This does not feel pleasant. The upside is that even though you reverted the code not everything is lost. You still have more knowledge about the problem. This knowledge can be used on the second and hopefully last attack. Make notes if you need them to remember the information you gathered.&lt;br /&gt;
&lt;br /&gt;
The first attempt was in the wrong direction and/or too big. It is a good idea to make smaller steps with interim commits to create save points you can revert to. This creates a safety net if you bump into the problems again. You can revert repeatedly to chop smaller portions of the problem until it is solved. You decrease the size of the changes until you can understand a problem. Once in a while strange things happen and a single line change has crazy effects. After removing such road blocks you can make bigger steps again.&lt;br /&gt;
&lt;br /&gt;
There is of course a middle way: trying to revert only partially. Without creating and applying patches you have only one direction to go (revert) and you'll swiftly have to revert everything (because your change history is lost). I’ll come back to an approach to use diff and patch to do partial reverts in a controlled way later.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Bringing the costs of reverts down&lt;/h3&gt;&lt;br /&gt;
The problem with reverts is that they are expensive. Work you've already done is removed from the source tree. Not something we are especially proud of.&lt;br /&gt;
&lt;br /&gt;
The problem is only as big as the change set that is flushed down the toilet. You should commit as often as your infrastructure allows: the execution time of tests and the integration costs are the main factor here. (You can move some of the cost into the continuous integration environment you're using.) As always this is a trade-off between the work lost and the overhead created by frequent commits. Committing every hour is probably a good idea. Just do whatever fit’s your needs best.&lt;br /&gt;
&lt;br /&gt;
The other factor is the right attitude to the revert operation. If you have already spent a lot of time on a problem and could not find a fix, it’s likely you won’t find it in this direction and a fresh approach is needed. You can actually save a lot of effort by aborting this failed attempt. This will also bring the total costs of a inevitable later revert down.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;br /&gt;
Failed attempts are not the problem. We have to learn from our failures. They are just too often and valuable to loose. Making failures is okay. Samuel Beckett put it nicely:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;Ever tried. Ever failed. No matter. Try again. Fail again. Fail better.&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-1867763600271849263?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mFXz5hjBGfg:heIDgYPwN1w:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mFXz5hjBGfg:heIDgYPwN1w:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=mFXz5hjBGfg:heIDgYPwN1w:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mFXz5hjBGfg:heIDgYPwN1w:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=mFXz5hjBGfg:heIDgYPwN1w:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mFXz5hjBGfg:heIDgYPwN1w:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mFXz5hjBGfg:heIDgYPwN1w:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=mFXz5hjBGfg:heIDgYPwN1w:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mFXz5hjBGfg:heIDgYPwN1w:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/mFXz5hjBGfg" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/1867763600271849263/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2011/03/revert-sometimes-going-back-is-way.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/1867763600271849263?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/1867763600271849263?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/mFXz5hjBGfg/revert-sometimes-going-back-is-way.html" title="Revert - Sometimes going back is the way forward" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lh5.googleusercontent.com/-DbduyyTZDgo/TYTglFcw1KI/AAAAAAAAAE0/6vI6kbePqfw/s72-c/shift.jpg" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2011/03/revert-sometimes-going-back-is-way.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CE8ARns-eSp7ImA9Wx9UEUg.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-4678114208247051321</id><published>2011-02-07T21:41:00.001+01:00</published><updated>2011-02-08T09:14:07.551+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-02-08T09:14:07.551+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="test" /><title>Members in unit tests considered harmful</title><content type="html">I’ve  been reading &lt;a href="http://martinfowler.com/books.html#dsl"&gt;Domain Specific Languages&lt;/a&gt; by Martin Fowler. It is a good  book to formalize API design and gives you a new perspective to think  about your software. You can buy this book even if you’re not  planning to implements DSLs in the future. A DSL is a differently  marketed API anyway - but that’s another blog.&lt;br /&gt;
&lt;br /&gt;
I  don’t like the style of the tests presented in the book. Martin  captures testing of DSLs at the beginning of the book (3.6. Testing DSLs  page 53). The tests use members heavily. I’ve seen this style already  in Robert C. Martin’s &lt;a href="http://www.pearsonhighered.com/educator/academic/product/1,3110,0132350882,00.html"/&gt;Clean Code&lt;/a&gt;. Both are TDD pundits so I thought  it’s not too widely known that you can write tests with better  readability.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_0LGgFlXE6gQ/TU6bCcs9zbI/AAAAAAAAAEs/6IpAVd2mOXw/s1600/members_only_low.jpg"&gt;&lt;img src="http://3.bp.blogspot.com/_0LGgFlXE6gQ/TU6bCcs9zbI/AAAAAAAAAEs/6IpAVd2mOXw/s400/members_only_low.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
The code that is tested is a controller that uses a state machine to react on events with transitions to the next state.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Use of members in unit tests&lt;/h3&gt;&lt;br /&gt;
The testing style I object to looks like this:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test public void event_causes_transition(){
    fire(trigger_a);
    assertCurrentState(a);
}
&lt;/pre&gt;&lt;br /&gt;
Sorry,  this looks nice but it’s not readable. What is the fixture? What is the  class tested and the methods called? What are the parameters of the  call and the result? What does the assertion check? You can figure it  out, but it’s not easy and you have to navigate a lot in the source  file.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;No members in unit tests&lt;/h3&gt;&lt;br /&gt;
Now contrast that with this member-less style:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test public void event_causes_transition(){
    Event trigger_a = anyEvent();
    State a = anyState();
    Controller controller = controller(stateMachine(transition(trigger_a, a)));

    controller.handle(trigger_a.getId());

    assertEquals(a, controller.getCurrentState());
}
&lt;/pre&gt;&lt;br /&gt;
It’s  longer than the original version, but everything is in one place. You  still setup an initial state but it is completely local. The naming is  kept in synch with the original version so you can correlate  implementations. The naming is local to the test. It can be specific in  member-less style: &amp;quot;a&amp;quot; should be a &amp;quot;state&amp;quot; and &amp;quot;trigger_a&amp;quot; a &amp;quot;trigger&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
As  you can see in the test, it creates an event and a state. The details  of both seem not to be significant for the test. This insignificance is  stated in the test explicitly. Then a controller is setup up with a  state machine that has one transition. After you have done the setup,  you can actually call the method you want to test and check the result. &lt;br /&gt;
&lt;br /&gt;
Every  test method following this style has the sequence: setup, execute,  assert while keeping everything in the scope of the test: no members!&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Code organization and readability&lt;/h3&gt;&lt;br /&gt;
After contrasting the two implementation styles let’s try to explain why the member-less style is more readable.&lt;br /&gt;
&lt;br /&gt;
If  tests make heavy use of members, they look nice at the surface: no  redundancy, nice method names and helper methods to keep the intend  clear. Why is this less readable that the member-less style? The problem  is the missing context and necessary navigation in the test code to get  the context before you can understand what is actually tested.&lt;br /&gt;
&lt;br /&gt;
If you recorded the navigation of a reader you would see something similar to the following:&lt;br /&gt;
&lt;ul&gt;&lt;li &gt;the setup methods&lt;/li&gt;
&lt;ul&gt;&lt;li &gt;memorize setup members&lt;/li&gt;
&lt;/ul&gt;&lt;li &gt;test method: execute statement (fire)&lt;/li&gt;
&lt;li &gt;execute helper implementation&lt;/li&gt;
&lt;ul&gt;&lt;li &gt;recall setup members&lt;/li&gt;
&lt;li &gt;memorize result members&lt;/li&gt;
&lt;/ul&gt;&lt;li &gt;test method: assert statement (assertCurrentState)&lt;/li&gt;
&lt;li &gt;assert statement implementation&lt;/li&gt;
&lt;ul&gt;&lt;li &gt;recall result members&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;&lt;br /&gt;
As  you can see, the reader has to navigate and remember a lot to get the  initial setup of the test and keep track of the transition triggered by  the execution.&lt;br /&gt;
&lt;br /&gt;
In  every place were the reader has to remember some information there’s  the potential that an additional navigation is necessary, because he  could simply not remember enough context. This becomes even harder when  you start to organize the tests into test class hierarchies. The context  to remember is the context of all super classes members, execution  method implementations and assertion method implementations. Naturally,  our ability to store information is limited. Less context information to  remember is better for the readability of a test.&lt;br /&gt;
&lt;br /&gt;
So the nice test from above actually looks like this in reality:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test a(){
    //a lot of navigation and to remember
    a();
    //dito
    b();
    //dito
}
&lt;/pre&gt;&lt;br /&gt;
&lt;h3&gt;Ideal organization of code&lt;/h3&gt;&lt;br /&gt;
This  is not readable. Ideally, something readable is a linear text laid out exactly in the way you need it to work on your current problem. You  build up the context as you go without loosing time navigating. A second  ideal of readability is that every information necessary is in one spot. All your context is visible to you at the same time.&lt;br /&gt;
&lt;br /&gt;
The  two ideals effect each other. Given a problem complex enough, you cannot have a compact linear representation. Either it’s linear or compact.&lt;br /&gt;
&lt;br /&gt;
Additionally, you make trade-offs to remove redundancy from code. If you  are asking yourself whether you reorganize to remove redundancy don’t  forget that the most important feature of a test is readability.  Introduce some redundancy as long as it helps to understand your test. &lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Organize to allow effective navigation&lt;/h3&gt;&lt;br /&gt;
You  cannot organize code in a way it can be read linearly for all purposes, but you can organize it in a way that the navigation capabilities of  IDEs allow effective navigation. Every navigation starts in the test  method and is done by looking up the definition of a helper method. The  full context is kept in local variables, parameters and return values.&lt;br /&gt;
&lt;br /&gt;
You  can start to read a test class from every test method without losing  vital information. No navigation to members is necessary:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;Event trigger = anyEvent();
State state = anyState();
Controller controller = controller(stateMachine(transition(trigger, state)));
&lt;/pre&gt;&lt;br /&gt;
The member-less style makes the setup explicit. Try to name all setup helper methods following the &lt;a href="http://en.wikipedia.org/wiki/Principle_of_least_astonishment"&gt;Principle of Least Astonishment&lt;/a&gt;.  They are nicely named and should state what their individual guarantee  is. Accordingly, a method that creates an valid event without further guarantees is named anyEvent().&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;controller.handle(trigger.getId());
&lt;/pre&gt;&lt;br /&gt;
Now  the actual method to test can be executed. The member-less style shows  exactly what is going on. It presents crystal clear the parameters used  and the result value received. There is no navigation overhead. The  instances from the setup are used to run the actual test. If there is an  execution result it is stored in the local context of the test method. &lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;assertEquals(state, controller.getCurrentState());
&lt;/pre&gt;&lt;br /&gt;
After  the execution you can do the assertion on the resulting state or  result. This again uses only instances from the method scope. Sometimes  it is useful to introduce custom assertions to remove redundancy. In  this style the assertion method works only on the parameters given, e.g.  assertState(state, controller).&lt;br /&gt;
&lt;br /&gt;
If you tracked the navigation for the member-less test style you would get something like this:&lt;ul&gt;&lt;li &gt;test&lt;/li&gt;
&lt;li &gt;setup methods&lt;/li&gt;
&lt;ul&gt;&lt;li &gt;memorize state&lt;/li&gt;
&lt;/ul&gt;&lt;li &gt;test&lt;/li&gt;
&lt;ul&gt;&lt;li &gt;recall setup state&lt;/li&gt;
&lt;li &gt;execute method&lt;/li&gt;
&lt;li &gt;assert method&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;&lt;br /&gt;
&lt;br /&gt;
The  result is not the ideal of a linearly readable test, but the navigation  to the definition and back to the test is relatively swift. You can be  sure that you did not miss vital information as everything is in one  place. Removing uncertainty is the key to a simpler reasoning about the  test at hand.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Avoid test class hierarchies&lt;/h3&gt;&lt;br /&gt;
The  member-less test setup also helps to avoid test class hierarchies that  are are only introduced to allow different fixtures for a sets of tests.  Sometimes this is avoided in member style test by doing some setup  globally in members and in the test method, but this complicates the  reasoning even more. With a member-less test you can have a different  setup for every test method. Normally, you organize the setup methods in  a way that they build on each other to avoid redundancy.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Avoid isolation from the API&lt;/h3&gt;&lt;br /&gt;
If  you put the method execution and assertion into separate methods - like  the member heavy style does - you introduce the danger of hiding API  problems. One argument for TDD is to experience your API from the point  of view of a user. When the actual method calls are in helper methods  you do isolate yourself from the API reality your users are facing. It  should be nice enough to do the setup, execution and assertion directly  with it. Otherwise you have a design problem. The test code should be as  much as possible analogous to the code users of an API have to write.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;br /&gt;
I  hope I provided a new perspective to look at the organization of unit  tests. If you have objections with this approach feel free to add a  comment. If you are interested in this and other aspects of testing  software, you can have a look at &lt;a href="http://uselezz.blogspot.com/2010/10/test-principles-part-ii.html"&gt;Test Principles&lt;/a&gt; from my co-worker Gaetano Gallo. He covered this idea under &amp;quot;Self containment&amp;quot; in his blog.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-4678114208247051321?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AxRJcmz6iRA:9PMc2v0WclM:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AxRJcmz6iRA:9PMc2v0WclM:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=AxRJcmz6iRA:9PMc2v0WclM:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AxRJcmz6iRA:9PMc2v0WclM:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=AxRJcmz6iRA:9PMc2v0WclM:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AxRJcmz6iRA:9PMc2v0WclM:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AxRJcmz6iRA:9PMc2v0WclM:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=AxRJcmz6iRA:9PMc2v0WclM:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AxRJcmz6iRA:9PMc2v0WclM:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/AxRJcmz6iRA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/4678114208247051321/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2011/02/members-in-unit-tests-concidered.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/4678114208247051321?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/4678114208247051321?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/AxRJcmz6iRA/members-in-unit-tests-concidered.html" title="Members in unit tests considered harmful" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_0LGgFlXE6gQ/TU6bCcs9zbI/AAAAAAAAAEs/6IpAVd2mOXw/s72-c/members_only_low.jpg" height="72" width="72" /><thr:total>3</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2011/02/members-in-unit-tests-concidered.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEYEQXw6fyp7ImA9Wx9VGUQ.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-2182244496397112846</id><published>2010-12-07T20:32:00.000+01:00</published><updated>2011-02-06T13:41:40.217+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-02-06T13:41:40.217+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="test" /><category scheme="http://www.blogger.com/atom/ns#" term="quickcheck" /><title>Inverse Function Test Pattern in Quickcheck</title><content type="html">I already mentioned the inverse function test pattern in the &lt;a href="http://theyougen.blogspot.com/2010/11/alternative-test-approach-quickcheck.html"&gt;introduction to specification-based testing&lt;/a&gt; using &lt;a href="https://quickcheck.dev.java.net/"&gt;Quickcheck&lt;/a&gt;. Now I would like to present the pattern in more detail with an example.&lt;br /&gt;
&lt;br /&gt;
The basic idea of testing with inverse functions is simple. We have two functions f and its inverse function f&lt;span style="text-decoration: none; vertical-align: super;"&gt;-1&lt;/span&gt;. If we apply f to an input value and then take the result and apply it to f&lt;span style="text-decoration: none; vertical-align: super;"&gt;-1&lt;/span&gt; this should result in the input value again: f&lt;span style="text-decoration: none; vertical-align: super;"&gt;-1&lt;/span&gt;(f(x)) = x.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_0LGgFlXE6gQ/TPpl205aZkI/AAAAAAAAAEY/XPJL_BLycrU/s1600/200px-Yin_and_Yang.svg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_0LGgFlXE6gQ/TPpl205aZkI/AAAAAAAAAEY/XPJL_BLycrU/s1600/200px-Yin_and_Yang.svg.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
This test pattern is applicable for all kinds of functions for example compression and encryption. In the business application domain create and delete are examples of operations that can be tested this way. Another example are do and undo operations of command objects. In both examples, doing and undoing an action leaves the state of the world unchanged.&lt;br /&gt;
&lt;br /&gt;
There are some constraints on the applicability of the test pattern. The function f has to have a inversion function so it is a bijective function and one of the functions f or f&lt;span style="text-decoration: none; vertical-align: super;"&gt;-1&lt;/span&gt; have to be tested. Otherwise the intermediary result could be invalid. For example, if you test a create and a delete operation together, the inverse function test passes if both operations do nothing.&lt;br /&gt;
&lt;br /&gt;
The example used to illustrate the inverse function testing is a simple square function.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;public double square(double a) {
 double square = a * a;&amp;nbsp;
 return square;
}
&lt;/pre&gt;&lt;br /&gt;
The inverse function test can be implemented with concrete values. We use the Math.sqrt() implementation as the inverse function.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test public void squareConcreteValues() {
  double a = 2;
  assertEquals(square(Math.sqrt(a)), a, precission);
}
&lt;/pre&gt;&lt;br /&gt;
This is okay but defining input values manually is not very productive, not readable and has not sufficient test coverage. You can instead employ some computing power to generate the values using Quickcheck.&lt;br /&gt;
&lt;br /&gt;
Firstly, the square function is not bijective as square(-x) = square(x). This is a fact we did not express in the example with the concrete values. It simply omitted this fact. To fix this the result is compared to the absolute value of x. Secondly, the function will overflow and test output like this:&lt;br /&gt;
java.lang.AssertionError: expected:&amp;lt;1.6482368012418589E307&amp;gt; but was:&amp;lt;Infinity&amp;gt;&lt;br /&gt;
is typical.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test public void squareWithOverflows() {
  for(double a : someDoubles()) {
    assertEquals(abs(a), Math.sqrt(square(a)), a * precission);
  }
}
&lt;/pre&gt;&lt;br /&gt;
Again, this aspect was not expressed in the concrete test. Even if you were not aware of the problem the failing test points to the overflow problem. This is a nice example how Quickcheck can help you to find bugs you did not anticipate. I admit that this is a simple example but give it a try. You'll see that you run into all kinds of problems you did not think of. You have to break your software to know it.&lt;br /&gt;
&lt;br /&gt;
Now we have to fix the overflow problem. Depending on the situation it can be easier to find and test a valid implementation that is more restrictive than theoretically possible but suffices your requirements. This is the trade-off between effort now and potential applicability later.&lt;br /&gt;
&lt;br /&gt;
For this example it is easy to find all legal input arguments. It is the largest double value that can be squared.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test public void squareWithBounds() {
  double max = Math.nextAfter(Math.sqrt(Double.MAX_VALUE), 0.0);
  for (double a : someDoubles(-max, max)) {
    assertEquals(abs(a), Math.sqrt(square(a)), a * precission);
  }
}
&lt;/pre&gt;&lt;br /&gt;
To finish this example let’s write the test for the illegal input arguments as well. All square arguments that cause an overflow  are invalid and should cause an IllegalArgumentException. The invalidArguments double generator defines all invalid values. These are the values greater than the largest valid value max and smaller than the smallest allowed value -max.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test public void squareInvalidArguments() {
  double max = Math.sqrt(Double.MAX_VALUE);
  double firstInvalid = Math.nextUp(max);
  Generator&amp;lt;Double&amp;gt; invalidArguments =
    oneOf(doubles(-Double.MAX_VALUE, -firstInvalid))
    .add(doubles(firstInvalid, Double.MAX_VALUE));
    
  for (double a : someEnsureValues(asList(firstInvalid, -firstInvalid), invalidArguments)) {
    try{
      square(a);
      fail();
    }catch(IllegalArgumentException e){ }
  }
}
&lt;/pre&gt;&lt;br /&gt;
The implemenation passing the tests is:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;public double square(double a) {
  double square = a * a;
  if (Double.isInfinite(square)) { throw new IllegalArgumentException() };
  return square;
}
&lt;/pre&gt;&lt;br /&gt;
Testing with inversion functions can solve the dilemma of writing tests without repeating the production code. If you repeat the code in the test you have an additional check that you wrote the code down correctly. You have to repeat an error in both places to introduce a bug. This is all you can get out of such tests. If you test with an inverse function the test and implementation do not share code. This kind of test has the potential to find conceptual problems in your code. (Actually, the reality is not black and white. You can have a test that implements the same operation with a simpler test implementation. This implementation verifies that the more complex production version works. This is the idea of the analogous function test pattern. I will come to later.)&lt;br /&gt;
&lt;br /&gt;
If the inverse function pattern is applicable it can save you a lot of effort. For example the result of an operation can be very costly to verify like encryption functions. The exact representation may not be of interest for the given domain or change frequently leading to high maintenance costs with concrete test values. If you can define valid input values and have an tested inverse function the test effort is to setup the input value generator. The nice side-effect is that you test that the functions you think are inverse functions really are inverse.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-2182244496397112846?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AZu-IadMlrA:5SY0gVJoVX4:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AZu-IadMlrA:5SY0gVJoVX4:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=AZu-IadMlrA:5SY0gVJoVX4:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AZu-IadMlrA:5SY0gVJoVX4:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=AZu-IadMlrA:5SY0gVJoVX4:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AZu-IadMlrA:5SY0gVJoVX4:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AZu-IadMlrA:5SY0gVJoVX4:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=AZu-IadMlrA:5SY0gVJoVX4:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=AZu-IadMlrA:5SY0gVJoVX4:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/AZu-IadMlrA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/2182244496397112846/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/12/inverse-function-test-pattern-in.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/2182244496397112846?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/2182244496397112846?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/AZu-IadMlrA/inverse-function-test-pattern-in.html" title="Inverse Function Test Pattern in Quickcheck" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_0LGgFlXE6gQ/TPpl205aZkI/AAAAAAAAAEY/XPJL_BLycrU/s72-c/200px-Yin_and_Yang.svg.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/12/inverse-function-test-pattern-in.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkMBQHc9eip7ImA9Wx9bEE4.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-4556393747614665207</id><published>2010-11-10T19:04:00.005+01:00</published><updated>2011-02-18T14:07:31.962+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-02-18T14:07:31.962+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="test" /><category scheme="http://www.blogger.com/atom/ns#" term="quickcheck" /><title>The easiest way to get started with Quickcheck</title><content type="html">You may download the &lt;a href="http://theyougen.blogspot.com/2010/11/alternative-test-approach-quickcheck.html"&gt;Quickcheck&lt;/a&gt; &lt;a href="https://quickcheck.dev.java.net/servlets/ProjectDocumentList"&gt;zip file&lt;/a&gt; from java.net. This zip contains the full Quickcheck distribution. Alternatively, the easiest way to &lt;a href="http://theyougen.blogspot.com/2010/11/alternative-test-approach-quickcheck.html"&gt;get started&lt;/a&gt; is to use the &lt;a href="http://download.java.net/maven/2/"&gt;Java.net maven repository&lt;/a&gt;. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Add the following definition to your pom.xml file:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: xml"&gt;&amp;lt;project&amp;gt;
  &amp;lt;repositories&amp;gt;
    &amp;lt;repository&amp;gt;
      &amp;lt;id&amp;gt;maven2-repository.dev.java.net&amp;lt;/id&amp;gt;
      &amp;lt;name&amp;gt;Java.net Repository for Maven&amp;lt;/name&amp;gt;
      &amp;lt;url&amp;gt;http://download.java.net/maven/2/&amp;lt;/url&amp;gt;
      &amp;lt;layout&amp;gt;default&amp;lt;/layout&amp;gt;
    &amp;lt;/repository&amp;gt;
  &amp;lt;/repositories&amp;gt;
  &amp;lt;dependencies&amp;gt;
    &amp;lt;dependency&amp;gt;
      &amp;lt;groupId&amp;gt;net.java&amp;lt;/groupId&amp;gt;
      &amp;lt;artifactId&amp;gt;quickcheck&amp;lt;/artifactId&amp;gt;
      &amp;lt;version&amp;gt;0.5.1&amp;lt;/version&amp;gt;
      &amp;lt;scope&amp;gt;test&amp;lt;/scope&amp;gt;
    &amp;lt;/dependency&amp;gt;
  &amp;lt;/dependencies&amp;gt;
&amp;lt;/project&amp;gt;
&lt;/pre&gt;&lt;br /&gt;
Have fun.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-4556393747614665207?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=CBLA3n1aUcs:30Eb1bYvFHg:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=CBLA3n1aUcs:30Eb1bYvFHg:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=CBLA3n1aUcs:30Eb1bYvFHg:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=CBLA3n1aUcs:30Eb1bYvFHg:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=CBLA3n1aUcs:30Eb1bYvFHg:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=CBLA3n1aUcs:30Eb1bYvFHg:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=CBLA3n1aUcs:30Eb1bYvFHg:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=CBLA3n1aUcs:30Eb1bYvFHg:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=CBLA3n1aUcs:30Eb1bYvFHg:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/CBLA3n1aUcs" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/4556393747614665207/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/11/easiest-way-to-get-started-with.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/4556393747614665207?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/4556393747614665207?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/CBLA3n1aUcs/easiest-way-to-get-started-with.html" title="The easiest way to get started with Quickcheck" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/11/easiest-way-to-get-started-with.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEYEQXw6cCp7ImA9Wx9VGUQ.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-6240923508054164468</id><published>2010-11-08T18:32:00.002+01:00</published><updated>2011-02-06T13:41:40.218+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-02-06T13:41:40.218+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="test" /><category scheme="http://www.blogger.com/atom/ns#" term="quickcheck" /><title>Alternative test approach: Quickcheck</title><content type="html">With this article I would like to shed some light on the question: Why should I implement tests with the specification-based testing approach proposed by &lt;a href="https://quickcheck.dev.java.net/"&gt;Quickcheck&lt;/a&gt;? I will show that the approach yield tests with better readability, more test coverage, helps to omit mock tests and manually maintained test fixtures.&lt;br /&gt;
&lt;br /&gt;
I'll illustrate the approach with an simple example taken from the Guava library to be able to discuss the pros and cons. The example I picked is the test of the &lt;a href="http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/base/Splitter.html"&gt;com.google.common.base.Splitter&lt;/a&gt; class. I present three implementations of the Splitter tests. The test implementation as it can be found in Guava &lt;a href="http://code.google.com/p/guava-libraries/source/browse/trunk/test/com/google/common/base/SplitterTest.java?r=135"&gt;today&lt;/a&gt; (&lt;a href="https://quickcheck.dev.java.net/source/browse/quickcheck/trunk/quickcheck-examples/src/test/java/splitter/SplitterTest.java?rev=495&amp;amp;view=markup"&gt;translated to TestNG&lt;/a&gt;), a test implementation with TestNG @DataProvider and an implementation with Quickcheck for Java.&lt;br /&gt;
&lt;br /&gt;
The following code are tests as they are today. The tests check the behavior of simple split operations for different input values. These tests have a lot of code but are structurally very similar.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test public void testCharacterSimpleSplit() {
  String simple = "a,b,c";
  Iterable&amp;lt;String&amp;gt; letters = Splitter.on(',').split(simple);
  assertContentsInOrder(letters, "a", "b", "c");
}

@Test public void testCharacterSplitWithDoubleDelimiter() {
  String doubled = "a,,b,c";
  Iterable&amp;lt;String&amp;gt; letters = Splitter.on(',').split(doubled);
  assertContentsInOrder(letters, "a", "", "b", "c");
}

@Test public void testCharacterSplitWithDoubleDelimiterAndSpace() {
  String doubled = "a,, b,c";
  Iterable&amp;lt;String&amp;gt; letters = Splitter.on(',').split(doubled);
  assertContentsInOrder(letters, "a", "", " b", "c");
}

@Test public void testCharacterSplitWithTrailingDelimiter() {
  String trailing = "a,b,c,";
  Iterable&amp;lt;String&amp;gt; letters = Splitter.on(',').split(trailing);
  assertContentsInOrder(letters, "a", "b", "c", "");
}

@Test public void testCharacterSplitWithLeadingDelimiter() {
  String leading = ",a,b,c";
  Iterable&amp;lt;String&amp;gt; letters = Splitter.on(',').split(leading);
  assertContentsInOrder(letters, "", "a", "b", "c");
}

@Test public void testCharacterSplitWithMulitpleLetters() {
  Iterable&amp;lt;String&amp;gt; testCharacteringMotto =
     Splitter.on('-').split("Testing-rocks-Debugging-sucks");
     assertContentsInOrder(
      testCharacteringMotto, "Testing", "rocks", "Debugging", "sucks");
}
 
private void assertContentsInOrder(Iterable&amp;lt;String&amp;gt; actual,
     String... expected) {
  assertEquals(Arrays.asList(expected), Lists.newArrayList(actual));
}
&lt;/pre&gt;&lt;br /&gt;
From the readability perspective the problem with tests specifying concrete values is that the reader has to infer which part of the information is significant and which part is not. For example taking this test in isolation:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test public void testCharacterSimpleSplit() {
  String simple = "a,b,c";
  Iterable&amp;lt;String&amp;gt; letters = Splitter.on(',').split(simple);
  assertContentsInOrder(letters, "a", "b", "c");
}
&lt;/pre&gt;&lt;br /&gt;
One could come to the result that:&lt;br /&gt;
- split works only with one, alphanumeric character (“aa,bb” would not be a legal input string)&lt;br /&gt;
- the separator may not be an alphanumeric character&lt;br /&gt;
- splits works only for successive char values as the input is a - b - c&lt;br /&gt;
- all other characters are undefined&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The reader of the tests has to digest all tests to be able to infer the actual properties of the Splitter.  While reading he has also to remove the insignificant information and has to eliminate the invalid assumptions. (Actually, the tests say nothing about alphanumeric separators. So this assumption is still open.) As we know what split does this is trivial but for unknown functionality this is much harder.&lt;br /&gt;
&lt;br /&gt;
The test can be implemented with the &lt;a href="http://testng.org/doc/documentation-main.html#parameters-dataproviders"&gt;TestNG DataProviders&lt;/a&gt; in a &lt;a href="https://quickcheck.dev.java.net/source/browse/quickcheck/trunk/quickcheck-examples/src/test/java/splitter/DataProviderSplitterTest.java?rev=495&amp;amp;view=markup"&gt;more concise&lt;/a&gt; way:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;static final String SPLIT = "split";

@DataProvider(name = SPLIT)
public Object[][] split() {
  Object[] simpleSplit = {"a,b,c", ",", new String[] { "a", "b", "c" } };
  Object[] splitWithDoubleDelimiter = {
    "a,,b,c", ",", new String[] { "a", "", "b", "c" } };
  Object[] splitWithDoubleDelimiterAndSpace = {
    "a,, b,c", ",", new String[] { "a", "", " b", "c" } };
  Object[] splitWithTrailingDelimiter = {
    "a,b,c,", ",", new String[] { "a", "b", "c", "" } };
  Object[] splitWithLeadingDelimiter = {
    ",a,b,c", ",", new String[] { "", "a", "b", "c" } };
  Object[] splitWithMulitpleLetters = {
    "Testing-rocks-Debugging-sucks", "-", new String[] { "Testing", "rocks", "Debugging", "sucks" } };
  return new Object[][] {
    simpleSplit, splitWithDoubleDelimiter, splitWithDoubleDelimiterAndSpace,
    splitWithTrailingDelimiter, splitWithLeadingDelimiter, splitWithMulitpleLetters };
}

@Test(dataProvider = SPLIT)
public void simpleSplit(String simple, String separator, String[] expected) {
  Iterable&amp;lt;String&amp;gt; letters = Splitter.on(separator).split(simple);
  assertEquals(Arrays.asList(expected), Lists.newArrayList(letters));
}
&lt;/pre&gt;&lt;br /&gt;
This test removes a lot of the boiler-plate in the original version. It tests the split method repeatedly with a set of input values. The more compact representation of the test helps as long as we are keeping test input data and the test in one place. (If you organize your source file in a way that the test data and test is separated the readability decreases significantly.) This test approach still suffers from the readability problems the initial test has: the reader has to derive the property of the Splitter implementation from the input and expected return values. This is like inferring the properties of multiplication from a set of input and return values.&lt;br /&gt;
&lt;br /&gt;
An completely different approach can be implemented using &lt;a href="https://quickcheck.dev.java.net/"&gt;Quickcheck&lt;/a&gt; for Java for &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/package-summary.html"&gt;API&lt;/a&gt;. Quickcheck is build on the basic abstraction of the &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/Generator.html"&gt;Generator&amp;lt;T&amp;gt;&lt;/a&gt;. The only method of the Generator&amp;lt;T&amp;gt; interface is next(). It returns a new value of type T.&lt;br /&gt;
&lt;br /&gt;
The &lt;a href="https://quickcheck.dev.java.net/source/browse/quickcheck/trunk/quickcheck-examples/src/test/java/splitter/QuickcheckSplitterTest.java?rev=495&amp;amp;view=markup"&gt;Splitter test&lt;/a&gt; uses Quickcheck functionality to create values:&lt;br /&gt;
- &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/PrimitiveGenerators.html#strings%28%29"&gt;PrimitiveGenerators.strings()&lt;/a&gt; is a Generator&amp;lt;String&amp;gt; for undefined string values&lt;br /&gt;
- &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/CombinedGeneratorsIterables.html#someNonEmptyLists%28net.java.quickcheck.Generator%29"&gt;CombinedGeneratorIterators.someNonEmptyLists()&lt;/a&gt; is similar to a Generator&amp;lt;List&amp;lt;T&amp;gt;&amp;gt; but it is adapted to Iterable&amp;lt;List&amp;lt;T&amp;gt;&amp;gt; to be able to use it in a for-each loop. The lists returned have at least one element.&lt;br /&gt;
&lt;br /&gt;
Basic idea is to specify a test as follows: If you take any non empty list of words, join them with a distinct separator and then split the joined string the result will be the input list of words again.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;@Test public void simpleSplit() {
  for (List&amp;lt;String&amp;gt; words : someNonEmptyLists(strings())) {
    char separator = anyDistinctCharacter(words);
    String input = Joiner.on(separator).join(words);
    Iterable&amp;lt;String&amp;gt; letters = Splitter.on(separator).split(input);
    assertEquals(words, Lists.newArrayList(letters));
  }
}
&lt;/pre&gt;&lt;br /&gt;
The test uses the anyDistinctCharacter helper method to create separators. Separators have to be distinct from the characters used in words otherwise the splitter splits words.  The helper uses the PrimitiveGenerators.characters() factory method to create a Generator&amp;lt;Character&amp;gt; for arbitrary characters. To create the separator values &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/CombinedGenerators.html#excludeValues%28net.java.quickcheck.Generator,%20java.util.Collection%29"&gt;CombinedGenerator.excludeValues&lt;/a&gt; generator is used. It is build on a input generator but skips all generated values that are equal to the excluded values. (It skips all characters that are in words.)&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;private char anyDistinctCharacter(List&amp;lt;String&amp;gt; words) {
    char[] notAllowedAsSeperator = Joiner.on("").join(words).toCharArray();
    return excludeValues(characters(), asList(notAllowedAsSeperator)).next();
}
&lt;/pre&gt;&lt;br /&gt;
This specification-based test describes the contract and does not enumerate the specific test values. Compared to the two other implementations the way the test is written is more rigid. This is a positive thing, as I already tried to explain that when composing a test of concrete values there is place for misinterpretations. The other effect is that six tests boil down to one test.&lt;br /&gt;
&lt;br /&gt;
This test approach has of course it's trade-offs: it depends on join and is not deterministic.&lt;br /&gt;
- Join has to be tested so that this test is working. I think the join test is much easier than the split test. Once join is tested we can depend on it. By testing join/split together we can check that these two are symmetric (f(f&lt;span style="text-decoration: none; vertical-align: super;"&gt;-1&lt;/span&gt;(x)) = x). Testing symmetry comes for free with this implementation approach.&lt;br /&gt;
- The real concern is that we will run the test with randomly generated test data and it is not guaranteed to contain all of the enumerated test cases in &lt;b&gt;every&lt;/b&gt; test run.&lt;br /&gt;
&lt;br /&gt;
If non-determinism is your concern there are some ways to address this.&lt;br /&gt;
- You can use a different &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/distribution/Distribution.html"&gt;distributions function&lt;/a&gt; in your generators. Using different distributions will make certain values more or less likely to occur.&lt;br /&gt;
- You can make the values used deterministic by using the CombinedGenerators.ensureValues(ensuredValues). This is still better than the tests with concrete values as you only have to specify the input values and not the expected result values.&lt;br /&gt;
- You can combine deterministic and random values. The generator created with &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/CombinedGenerators.html#ensureValues%28java.util.Collection,%20net.java.quickcheck.Generator%29"&gt;CombinedGenerators.ensureValues(ensuredValues, otherValues)&lt;/a&gt; uses the enumerated values first and then the otherValues generator after the explicit values are exhausted.&lt;br /&gt;
The random nature of the generated values has of course an advantage: you may find bugs that you did not anticipate while writing the test. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Primitive type, enumerations and collection generators alone are not very useful. To test a real world application you have to be able to create values of your custom types. In the Splitter example we already used generators provided by the Quickcheck implementation.&lt;br /&gt;
&lt;br /&gt;
Here is a simple example to get you started.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;class Name {
  private final String first;
  private final String last;

  public Name(String first, String last) {
    super();
    this.first = first;
    this.last = last;
  }
  public String getLast() { return last; }
  public String getFirst() { return first; }
}

class NameGenerator implements net.java.quickcheck.Generator&amp;lt;Name&amp;gt;{
  Generator&amp;lt;String&amp;gt; first = PrimitiveGenerators.strings();
  Generator&amp;lt;String&amp;gt; last = PrimitiveGenerators.strings();

  @Override public Name next() {
    return new Name(first.next(), last.next());
  }
}
&lt;/pre&gt;&lt;br /&gt;
The code defines the custom type Name and a Generator&amp;lt;Name&amp;gt; using generator implementations provided by the Quickcheck framework. This is analogous to the definition of types in Java you define your own types based on the primitive and other types provided by the JDK.&lt;br /&gt;
&lt;br /&gt;
The test then checks one property of the Name type. This test fails with the current implementation of the Name but this is easily fixed.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: java"&gt;public class NameTest {
  @Test public void equals(){
    for(Name name : Iterables.toIterable(new NameGenerator())){
      assertEquals(name, new Name(name.getFirst(), name.getLast()));
    }
  }
}
&lt;/pre&gt;&lt;br /&gt;
The test uses an adapter to the Iterable interface to be able to use the generator in a for expression. This is one way to run the test. You can also use an inner class with a more capable runner (&lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/QuickCheck.html#forAll%28net.java.quickcheck.Generator,%20net.java.quickcheck.Characteristic%29"&gt;QuickCheck.forAll(generator, characteristic)&lt;/a&gt;). Using the for expression is the trade-off given that the Java language has no closures yet.&lt;br /&gt;
&lt;br /&gt;
Analogous to your domain object hierarchy you can now build a generator hierarchy that is used to create the domain objects. This clearly adds some overhead as you now have an additional generator for every domain object. But this pays off as you can:&lt;br /&gt;
- use generators in unit tests.&lt;br /&gt;
- use generated values to replace hard coded fixtures in integration tests (for example tests that you dbunit).&lt;br /&gt;
- omit mock tests that are only in place because the object graph to start a test is to hard to setup manually&lt;br /&gt;
- use generated values to write inverse functions (f&lt;span style="text-decoration: none; vertical-align: super;"&gt;-1&lt;/span&gt;(f(x)) = x) and analogous functions (f(x) = g(x) where f or g is simpler) tests (for example zip, encryption, read/write, sorting, searching, etc.).&lt;br /&gt;
&lt;br /&gt;
I hope I could convince you that the approach helps you with writing more expressive tests by leveraging some computing power. You can use specification-based testing approach to test a range of software from the boring business software to more interesting areas like data-structures and algorithms. Maybe you could &lt;a href="http://theyougen.blogspot.com/2010/11/easiest-way-to-get-started-with.html"&gt;start by writing&lt;/a&gt; a test for the join method using Quickcheck?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-6240923508054164468?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=-igOjUm4yPQ:oTw5WJ_jW60:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=-igOjUm4yPQ:oTw5WJ_jW60:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=-igOjUm4yPQ:oTw5WJ_jW60:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=-igOjUm4yPQ:oTw5WJ_jW60:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=-igOjUm4yPQ:oTw5WJ_jW60:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=-igOjUm4yPQ:oTw5WJ_jW60:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=-igOjUm4yPQ:oTw5WJ_jW60:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=-igOjUm4yPQ:oTw5WJ_jW60:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=-igOjUm4yPQ:oTw5WJ_jW60:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/-igOjUm4yPQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/6240923508054164468/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/11/alternative-test-approach-quickcheck.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/6240923508054164468?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/6240923508054164468?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/-igOjUm4yPQ/alternative-test-approach-quickcheck.html" title="Alternative test approach: Quickcheck" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/11/alternative-test-approach-quickcheck.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0YARHc5fCp7ImA9WxFbFUo.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-3770247777880323371</id><published>2010-07-08T09:42:00.000+02:00</published><updated>2010-07-08T09:45:45.924+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-07-08T09:45:45.924+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><title>Java closures get type inference, hooray!</title><content type="html">According to the latest &lt;a href="http://mail.openjdk.java.net/pipermail/lambda-dev/2010-July/001775.html"&gt;State of the Lambda document&lt;/a&gt; closures in Java will get type inference.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;The return type and exception types of a lambda expression are inferred by the compiler; the parameter types may be explicitly specified or they may be inferred from the assignment context.&lt;/blockquote&gt;&lt;br /&gt;
So this expression will be valid without explicit type annotation:&lt;br /&gt;
&lt;pre class="brush: java"&gt;CallbackHandler cb = { c -&amp;gt; System.out.println("pippo") };
&lt;/pre&gt;Finally, they seem to have mercy with the humble Java developer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-3770247777880323371?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=leJgMlzppEs:z5x9kxoSRyQ:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=leJgMlzppEs:z5x9kxoSRyQ:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=leJgMlzppEs:z5x9kxoSRyQ:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=leJgMlzppEs:z5x9kxoSRyQ:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=leJgMlzppEs:z5x9kxoSRyQ:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=leJgMlzppEs:z5x9kxoSRyQ:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=leJgMlzppEs:z5x9kxoSRyQ:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=leJgMlzppEs:z5x9kxoSRyQ:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=leJgMlzppEs:z5x9kxoSRyQ:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/leJgMlzppEs" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/3770247777880323371/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/07/java-closures-get-type-inference-hooray.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/3770247777880323371?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/3770247777880323371?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/leJgMlzppEs/java-closures-get-type-inference-hooray.html" title="Java closures get type inference, hooray!" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>3</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/07/java-closures-get-type-inference-hooray.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0cARHw9cSp7ImA9WxFUGE8.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-5112578277557330512</id><published>2010-06-29T18:29:00.002+02:00</published><updated>2010-06-29T18:30:45.269+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-06-29T18:30:45.269+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><title>Talk to me - have fun with your favorite podcasts</title><content type="html">Driving home the  other day I listened to a German radio show that made fun of chancellor  Angela Merkel. They cut together some of her speeches to let her say the  truth about here coalition with the FDP.&lt;br /&gt;
&lt;br /&gt;
I thought: how hard  can this be to create? It seems a &lt;a href="http://bitbucket.org/blob79/talk-to-me"&gt;simple version&lt;/a&gt; is not to hard to build  using the Java sound API and my favorite podcast.&lt;br /&gt;
&lt;br /&gt;
The idea is  simple: take input audio files and create a index of words contained in  these files. The index entry is the word, start time and end time. When a  new sentence is created it will be stitched together with single words  taken from the index.&lt;br /&gt;
&lt;br /&gt;
The &lt;a href="http://bitbucket.org/blob79/talk-to-me/src/e0aead27e065/src/main/java/org/bitbucket/talktome/FileIndex.java" id="y6-8" title="file index implementation"&gt;file index implementation&lt;/a&gt;  is straight forward. It reads the index information from a CSV file  that is on the classpath. The whole application configuration is done  with Guice so the index file name is injected. (I created only a &lt;a href="http://bitbucket.org/blob79/talk-to-me/src/e0aead27e065/src/main/resources/index" id="i-v_" title="small index"&gt;small index&lt;/a&gt; from the &lt;a href="http://bitbucket.org/blob79/talk-to-me/raw/e0aead27e065/src/main/resources/in.wav" id="l2et" title="Java Posse Podcast"&gt;Java Posse Podcast&lt;/a&gt; using  Audible.)&lt;br /&gt;
&lt;br /&gt;
The &lt;a href="http://java.sun.com/javase/6/docs/api/javax/sound/sampled/AudioInputStream.html" id="ey.p" title="AudioInputStream"&gt;AudioInputStream&lt;/a&gt; is the main  class to interact with the Java Sound API. You read audio data from it.  If you create audio data you do this by creating a AudioInputStream the &lt;a href="http://java.sun.com/javase/6/docs/api/javax/sound/sampled/AudioSystem.html" id="t8h0" title="AudioSystem"&gt;AudioSystem&lt;/a&gt; can read from. The actual  encoding is done by the AudioSystem implementation depending on the  output audio format.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://bitbucket.org/blob79/talk-to-me/src/e0aead27e065/src/main/java/org/bitbucket/talktome/DefaultButcher.java" id="qpti" title="The Butcher"&gt;The Butcher&lt;/a&gt; class is the one  concerned with audio files. It can read and write audio files and create  AudioInputStreams from an input byte array. The other interesting think  the Butcher can is cutting samples from a AudioInputStream. The  AudioInputStream consists of frames that represent the samples of the &lt;a href="http://en.wikipedia.org/wiki/Pulse-code_modulation" id="y.yv" title="PCM signal"&gt;PCM signal&lt;/a&gt;. Frames have a length of multiple bytes. To cut a valid range of frames from the AudioInputStream one has  to take the frame size into account. The start and end time in  milliseconds have to be translated to start byte and end bytes of the  start frame and end frame. (The start and end data is stored as  timestamps to keep them independent from the underlying encoding of the  file used.)&lt;br /&gt;
&lt;br /&gt;
The Butcher implementation is simplified. It only  supports one WAV file AudioFormat and does no stream processing.&lt;br /&gt;
&lt;br /&gt;
The &lt;a href="http://bitbucket.org/blob79/talk-to-me/src/e0aead27e065/src/main/java/org/bitbucket/talktome/Composer.java"&gt;Composer&lt;/a&gt; creates the output file. For a given sentence it takes the  audio data for each word from the input files, concatenates the audio  data and writes the result to disk. The Composer is currently not very  sophisticated and takes the first word from the index it can find.&lt;br /&gt;
&lt;br /&gt;
After building with mvn assembly:assembly the Java application can be  run with&lt;br /&gt;
&lt;pre class="brush: plain"&gt;java -jar  target/talk-to-me-1.0-SNAPSHOT-jar-with-dependencies.jar [output file] [sentence]
&lt;/pre&gt;&lt;br /&gt;
There is still plenty of interesting  material to play around with. The current version can be improved in  different ways:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt; Indexing an audio file is quite cumbersome.  If the start and end timestamp of a word could be detected from the  silence between words indexing would be much easier.&lt;/li&gt;
&lt;li&gt;The  amplitude of words and the length of silence should be normalized.&lt;/li&gt;
&lt;li&gt;Indexing  could be even simpler if some speech recognition on the words could be  performed.&lt;/li&gt;
&lt;li&gt;The output quality could be improved by finding  longest sequences of words in an input audio file that match the target  sentence (&lt;a href="http://en.wikipedia.org/wiki/Longest_common_substring_problem" id="d63h" title="longest common_substring_problem"&gt;longest  common_substring_problem&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Longest_common_subsequence_problem" id="l8-5" title="longest common subsequence problem"&gt;longest common  subsequence problem&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;&lt;/sentence&gt;&lt;/output&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-5112578277557330512?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oZjfyMylV_I:2p9NrXgPj-Y:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oZjfyMylV_I:2p9NrXgPj-Y:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=oZjfyMylV_I:2p9NrXgPj-Y:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oZjfyMylV_I:2p9NrXgPj-Y:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=oZjfyMylV_I:2p9NrXgPj-Y:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oZjfyMylV_I:2p9NrXgPj-Y:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oZjfyMylV_I:2p9NrXgPj-Y:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=oZjfyMylV_I:2p9NrXgPj-Y:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oZjfyMylV_I:2p9NrXgPj-Y:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/oZjfyMylV_I" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/5112578277557330512/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/06/talk-to-me-have-fun-with-your-fun-with.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/5112578277557330512?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/5112578277557330512?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/oZjfyMylV_I/talk-to-me-have-fun-with-your-fun-with.html" title="Talk to me - have fun with your favorite podcasts" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/06/talk-to-me-have-fun-with-your-fun-with.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0YBQHw4fSp7ImA9WxFbFUo.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-7561545492762256429</id><published>2010-02-22T18:33:00.001+01:00</published><updated>2010-07-08T09:45:51.235+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-07-08T09:45:51.235+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scala" /><title>Tuning the Bloom-Filter-based Spelling Corrector</title><content type="html">With the &lt;a href="http://theyougen.blogspot.com/2010/02/producing-candidates-for-spelling.html"&gt;candidate iterators and filter in place&lt;/a&gt; the Bloom  Filter-based spelling corrector can be implemented and the performance measured.&lt;br /&gt;
&lt;br /&gt;
The  base implementation is &lt;a href="http://bitbucket.org/blob79/scalaspellingcorrector/src/9020562ba4ce/src/main/scala/org/bitbucket/speller/Speller.scala" id="f.9w" title="here"&gt;here&lt;/a&gt;. The basic idea is described in the  first blog post about the &lt;a href="http://theyougen.blogspot.com/2010/02/faster-spelling-corrector.html" id="pp4b" title="Bloom Filter-based spelling corrector"&gt;Bloom  Filter-based spelling corrector&lt;/a&gt; so I won't get into detail about the  implementation.&lt;br /&gt;
&lt;br /&gt;
My benchmark is the &lt;a href="http://blog.lostlake.org/index.php?/archives/61-A-spell-checker-in-Scala.html" id="o-y3" title="implementation by David Pollack"&gt;implementation by  David Pollack&lt;/a&gt;. As far as I can tell he thought a bit about  performance so the comparison is not totally unfair.&lt;br /&gt;
&lt;br /&gt;
The numbers  aren't to bad. The Bloom Filter-based implementation is 2.5 times  faster. (The &lt;a href="http://bitbucket.org/blob79/scalaspellingcorrector/src/3e2bf0f3be47/src/test/scala/org/bitbucket/speller/Performance.scala" id="w1dq" title="performance comparison numbers"&gt;performance comparison  numbers&lt;/a&gt; are based on correcting words of edit distance 0 to 3  derived from the dictionary. The JVM is running with &lt;span style="font-family: Courier New;"&gt;&lt;span class="c1"&gt;-Xms256m -Xmx256m -server.&lt;/span&gt;&lt;/span&gt;) My  implementation corrects 115 words per seconds.&lt;br /&gt;
&lt;br /&gt;
bloom 182.04358  (115 Hz)&lt;br /&gt;
benchmark 466.560114 (45 Hz)&lt;br /&gt;
benchmark/bloom  2.562903421257701&lt;br /&gt;
&lt;br /&gt;
But there is still room for improvement.  Executing the application with a memory profiler reveals that a lot of  java.lang.Integer objects are allocated. The filter function Int =&amp;gt;  Boolean is translated by the Scala compiler to the generic trait &lt;a href="http://www.scala-lang.org/docu/files/api/scala/Function1.html" id="oejk" title="Function1[Integer, Boolean]"&gt;Function1[Integer,  Boolean]&lt;/a&gt;. This causes boxing of int values to Integer objects. (The  boxing of Boolean values is not a problem. There are only two values as  long as &lt;a href="http://java.sun.com/javase/6/docs/api/java/lang/Boolean.html#valueOf%28boolean%29" id="cwmi" title="Boolean.valueOf"&gt;Boolean.valueOf&lt;/a&gt; is used.) &lt;a href="http://bitbucket.org/blob79/scalaspellingcorrector/changeset/7aa960dc3f58/" id="c4:8" title="Replacing the filter function"&gt;Replacing the filter  function&lt;/a&gt; with a filter trait based on primitive types solves this  problem.&lt;br /&gt;
&lt;pre class="brush: scala"&gt;trait Filter{
  def contains(x : Int) : Boolean
}
&lt;/pre&gt;The  other source of boxed integers is the partial application of the  StringHasher.hashRange method. When the string parameter is applied to  the hashRange method a new &lt;a href="http://www.scala-lang.org/docu/files/api/scala/Function3.html" id="c5gv" title="Function3[Int,Int,Int,Int]"&gt;Function3[Int,Int,Int,Int]&lt;/a&gt;  is returned. This causes boxing as the previous Function1 object does.  The partial application made the code a bit more readable. We have to  compromise here on readability and replace the partial application with a  normal method that fits well with the JVM.&lt;br /&gt;
&lt;br /&gt;
Implementing these  improvements resulted in worse performance at the beginning. The problem  was that I used s(idx) instead of s.charAt(idx). The s(idx) converts  the String to a RichString and then calls the apply method on it. Both  methods are functionally identical but the first will cause conversions  from String to RichString whereas the charAt method is a direct method  call on the given String. This is a subtle problem the best solution  would be able to disable implicit conversions from String to RichString  defined in Predef for this file.&lt;br /&gt;
&lt;br /&gt;
With both improvements in place &lt;a href="http://bitbucket.org/blob79/scalaspellingcorrector/changeset/2506bbb223a2" id="o9da" title="the implementation"&gt;the implementation&lt;/a&gt; is 3.7  times faster.&lt;br /&gt;
&lt;br /&gt;
bloom 113.69269 (184 Hz)&lt;br /&gt;
benchmark 424.767132  (49Hz)&lt;br /&gt;
benchmark/bloom 3.736098881994964&lt;br /&gt;
&lt;br /&gt;
The next step is a  bit controversial: &lt;a href="http://bitbucket.org/blob79/scalaspellingcorrector/changeset/6c78cdca143e/" id="h5qa" title="basing the candidate production on char Arrays 
replacing Strings"&gt;basing the candidate production on char Arrays  replacing Strings&lt;/a&gt;. This yields good performance improvements the  final number is 4.9 times faster than the competing implementation but  it has clearly a number of drawbacks. The code becomes quite low-level  using System.arrayCopy instead of RichString methods. The char[] arrays  are mutable so a collaborator may manipulate the array content.&lt;br /&gt;
&lt;br /&gt;
bloom  89.082697 (235 Hz)&lt;br /&gt;
benchmark 437.214067 (48 Hz)&lt;br /&gt;
benchmark/bloom  4.907957232143522 &lt;br /&gt;
&lt;br /&gt;
Some further improvements are still possible.  With enough effort the BloomSpeller, Dictionary and CandidateIterators  could be based on a &lt;a href="http://java.sun.com/javase/6/docs/api/java/lang/CharSequence.html" id="ap3_" title="CharSequence"&gt;CharSequence&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Flyweight_pattern" id="buj7" title="flyweight implementation"&gt;flyweight implementation&lt;/a&gt;. Having  this interface in place candidate iterators can return implementations  of CharSequence containing only the base string as common state and the  individual state of the candidate (index to manipulate, candidate hash  value and optionally the alphabet character to use). This would save  memory allocation, garbage collection for the candidate char[] array  instance and redundant hash value computation. (The candidate hash value  is computed a second time when the Speller implementation checks  whether the candidate is in the Dictionary.)&lt;br /&gt;
&lt;br /&gt;
The final step is to  tune the Bloom Filter configuration. In the &lt;a href="http://theyougen.blogspot.com/2010/02/faster-spelling-corrector.html" id="zo.2" title="first blog entry"&gt;first blog entry&lt;/a&gt; of this series I  calculated expected performance values based on the Bloom Filter worst  case (all candidates are contained in the Bloom Filter). The actual  numbers differ from the expected results as most candidate checks  against the Bloom Filter will abort earlier before k hash functions were  executed. Another factor not present in the model is the optimization  to compute only hash values for about &lt;a href="http://theyougen.blogspot.com/2010/02/producing-candidates-for-spelling.html#hashvalue_method"&gt;half of the string length&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
The  performance is surprisingly stable for an n/m ratio of 4 up to 256. For  an n/m up to 4 the production of false candidates dominates the  performance characteristics. For n/m ratio greater than 128 the  execution of hash functions starts to degrade performance. Over all the  configuration of an n/m ratio = 4 is good enough. Only small  improvements can be attained with changing this number.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_0LGgFlXE6gQ/S3ayA4DgX6I/AAAAAAAAACQ/D_YG2IFFDgY/s1600-h/perf_spell.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_0LGgFlXE6gQ/S3ayA4DgX6I/AAAAAAAAACQ/D_YG2IFFDgY/s320/perf_spell.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;table border="0" cellspacing="0" class="zeroBorder" cols="2" frame="VOID" rules="NONE"&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td align="left" height="17" style="border: 1px solid rgb(0, 0, 0);" width="86"&gt;&lt;b&gt;n/m ratio&lt;/b&gt;&lt;/td&gt;&lt;td align="left" style="border: 1px solid rgb(0, 0, 0);" width="131"&gt;&lt;b&gt;performance  ratio&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;1&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;1,85&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;2&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;3,07&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;4&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;4,05&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;8&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;5,07&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;16&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;4,75&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;32&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;4,83&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;64&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;4,93&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;128&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;4,69&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;256&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;3,89&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;512&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;1,78&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align="right" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;1024&lt;/td&gt;&lt;td align="right" style="border: 1px solid rgb(0, 0, 0);"&gt;0,53&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;&lt;br /&gt;
Scala is a flexible  language. You can write high-level and low-level code as needed. The  basic approach to start a high-level implementation and optimizing the  hot spots worked out quite well for this example. The byte code emitted  by the Scala compiler is good enough in most cases. Some abstractions of  the language do not fit to well with the Java platform. This is not a  problem for regular code and can be fixed with low-level implementations  for hot spots as we've seen. The new @specialized annotation may help  here in the future to lower the amount of low-level code.&lt;br /&gt;
&lt;br /&gt;
There  are still some areas worth to investigate. How to distribute the  candidate production and filtering over multiple cores? Using a &lt;a href="http://gee.cs.oswego.edu/dl/papers/fj.pdf" id="eu0l" title="join-fork  framework"&gt;join-fork framework&lt;/a&gt; or &lt;a href="http://www.scala-lang.org/docu/files/api/scala/actors/Futures$object.html#future%28%3D%3ET%29" id="er30" title="Scala's Future construct"&gt;Scala's Future construct&lt;/a&gt;  seem to fit quite well here. Another idea is to compare the Bloom  Filter-based spell checker implementation against a &lt;a href="http://en.wikipedia.org/wiki/BK-tree" id="gduk" title="BK-Tree"&gt;BK-Tree&lt;/a&gt;-based  implementation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-7561545492762256429?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=G5EMdMb1eC4:h1v0KsdGK3I:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=G5EMdMb1eC4:h1v0KsdGK3I:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=G5EMdMb1eC4:h1v0KsdGK3I:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=G5EMdMb1eC4:h1v0KsdGK3I:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=G5EMdMb1eC4:h1v0KsdGK3I:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=G5EMdMb1eC4:h1v0KsdGK3I:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=G5EMdMb1eC4:h1v0KsdGK3I:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=G5EMdMb1eC4:h1v0KsdGK3I:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=G5EMdMb1eC4:h1v0KsdGK3I:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/G5EMdMb1eC4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/7561545492762256429/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/02/tuning-bloom-filter-based-spelling.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/7561545492762256429?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/7561545492762256429?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/G5EMdMb1eC4/tuning-bloom-filter-based-spelling.html" title="Tuning the Bloom-Filter-based Spelling Corrector" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_0LGgFlXE6gQ/S3ayA4DgX6I/AAAAAAAAACQ/D_YG2IFFDgY/s72-c/perf_spell.jpg" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/02/tuning-bloom-filter-based-spelling.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkYARHY6fyp7ImA9WxBVGEs.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-1201542276734458028</id><published>2010-02-16T20:11:00.002+01:00</published><updated>2010-02-22T18:35:45.817+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-22T18:35:45.817+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scala" /><title>Producing candidates for a spelling corrector</title><content type="html">The next step to implement the &lt;a href="http://theyougen.blogspot.com/2010/02/faster-spelling-corrector.html" id="s3q9" title="Bloom Filter-based approach"&gt;Bloom Filter-based spelling corrector&lt;/a&gt; is to create and  check hash values for candidates against the filter without actually  creating candidate instances. If the filter step succeeds the candidate  String instances are created in the next step.&lt;br /&gt;
&lt;br /&gt;
All &lt;a href="http://bitbucket.org/blob79/scalaspellingcorrector/src/662ebe8e8896/src/main/scala/org/bitbucket/speller/CandidateIterator.scala" id="f2cc" title="candidate 
iterators"&gt;candidate iterators&lt;/a&gt; (for insertion, deletion,  transposition and  replacement) are based on the &lt;span class="misspell" suggestions="Candidate Iterator,Candidate-Iterator"&gt;CandidateIterator&lt;/span&gt;  trait. The &lt;span class="misspell" suggestions="Candidate 
Iterator,Candidate-Iterator"&gt;CandidateIterator&lt;/span&gt; can return a hash  value for a candidate with the &lt;span class="misspell" suggestions="hash 
Value,hash-Value,Nashville,Heshvan,harshly"&gt;hashValue&lt;/span&gt; method and a  candidate string with the candidate method. The next method scrolls to  the next value.&lt;br /&gt;
&lt;pre class="brush: scala"&gt;trait CandidateIterator{
  def next : Boolean
  def candidate : String
  def hashValue : Int
}&lt;/pre&gt;An  instructive implementation is the replacement &lt;span class="misspell" suggestions="Candidate Iterator,Candidate-Iterator"&gt;CandidateIterator&lt;/span&gt;.  It replaces each character of the base string with a character from the  alphabet to create a candidate. &lt;br /&gt;
&lt;br /&gt;
The candidate method  does exactly this for candidate strings. It takes the base string and  replaces the character at the &lt;span class="misspell" suggestions="ID,id,ix,Ida,IDs"&gt;idx&lt;/span&gt; position with the character  from the alphabet at the &lt;span class="misspell" suggestions=""&gt;alphabetIdx&lt;/span&gt;  position (the alphabet is defined as val alphabet = 'a' to 'z' toArray).&lt;br /&gt;
&lt;pre class="brush: scala"&gt;def candidate = (base take idx) + Dictionary.alphabet(alphabetIdx) + (base drop idx + 1)
&lt;/pre&gt;&lt;a href="" name="hashvalue_method"&gt;&lt;/a&gt;The  hashValue method is  not as easily readable as the candidate method. The implementation is  based on the insight that the hash function for &lt;a href="http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode%28%29" id="yprh" title="Strings is the
 defined"&gt;Strings is the defined&lt;/a&gt; in a way that we can reused a base  hash value for the a &lt;span class="misspell" suggestions="sub 
string,sub-string,subsiding,sistering,lobstering"&gt;substring&lt;/span&gt; base.&lt;span class="misspell" suggestions="sub 
string,sub-string,subsiding,sistering,lobstering"&gt;substring&lt;/span&gt;(0, &lt;span class="misspell" suggestions="ID,id,ix,Ida,IDs"&gt;idx&lt;/span&gt;). With that  in place we can save about half of the hash computations. This has huge  performance benefits over an implementation that works directly with  Strings where hash values have to be computed from the scratch for every  candidate. Analogous to the String copy costs the hashing costs  increase nearly linearly with the String length. The performance  benefits will be especially relevant for long Strings that will then  produce a lot of candidate strings.&lt;br /&gt;
&lt;br /&gt;
The  next method takes care of the idx, alphabetIdx to point to valid  positions. The &lt;span class="misspell" suggestions="base Hash,base-Hash,beseech,Bosch,Busch"&gt;baseHash&lt;/span&gt;  value is calculated iteratively in the next method from the &lt;span class="misspell" suggestions="base Hash,base-Hash,beseech,Bosch,Busch"&gt;baseHash&lt;/span&gt;  of the last round if the &lt;span class="misspell" suggestions="ID,id,ix,Ida,IDs"&gt;idx&lt;/span&gt; value changes.&lt;br /&gt;
&lt;br /&gt;
The &lt;span class="misspell" suggestions="hash 
Value,hash-Value,Nashville,Heshvan,harshly"&gt;hashValue&lt;/span&gt;  implementation is now based on the &lt;span class="misspell" suggestions="base Hash,base-Hash,beseech,Bosch,Busch"&gt;baseHash&lt;/span&gt;  value. The &lt;a href="http://theyougen.blogspot.com/2010/02/implementing-hash-functions-in.html"&gt;hash is computed&lt;/a&gt; for the character taken from the alphabet  and the remaining string base.&lt;span class="misspell" suggestions="sub 
string,sub-string,subsiding,sistering,lobstering"&gt;substring&lt;/span&gt;(&lt;span class="misspell" suggestions="ID,id,ix,Ida,IDs"&gt;idx&lt;/span&gt; + 1,  base.length).&lt;br /&gt;
&lt;pre class="brush: scala"&gt;class Replacement(base : String) extends BaseCandidateIterator(base){
  private var idx = 0
  private var alphabetIdx = -1
  private var baseHash = 0
  def next = {
    alphabetIdx += 1
    if(alphabetIdx == Dictionary.alphabet.length){
      idx += 1
      alphabetIdx = 0
      baseHash = hashAt(baseHash, idx - 1)
    }
    idx &amp;lt; base.length
  }
  def candidate = (base take idx) + Dictionary.alphabet(alphabetIdx) + (base drop idx + 1)
  def hashValue = {
    val pivot = hashFor(baseHash, Dictionary.alphabet(alphabetIdx))
    val right = hashRange(pivot, idx + 1, base.length)
    right
  }
}&lt;/pre&gt;The &lt;span class="misspell" suggestions="Candidate Iterator,Candidate-Iterator"&gt;CandidateIterator&lt;/span&gt;  let's us iterate over candidate hash values without creating objects.  Externally to the &lt;span class="misspell" suggestions="Candidate 
Iterators,Candidate-Iterators"&gt;CandidateIterators&lt;/span&gt; the filtering  takes places calling the candidate to produce an instance method only  when the hash value was in the filter. This work is done by the &lt;a href="http://bitbucket.org/blob79/scalaspellingcorrector/src/6c78cdca143e/src/main/scala/org/bitbucket/speller/CandidateIterator.scala#cl-106" id="fts6" title="CandidateFilter"&gt;&lt;span class="misspell" suggestions="Candidate 
Filter,Candidate-Filter,Quantitatively"&gt;CandidateFilter&lt;/span&gt;&lt;/a&gt;.&lt;br /&gt;
&lt;pre class="brush: scala"&gt;class CandidateFilter(base : CandidateIterator, filter : Int =&amp;gt; Boolean) extends Iterator[String]{
  private var more : Boolean = false
  def hasNext : Boolean = {
    def hasMore : Boolean = {
      while(base.next) if(filter(base.hashValue)) return true
      false
    }
    more ||= hasMore
    more
  }

  def next : String = {
    more = false
    base.candidate
  }
}
&lt;/pre&gt;The class takes a &lt;span class="misspell" suggestions="Candidate Iterator,Candidate-Iterator"&gt;CandidateIterator&lt;/span&gt;  and a filter. It implements the Iterator trait. Values returned by the &lt;span class="misspell" suggestions="Candidate 
Filter,Candidate-Filter,Quantitatively"&gt;CandidateFilter&lt;/span&gt; are  candidate strings that passed the Bloom Filter test.&lt;br /&gt;
&lt;br /&gt;
Users of the  &lt;span class="misspell" suggestions="Candidate 
Filter,Candidate-Filter,Quantitatively"&gt;CandidateFilter&lt;/span&gt; and &lt;span class="misspell" suggestions="Candidate Iterator,Candidate-Iterator"&gt;CandidateIterator&lt;/span&gt;  can now use the Iterator convenience methods and for expressions to  work with candidate strings as usual.&lt;br /&gt;
&lt;br /&gt;
With the &lt;a href="http://bitbucket.org/blob79/scalaspellingcorrector/src/662ebe8e8896/src/main/scala/org/bitbucket/speller/CandidateIterator.scala"&gt;&lt;span class="misspell" suggestions="Candidate Iterator,Candidate-Iterator"&gt;CandidateIterator&lt;/span&gt;  implementation and the &lt;span class="misspell" suggestions="Candidate 
Filter,Candidate-Filter,Quantitatively"&gt;CandidateFilter&lt;/span&gt;&lt;/a&gt; in place  the spelling corrector &lt;a href="http://theyougen.blogspot.com/2010/02/tuning-bloom-filter-based-spelling.html"&gt;can be implementation in the final step&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-1201542276734458028?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=bzMybiJNAmA:S6JkBoPViA4:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=bzMybiJNAmA:S6JkBoPViA4:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=bzMybiJNAmA:S6JkBoPViA4:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=bzMybiJNAmA:S6JkBoPViA4:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=bzMybiJNAmA:S6JkBoPViA4:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=bzMybiJNAmA:S6JkBoPViA4:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=bzMybiJNAmA:S6JkBoPViA4:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=bzMybiJNAmA:S6JkBoPViA4:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=bzMybiJNAmA:S6JkBoPViA4:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/bzMybiJNAmA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/1201542276734458028/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/02/producing-candidates-for-spelling.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/1201542276734458028?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/1201542276734458028?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/bzMybiJNAmA/producing-candidates-for-spelling.html" title="Producing candidates for a spelling corrector" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/02/producing-candidates-for-spelling.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkQHQ3cyeCp7ImA9WxBVFEU.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-795249182171751640</id><published>2010-02-08T18:05:00.001+01:00</published><updated>2010-02-18T09:05:32.990+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-18T09:05:32.990+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scala" /><title>Implementing hash functions in functional style</title><content type="html">The first ability needed to enable filtering with the &lt;a href="http://theyougen.blogspot.com/2010/02/faster-spelling-corrector.html" id="g2bx" title="Bloom Filter-based spell checker"&gt;Bloom Filter-based spell checker&lt;/a&gt; is to produce hash values for candidates.&lt;br /&gt;
&lt;br /&gt;
The basic functionality is to hash a character given a base hash value from a preceding invocation. The hashFor function implements this.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: scala"&gt;def hashFor(h: Int, c: Char) : Int = h * 31 + c
&lt;/pre&gt;&lt;br /&gt;
Two further functions help with building candidate hashes: hashAt and hashRange.&lt;br /&gt;
&lt;br /&gt;
HashAt builds the hash for the a character of base string at a index position given a base hash value. &lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: scala"&gt;def hashAt(base : String)(h: Int, i: Int) : Int= hashFor(h, base.charAt(i))&lt;/pre&gt;&lt;br /&gt;
The hashAt method is defined in a way to be able to use partial application of parameters. (It's not using clean currying. The Scala language allows to define functions with mixed arity: unary functions and functions with &lt;a href="http://en.wikipedia.org/wiki/Arity" id="te1t" title="arity"&gt;arity&lt;/a&gt; greater than 1.) &lt;br /&gt;
&lt;br /&gt;
The motivation to define the function in this way is to be able to use it with fewer parameters than the method definition &lt;code&gt;def hashAt(base : String, h: Int, i: Int)&lt;/code&gt; when it is used repeatedly with the same base string parameter.&lt;br /&gt;
&lt;br /&gt;
To get the function signature you can access the function object in the Scala console:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;scala&amp;gt; hashAt _
(String) =&amp;gt; (Int, Int) =&amp;gt; Int&lt;/pre&gt;&lt;br /&gt;
Alternatively the signature can be written as (String) =&amp;gt; ((Int, Int) =&amp;gt; Int) for better readability. The function takes one parameter and will return a function of type&amp;nbsp; (Int, Int) =&amp;gt; Int. &lt;br /&gt;
&lt;br /&gt;
To make any sense of this let's apply the parameters one by one:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: scala"&gt;scala&amp;gt; val f = hashAt _
f: (String) =&amp;gt; (Int, Int) =&amp;gt; Int = &amp;lt;function&amp;gt;

scala&amp;gt; val g = f("s")
g: (Int, Int) =&amp;gt; Int =  &amp;lt;function&amp;gt;

scala&amp;gt; val h = g(0,0)
h: Int = 115
&lt;/pre&gt;&lt;br /&gt;
The parameters are applied left to right. The intermediary function definitions has each parameters bound until there are no parameter left to bind and the function is evaluated. The sequence is equivalent to calling hashAt("s")(0,0).&lt;br /&gt;
&lt;br /&gt;
HashRange hashes characters of a string from a start index to the end index (exclusively) given a base hash value.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: scala"&gt;def hashRange(base : String)(h: Int, i: Int, end : Int) : Int =
&amp;nbsp;&amp;nbsp;&amp;nbsp; if(i &amp;lt; end) hashRange(base)(hashAt(base)(h, i), i + 1, end) else h&lt;/pre&gt;&lt;br /&gt;
Using the hashRange method you can produce the same hashValues as the java.lang.String implementation.&lt;br /&gt;
&lt;pre class="brush: java"&gt;public int hashCode() {
&amp;nbsp;&amp;nbsp; &amp;nbsp;int h = hash;
&amp;nbsp;&amp;nbsp; &amp;nbsp;if (h == 0) {
&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int off = offset;
&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; char val[] = value;
&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int len = count;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int i = 0; i &amp;lt; len; i++) {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h = 31*h + val[off++];
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hash = h;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}
&amp;nbsp;&amp;nbsp;&amp;nbsp; return h;
}
&lt;/pre&gt;The equivalent definition based on the hashRange function is:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: scala"&gt;def hash(s : String) = StringHasher.hashRange(s)(0, 0, s.length)&lt;/pre&gt;&lt;br /&gt;
The function is defined as a tail recursive method. A tail recursive function is a function that does only return values from recursive calls and won't manipulate these return values. With a tail recursive definition the Scala compiler is able to translate the recursive call into a loop. Defining a &lt;a href="http://theyougen.blogspot.com/2010/01/why-overrideable-tail-recursive-methods.html"&gt;tail recursive function is a bit tricky&lt;/a&gt; and you may inadvertently change a tail recursive version into a normal recursive one and even an optimized tail recursive function into a not optimized one. To omit the pitfalls of tail recursive functions Scala 2.8 introduces a @tailrec annotation. This produces a compiler failure if a function cannot be optimized. &lt;br /&gt;
&lt;br /&gt;
With the hash support methods hashAt, hashFor, hashRange in place we can implement hash values for candidates. I'll implement the insertion candidate production to demonstrate the application of the methods. The insert adds a characters from an alphabet at a given index.&lt;br /&gt;
&lt;br /&gt;
It is always good to write some tests (besides the fact that it tests the functionality) to be able to have a look at the API from the user's perspective. Testing the individual methods is not enough to get a feel of the API. Users will always use a combination of the three methods. One concern is that the hash code is calculated incrementally and the user has to define the hash functions in a incremental manner as well. Using the output of preceding execution as an input parameter for succeeding method calls which may be error prone.&lt;br /&gt;
&lt;br /&gt;
The law for the insert candidate hash can be defined with the equivalent hash function of java.lang.String implementation. It can be expressed in specification-based test:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: scala"&gt;@Test def userSample {
  for{s &amp;lt;- strings
    index &amp;lt;- integers(0, s.length - 1)
    c &amp;lt;- basicLatinCharacters}{
    val insertCandidate = (s take index) + c + (s drop index)
    assertEquals(insertCandidate.hashCode, insertCandidateHash(s, index,c))
  }
}
&lt;/pre&gt;&amp;nbsp;Three methods of the specification-based test framework Quickcheck are used in the test:&lt;br /&gt;
- &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/PrimitiveGenerators.html#strings%28%29" id="rgd7" title="strings"&gt;strings&lt;/a&gt; returns an arbitrary string value&lt;br /&gt;
- &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/PrimitiveGenerators.html#integers%28int,%20int%29" id="g0jr" title="integer(min,max)"&gt;integer(min,max)&lt;/a&gt; returns&amp;nbsp; an integer value from min to max (inclusive)&lt;br /&gt;
- &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/PrimitiveGenerators.html#basicLatinCharacters%28%29" id="mg1f" title="basicLatinCharacters"&gt;basicLatinCharacters&lt;/a&gt; returns a sample character from the from the Unicode Latin-1 &lt;a href="http://en.wikipedia.org/wiki/Code_page" id="xe6e" title="code page"&gt;code page&lt;/a&gt;.&lt;br /&gt;
All three methods return values through the &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/Generator.html" id="vt6w" title="Generator"&gt;Generator&lt;/a&gt; interface that is converted with a implicit type conversion to an Scala &lt;a href="http://www.scala-lang.org/docu/files/api/scala/Iterator.html" id="fria" title="Iterator"&gt;Iterator&lt;/a&gt; to be able to use it in a for expression.&lt;br /&gt;
&lt;br /&gt;
With the input string, insertion index and character to insert given the actual law is quite simple. We construct a string that is an equivalent insertion candidate and get its hash value. The hash value returned by the insertCandidateHash method under test has to be produce the same value. (This is a testing pattern in the context of specification-based test: testing based on a analogous functions. Stay tuned for a in depth description of the testing patterns.) This example shows that unpleasant tests with classical scenario-based TDD can be expressed in a compact way with specification-based testing.  The last piece of the puzzle is the implementation of the insertCandidateHash function.  &lt;br /&gt;
&lt;pre class="brush: scala"&gt;def insertCandidateHash(s : String, i : Int, c : Char) = {
  val hashRange = StringHasher.hashRange(s) _
  val left = hashRange(0, 0, i)
  val pivot = StringHasher.hashFor(left, c)
  val right = hashRange(pivot, i, s.length)
  right
}
&lt;/pre&gt;The sample code demonstrates the use partial application of function parameters in the first line. The string parameter is applied to the function returning a function with the signature (Int, Int, Int) =&amp;gt; Int. This function object can than be used for all hash values of the same base string. The hash value is calculated for the left, pivot and right characters of the string. The characters for the left and right hash values are unchanged from the base string. The hash value for the pivot element is one from the alphabet as one would expect for an insertion.&lt;br /&gt;
&lt;br /&gt;
In the &lt;a href="http://theyougen.blogspot.com/2010/02/producing-candidates-for-spelling.html"&gt;next blog I'll implement the candidate iterators&lt;/a&gt; for insertion, replacement, transposition and delete based on the StringHasher.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-795249182171751640?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kknknlaFHw8:ztEXoFh8ryY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kknknlaFHw8:ztEXoFh8ryY:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=kknknlaFHw8:ztEXoFh8ryY:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kknknlaFHw8:ztEXoFh8ryY:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=kknknlaFHw8:ztEXoFh8ryY:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kknknlaFHw8:ztEXoFh8ryY:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kknknlaFHw8:ztEXoFh8ryY:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=kknknlaFHw8:ztEXoFh8ryY:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kknknlaFHw8:ztEXoFh8ryY:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/kknknlaFHw8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/795249182171751640/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/02/implementing-hash-functions-in.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/795249182171751640?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/795249182171751640?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/kknknlaFHw8/implementing-hash-functions-in.html" title="Implementing hash functions in functional style" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/02/implementing-hash-functions-in.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkYNQXk6cCp7ImA9WxBWFk0.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-8034932662638456028</id><published>2010-02-07T13:56:00.001+01:00</published><updated>2010-02-08T06:49:50.718+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-08T06:49:50.718+01:00</app:edited><title>The moment electronic paper died</title><content type="html">Perhaps you did not &lt;a href="http://www.apple.com/ipad/" id="frp9" title="noticed it"&gt;noticed  it&lt;/a&gt; but last week the &lt;a href="http://en.wikipedia.org/wiki/Electronic_paper" id="tcxq" title="electronic paper"&gt;electronic  paper&lt;/a&gt; followed the &lt;a href="http://en.wikipedia.org/wiki/Maglev_%28transport%29" id="gqle" title="Maglev"&gt;&lt;span class="misspell" suggestions="Malva,Calve,Calv,Markov,Melva"&gt;Maglev&lt;/span&gt;&lt;/a&gt;  and became part of the museum of failed technical innovations. As the &lt;span class="misspell" suggestions="Malva,Calve,Calv,Markov,Melva"&gt;Maglev&lt;/span&gt;,  it started with a brilliant theoretical idea that nobody cared about  once it was there.&lt;br /&gt;
&lt;br /&gt;
The &lt;span class="misspell" suggestions="Malva,Calve,Calv,Markov,Melva"&gt;Maglev&lt;/span&gt; promised  faster train travel with a better efficiency. The problem of the &lt;span class="misspell" suggestions="Malva,Calve,Calv,Markov,Melva"&gt;Maglev&lt;/span&gt;  is that is does not fit well into the current infrastructure. New  separate tracks have to be build. In the meantime wheeled electric  trains are &lt;a href="http://en.wikipedia.org/wiki/Land_speed_record_for_railed_vehicles" id="ovkh"&gt;capable to travel at high speeds&lt;/a&gt; using and interfacing  better with the current infrastructure.&lt;br /&gt;
&lt;br /&gt;
Electronic paper has the  advantage that it does not consume electricity while displaying a single  page and works without &lt;span class="misspell" suggestions="back 
light,back-light,backlog,blight,backbit"&gt;backlight&lt;/span&gt;. In theory an  electronic paper-based device can be used much longer with recharging.  Readability for text is supposed to be better than a LCD as well.  But it cannot display charts, tables and photos well. Everything users  are used to know on the computer and smart phones. Additionally, they  switch slowly from one page to another and the display even flickers  annoyingly while changing the page content.&lt;br /&gt;
&lt;br /&gt;
Until the advent of  the &lt;span class="misspell" suggestions="IPA,pad,ipsa,Ida,paid"&gt;iPad&lt;/span&gt;  I thought of buying such a device and fantasized about caring around  all my IT-books, classical literature from Project Gutenberg and blogs  on this device around while commuting. It was just a matter of the right  price. But now I do not see a point anymore. The &lt;span class="misspell" suggestions="IPA,pad,ipsa,Ida,paid"&gt;iPad&lt;/span&gt; has proven that it is  possible to build a general purpose computing device that includes book  reading software. The electronic paper may be better for novels but I  don't care. The ability to use all kinds of software and browse the web  on such a device is definitely buying me more.&lt;br /&gt;
&lt;br /&gt;
That said the &lt;span class="misspell" suggestions="IPA,pad,ipsa,Ida,paid"&gt;iPad&lt;/span&gt; is  just the prove that it can be done. It is not platform I would like to  develop on.&lt;br /&gt;
&lt;br /&gt;
What's the point off having a general purpose  computing device that is controlled by one company? The platform for  such a device has to be reasonably open. And I think that will not only  be in the interest of application developers but of the provider of such  a platform as well. The development and deployment model imposed on  developers will stifle innovation. As innovative as the &lt;span class="misspell" suggestions="IPA,pad,ipsa,Ida,paid"&gt;iPad&lt;/span&gt; may be a  single company won't outperform a whole industry in the long run.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-8034932662638456028?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=WkNlJtdiaYg:n48UdpBNIIg:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=WkNlJtdiaYg:n48UdpBNIIg:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=WkNlJtdiaYg:n48UdpBNIIg:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=WkNlJtdiaYg:n48UdpBNIIg:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=WkNlJtdiaYg:n48UdpBNIIg:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=WkNlJtdiaYg:n48UdpBNIIg:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=WkNlJtdiaYg:n48UdpBNIIg:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=WkNlJtdiaYg:n48UdpBNIIg:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=WkNlJtdiaYg:n48UdpBNIIg:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/WkNlJtdiaYg" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/8034932662638456028/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/02/moment-electronic-paper-died.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/8034932662638456028?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/8034932662638456028?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/WkNlJtdiaYg/moment-electronic-paper-died.html" title="The moment electronic paper died" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/02/moment-electronic-paper-died.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEQGQn05eyp7ImA9Wx5QGEk.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-2455686083624856574</id><published>2010-02-02T20:22:00.014+01:00</published><updated>2010-09-07T09:58:43.323+02:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-09-07T09:58:43.323+02:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scala" /><title>A faster spelling corrector</title><content type="html">&lt;div id="doc-contents"&gt;The &lt;a href="http://theyougen.blogspot.com/2009/12/peter-norvigs-spelling-corrector-in.html" id="w46g" title="compact spelling corrector implemention"&gt;compact spelling corrector implemention&lt;/a&gt; showed some of the features of Scala: for expression, higher order functions (fold and map), tuple support and the infix method syntax. The current implementation has terrible performance, so we try to develop a faster corrector implementation.&lt;br /&gt;
&lt;br /&gt;
Let's have a look at it at run-time with the VisualVM memory profiler.&lt;br /&gt;
&lt;br /&gt;
&lt;div id="yxat" style="text-align: left;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_0LGgFlXE6gQ/S2h42lVmPNI/AAAAAAAAABw/xTpJEaZ_Qhs/s1600-h/visualvm.snapshot.cut.bmp" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://3.bp.blogspot.com/_0LGgFlXE6gQ/S2h42lVmPNI/AAAAAAAAABw/xTpJEaZ_Qhs/s400/visualvm.snapshot.cut.bmp" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
Not very surprising, it creates a lot of objects. To correct 20 words of edit distance 0 to 3 randomly picked from the &lt;a href="http://norvig.com/big.txt" id="t:7s" title="big.txt"&gt;big.txt&lt;/a&gt; file it allocated 15 million char[] arrays and 9 million String instances. Allocation may be cheap but allocating millions of objects is not.&lt;br /&gt;
&lt;br /&gt;
One way to improve the performance of the spelling corrector is to allocate fewer objects. The spelling corrector is based on a brute force approach: calculate all possible candidates for a given edit distance. The most expensive operation is to create candidate string instances. The java.lang.String implementation is immutable. Every single change to a String will force the creation of a new instance.&lt;br /&gt;
&lt;br /&gt;
But we can do better than that. We can capitalize on the knowledge that the Strings won't be random. The candidates can be easily deduced from the base string. A candidate string has some additional private state depending on the operation (insert, transpose, replace, delete) performed. Phrasing it in this way directly leads to the &lt;a href="http://en.wikipedia.org/wiki/Flyweight_pattern" id="kc2h" title="flyweight pattern"&gt;flyweight pattern&lt;/a&gt;. Flyweights can be implemented for every operation implementing the &lt;a href="http://java.sun.com/javase/6/docs/api/java/lang/CharSequence.html" id="i:s." title="CharSequence interface"&gt;CharSequence interface&lt;/a&gt; that is already part of the JDK. This will gain quite good performance improvements. No additional char[] arrays will be allocated.&lt;br /&gt;
&lt;br /&gt;
I'll skip the flyweight implementation and leave it as an exercise. There is a better approach.&lt;br /&gt;
&lt;br /&gt;
The flyweight implementation will still create one object per candidate. Even this can be omitted. If we would pick only candidates that are promising the creation of objects would be further reduced and the performance increased. And happy coincidence a data structure can help here: the &lt;a href="http://theyougen.blogspot.com/2009/12/decent-bloom-filter-in-scala.html" id="vgxw" title="Bloom Filter"&gt;Bloom Filter&lt;/a&gt;. &lt;br /&gt;
&lt;br /&gt;
Additionally to dictionary of word frequencies that is already used in the base implementation of the spelling corrector a Bloom Filter is initialized with all words.&lt;br /&gt;
&lt;br /&gt;
The candidate generation will than be separated in multiple steps: candidate hashing, hash filtering with the Bloom Filter, candidate instantiation and dictionary filtering.&lt;br /&gt;
&lt;br /&gt;
Using the Bloom Filter most of the candidates will not survive the hash generation and hash filtering steps. The hashed candidates can be calculated without instantiating any objects, just using primitive int values that live on the stack. This has super performance characteristics: no memory allocation and garbage collecting needed.&lt;br /&gt;
&lt;br /&gt;
For every type of candidate: deletion, replacement, insertion and transposition an Iterator can be implemented that produces the candidate hash values. The iterator implementations are based on a base string and additional state (positions in the base string and positions in the alphabet) necessary for the candidate hash value production.&lt;br /&gt;
&lt;br /&gt;
All candidate hash values will be checked if they are contained in the Bloom Filter. If so the candidate string will be created and checked against the dictionary. The second check is necessary as the bloom filter produces false positives.&lt;br /&gt;
&lt;br /&gt;
The actual false positive rate can be freely chosen. It will have no impact on the functionality. It could be 100%. The false positive rate can be changed to tune the performance depending on memory consumption requirements, cost of hash functions used in the Bloom Filter implementation and object instantiation cost for candidate strings.&lt;br /&gt;
&lt;br /&gt;
To check on the basic assumption we can &lt;a href="http://bitbucket.org/blob79/scalaspellingcorrector/src/a082732c5b4c/src/test/scala/org/bitbucket/speller/BloomFilterVsStringAllocation.scala" id="tled" title="measure the performance characteristics of the basic string operations needed and the Bloom Filter"&gt;measure the performance characteristics of the basic string operations needed and the Bloom Filter&lt;/a&gt;. The limited performance test compares the run-time of a character array copy needed to create the base char[] for one insertion candidate string (creating a String candidate won't be faster than this)&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: scala"&gt;arraycopy(stringValue, 0, s, 0, 1) //insertion
s(1) = 'c'
arraycopy(stringValue, 1, s, 2, stringValue.length - 1)
&lt;/pre&gt;&lt;br /&gt;
and the Bloom Filter contains method performance.&lt;br /&gt;
&lt;br /&gt;
All presented durations lack an unit of measurement as all measurements are based on the same scale. This is sufficient to compare the values.&lt;br /&gt;
&lt;br /&gt;
The time needed to create the insertion candidate string grows linearly with the base string length.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_0LGgFlXE6gQ/S2h5T9jzw7I/AAAAAAAAAB4/q1A1Vkn8_bc/s1600-h/char+copy+time.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/_0LGgFlXE6gQ/S2h5T9jzw7I/AAAAAAAAAB4/q1A1Vkn8_bc/s320/char+copy+time.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;table border="0" cellspacing="0" class="zeroBorder" cols="2" frame="VOID" rules="NONE"&gt;&lt;tbody&gt;
&lt;tr&gt;    &lt;td align="LEFT" height="17" style="border: 1px solid rgb(0, 0, 0);" width="86"&gt;&lt;b&gt;char[] length &lt;/b&gt;&lt;/td&gt;    &lt;td align="LEFT" style="border: 1px solid rgb(0, 0, 0);" width="137"&gt;&lt;b&gt;character  copy time &lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;2&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;11,69&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;4&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;11,70&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;8&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;12,57&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;16&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;15,95&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;32&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;21,57&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;64&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;30,21&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;128&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;51,22&lt;/td&gt;   &lt;/tr&gt;
&lt;/tbody&gt; &lt;/table&gt;&lt;br /&gt;
&lt;div id="ye7:" style="text-align: left;"&gt;&lt;/div&gt;&lt;br /&gt;
The Bloom Filter time grows linearly with the number of hash functions (k) used. The &lt;a href="http://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives" id="cizz" title="number of hash functions"&gt;optimal number of hash functions&lt;/a&gt; used is determined according to the ration between expected elements (n) and bloom filter size (m). &lt;br /&gt;
&lt;br /&gt;
&lt;img alt="k = \frac{m}{n}  ln(2)" class="ee_img tr_noresize" eeimg="1" src="http://2.bp.blogspot.com/_0LGgFlXE6gQ/TIXs5Dv29yI/AAAAAAAAADA/SFzlaeo6aPc/s320/chart.png" style="vertical-align: middle;" /&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The measured time is the worst case: an entry (or a false positive) is in the Bloom Filter. In this scenario all k-hash functions are executed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_0LGgFlXE6gQ/S2h5fcgBtuI/AAAAAAAAACA/Zlud22Q-OwE/s1600-h/bloom+time.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/_0LGgFlXE6gQ/S2h5fcgBtuI/AAAAAAAAACA/Zlud22Q-OwE/s320/bloom+time.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;table border="0" cellspacing="0" class="zeroBorder" cols="4" frame="VOID" rules="NONE"&gt;&lt;tbody&gt;
&lt;tr&gt;    &lt;td align="LEFT" height="17" style="border: 1px solid rgb(0, 0, 0);" width="120"&gt;&lt;b&gt;n/m ratio&lt;/b&gt;&lt;/td&gt;    &lt;td align="LEFT" style="border: 1px solid rgb(0, 0, 0);" width="86"&gt;&lt;b&gt;k&lt;/b&gt;&lt;/td&gt;     &lt;td align="LEFT" style="border: 1px solid rgb(0, 0, 0);" width="120"&gt;&lt;b&gt;false  positive rate&lt;/b&gt;&lt;/td&gt;    &lt;td align="LEFT" style="border: 1px solid rgb(0, 0, 0);" width="102"&gt;&lt;b&gt;time filter&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;1&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,62&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;3,73&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;2&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,39&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;5,70&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;3&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,24&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;7,70&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;4&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;3&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,14&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;7,71&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;5&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;4&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,09&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;9,38&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;6&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;5&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,05&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;11,59&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;7&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;5&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,03&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;11,59&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;8&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;6&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,02&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;13,80&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;9&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;7&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,01&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;18,68&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;10&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;7&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,01&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;18,68&lt;/b&gt;&lt;/td&gt;   &lt;/tr&gt;
&lt;/tbody&gt; &lt;/table&gt;&lt;br /&gt;
&lt;div id="rd6v" style="text-align: left;"&gt;&lt;/div&gt;In a real world scenario the average run-time of the Bloom Filter will be better as the execution can be aborted for values that are not in the Bloom Filter. The false positive probability another important figure of the Bloom Filter: &lt;br /&gt;
&lt;br /&gt;
&lt;img alt="p_f =(1-e^{-kn/m})^k" class="ee_img tr_noresize" eeimg="1" src="http://2.bp.blogspot.com/_0LGgFlXE6gQ/TIXtvMUW9sI/AAAAAAAAADI/swzx-kwX3Es/s1600/chart.png" /&gt;&lt;br /&gt;
&lt;br /&gt;
False positive causes additional work based on the assumption to work with a valid value. This run-time is wasted as the preceding check against the dictionary will eliminate the false positives.&lt;br /&gt;
&lt;br /&gt;
To get the expected resulting performance of the spelling corrector we construct a small model. The expected run-time is determined by the cost of the filter t_f,&amp;nbsp;the false positives probability p_f&amp;nbsp;the probability p_w&amp;nbsp;that a candidate is a word in the dictionary and the time t_s&amp;nbsp;to construct a candidate string. The time spend to construct strings which are false positives is p_f * t_s. The time spend to construct strings which are words is p_w*t_s. (To simplify the formula a constant string length is assumed.)&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_0LGgFlXE6gQ/TIXv9HLwnMI/AAAAAAAAAEQ/qeFD2uFhT1U/s1600/t_form.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_0LGgFlXE6gQ/TIXv9HLwnMI/AAAAAAAAAEQ/qeFD2uFhT1U/s320/t_form.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The values computed are based on candidate strings with a length of 8 chars (t_s=t_s8)&amp;nbsp;and the probability that 5% of the candidates are words contained in the dictionary (p_w=0.05).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_0LGgFlXE6gQ/S2h5lUS5DgI/AAAAAAAAACI/QIR_7MXyw2E/s1600-h/expected+time.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/_0LGgFlXE6gQ/S2h5lUS5DgI/AAAAAAAAACI/QIR_7MXyw2E/s320/expected+time.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div id="qspi" style="text-align: left;"&gt;&lt;/div&gt;&lt;table border="0" cellspacing="0" class="zeroBorder" cols="6" frame="VOID" rules="NONE"&gt;&lt;tbody&gt;
&lt;tr&gt;    &lt;td align="LEFT" height="17" style="border: 1px solid rgb(0, 0, 0);" width="120"&gt;&lt;b&gt;n/m ratio&lt;/b&gt;&lt;/td&gt;    &lt;td align="LEFT" style="border: 1px solid rgb(0, 0, 0);" width="86"&gt;&lt;b&gt;k&lt;/b&gt;&lt;/td&gt;     &lt;td align="LEFT" style="border: 1px solid rgb(0, 0, 0);" width="120"&gt;&lt;b&gt;false  positive rate (p_f)&lt;/b&gt;&lt;/td&gt;    &lt;td align="LEFT" style="border: 1px solid rgb(0, 0, 0);" width="102"&gt;&lt;b&gt;time filter (t_f)&lt;/b&gt;&lt;/td&gt;     &lt;td align="LEFT" style="border: 1px solid rgb(0, 0, 0);" width="156"&gt;&lt;b&gt;false  positive time (p_f * t_s)&lt;/b&gt;&lt;/td&gt;    &lt;td align="LEFT" style="border: 1px solid rgb(0, 0, 0);" width="98"&gt;&lt;b&gt;expected time (t)&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;1&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,62&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;3,73&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;7,84&lt;/td&gt;     &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;12,20&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;2&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,39&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;5,70&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;4,88&lt;/td&gt;     &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;11,21&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;3&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,24&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;7,70&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;3,04&lt;/td&gt;     &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;11,37&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;4&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;3&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,14&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;7,71&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;1,76&lt;/td&gt;     &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;10,10&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;5&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;4&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,09&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;9,38&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;1,09&lt;/td&gt;     &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;11,09&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;6&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;5&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,05&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;11,59&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,66&lt;/td&gt;     &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;12,88&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;7&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;5&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,03&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;11,59&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,39&lt;/td&gt;     &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;12,61&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;8&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;6&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,02&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;13,80&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,25&lt;/td&gt;     &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;14,68&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;9&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;7&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,01&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;18,68&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,15&lt;/td&gt;     &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;19,46&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;tr&gt;    &lt;td align="RIGHT" height="17" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;10&lt;/b&gt;&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;7&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,01&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;18,68&lt;/td&gt;    &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;0,09&lt;/td&gt;     &lt;td align="RIGHT" style="border: 1px solid rgb(0, 0, 0);"&gt;&lt;b&gt;19,40&lt;/b&gt;&lt;/td&gt;    &lt;/tr&gt;
&lt;/tbody&gt; &lt;/table&gt;&lt;br /&gt;
By increasing the n/m ratio (and hash functions k executed respectively) of the Bloom Filter we see an increase in the time spent to check if the filter contains a candidates string (time filter). On the other hand the time spent to produce false positives (false positive time) declines with a growing n/m ratio.&lt;br /&gt;
&lt;br /&gt;
For the given model the local minimum of the expected run-time is with a n/m ratio of 4 (false positive rate of 0,14). The expected worst case run-time of the Bloom Filter based implementation is better than the plain array copying run-time (with a of minimum 11,69) and the performance benefits increases dramatically with the string length.&lt;br /&gt;
&lt;br /&gt;
In the next blog entries I'll implement this Bloom Filter-based spelling corrector to see if one can capitalize on the theoretical advantages. The next blog entry will be about the &lt;a href="http://theyougen.blogspot.com/2010/02/implementing-hash-functions-in.html"&gt;hash function support needed&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-2455686083624856574?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=lA-2IxoXcGw:r9SLRQ3hP3M:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=lA-2IxoXcGw:r9SLRQ3hP3M:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=lA-2IxoXcGw:r9SLRQ3hP3M:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=lA-2IxoXcGw:r9SLRQ3hP3M:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=lA-2IxoXcGw:r9SLRQ3hP3M:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=lA-2IxoXcGw:r9SLRQ3hP3M:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=lA-2IxoXcGw:r9SLRQ3hP3M:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=lA-2IxoXcGw:r9SLRQ3hP3M:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=lA-2IxoXcGw:r9SLRQ3hP3M:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/lA-2IxoXcGw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/2455686083624856574/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/02/faster-spelling-corrector.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/2455686083624856574?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/2455686083624856574?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/lA-2IxoXcGw/faster-spelling-corrector.html" title="A faster spelling corrector" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_0LGgFlXE6gQ/S2h42lVmPNI/AAAAAAAAABw/xTpJEaZ_Qhs/s72-c/visualvm.snapshot.cut.bmp" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/02/faster-spelling-corrector.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C08NRXY4eCp7ImA9WxBQGEU.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-6293719930262424774</id><published>2010-01-18T21:05:00.002+01:00</published><updated>2010-01-19T07:18:14.830+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-19T07:18:14.830+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scala" /><title>Why overrideable tail recursive methods must not be optimized</title><content type="html">Suppose you define a &lt;a href="http://en.wikipedia.org/wiki/Tail_recursion"&gt;tail recursive method&lt;/a&gt; x in a object A. You can use the @tailrec annotation to mark methods you expect optimization of the tail recursive code to happen.&lt;br /&gt;
&lt;pre class="brush: scala"&gt;import annotation._
object A{
&amp;nbsp;&amp;nbsp;&amp;nbsp; @tailrec def x(i : Int, s : String) : String = if(i &amp;gt; 0) x(i - 1, s + "a:" + i) else s
}
&lt;/pre&gt;Now you move the method x to a trait or class and suddenly the compilation fails.&lt;br /&gt;
&lt;pre class="brush: scala"&gt;import annotation._
trait A{
&amp;nbsp;&amp;nbsp;&amp;nbsp; @tailrec def x(i : Int, s : String) : String = if(i &amp;gt; 0) x(i - 1, s + "a:" + i) else s
}&lt;/pre&gt;&lt;pre class="brush: plain"&gt;&lt;console&gt;:8: error: could not optimize @tailrec annotated method
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; @tailrec def x(i : Int, s : String) : String = if(i &amp;gt; 0) x(i - 1, s + "a:" + i) else s
&lt;/pre&gt;I had exactly this problem. I defined a perfectly valid tail recursive function but the compiler would not optimize it to a loop. This was in Scala 2.7 that does not support the @tailrec annotation. You have the additional issue of spotting the problem in the first place.&lt;br /&gt;
&lt;br /&gt;
The problem is that overrideable tail recursive methods won't be optimized. If you change the code to&lt;br /&gt;
&lt;pre class="brush: scala"&gt;import annotation._
trait A{
&amp;nbsp;&amp;nbsp;&amp;nbsp; @tailrec final def x(i : Int, s : String) : String = if(i &amp;gt; 0) x(i - 1, s + "a:" + i) else s
}
&lt;/pre&gt;it will compile again. This is the rule. But why?&lt;br /&gt;
&lt;br /&gt;
Performance optimization must not change the behavior at run-time. So overriding an optimized tail recursive method will behave differently from an unoptimized tail recursive method.&lt;br /&gt;
&lt;br /&gt;
Let's define A in a way that it won't be optimized and override the tail recursive method x in the subclass B.&lt;br /&gt;
&lt;pre class="brush: scala"&gt;class A{
&amp;nbsp;&amp;nbsp;&amp;nbsp; def x(i : Int, s : String) : String = if(i &amp;gt; 0) x(i - 1, s + "a:" + i) else s
}

class B extends A{
&amp;nbsp;&amp;nbsp;&amp;nbsp; override def x(i : Int, s: String) : String = super.x(i, s + "b:")
}
&lt;/pre&gt;This will produce the output:&lt;br /&gt;
&lt;pre class="brush: plain"&gt;scala&amp;gt; new A().x(3, "")
res0: String = a:3a:2a:1

scala&amp;gt; new B().x(3, "")
res1: String = b:a:3b:a:2b:a:1b:
&lt;/pre&gt;The implementation of the method x in B invokes the method x in A. When the method x defined in A invokes the method x it calls the method x in B &lt;/console&gt;(for an instance of B)&lt;console&gt;. This behavior is due to late binding of calls to x.&lt;br /&gt;
&lt;br /&gt;
Suppose it was possible to optimize the method x in A. The compiler would define a class similar to this definition of A. The recursive invocation of x is optimized to a loop. With this definition of x there is no possibility to apply late binding to x. The optimization would hard wire the definition of x.&lt;br /&gt;
&lt;pre class="brush: scala"&gt;class A{
&amp;nbsp;&amp;nbsp;&amp;nbsp; def x(i : Int, s : String) : String = {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var j = i
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var t = s
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while(j &amp;gt; 0){
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t = t + "a:" + j
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; j = j - 1
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t
&amp;nbsp;&amp;nbsp;&amp;nbsp; }
}

class B extends A{
&amp;nbsp;&amp;nbsp;&amp;nbsp; override def x(i : Int, s: String) : String = super.x(i, "b:" + s)
}
&lt;/pre&gt;&lt;pre class="brush: plain"&gt;scala&amp;gt; new A().x(3, "")
res0: String = a:3a:2a:1

scala&amp;gt; new B().x(3, "")
res1: String = b:a:3a:2a:1
&lt;/pre&gt;The early binding of x in A changes the behavior of B. Once the method x in A is called it will never come back to B. This is the reason why the Scala compiler can't optimize overrideable tail recursive methods.&lt;br /&gt;
&lt;br /&gt;
Following this explanation, it should be clear that the @tailrec annotation is worthwhile and should be used in all your Scala 2.8 code you expect the optimization to happen. For Scala 2.7 it's a bit unfortunate that moving a tail recursive method from an object to trait will change the behavior significantly without a compiler error. So be warned.&lt;/console&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-6293719930262424774?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=E64AeR-3eWw:12tDrJexzCI:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=E64AeR-3eWw:12tDrJexzCI:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=E64AeR-3eWw:12tDrJexzCI:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=E64AeR-3eWw:12tDrJexzCI:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=E64AeR-3eWw:12tDrJexzCI:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=E64AeR-3eWw:12tDrJexzCI:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=E64AeR-3eWw:12tDrJexzCI:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=E64AeR-3eWw:12tDrJexzCI:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=E64AeR-3eWw:12tDrJexzCI:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/E64AeR-3eWw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/6293719930262424774/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/01/why-overrideable-tail-recursive-methods.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/6293719930262424774?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/6293719930262424774?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/E64AeR-3eWw/why-overrideable-tail-recursive-methods.html" title="Why overrideable tail recursive methods must not be optimized" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/01/why-overrideable-tail-recursive-methods.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUEDSXw8cSp7ImA9WxBQEUQ.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-4603709618359039784</id><published>2010-01-11T09:01:00.001+01:00</published><updated>2010-01-11T09:14:38.279+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-11T09:14:38.279+01:00</app:edited><title>Lessons from Stackoverflow: missing education should not stop you from being a good developer</title><content type="html">To put this blog into perspective: I've the weakest of all possible computer science degrees you can imagine. It's formally a degree in computer science but half of my curriculum was in communication engineering which is an interesting field (if you want to build a digital hearing aid for your grandma) but does not really help with software development.&lt;br /&gt;
&lt;br /&gt;
I do appreciate the work of Jeff Atwood at &lt;a href="http://stackoverflow.com/" id="zsf2" title="Stackoverflow"&gt;Stackoverflow&lt;/a&gt;. It's a great site. It is a viable programming resource. Judging based on his results he is a good developer.&lt;br /&gt;
&lt;br /&gt;
Ironically you can build a good product without been a topnotch computer scientist. This is really encouraging for me. &lt;br /&gt;
&lt;br /&gt;
I'd been reading &lt;a href="http://www.codinghorror.com/blog/" id="vnzp" title="Jeff's blog"&gt;Jeff Atwood's blog&lt;/a&gt; for more than two years because it was funny, entertaining, presented a down-to-earth view of IT and provided insights of a different community (.net and Windows) from my Java community. But around the time he started Stackoverflow I quit reading his blog. I cannot say exactly why, but I lost its appeal a bit. I liked to read more computer science related stuff (mostly from the functional programming world). Roughly one year later I'm back again. I really like Stackoverflow.&lt;br /&gt;
&lt;br /&gt;
But time and again listening to the Stackoverflow podcasts makes me wondering how's that possible. How could he write Stackoverlow? In the latest &lt;a href="http://blog.stackoverflow.com/2010/01/podcast-79" id="c6ci" title="podcast"&gt;Stackoverflow podcast #79&lt;/a&gt; &lt;a href="http://www.joelonsoftware.com/" id="db26" title="Joel S"&gt;Joel Spolsky&lt;/a&gt; tried to explain that you can parse only &lt;a href="http://en.wikipedia.org/wiki/Regular_language" id="ln9y" title="regular languages"&gt;regular languages&lt;/a&gt; with regexps parser. (Actually, most implementations allow more than that. For example &lt;a href="http://www.regular-expressions.info/brackets.html#usebackrefinregex" id="lsjk" title="back referencing"&gt;back referencing&lt;/a&gt; is more powerful than regular expressions.) This explanation goes on and on. This was like a &lt;a href="http://blog.stackoverflow.com/2009/10/podcast-71/" id="f-g8" title="déjà vu when he tried to explain"&gt;déjà vu when he tried to explain&lt;/a&gt; Jeff and &lt;a href="http://www.hanselminutes.com/" id="z.i6" title="Scott Hanselman"&gt;Scott Hanselman&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Type_inference#Hindley-Milner_type_inference_algorithm" id="b.b6" title="Hindley-Milner type inference algorithm"&gt;Hindley-Milner type inference algorithm&lt;/a&gt; used by the Haskell compiler. (He tried to explain that you can deduce the return type of a method from the parameters and the functions called. Let's say he did not get too far.)&lt;br /&gt;
&lt;br /&gt;
Jeff explains why it's great to have his &lt;a href="http://code.google.com/p/markdownsharp/" id="e87p" title="first open-source project"&gt;first open-source project&lt;/a&gt;. How many bugs he fixed and that he is now at the point where the hairy problems start. Few people seem to be able to help fixing the problems in the C# port of the PHP and Perl implementations. Joel explains again and again that you need a &lt;a href="http://en.wikipedia.org/wiki/Lexical_analysis" id="t6ki" title="lexer"&gt;lexer&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Parser_generator" id="x1z0" title="parser generator"&gt;parser generator&lt;/a&gt; to parse &lt;a href="http://daringfireball.net/projects/markdown/" rel="nofollow"&gt;Markdown&lt;/a&gt; and every student with a parser course can tell you that this is right tool for this kind of problem. That a transformation from an &lt;a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree" id="aacu" title="AST"&gt;AST&lt;/a&gt; is a piece of cake. I never had a parser course so don't expect a &lt;a href="http://www.cforcoding.com/2010/01/jmd-markdown-and-brief-overview-of.html"&gt;scientific explanation&lt;/a&gt; here, but I was at least able to parse a subset of SQL with &lt;a href="http://en.wikipedia.org/wiki/Parser_combinator" id="prk9" title="Parser Combinators"&gt;Parser Combinators&lt;/a&gt;. The moment Jeff started to explain his open source project. That the parsing is done with regular expressions I though: "Hell, why didn't he use a lexer/parser for that."&lt;br /&gt;
&lt;br /&gt;
But you can learn something from Jeff Atwood's example: Your meager education should not stop you from being a good developer. We can't all have ivy league education. (They wouldn't be ivy league if everyone would get a degree. The system does only work because it discriminates between students.) Let's get over it.&amp;nbsp; Put your enthusiasm in your work and learn the bits and pieces you have to learn to get the jobs done. Try to constantly improve your theoretical knowledge, your tool knowledge, quality of your code and your communication skills. Maintain your critical thinking to spot ways you can improve. You can't known everything but you &lt;a href="http://www.brainyquote.com/quotes/quotes/d/donaldrums148142.html" id="h0qf" title="should known what you don't known"&gt;should known what you don't known&lt;/a&gt;:&lt;br /&gt;
&lt;br /&gt;
&lt;div class="KonaBody"&gt;&lt;i&gt;There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.&lt;br /&gt;
&lt;/i&gt;&lt;br /&gt;
The known unknowns will slow you down, but only the unknown unknowns can hurt you. These are the places where you put to much effort into a problem that is already solved like &lt;a href="http://en.wikipedia.org/wiki/List_of_parser_generators" id="f5w." title="parsing"&gt;parsing&lt;/a&gt;. But a star maverick developer can only develop so much. There will be times when a &lt;a href="http://www.joelonsoftware.com/items/2009/09/23.html" id="farl" title="Duct Tape Developer"&gt;Duct Tape Developer is needed&lt;/a&gt; and the pure theory boys won't get the job done. This is the one thing you can learn from the Stackoverflow story: &lt;a href="http://en.wikipedia.org/wiki/It%27s_the_economy,_stupid" id="e5.m" title="It's the motivation, stupid."&gt;It's the motivation, stupid&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;/div&gt;By the way, if you have doubts about your abilities, this is &lt;a href="http://en.wikipedia.org/wiki/Dunning-Kruger_effect" id="psot" title="probably a good sign"&gt;probably a good sign&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-4603709618359039784?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=dXMxXu3MA7M:ZsnpTnaEMSY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=dXMxXu3MA7M:ZsnpTnaEMSY:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=dXMxXu3MA7M:ZsnpTnaEMSY:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=dXMxXu3MA7M:ZsnpTnaEMSY:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=dXMxXu3MA7M:ZsnpTnaEMSY:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=dXMxXu3MA7M:ZsnpTnaEMSY:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=dXMxXu3MA7M:ZsnpTnaEMSY:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=dXMxXu3MA7M:ZsnpTnaEMSY:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=dXMxXu3MA7M:ZsnpTnaEMSY:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/dXMxXu3MA7M" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/4603709618359039784/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/01/lessons-from-stackoverflow-missing.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/4603709618359039784?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/4603709618359039784?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/dXMxXu3MA7M/lessons-from-stackoverflow-missing.html" title="Lessons from Stackoverflow: missing education should not stop you from being a good developer" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>5</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/01/lessons-from-stackoverflow-missing.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0cEQn0zfip7ImA9WxBQEU4.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-3270178970350611452</id><published>2010-01-07T09:07:00.003+01:00</published><updated>2010-01-10T14:43:23.386+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-10T14:43:23.386+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scala" /><title>How to setup a Maven Scala project with IntelliJ IDEA support</title><content type="html">Setting up a Maven and Scala project with IDEA is simple.&lt;br /&gt;
&lt;br /&gt;
Install &lt;a href="http://www.jetbrains.com/idea/free_java_ide.html"&gt;IntelliJ IDEA Community Edition&lt;/a&gt; and the Scala plugin (File -&amp;gt; Settings -&amp;gt; Plugins -&amp;gt; Available).&lt;br /&gt;
&lt;br /&gt;
Generate your maven project using the &lt;a href="http://scala-blogs.org/2008/01/maven-for-scala.html"&gt;simple Scala archetype&lt;/a&gt;:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: plain"&gt;mvn org.apache.maven.plugins:maven-archetype-plugin:1.0-alpha-7:create 
    -DarchetypeGroupId=org.scala-tools.archetypes  
    -DarchetypeArtifactId=scala-archetype-simple
    -DarchetypeVersion=1.2
    -DremoteRepositories=http://scala-tools.org/repo-releases
    -DgroupId=group
    -DartifactId=artifact
&lt;/pre&gt;&lt;br /&gt;
Add the Scala compiler lib to your project dependencies in the pom.xml file:&lt;br /&gt;
&lt;pre class="brush: xml"&gt;&amp;lt;dependencies&amp;gt;
    &amp;lt;dependency&amp;gt;
        &amp;lt;groupId&amp;gt;org.scala-lang&amp;lt;/groupId&amp;gt;
        &amp;lt;artifactId&amp;gt;scala-compiler&amp;lt;/artifactId&amp;gt;
        &amp;lt;version&amp;gt;${scala.version}&amp;lt;/version&amp;gt;
        &amp;lt;scope&amp;gt;compile&amp;lt;/scope&amp;gt;
    &amp;lt;/dependency&amp;gt;
 &amp;lt;/dependencies&amp;gt;        
&lt;/pre&gt;&lt;br /&gt;
Update the Scala version configuration to a recent release:&lt;br /&gt;
&lt;pre class="brush: xml"&gt;&amp;lt;properties&amp;gt;
    &amp;lt;scala.version&amp;gt;2.7.7&amp;lt;/scala.version&amp;gt;
&amp;lt;/properties&amp;gt;
&lt;/pre&gt;Open the a new project. Select the pom.xml in the file dialog.&lt;br /&gt;
&lt;br /&gt;
An notification will pop up. Click &lt;i&gt;"Create Scala Facet"&lt;/i&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Done.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
You can now compile and execute as usual and run the maven build &lt;code&gt;mvn install&lt;/code&gt; with the Maven Tool Window. (The &lt;i&gt;M2_HOME&lt;/i&gt; environment variable or Maven home settings (File -&amp;gt; Settings -&amp;gt; Maven -&amp;gt; Maven home directory) have to be present.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-3270178970350611452?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=IPH-DidUrVk:6vL0nSl0xQA:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=IPH-DidUrVk:6vL0nSl0xQA:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=IPH-DidUrVk:6vL0nSl0xQA:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=IPH-DidUrVk:6vL0nSl0xQA:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=IPH-DidUrVk:6vL0nSl0xQA:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=IPH-DidUrVk:6vL0nSl0xQA:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=IPH-DidUrVk:6vL0nSl0xQA:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=IPH-DidUrVk:6vL0nSl0xQA:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=IPH-DidUrVk:6vL0nSl0xQA:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/IPH-DidUrVk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/3270178970350611452/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/01/how-to-setup-maven-scala-project-with.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/3270178970350611452?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/3270178970350611452?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/IPH-DidUrVk/how-to-setup-maven-scala-project-with.html" title="How to setup a Maven Scala project with IntelliJ IDEA support" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/01/how-to-setup-maven-scala-project-with.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0cARn8-eip7ImA9WxBQEU4.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-5695459262623184884</id><published>2010-01-06T17:59:00.005+01:00</published><updated>2010-01-10T14:44:07.152+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-10T14:44:07.152+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scala" /><title>The best Scala IDE: IntelliJ IDEA</title><content type="html">After years and years of using Eclipse (after Netbeans, JBuilder, Visual Cafe, Emacs and other spooky Java "editors") &lt;a href="http://www.jetbrains.com/idea/"&gt;IntelliJ IDEA&lt;/a&gt; is powerful, convenient and simple to use IDE for Scala development. Oh, yes it's free: &lt;a href="http://www.jetbrains.com/idea/free_java_ide.html"&gt;IntelliJ IDEA Community edition&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
I tried IDEA because the Scala plug-in for Eclipse is so horrible. I works so badly that using good old &lt;a href="http://www.jedit.org/"&gt;jEdit&lt;/a&gt; with some macros is way better. On a small project I had to clean the whole project for every compile. It's so bad. After several tries (starting sometime in 2008) I've given up on this thing.&lt;br /&gt;
&lt;br /&gt;
Before IDEA I also tried to use Netbeans, but it failed the dump-developer-that-does-not-read-the-manual-test. Basic functionality should be usable without a detailed study of the documentation. That worked for me with IDEA but I could not get the plugin Scala plugin to run in Netbeans. It's probably me and everything is fine with Netbeans. Still I've invested less time to get things done with IDEA. It simply worked.&lt;br /&gt;
&lt;br /&gt;
The simple summary: &lt;b&gt;every basic IDE feature works&lt;/b&gt;: syntax high lighting, compiling, debugging, navigation and outline.&lt;br /&gt;
&lt;br /&gt;
If you worried about productivity of Scala development because it was missing IDE support: &lt;b&gt;stop worrying&lt;/b&gt;. &lt;b&gt;The support is here.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
I'm a heavy short cut user so using a new IDE slows me down naturally. Using the Eclipse key map was not an option for me. This works only 80% of the time. The main problem is that functionality with the same shortcuts behaves slightly different. That's maddening at the long run. So it's better to learn a new set of short cuts. It's like driving in Great Britain. It's easier when you are using a car with the steering wheel on the right. Driving a car with the steering wheel on the left is dangerous. There are the country side hedges where you'll see practically nothing in a left turn and chances are that you're going to start your trip on the right hand side on a calm country road. So better relearn.&lt;br /&gt;
&lt;br /&gt;
Interestingly IDEA has a code analysis tool that does spell checking in comments, methods and variables names. This is already works for Scala. Eclipse does spell checking only in comments. This is a boon for code quality. You do not want to read incorrectly spelled source code. This is plain annoying to read, fix and frankly to get right (camel case does not make it easier). (The code analysis tool for Java is very good. It finds tons and tons of problems after FindBugs. I have not heard about the analysis tool before. It would be nice if it could be integrated in build tools and used in a continuous integration environments.)&lt;br /&gt;
&lt;br /&gt;
The maven support seems to work out of the box. It reads a pom.xml and builds IDEA project files from it. It still uses its internal build infrastructure. With Eclipse you have to manually generate the .project and .classpath files which works but is an extra step and the eclipse specific configuration in pom files can get quite clunky.&lt;br /&gt;
&lt;br /&gt;
The essential things work for Scala. You have code completion without implicit type conversion information. I suppose the support for implicit type conversion is quite hard to implement. There are a lot of callable members with type conversion for a given object. I suppose the best way to implement this would be a 2-step process. First show all directly reachable members and after hitting Ctrl + Shift again show all reachable members. Methods parameters are missing for decompiled Scala classes and as code completion information.&lt;br /&gt;
&lt;br /&gt;
Besides these minor problems IntelliJ IDEA is a &lt;b&gt;wonderful tool&lt;/b&gt; to develop Scala with and it does not crash. &lt;a href="http://theyougen.blogspot.com/2010/01/how-to-setup-maven-scala-project-with.html"&gt;Go and try it yourself&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-5695459262623184884?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=89C178lniRs:8CEvlIzry_M:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=89C178lniRs:8CEvlIzry_M:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=89C178lniRs:8CEvlIzry_M:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=89C178lniRs:8CEvlIzry_M:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=89C178lniRs:8CEvlIzry_M:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=89C178lniRs:8CEvlIzry_M:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=89C178lniRs:8CEvlIzry_M:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=89C178lniRs:8CEvlIzry_M:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=89C178lniRs:8CEvlIzry_M:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/89C178lniRs" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/5695459262623184884/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/01/best-scala-ide-intellij-idea.html#comment-form" title="14 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/5695459262623184884?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/5695459262623184884?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/89C178lniRs/best-scala-ide-intellij-idea.html" title="The best Scala IDE: IntelliJ IDEA" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>14</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/01/best-scala-ide-intellij-idea.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEcDRHc8fyp7ImA9Wx9VGUQ.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-7995092475447730702</id><published>2010-01-02T21:51:00.007+01:00</published><updated>2011-02-06T13:41:15.977+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-02-06T13:41:15.977+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="test" /><category scheme="http://www.blogger.com/atom/ns#" term="quickcheck" /><title>Quickcheck 0.5 release</title><content type="html">It's done: &lt;a href="https://quickcheck.dev.java.net/"&gt;Quickcheck&lt;/a&gt; &lt;a href="https://quickcheck.dev.java.net/files/documents/7367/146637/quickcheck.0.5.zip"&gt;version 0.5&lt;/a&gt; is ready. Again one of the nice little releases of Quickcheck. &lt;br /&gt;
&lt;br /&gt;
The 0.5 release added the following features mainly to provide better readability:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;Support for an &lt;a href="http://theyougen.blogspot.com/2009/06/iterable-adapter-ways-to-define.html"&gt;adapter from Generator&lt;t&gt; to Iterable&lt;t&gt;&lt;/t&gt;&lt;/t&gt;&lt;/a&gt; to allow the use of Java's for-expression in tests (replacing the inner classes).&lt;/li&gt;
&lt;li&gt;Generated source code for &lt;a href="http://theyougen.blogspot.com/2009/10/using-source-code-generation-to.html"&gt;@Iterables and @Samples&lt;/a&gt; annotations on generator factories. The @Iterables annotation was added to support for-expression directly with Generators without additional code. The @Samples annotation can be used to generate a factory for sample values from a Generator class.&lt;br /&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;br /&gt;
Minor changes are:&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;Added a declarative &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/ObjectGenerator.html"&gt;Pojo generator for interfaces&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Implemented &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/CombinedGenerators.html#excludeValues%28net.java.quickcheck.Generator,%20T...%29"&gt;excluding generator&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Added &lt;a href="https://quickcheck.dev.java.net/javadoc/net/java/quickcheck/generator/Generators.html#cast%28net.java.quickcheck.Generator%29"&gt;Generators.cast&lt;/a&gt; utility method. Generator&amp;lt;String&amp;gt; can now be casted to Generator&amp;lt;Object&amp;gt; to provide broader applicability of Generators.&lt;/li&gt;
&lt;li&gt;Maven repository support &lt;a href="http://download.java.net/maven/2/net/java/quickcheck"&gt;(http://download.java.net/maven/2/net/java/quickcheck)&lt;/a&gt;&lt;br /&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;br /&gt;
The version 0.6 of Quickcheck will be a major rework after more than 2 years of experience with the specification-based test approach and development. Mainly to support further development for example shrinking (simplification) of failed test values. A major feature still missing in Quickcheck version 0.5. &lt;br /&gt;
&lt;br /&gt;
When I started Quickcheck I would not have thought that it is possible to develop such a tiny little thing for more than 2 years. Still more ideas and things that could be implemented and written about and so little time.&lt;br /&gt;
&lt;br /&gt;
Recently, I've used the 0.5 version in my Scala projects I blogged about. I did not mention that I tested it in any way,  though. It was a pleasant experience given that QuickCheck was not designed with Scala's  language features in mind. It works with just a small implicit type conversion and adaption layer (10 lines or so).&lt;br /&gt;
&lt;br /&gt;
Distribution function implementations based on the &lt;a href="http://theyougen.blogspot.com/2009/12/decent-bloom-filter-in-scala.html"&gt;XorShift random number generator&lt;/a&gt; I've already worked with would be nice to have. When I'm at adding new distributions anyway, there are still the distribution functions based on heuristics to implement. For example a distribution function based on a boundary heuristic that starts production of values with minimum value, -epsilon, 0, +epsilon and maximum value before random generation kicks in. Another item on my ever growing list is the description of patterns related to specification-based testing and QuickCheck. Obviously there is still plenty of work left.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-7995092475447730702?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oyalBJ27icM:uC730lJm2Ds:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oyalBJ27icM:uC730lJm2Ds:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=oyalBJ27icM:uC730lJm2Ds:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oyalBJ27icM:uC730lJm2Ds:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=oyalBJ27icM:uC730lJm2Ds:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oyalBJ27icM:uC730lJm2Ds:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oyalBJ27icM:uC730lJm2Ds:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=oyalBJ27icM:uC730lJm2Ds:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=oyalBJ27icM:uC730lJm2Ds:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/oyalBJ27icM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/7995092475447730702/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2010/01/quickcheck-05-release.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/7995092475447730702?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/7995092475447730702?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/oyalBJ27icM/quickcheck-05-release.html" title="Quickcheck 0.5 release" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2010/01/quickcheck-05-release.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0EHSHs8fCp7ImA9WxBQGE0.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-6978393066089795076</id><published>2009-12-31T13:55:00.007+01:00</published><updated>2010-01-18T10:07:19.574+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-18T10:07:19.574+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scala" /><title>Peter Norvig's Spelling Corrector in Scala</title><content type="html">This is a Scala version of &lt;a href="http://norvig.com/spell-correct.html"&gt;Peter Norvig's spelling corrector&lt;/a&gt;. Reading his essay I realized that Python is quite a nice language and similar to Scala due to the support of for-expressions in both languages. Before that I've seen Python as a strange language that made you write self in your classes all the time.&lt;br /&gt;
&lt;br /&gt;
I tried to write the shortest version possible without introducing redundancy. The code is definitely not very readable.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush: scala"&gt;import util.matching.Regex.MatchIterator

val alphabet = 'a' to 'z' toArray
def train(features : MatchIterator) = (Map[String, Int]() /: features)((m, f) =&amp;gt; m + ((f, m.getOrElse(f, 0) + 1)))
def words(text : String) = ("[%s]+" format alphabet.mkString).r.findAllIn(text.toLowerCase)
val dict = train(words(io.Source.fromFile("big.txt").mkString))

def edits(s : Seq[(String, String)]) = (for((a,b) &amp;lt;- s; if b.length &amp;gt; 0) yield a + b.substring(1)) ++
&amp;nbsp; (for((a,b) &amp;lt;- s; if b.length &amp;gt; 1) yield a + b(1) + b(0) + b.substring(2)) ++
&amp;nbsp; (for((a,b) &amp;lt;- s; c &amp;lt;- alphabet if b.length &amp;gt; 0) yield a + c + b.substring(1)) ++
&amp;nbsp; (for((a,b) &amp;lt;- s; c &amp;lt;- alphabet) yield a + c + b)

def edits1(word : String) = edits(for(i &amp;lt;- 0 to word.length) yield (word take i, word drop i))
def edits2(word : String) = for(e1 &amp;lt;- edits1(word); e2 &amp;lt;-edits1(e1)) yield e2
def known(words : Seq[String]) = for(w &amp;lt;- words; found &amp;lt;- dict.get(w)) yield w
def or[T](candidates : Seq[T], other : =&amp;gt; Seq[T]) = if(candidates.isEmpty) other else candidates

def candidates(word: String) = or(known(List(word)), or(known(edits1(word)), known(edits2(word))))

def correct(word : String) = ((-1, word) /: candidates(word))(
&amp;nbsp; (max, word) =&amp;gt; if(dict(word) &amp;gt; max._1) (dict(word), word) else max)._2

List("osters", "musters", "mixters") map correct foreach println&amp;nbsp;
&lt;/pre&gt;&lt;br /&gt;
You can run the script directly with the Scala interpreter:&lt;br /&gt;
&lt;pre&gt;$ scala test.scala
posters
masters
matters
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-6978393066089795076?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=nRuzOIpL8hc:9DX6pjBcZ1w:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=nRuzOIpL8hc:9DX6pjBcZ1w:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=nRuzOIpL8hc:9DX6pjBcZ1w:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=nRuzOIpL8hc:9DX6pjBcZ1w:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=nRuzOIpL8hc:9DX6pjBcZ1w:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=nRuzOIpL8hc:9DX6pjBcZ1w:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=nRuzOIpL8hc:9DX6pjBcZ1w:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=nRuzOIpL8hc:9DX6pjBcZ1w:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=nRuzOIpL8hc:9DX6pjBcZ1w:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/nRuzOIpL8hc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/6978393066089795076/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2009/12/peter-norvigs-spelling-corrector-in.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/6978393066089795076?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/6978393066089795076?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/nRuzOIpL8hc/peter-norvigs-spelling-corrector-in.html" title="Peter Norvig's Spelling Corrector in Scala" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2009/12/peter-norvigs-spelling-corrector-in.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0YFQnw8eyp7ImA9WxBWEUk.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-7880057724013372504</id><published>2009-12-20T15:54:00.011+01:00</published><updated>2010-02-02T21:05:13.273+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-02T21:05:13.273+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scala" /><title>A decent Bloom Filter in Scala</title><content type="html">The &lt;a href="http://en.wikipedia.org/wiki/Bloom_filter"&gt;Bloom Filter&lt;/a&gt; is an interesting probabilistic data structure. Its main advantage is that the Bloom Filter drastically reduces the memory consumption.&lt;br /&gt;
&lt;br /&gt;
It produces false positives. Which means that it falsely states that a value is contained. On the other hand it ensures that it produces no false negatives. This renders it useful for space efficient in-process filtering.&lt;br /&gt;
&lt;br /&gt;
The memory needed is not determined by the size of the object stored but by the number of elements and quality of the Bloom Filter (the false positive rate). Whole cars, houses and star ships fit into this little bloom filter. &lt;br /&gt;
&lt;br /&gt;
My Scala implementation is based on a &lt;a href="http://blog.locut.us/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/"&gt;decent stand-alone Java Bloom Filter&lt;/a&gt; by Ian Clarke.&lt;br /&gt;
&lt;pre class="brush: scala"&gt;class BloomFilter(val size : Int, val expectedElements : Int ){
  require(size &amp;gt; 0)
  require(expectedElements &amp;gt; 0)

  val bitArray = new BitArray(size)
  val k = ceil((bitArray.size / expectedElements) * log(2.0)).toInt
  val expectedFalsePositiveProbability = pow(1 - exp(-k * 1.0 * expectedElements / bitArray.size), k)

  def add(hash : Int) {
    def add(i : Int, seed : Int) {
      if(i == k) return
      val next = xorRandom(seed)
      bitArray.set(next)
      add(i + 1, next)
    }
    add(0, hash)
  }

  def contains(hash : Int) : Boolean = {
    def contains(i : Int, seed : Int) : Boolean = {
      if(i == k) return true
      val next = xorRandom(seed)
      if (!bitArray.get(next)) return false
      return contains(i + 1, next)
    }
    contains(0, hash)
  }

  private def xorRandom(i : Int) = {
    var y = i
    y ^= y &amp;lt;&amp;lt; 13
    y ^= y &amp;gt;&amp;gt; 17
    y ^ y &amp;lt;&amp;lt; 5
  }
}
&lt;/pre&gt;It has some advantages over the base version:&lt;br /&gt;
- My implementation replaces java.util.Random with am in-lined &lt;a href="http://www.jstatsoft.org/v08/i14/paper"&gt;Xorshift Random Number Generator&lt;/a&gt; that has sufficient randomness for a Bloom Filter. Additionally the java.util.Random is thread-safe which decreases the performance unnecessarily.&lt;br /&gt;
- The BitSet implementation is replaced by a simplified bit array implementation. An idea I've borrowed from a Bloom Filter implementation by &lt;a href="http://smallwig.blogspot.com/"&gt;Kevin Bourrillion&lt;/a&gt;. The size of the BitArray is always a power of 2. This allows some performance optimizations. The modulo operations used to transform a hash function result into a bit set index can be replaced with bit shifting: &lt;code&gt;x % size = x &amp;amp; (size – 1)&lt;/code&gt; (when size is a power of 2).&lt;br /&gt;
&lt;pre class="brush: scala"&gt;class BitArray(bits : Int){
  val size = nextPow2(bits)
  require(isPowerOf2(size))
  private val data = new Array[Long](size &amp;gt;&amp;gt; 6)

  def set(index : Int) = data(idx(index)) |= (1L &amp;lt;&amp;lt; index)
  def get(index : Int) = (data(idx(index)) &amp;amp; (1L &amp;lt;&amp;lt; index)) != 0

  private val mask = size - 1
  private def idx(index : Int) = (index &amp;amp; mask) &amp;gt;&amp;gt; 6
  private def isPowerOf2(i : Int) = ((i - 1) &amp;amp; i) == 0
  private def nextPow2(i : Int) = {
    def highestBit(remainder : Int, c : Int) : Int = 
       if(remainder &amp;gt; 0) highestBit(remainder &amp;gt;&amp;gt; 1, c + 1) else c
    require(i &amp;lt;= (1 &amp;lt;&amp;lt; 30))
    val n = if(isPowerOf2(i)) i else 1 &amp;lt;&amp;lt; highestBit(i, 0)
    assert(n &amp;gt;= i &amp;amp;&amp;amp; i * 2 &amp;gt; n &amp;amp;&amp;amp; isPowerOf2(n))
    n
  }
}
&lt;/pre&gt;- The bloom filter works directly with hash values. I think that the Set semantics are overstretched when applied to Bloom Filters. A lot of the methods are not supported and the return values of add (and addAll) are misleading. Using the hash values directly is useful if you do not want to create objects to check if they are contained in the filter. The assumption here is that the object instantiation is far more expensive than calculating hash values and create only objects for the values contained in the filter. Normal uses are of course still possible with this interface.&lt;br /&gt;
&lt;br /&gt;
All the changes add up to an increase in performance of at least 300% compared to the base implementation having a similar false positive rate.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-7880057724013372504?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mHZlSrsdscQ:BHDHnQPsKuI:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mHZlSrsdscQ:BHDHnQPsKuI:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=mHZlSrsdscQ:BHDHnQPsKuI:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mHZlSrsdscQ:BHDHnQPsKuI:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=mHZlSrsdscQ:BHDHnQPsKuI:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mHZlSrsdscQ:BHDHnQPsKuI:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mHZlSrsdscQ:BHDHnQPsKuI:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=mHZlSrsdscQ:BHDHnQPsKuI:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=mHZlSrsdscQ:BHDHnQPsKuI:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/mHZlSrsdscQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/7880057724013372504/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2009/12/decent-bloom-filter-in-scala.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/7880057724013372504?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/7880057724013372504?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/mHZlSrsdscQ/decent-bloom-filter-in-scala.html" title="A decent Bloom Filter in Scala" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2009/12/decent-bloom-filter-in-scala.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEcDRHc8fyp7ImA9Wx9VGUQ.&quot;"><id>tag:blogger.com,1999:blog-1392748891339897691.post-3500484831875086596</id><published>2009-10-19T14:07:00.002+02:00</published><updated>2011-02-06T13:41:15.977+01:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-02-06T13:41:15.977+01:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="test" /><category scheme="http://www.blogger.com/atom/ns#" term="quickcheck" /><title>Support for generator type conversion in Quickcheck</title><content type="html">A new utility method is now part of Quickcheck that lets you convert a Generator&amp;lt;A&amp;gt; to a Generator&amp;lt;B&amp;gt; given that A extends B.&lt;br /&gt;
&lt;br /&gt;
This is useful when you have a Generator&amp;lt;Integer&amp;gt; and would like to treat it as a generator of type Generator&amp;lt;Object&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
This can now be done using the Generators.cast method:&lt;br /&gt;
&lt;pre class="brush: java"&gt;Generator&amp;lt;Integer&amp;gt; integers = PrimitiveGenerators.integers();
Generator&amp;lt;Object&amp;gt; objects = Generators.&amp;lt;Object&amp;gt; cast(integers);
&lt;/pre&gt;This is valid as the type Generator is covariant. (It does only produce values but the state of the Generator&amp;lt;T&amp;gt; is not altered.)&lt;br /&gt;
&lt;br /&gt;
The cast is needed as the Java type does only support invariant generic types. Java cannot support the valid implicit type conversion from Generator&amp;lt;Integer&amp;gt; to Generator&amp;lt;Object&amp;gt;. This information is not present in the Java type system. The cast has to be performed explicitly. (If the generic type Generator could be annotated that it is covariant this sort of conversion could be supported. Scala supports the covariant and contravariant type conversion this way.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1392748891339897691-3500484831875086596?l=theyougen.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kIzhYYGOYZU:Z2kki2fCh9o:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kIzhYYGOYZU:Z2kki2fCh9o:-BTjWOF_DHI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=kIzhYYGOYZU:Z2kki2fCh9o:-BTjWOF_DHI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kIzhYYGOYZU:Z2kki2fCh9o:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=kIzhYYGOYZU:Z2kki2fCh9o:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kIzhYYGOYZU:Z2kki2fCh9o:YwkR-u9nhCs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=YwkR-u9nhCs" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kIzhYYGOYZU:Z2kki2fCh9o:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?i=kIzhYYGOYZU:Z2kki2fCh9o:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/TheYoungGeneration?a=kIzhYYGOYZU:Z2kki2fCh9o:7Q72WNTAKBA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/TheYoungGeneration?d=7Q72WNTAKBA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheYoungGeneration/~4/kIzhYYGOYZU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://theyougen.blogspot.com/feeds/3500484831875086596/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://theyougen.blogspot.com/2009/10/support-for-generator-type-conversion.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/3500484831875086596?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392748891339897691/posts/default/3500484831875086596?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheYoungGeneration/~3/kIzhYYGOYZU/support-for-generator-type-conversion.html" title="Support for generator type conversion in Quickcheck" /><author><name>Thomas Jung</name><uri>http://www.blogger.com/profile/13153488692248832865</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://theyougen.blogspot.com/2009/10/support-for-generator-type-conversion.html</feedburner:origLink></entry></feed>

