<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;C0AFSH05fSp7ImA9WhBWF0k.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475</id><updated>2013-04-11T21:41:59.325-07:00</updated><category term="Redmine" /><category term="Fail" /><category term="Sqlite" /><category term="Lucene" /><category term="JUG" /><category term="Tika" /><category term="Java" /><category term="Test" /><category term="Groovy" /><category term="Elasticsearch" /><category term="Development" /><category term="Netbeans" /><category term="Scala" /><category term="Jetty" /><category term="Gradle" /><category term="Git" /><category term="Maven" /><category term="Akka" /><category term="OpenCms" /><category term="Spring" /><category term="Book" /><category term="Ruby on Rails" /><category term="Solr" /><category term="Event" /><category term="Mahout" /><title>Dev Time</title><subtitle type="html">Notes on development issues</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://blog.florian-hopf.de/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>35</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/florian-hopf/UjyC" /><feedburner:info uri="florian-hopf/ujyc" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;CUYDR3o-fSp7ImA9WhBSEks.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-1445571298128382344</id><published>2013-02-19T00:39:00.001-08:00</published><updated>2013-02-19T00:39:36.455-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-02-19T00:39:36.455-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Event" /><title>Softwerkskammer Rhein-Main Open Space</title><content type="html">&lt;p&gt;On Saturday I attended an &lt;a href="http://rhein-main-openspace.softwerkskammer.de/" target="_blank"&gt;Open Space in Wiesbaden&lt;/a&gt;, organized by members of Softwerkskammer Rhein-Main, a very active chapter of the German software craftmanship community. The event took place in the offices of &lt;a href="http://www.seibert-media.net/" target="_blank"&gt;Seibert Media&lt;/a&gt; above a shopping mall including a nice view of the city.&lt;/p&gt;

&lt;h4&gt;The Format&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Open-space_meeting" target="_blank"&gt;Open Space conferences&lt;/a&gt; are special as there is no predefined agenda. All the attendees can bring ideas and propose those in the opening session and choose a time slot and room. Sessions are not necessarily normal presentations but rather discussions so it's even OK to just propose a question that you have or a topic you'd like to learn more about from the attendees. Also, there are some guidelines and rules: sessions don't need to start and end in time, you can always leave a session in case you feel you can't contribute and you shouldn't be disappointed if nobody shows up for your proposed session.&lt;/p&gt;

&lt;h4&gt;Personal Kanban&lt;/h4&gt;

&lt;p&gt;Dennis Traub presented a session on &lt;a href="http://www.personalkanban.com/pk/" target="_blank"&gt;Personal Kanban&lt;/a&gt;. As I did Kanban style development in one project already I was eager to learn how to apply the principles to personal organization. Basically it all works the same as normal Kanban. Tasks are visualized on a board where a swimlane defines the state of a task with work items flowing from left (todo) to right (done). You can define swimlanes as it fits your habits, e.g. one for todos, one for in progress and one for blocked. The in progress lane needs to have a Work in Progress limit which is the amount of tasks you start and process in parallel. An important aspect is that you don't have to put all your backlog items to the todo lane but you can also keep them in a seperate place. This keeps you from getting overwhelmed when looking at the board.&lt;/p&gt;

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-nnBQsluqZeM/USMe5pA2fhI/AAAAAAAAAHI/WaRA1hgL1uA/s1600/2013-02-16+12.15.04.jpg" imageanchor="1" &gt;&lt;img border="0" src="http://2.bp.blogspot.com/-nnBQsluqZeM/USMe5pA2fhI/AAAAAAAAAHI/WaRA1hgL1uA/s320/2013-02-16+12.15.04.jpg" vertical-align="middle"/&gt;&lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;It sounds like Kanban is a good way for organizing your daily life. For me personally the biggest hindrance is that I am working from my living room and I'd rather not put a Kanban board in my living room. If I'd use a separate office I guess I'd try it immediately.&lt;/p&gt;

&lt;h4&gt;Open Source&lt;/h4&gt;

&lt;p&gt;An attendee wanted to know some experiences with Open Source communities. Two full time committers, &lt;a href="http://www.olivergierke.de/" target="_blank"&gt;Ollie&lt;/a&gt; for Spring and Marcel for Eclipse, shared some of their experiences. I am still surprised that a lot of Open Source projects have quite some bugs in their trackers that could easily be fixed by newcomers. A lot of people like Open Source software but not that many seem to be interested in contributing to a project continuously. Most of the interaction with users in the issue trackers are one time reports, so the people report one bug and move on. Even for big projects like Spring and Eclipse it's hard to find committers. One way to motivate people is to organize hack days where users learn to work with the sources of the projects but this also needs quite some preparation.&lt;/p&gt;

&lt;h4&gt;Freelancing&lt;/h4&gt;

&lt;p&gt;The topic of freelancing was discussed all over the day. &lt;a href="http://tckr.cc/" target="_blank"&gt;Markus Tacker&lt;/a&gt; presented his idea of the kybernetic agency, a plan to form a freelance network with people who can work on projects together. We discussed benefits and possible problems, mainly of legal type. A quite inspiring session that also made me think about the difference of freelancing in the Java enterprise world compared to PHP development. Most of the freelancers I know would prefer not to work 5 days a week for one client exclusively but that is often a prerequisite for projects in the enterprise world.&lt;/p&gt;

&lt;h4&gt;Learning&lt;/h4&gt;

&lt;p&gt;Learning is a topic that is very important to me so I proposed a session on it. I already switched from 5 to 4 days the last months of my employment at synyx because I felt the need to invest more time in learning which is often not possible when working on client projects. Even now as a freelancer I keep one day for learning only. What works best for me is writing blogposts that contain some sample code. I can build something and when writing the post I make sure that I have a deep understanding of the topic I am writing about. Other people also said that the most important aspect is to have something to work on, reading or watching screencasts alone is no sustainable activity. I also liked the technique of another freelancer: whenever he notices that he could do something different on the current project he stops to track the time for the customer and tries to find ways to improve the project, probably learning a new approach. This is something you are doing implicitly as a freelancer, you often spend some of your spare time thinking about client work but I like this explicit approach.&lt;/p&gt;

&lt;h4&gt;Summary&lt;/h4&gt;

&lt;p&gt;All in all this was a really fruitful, but also exhausting, day. Though I chose meta topics exclusively I gained a lot from visiting. Thanks a lot to the organizers (mainly &lt;a href="http://www.squeakyvessel.com/" target="_blank"&gt;Benjamin&lt;/a&gt;), moderators, sponsors and all the attendees that made this event possible. I am looking forward to meeting a lot of the people again at &lt;a href="http://www.socrates-conference.de/" target="_blank"&gt;Socrates&lt;/a&gt; this year.&lt;p/&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/1445571298128382344/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2013/02/softwerkskammer-rhein-main-open-space.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/1445571298128382344?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/1445571298128382344?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/GTb6LWHmicY/softwerkskammer-rhein-main-open-space.html" title="Softwerkskammer Rhein-Main Open Space" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-nnBQsluqZeM/USMe5pA2fhI/AAAAAAAAAHI/WaRA1hgL1uA/s72-c/2013-02-16+12.15.04.jpg" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2013/02/softwerkskammer-rhein-main-open-space.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUcESH0ycSp7ImA9WhNaF0w.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-5060361724781563749</id><published>2013-02-01T03:10:00.000-08:00</published><updated>2013-02-01T03:10:09.399-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-02-01T03:10:09.399-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Book" /><category scheme="http://www.blogger.com/atom/ns#" term="Gradle" /><title>Book Review:  Gradle Effective Implementation Guide</title><content type="html">&lt;i&gt;PacktPub kindly offered me a free review edition of &lt;a target="_blank" href="http://www.packtpub.com/gradle-effective-implementation-guide/book"&gt;Gradle Effective Implementation Guide&lt;/a&gt; written by &lt;a target="_blank" href="http://mrhaki.blogspot.de/"&gt;mrhaki Hubert Klein Ikkink&lt;/a&gt;. As I planned to read it anyway I agreed to write a review of it.&lt;/i&gt;

&lt;p&gt;Maven was huge for Java Development. It brought dependency management, sane conventions and platform independent builds to the mainstream. If there is a Maven pom file available for an open source project you can be quite sure to manage to build it on your local machine in no time.&lt;/p&gt;

&lt;p&gt;But there are cases when it doesn't work that well. Its phase model is rather strict and the one-artifact-per-build restriction can get in your way for more unusual build setups. You can workaround some of these problems using profiles and assemblies but it feels that it is primarily useful for a certain set of projects.&lt;/p&gt;

&lt;p&gt;&lt;a target="_blank" href="http://www.gradle.org/"&gt;Gradle&lt;/a&gt; is different. It's more flexible but there's also a learning curve involved. Groovy as its build DSL is easy to read but probably not that easy to write at first because there are often multiple ways to do something. As a standard Java developer like me you might be unsure about the proper way of doing something.&lt;/p&gt;

&lt;p&gt;There are a lot of helpful resources online, namely the &lt;a target="_blank" href="http://forums.gradle.org/gradle"&gt;forum&lt;/a&gt; and the &lt;a target="_blank" href="http://www.gradle.org/docs/current/userguide/userguide.html"&gt;excellent user guide&lt;/a&gt; but as I prefer to read longer sections offline I am really glad that there now is a book available that contains extensive information and can get you started with Gradle.&lt;/p&gt;

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a target="_blank" href="http://www.packtpub.com/gradle-effective-implementation-guide/book" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"&gt;&lt;img border="0" height="320" width="259" src="http://1.bp.blogspot.com/-e8x0L35XNKc/UQuc7l1B__I/AAAAAAAAAG4/wwbHwriWSeU/s320/gradle-effective-implementation-guide.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;h4&gt;Content&lt;/h4&gt;

&lt;p&gt;The book starts with a general introduction into Gradle. You'll get a high level overview of its features, learn how to install it and write your first build file. You'll also learn some important options of the gradle executable that I haven't been aware of.&lt;/p&gt;

&lt;p&gt;Chapter 2 explains tasks and how to write build files. This is a very important chapter if you are not that deep into the Groovy language. You'll learn about the implicitly available Task and Project instances and the different ways of accessing methods and properties and of defining tasks and dependencies between them.&lt;/p&gt;

&lt;p&gt;Working with files is an important part of any build system. Chapter 3 contains detailed information on accessing and modifying files, file collections and file trees. This is also where the benefit of using Groovy becomes really obvious. The ease of working with collections can lead to very concise build definitions though you have all the power of Groovy and the JVM at your hands. The different log levels are useful to know and can come in handy when you'd like to diagnose a build.&lt;/p&gt;

&lt;p&gt;While understanding tasks is an important foundation for working with Gradle it's likely that you are after using it with programming languages. Nearly all of the remaining chapters cover working with different aspects on builds for JVM languages. Chapter 4 starts with a look at the Java plugin and its additional concepts. You'll see how you can compile and package Java applications and how to work with sourceSets.&lt;/p&gt;

&lt;p&gt;Nearly no application is an island. The Java world provides masses of useful libraries that can help you build your application. Proper dependency management, as introduced in Chapter 5, is important for easy build setups and for making sure that you do not introduce incompatible combinations of libraries. Gradle supports Maven, Ivy and local file based repositories. Configurations are used to group dependencies, e.g. to define dependencies that are only necessary for tests. If you need to influence the version you are retrieving for a certain dependency you can configure resolution strategies, version ranges and exclusions for transitive dependencies.&lt;/p&gt;

&lt;p&gt;Automated testing is a crucial part of any modern software development process. Gradle can work with JUnit and TestNG out of the box. Test execution times can be improved a lot by the incremental build support and the parallelization of tests. I guess this can lead to dramatically shorter build times, something I plan to try on an example project with a lot of tests in the near future. This chapter also introduces the different ways to run an application, create distributions and how to publish artifacts.&lt;/p&gt;

&lt;p&gt;The next chapter will show you how you can structure your application in separate projects. Gradle has clever ways to find out which projects need to be rebuild before and after building a certain project.&lt;/p&gt;

&lt;p&gt;Chapter 8 contains information on how to work with Scala and Groovy code. The necessary compiler versions can be defined in the build so there is no need to have additional installations. I've heard good things about the Scala integration so Gradle seems to be a viable alternative to sbt.&lt;/p&gt;

&lt;p&gt;The check task can be used to gather metrics on your project using many of the available open source projects for code quality measurement. Chapter 9 shows you how to include tools like Checkstyle, PMD and FindBugs to analyze your project sources, either standalone or by sending data to Sonar.&lt;/p&gt;

&lt;p&gt;If you need additional functionality that is not available you can start implementing your own tasks and plugins. Chapter 10 introduces the important classes for writing custom plugins and how to use them from Groovy and Java.&lt;/p&gt;

&lt;p&gt;Gradle can be used on several Continuous Integration systems. As I've been working with Hudson/Jenkins exclusively during the last years it was interesting to also read about the commercial alternatives Team City and Bamboo in Chapter 11.&lt;/p&gt;

&lt;p&gt;The final chapter contains a lot of in depth information on the Eclipse and IDEA plugins. Honestly, this contains more information on the Eclipse file format than I wanted to know but I guess that can be really useful for users. Unfortunately the &lt;a target="_blank" href="https://github.com/kelemen/netbeans-gradle-project"&gt;excellent Netbeans plugin&lt;/a&gt; is not described in the book.&lt;/p&gt;

&lt;h4&gt;Summary&lt;/h4&gt;

&lt;p&gt;The book is an excellent introduction into working effectively with Gradle. It has helped me to get a far better understanding of the concepts. If you are thinking about or already started working with Gradle I highly recommend to &lt;a target="_blank" href="http://www.packtpub.com/gradle-effective-implementation-guide/book"&gt;get a copy&lt;/a&gt;. There are a lot of detailed example files that you can use immediately. Many of those are very close to real world use cases and can help you thinking about additional ways Gradle can be useful for organizing your builds.&lt;/p&gt;

</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/5060361724781563749/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2013/02/book-review-gradle-effective.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/5060361724781563749?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/5060361724781563749?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/76POs3Poy5g/book-review-gradle-effective.html" title="Book Review:  Gradle Effective Implementation Guide" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-e8x0L35XNKc/UQuc7l1B__I/AAAAAAAAAG4/wwbHwriWSeU/s72-c/gradle-effective-implementation-guide.jpg" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2013/02/book-review-gradle-effective.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0QHSHc6fSp7ImA9WhNaEUw.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-4384756732308598401</id><published>2013-01-24T00:49:00.000-08:00</published><updated>2013-01-25T04:02:19.915-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-01-25T04:02:19.915-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><title>Make your Filters Match: Faceting in Solr</title><content type="html">&lt;p&gt;&lt;a href="http://searchhub.org/2009/09/02/faceted-search-with-solr/" target="_blank"&gt;Facets&lt;/a&gt; are a great search feature that let users easily navigate to the documents they are looking for. Solr makes it really easy to use them though when naively querying for facet values you might see some unexpected behaviour. Read on to learn the basics of what is happening when you are passing in filter queries for faceting. Also, I'll show how you can leverage local params to choose a different query parser when selecting facet values.&lt;/p&gt;

&lt;h4&gt;Introduction&lt;/h4&gt;

&lt;p&gt;Facets are a way to display categories next to a users search results, often with a count of how many results are in this category. The user can then select one of those facet values to retrieve only those results that are assigned to this category. This way he doesn't have to know what category he is looking for when entering the search term as all the available categories are delivered with the search results. This approach is really popular on sites like Amazon and eBay and is a great way to guide the user.&lt;/p&gt;

&lt;p&gt;Solr brought faceting to the Lucene world and arguably the feature was an important driving factor for its success (&lt;a href="http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html" target="_blank"&gt;Lucene 3.4 introduced faceting as well&lt;/a&gt;). Facets can be build from terms in the index, custom queries and ranges though in this post we will only look at field facets.&lt;/p&gt; 

&lt;p&gt;As a very simple example consider this schema definition:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;fields&amp;gt;&lt;br/&gt;    &amp;lt;field name=&amp;quot;id&amp;quot; type=&amp;quot;string&amp;quot; indexed=&amp;quot;true&amp;quot; stored=&amp;quot;true&amp;quot; required=&amp;quot;true&amp;quot; multiValued=&amp;quot;false&amp;quot; /&amp;gt; &lt;br/&gt;    &amp;lt;field name=&amp;quot;text&amp;quot; type=&amp;quot;text_general&amp;quot; indexed=&amp;quot;true&amp;quot; stored=&amp;quot;true&amp;quot;/&amp;gt;&lt;br/&gt;    &amp;lt;field name=&amp;quot;author&amp;quot; type=&amp;quot;string&amp;quot; indexed=&amp;quot;true&amp;quot; stored=&amp;quot;false&amp;quot;/&amp;gt;&lt;br/&gt;&amp;lt;/fields&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are three fields, the id, a title that we'd probably like to search on and an author. The author is defined as a string field which means no analyzing at all. The faceting mechanism uses the term value and not a stored value so we want to make sure that the original value is preserved. I explicitly don't store the author information to make it clear that we are working with the indexed value.&lt;/p&gt;

&lt;p&gt;Let's index some book data with curl (see &lt;a href="https://github.com/fhopf/solr-facet-example" target="_blank"&gt;this GitHub repo&lt;/a&gt; for the complete example including some unit tests that execute the same functionality using Java).&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl http://localhost:8082/solr/update -H &amp;quot;Content-Type: text/xml&amp;quot; --data-binary \&lt;br/&gt;    '&amp;lt;add&amp;gt;&amp;lt;doc&amp;gt;&lt;br/&gt;            &amp;lt;field name=&amp;quot;id&amp;quot;&amp;gt;1&amp;lt;/field&amp;gt;&lt;br/&gt;            &amp;lt;field name=&amp;quot;text&amp;quot;&amp;gt;On the Shortness of Life&amp;lt;/field&amp;gt;&lt;br/&gt;            &amp;lt;field name=&amp;quot;author&amp;quot;&amp;gt;Seneca&amp;lt;/field&amp;gt;&lt;br/&gt;    &amp;lt;/doc&amp;gt; &lt;br/&gt;    &amp;lt;doc&amp;gt;&lt;br/&gt;            &amp;lt;field name=&amp;quot;id&amp;quot;&amp;gt;2&amp;lt;/field&amp;gt;&lt;br/&gt;            &amp;lt;field name=&amp;quot;text&amp;quot;&amp;gt;What I Talk About When I Talk About Running&amp;lt;/field&amp;gt;&lt;br/&gt;            &amp;lt;field name=&amp;quot;author&amp;quot;&amp;gt;Haruki Murakami&amp;lt;/field&amp;gt;&lt;br/&gt;    &amp;lt;/doc&amp;gt; &lt;br/&gt;    &amp;lt;doc&amp;gt;&lt;br/&gt;            &amp;lt;field name=&amp;quot;id&amp;quot;&amp;gt;3&amp;lt;/field&amp;gt;&lt;br/&gt;            &amp;lt;field name=&amp;quot;text&amp;quot;&amp;gt;The Dude and the Zen Master&amp;lt;/field&amp;gt;&lt;br/&gt;            &amp;lt;field name=&amp;quot;author&amp;quot;&amp;gt;Jeff &amp;quot;The Dude&amp;quot; Bridges&amp;lt;/field&amp;gt;&lt;br/&gt;    &amp;lt;/doc&amp;gt;&lt;br/&gt;    &amp;lt;/add&amp;gt;'&lt;br/&gt;curl http://localhost:8082/solr/update -H &amp;quot;Content-Type: text/xml&amp;quot; --data-binary '&amp;lt;commit /&amp;gt;'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And verify that the documents are available:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl http://localhost:8082/solr/query?q=*:*&lt;br/&gt;{&lt;br/&gt;  &amp;quot;responseHeader&amp;quot;:{&lt;br/&gt;    &amp;quot;status&amp;quot;:0,&lt;br/&gt;    &amp;quot;QTime&amp;quot;:3,&lt;br/&gt;    &amp;quot;params&amp;quot;:{&lt;br/&gt;      &amp;quot;q&amp;quot;:&amp;quot;*:*&amp;quot;}},&lt;br/&gt;  &amp;quot;response&amp;quot;:{&amp;quot;numFound&amp;quot;:3,&amp;quot;start&amp;quot;:0,&amp;quot;docs&amp;quot;:[&lt;br/&gt;      {&lt;br/&gt;        &amp;quot;id&amp;quot;:&amp;quot;1&amp;quot;,&lt;br/&gt;        &amp;quot;text&amp;quot;:&amp;quot;On the Shortness of Life&amp;quot;},&lt;br/&gt;      {&lt;br/&gt;        &amp;quot;id&amp;quot;:&amp;quot;2&amp;quot;,&lt;br/&gt;        &amp;quot;text&amp;quot;:&amp;quot;What I Talk About When I Talk About Running&amp;quot;},&lt;br/&gt;      {&lt;br/&gt;        &amp;quot;id&amp;quot;:&amp;quot;3&amp;quot;,&lt;br/&gt;        &amp;quot;text&amp;quot;:&amp;quot;The Dude and the Zen Master&amp;quot;}]&lt;br/&gt;  }}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I'll omit parts of the response in the following examples. We can also have a look at the shiny new administration view of Solr 4 to see all terms that are indexed for the field author.&lt;/p&gt;

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-Gyieq4YvHEo/UQDpZCNB82I/AAAAAAAAAGk/AVL8zIszDzE/s1600/author-fields.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="238" width="320" src="http://4.bp.blogspot.com/-Gyieq4YvHEo/UQDpZCNB82I/AAAAAAAAAGk/AVL8zIszDzE/s320/author-fields.png" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;Each of the author names is indexed as one term.&lt;/p&gt;

&lt;h4&gt;Faceting&lt;/h4&gt;

&lt;p&gt;Let's move on to the faceting part. To let the user drill down on search results there are two steps involved. First you tell Solr that you would like to retrieve facets with the results. Facets are contained in an extra section of the response and consist of the indexed term as well as a count. As with most Solr parameters you can either send the necessary options with the query or preconfigure them in solrconfig.xml. This query has faceting on the author field enabled:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl &amp;quot;http://localhost:8082/solr/query?q=*:*&amp;facet=on&amp;facet.field=author&amp;quot;
{
  &amp;quot;responseHeader&amp;quot;:{...},
  &amp;quot;response&amp;quot;:{&amp;quot;numFound&amp;quot;:3,&amp;quot;start&amp;quot;:0,&amp;quot;docs&amp;quot;:[
      {
        &amp;quot;id&amp;quot;:&amp;quot;1&amp;quot;,
        &amp;quot;text&amp;quot;:&amp;quot;On the Shortness of Life&amp;quot;},
      {
        &amp;quot;id&amp;quot;:&amp;quot;2&amp;quot;,
        &amp;quot;text&amp;quot;:&amp;quot;What I Talk About When I Talk About Running&amp;quot;},
      {
        &amp;quot;id&amp;quot;:&amp;quot;3&amp;quot;,
        &amp;quot;text&amp;quot;:&amp;quot;The Dude and the Zen Master&amp;quot;}]
  },
  &amp;quot;facet_counts&amp;quot;:{
    &amp;quot;facet_queries&amp;quot;:{},
    &amp;quot;facet_fields&amp;quot;:{
      &amp;quot;author&amp;quot;:[
        &amp;quot;Haruki Murakami&amp;quot;,1,
        &amp;quot;Jeff \&amp;quot;The Dude\&amp;quot; Bridges&amp;quot;,1,
        &amp;quot;Seneca&amp;quot;,1]},
    &amp;quot;facet_dates&amp;quot;:{},
    &amp;quot;facet_ranges&amp;quot;:{}}}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And this is what a configuration in solrconfig looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;requestHandler name=&amp;quot;/select&amp;quot; class=&amp;quot;solr.SearchHandler&amp;quot;&amp;gt;
  &amp;lt;lst name=&amp;quot;defaults&amp;quot;&amp;gt;
    &amp;lt;str name=&amp;quot;q&amp;quot;&amp;gt;*:*&amp;lt;/str&amp;gt;  
    &amp;lt;str name=&amp;quot;echoParams&amp;quot;&amp;gt;none&amp;lt;/str&amp;gt;
    &amp;lt;int name=&amp;quot;rows&amp;quot;&amp;gt;10&amp;lt;/int&amp;gt;
    &amp;lt;str name=&amp;quot;df&amp;quot;&amp;gt;text&amp;lt;/str&amp;gt;
    &amp;lt;str name=&amp;quot;facet&amp;quot;&amp;gt;on&amp;lt;/str&amp;gt;
    &amp;lt;str name=&amp;quot;facet.field&amp;quot;&amp;gt;author&amp;lt;/str&amp;gt;
    &amp;lt;str name=&amp;quot;facet.mincount&amp;quot;&amp;gt;1&amp;lt;/str&amp;gt;
  &amp;lt;/lst&amp;gt;
&amp;lt;/requestHandler&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This way we don't have to pass the parameters with the query anymore and can see which parts of the query change.&lt;/p&gt;

&lt;h4&gt;Common Filtering&lt;/h4&gt;

&lt;p&gt;When a user chooses a facet you issue the same query again, this time adding a filter query that restricts the search results to any that have the value for this certain fields set. In our case the user would only see books of one certain author. Let's start simple and pretend that a user can't handle the massive amount of 3 search results and is only interested in books on Seneca:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:8082/solr/select?fq=author:Seneca'
{
  &amp;quot;responseHeader&amp;quot;:{...},
  &amp;quot;response&amp;quot;:{&amp;quot;numFound&amp;quot;:1,&amp;quot;start&amp;quot;:0,&amp;quot;docs&amp;quot;:[
      {
        &amp;quot;id&amp;quot;:&amp;quot;1&amp;quot;,
        &amp;quot;text&amp;quot;:&amp;quot;On the Shortness of Life&amp;quot;}]
  },
  &amp;quot;facet_counts&amp;quot;:{
    &amp;quot;facet_queries&amp;quot;:{},
    &amp;quot;facet_fields&amp;quot;:{
      &amp;quot;author&amp;quot;:[
        &amp;quot;Seneca&amp;quot;,1]},
    &amp;quot;facet_dates&amp;quot;:{},
    &amp;quot;facet_ranges&amp;quot;:{}}}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Works fine. We added a filter query that restricts the results to only those that are written by Seneca. Note that there is only one facet left because the search results don't contain any books by other authors. Let's see what happens when we try to filter the results to see only books by Haruki Murakami. We need to URL encode the blank, the rest of the query stays the same:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:8082/solr/select?fq=author:Haruki%20Murakami'
{
  &amp;quot;responseHeader&amp;quot;:{...},
  &amp;quot;response&amp;quot;:{&amp;quot;numFound&amp;quot;:0,&amp;quot;start&amp;quot;:0,&amp;quot;docs&amp;quot;:[]
  },
  &amp;quot;facet_counts&amp;quot;:{
    &amp;quot;facet_queries&amp;quot;:{},
    &amp;quot;facet_fields&amp;quot;:{
      &amp;quot;author&amp;quot;:[]},
    &amp;quot;facet_dates&amp;quot;:{},
    &amp;quot;facet_ranges&amp;quot;:{}}}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;No results. Why is that? The default query parser for filter queries is the Lucene query parser. It tokenizes the query on whitespace, so even if we store the field unanalyzed it's not the query we are probably expecting to use. The query that is the result of the parsing process is not a term query as in our first example. It's a boolean query that consists of two term queries &lt;code&gt;author:Haruki text:murakami&lt;/code&gt;. If you are familiar with the &lt;a href="http://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html" target="_blank"&gt;Lucene query syntax&lt;/a&gt; this won't be a surprise to you. If you prefix a term with a field name and a colon it will search on this field, otherwise it will search on the default field we declared in solrconfig.xml.&lt;/p&gt;

&lt;p&gt;How can we fix it? Simple, just turn it into a phrase by surrounding the words with double quotes:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:8082/solr/select?fq=author:&amp;quot;Haruki%20Murakami&amp;quot;'
{
  &amp;quot;responseHeader&amp;quot;:{...},
  &amp;quot;response&amp;quot;:{&amp;quot;numFound&amp;quot;:1,&amp;quot;start&amp;quot;:0,&amp;quot;docs&amp;quot;:[
      {
        &amp;quot;id&amp;quot;:&amp;quot;2&amp;quot;,
        &amp;quot;text&amp;quot;:&amp;quot;What I Talk About When I Talk About Running&amp;quot;}]
  },
  &amp;quot;facet_counts&amp;quot;:{
    &amp;quot;facet_queries&amp;quot;:{},
    &amp;quot;facet_fields&amp;quot;:{
      &amp;quot;author&amp;quot;:[
        &amp;quot;Haruki Murakami&amp;quot;,1]},
    &amp;quot;facet_dates&amp;quot;:{},
    &amp;quot;facet_ranges&amp;quot;:{}}}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, if you prefer, you can also escape the blank using the backslash, which yields the same result:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:8082/solr/select?fq=author:Haruki\%20Murakami'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Fun fact: I am not that good at picking examples. If we are filtering on our last author we will be surprised (at least I scratched my head for a while):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:8082/solr/select?fq=author:Jeff%20&amp;quot;The%20Dude&amp;quot;%20Bridges'
{
  &amp;quot;responseHeader&amp;quot;:{...},
  &amp;quot;response&amp;quot;:{&amp;quot;numFound&amp;quot;:1,&amp;quot;start&amp;quot;:0,&amp;quot;docs&amp;quot;:[
      {
        &amp;quot;id&amp;quot;:&amp;quot;3&amp;quot;,
        &amp;quot;text&amp;quot;:&amp;quot;The Dude and the Zen Master&amp;quot;}]
  },
  &amp;quot;facet_counts&amp;quot;:{
    &amp;quot;facet_queries&amp;quot;:{},
    &amp;quot;facet_fields&amp;quot;:{
      &amp;quot;author&amp;quot;:[
        &amp;quot;Jeff \&amp;quot;The Dude\&amp;quot; Bridges&amp;quot;,1]},
    &amp;quot;facet_dates&amp;quot;:{},
    &amp;quot;facet_ranges&amp;quot;:{}}}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This actually seemed to work though we neither turned it into a phrase nor did we escape the blanks. If we look at how the Lucene query parser handles this query we see immediately why this returns a result. As with the last example this is turned into a boolean query, only the first query is executed against the author field. The other two tokens are searching on the default field and in this case "The Dude" matches the text field: &lt;code&gt;author:Jeff text:"the dude" text:bridges&lt;/code&gt;. If you just want to match on the author field you can escape the blanks as we did in the example before:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:8082/solr/select?fq=author:Jeff\%20\"The\%20Dude\"\%20Bridges'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I'll spare you with the response.&lt;/p&gt;

&lt;h4&gt;Using Local Params to set the Query Parser&lt;/h4&gt;

&lt;p&gt;At ApacheCon Europe in November Eric Hatcher did &lt;a href="http://archive.apachecon.com/eu2012/presentations/06-Tuesday/PR-Lucene/aceu-2012-query-parsing-tips-and-tricks.pdf" target="_blank"&gt;a really interesting presentation on query parsers in Solr&lt;/a&gt; where he introduced another, probably cleaner way to do this: You can use the &lt;a href="http://wiki.apache.org/solr/LocalParams" target="_blank"&gt;local param syntax&lt;/a&gt; for choosing a different query parser. As we have learnt, the query parser defaults to the Lucene query parser. You can change the query parser for the query by setting the defType parameter, either via request parameters or in the solrconfig.xml but I am not aware of any way to set it for the filter queries. As we have unanalyzed terms the correct thing to do would be to use a TermQuery, which can be built using the &lt;a href="http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/search/TermQParserPlugin.html" target="_blank"&gt;TermQParserPlugin&lt;/a&gt;. To use this parser we can explicitly set it in the filter query:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:8082/solr/select?fq={!term%20f=author%20v='Jeff%20"The%20Dude"%20Bridges'}'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, for better readability, without the URL encoding:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:8082/solr/select?fq={!term f=author v='Jeff "The Dude" Bridges'}'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The local params are enclosed by curly braces. The value term is a shorthand for type='term', f is the fiels the TermQuery should be built for and v the value. Though this might look quirky at first this is a really powerful feature, especially since you can reference other request parameters from the local params. Consider this configuration of a request handler:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;requestHandler name=&amp;quot;/selectfiltered&amp;quot; class=&amp;quot;solr.SearchHandler&amp;quot;&amp;gt;
  &amp;lt;lst name=&amp;quot;defaults&amp;quot;&amp;gt;
    &amp;lt;str name=&amp;quot;q&amp;quot;&amp;gt;*:*&amp;lt;/str&amp;gt;  
    &amp;lt;str name=&amp;quot;echoParams&amp;quot;&amp;gt;explicit&amp;lt;/str&amp;gt;
    &amp;lt;int name=&amp;quot;rows&amp;quot;&amp;gt;10&amp;lt;/int&amp;gt;
    &amp;lt;str name=&amp;quot;wt&amp;quot;&amp;gt;json&amp;lt;/str&amp;gt;
    &amp;lt;str name=&amp;quot;indent&amp;quot;&amp;gt;true&amp;lt;/str&amp;gt;
    &amp;lt;str name=&amp;quot;df&amp;quot;&amp;gt;text&amp;lt;/str&amp;gt;
    &amp;lt;str name=&amp;quot;facet&amp;quot;&amp;gt;on&amp;lt;/str&amp;gt;
    &amp;lt;str name=&amp;quot;facet.field&amp;quot;&amp;gt;author&amp;lt;/str&amp;gt;
    &amp;lt;str name=&amp;quot;facet.mincount&amp;quot;&amp;gt;1&amp;lt;/str&amp;gt;
  &amp;lt;/lst&amp;gt;
  &amp;lt;lst name=&amp;quot;appends&amp;quot;&amp;gt;
    &amp;lt;str name=&amp;quot;fq&amp;quot;&amp;gt;{!term f=author v=$author}&amp;lt;/str&amp;gt;
  &amp;lt;/lst&amp;gt;
&amp;lt;/requestHandler&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The default configuration is the same as we were using above. Only the appends section is new, which adds additional parameters to the request. There are similar local params as we were using via curl, but the real filter query is replaced by the variable $author. This can now be passed in cleanly via an aptly named parameter:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:8082/solr/selectfiltered?author=Jeff%20"The%20Dude"%20Bridges'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are a lot of powerful features in Solr that are not that commonly used. To see this example in Java have a look at &lt;a href="https://github.com/fhopf/solr-facet-example" target="_blank"&gt;the Github repository of this blogpost&lt;/a&gt;.&lt;/p&gt;
</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/4384756732308598401/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2013/01/make-your-filters-match-faceting-in-solr.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/4384756732308598401?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/4384756732308598401?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/FKyffmDlN4E/make-your-filters-match-faceting-in-solr.html" title="Make your Filters Match: Faceting in Solr" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-Gyieq4YvHEo/UQDpZCNB82I/AAAAAAAAAGk/AVL8zIszDzE/s72-c/author-fields.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2013/01/make-your-filters-match-faceting-in-solr.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0MFQX88fCp7ImA9WhNUGE0.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-2776516946770882366</id><published>2013-01-09T23:03:00.000-08:00</published><updated>2013-01-09T23:03:30.174-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-01-09T23:03:30.174-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Test" /><category scheme="http://www.blogger.com/atom/ns#" term="Elasticsearch" /><title>JUnit Rule for ElasticSearch</title><content type="html">&lt;p&gt;While I am using &lt;a href="http://lucene.apache.org/solr/" target="_blank"&gt;Solr&lt;/a&gt; a lot in my current engagement I recently started a pet project with &lt;a href="http://www.elasticsearch.org/" target="_blank"&gt;ElasticSearch&lt;/a&gt; to learn more about it. Some of its functionality is rather different from Solr so there is quite some experimentation involved. I like to start small and implement tests if I like to find out how things work (see &lt;a href="http://blog.florian-hopf.de/2012/06/running-and-testing-solr-with-gradle.html" target="_blank"&gt;this post on how to write tests for Solr&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;ElasticSearch internally uses &lt;a href="http://testng.org/doc/index.html" target="_blank"&gt;TestNG&lt;/a&gt; and the test classes are not available in the distributed jar files. Fortunately it is really easy to start an ElasticSearch instance from within a test so it's no problem to do something similar in JUnit. Felix Müller posted some &lt;a href="http://cupofjava.de/blog/2012/11/27/embedded-elasticsearch-server-for-tests/" target="_blank"&gt;useful code snippets&lt;/a&gt; on how to do this, obviously targeted at a Maven build. The ElasticSearch instance is started in a setUp method and stopped in a tearDown method:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;private EmbeddedElasticsearchServer embeddedElasticsearchServer;

@Before
public void startEmbeddedElasticsearchServer() {
    embeddedElasticsearchServer = new EmbeddedElasticsearchServer();
}

@After
public void shutdownEmbeddedElasticsearchServer() {
    embeddedElasticsearchServer.shutdown();
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As it is rather cumbersome to add these methods to all tests I transformed the code to a &lt;a href="http://kentbeck.github.com/junit/javadoc/4.10/org/junit/rules/TestRule.html" target="_blank"&gt;JUnit rule&lt;/a&gt;. Rules can execute code before and after a test is run and influence its execution. There are some base classes available that make it really easy to get started with custom rules.&lt;/p&gt;

&lt;p&gt;Our ElasticSearch example can be easily modeled using the base class &lt;a href="http://kentbeck.github.com/junit/javadoc/4.10/org/junit/rules/ExternalResource.html" target="_blank"&gt;ExternalResource&lt;/a&gt; (see the &lt;a href="https://github.com/fhopf/elasticsearch-junit-rule" target="_blank"&gt;full example code on GitHub&lt;/a&gt;):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public class ElasticsearchTestNode extends ExternalResource {

    private Node node;
    private Path dataDirectory;
    
    @Override
    protected void before() throws Throwable {
        try {
            dataDirectory = Files.createTempDirectory("es-test", new FileAttribute&lt;?&gt; []{});
        } catch (IOException ex) {
            throw new IllegalStateException(ex);
        }

        ImmutableSettings.Builder elasticsearchSettings = ImmutableSettings.settingsBuilder()
                .put("http.enabled", "false")
                .put("path.data", dataDirectory.toString());

        node = NodeBuilder.nodeBuilder()
                .local(true)
                .settings(elasticsearchSettings.build())
                .node();
    }

    @Override
    protected void after() {
        node.close();
        try {
            FileUtils.deleteDirectory(dataDirectory.toFile());
        } catch (IOException ex) {
            throw new IllegalStateException(ex);
        }
    }
    
    public Client getClient() {
        return node.client();
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The before method is executed before the test is run so we can use it to start ElasticSearch. All data is written to a temporary folder. The after method is used to stop ElasticSearch and delete the folder.&lt;/p&gt;

&lt;p&gt;In your test you can now just use the rule, either with the @Rule annotation to have it triggered on each test method, or using @ClassRule to execute it only once per class:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public class CoreTest {

    @Rule
    public ElasticsearchTestNode testNode = new ElasticsearchTestNode();
    
    @Test
    public void indexAndGet() throws IOException {
        testNode.getClient().prepareIndex("myindex", "document", "1")
                .setSource(jsonBuilder().startObject().field("test", "123").endObject())
                .execute()
                .actionGet();
        
        GetResponse response = testNode.getClient().prepareGet("myindex", "document", "1").execute().actionGet();
        assertThat(response.getSource().get("test")).isEqualTo("123");
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As it is really easy to implement custom rules I think this is a feature I will be using more often in the future.&lt;/p&gt;
</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/2776516946770882366/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2013/01/junit-rule-for-elasticsearch.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/2776516946770882366?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/2776516946770882366?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/TtmMuNhc_Pc/junit-rule-for-elasticsearch.html" title="JUnit Rule for ElasticSearch" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2013/01/junit-rule-for-elasticsearch.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUEDQn4-fCp7ImA9WhNUEk0.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-4066957055825785412</id><published>2013-01-03T01:52:00.000-08:00</published><updated>2013-01-03T02:07:53.054-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-01-03T02:07:53.054-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Event" /><title>12 Conferences of 2012</title><content type="html">&lt;p&gt;I went to a lot, probably too many conferences in 2012. As the year is over now I'd like to summarize some of the impressions, maybe there's a conference you didn't know about and you'd like to attend this year.&lt;/p&gt;

&lt;h4&gt;FOSDEM&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://fosdem.org" target="_blank"&gt;FOSDEM&lt;/a&gt; is the Free and Open Source Software Developer European Meetup, a yearly event that takes place in Brussels, Belgium. There are multiple tracks and developer rooms on a multitude of topics ranging from databases to programming languages and open source tools. The rooms are spread across some buildings at the University so there might be some walking involved when switching tracks. What is rather special is that there's no registration involved, you just go there and that's it. The amount of people can be overwhelming, especially in the main entrance area. Unfortunately I was rather disappointed with the talks I chose. The event is a very good fit if you are working on an Open Source project and, as the name of the conference suggests, want to meet other developers of the project.&lt;/p&gt;

&lt;h4&gt;Berlin Expert Days&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://www.bed-con.de/" target="_blank"&gt;BED-Con&lt;/a&gt; is a rather young conference organized by the Java User Group Berlin-Brandenburg. I haven't been there in the first year but in 2012 it still had a small and informal feeling. The conference takes place in three rooms of the Freie Universität Berlin, the content selection was an excellent mixture of technical and process/meta talks, most of them in German. If you can afford the trip to Berlin I'd definitively recommend going there.&lt;/p&gt;

&lt;h4&gt;JAX&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://jax.de/" target="_blank"&gt;The largest and best known German Java conference.&lt;/a&gt; There are two editions, JAX in April in Mainz (the one I attended) and W-JAX in November in Munich. There's one huge hall and several smaller rooms and a wide variety of topics you can choose from. I never planned to go there as the admission fee is rather high (thanks to Software &amp;amp; Support for sponsoring my ticket) but I have to admit that it really can be worth the money. There were excellent talks by Charles Nutter, Tim Berglund and many more. The infrastructure (food, coffee, schedules) is very good, if you are on a business budget you can gain a lot by visiting.&lt;/p&gt; 

&lt;h4&gt;Berlin Buzzwords&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://berlinbuzzwords.de/" target="_blank"&gt;A niche conference on Search, Scale and Big Data.&lt;/a&gt; A lot of people are coming from overseas just to visit. If you are interested in these technologies definitively go there. For more information see &lt;a href="http://blog.florian-hopf.de/2012/06/berlin-buzzwords-2012.html"&gt;this post&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Barcamp Karlsruhe&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://www.barcamp-karlsruhe.de/" target="_blank"&gt;My first real Barcamp.&lt;/a&gt; Really fun event but of course there are always some sessions that are not as interesting as anticipated. Topics ranged from Computer and work stuff to more soft content. I always thought this is a nerd only event but as there is so much to choose from Barcamps might even be interesting for people who are not that much into computers. Very well organized, no admission fee, interesting sessions.&lt;/p&gt;

&lt;h4&gt;Socrates&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://www.socrates-conference.de/"&gt;The International Software Craftmanship and Testing Conference.&lt;/a&gt; Awesome setting and the first open space conference I attended. It takes place in a seminar center in the middle of nowhere which makes it a very intense experience. Besides the sessions there are a lot of informal discussions going on around the day with the very enthusiastic attendants. The 2012 event started Thursday evening with a world cafe, Friday and Saturday open space and an optional code retreat on Sunday. I'd say there were three kinds of sessions: informal discussion rounds, practical hands on sessions and talks. It seems that most people liked the practical sessions best, so if I could choose again I'd go to more of those. Thanks to the sponsors all we had to pay was the accommodation and one meal which additionally makes it an incredible cheap event. Be quick with registration as space is limited.&lt;/p&gt;

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-8aVDubc9AV0/UOVNc4A7F5I/AAAAAAAAAFk/LAbJmWk6t0o/s1600/2012-08-03%2B18.01.43.jpg" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="240" width="320" src="http://2.bp.blogspot.com/-8aVDubc9AV0/UOVNc4A7F5I/AAAAAAAAAFk/LAbJmWk6t0o/s320/2012-08-03%2B18.01.43.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;h4&gt;FrOSCon&lt;/h4&gt;

&lt;p&gt;The &lt;a href="http://www.froscon.de/startseite/" target="_blank"&gt;Free and Open Source Software Conference&lt;/a&gt; is a great community weekend event with different tracks on admin and development topics. I like it a lot because of the variation of talks and the very informal setting. It's a mixture of holiday and learning and for me a chance to get information on topics that are not presented at the other developer conferences I attend. Talks are partly english, partly german. You can either stay in St. Augustin or in Bonn, it's only a short tram ride.&lt;/p&gt;

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/--dGD_kHmce8/UOVN5f7oYcI/AAAAAAAAAFw/joZu52SyOFM/s1600/2012-08-25%2B16.36.21.jpg" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="240" width="320" src="http://1.bp.blogspot.com/--dGD_kHmce8/UOVN5f7oYcI/AAAAAAAAAFw/joZu52SyOFM/s320/2012-08-25%2B16.36.21.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;h4&gt;JAX On Tour&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://jax-on-tour.de/" target="_blank"&gt;JAX On Tour&lt;/a&gt; is another event I attended because of the generous sponsorship of Software &amp;amp; Support. It's not a conference but a training event with longer talks that are grouped together. It's a small event and there's always time for questions. I learned a lot, mainly about documenting architectures. This is a really good alternative to a normal conference to grasp a topic in depth.&lt;/p&gt;

&lt;h4&gt;OpenCms-Days&lt;/h4&gt;

&lt;p&gt;If you are into OpenCms you probably already heard about &lt;a href="http://www.opencms-days.org" target="_blank"&gt;OpenCms-Days&lt;/a&gt;, and if not this is probably not for you. Two days of OpenCms only, used by Alkacon to present new features and by the community to present extensions and projects. I am always impressed that there are people who fly around the world just to attend, but of course this is the only conference of its kind worldwide. There is always something new to learn and it's fun to meet the community.&lt;/p&gt;

&lt;h4&gt;DevFest Karlsruhe&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://www.devfest.info/event/ag9zfmRldmZlc3RnbG9iYWxyDQsSBUV2ZW50GNKsBgw"&gt;A one day event on Google technologies.&lt;/a&gt; There have been several events in different cities worldwide, this one organized by the Google Developer Group Karlsruhe. The organizers were really unlucky as multiple speakers canceled on short notice, nevertheless there were some really good talks. Kudos to the organizers who managed to get this event started in a really short time frame.&lt;/p&gt;

&lt;h4&gt;ApacheCon Europe&lt;/h4&gt;

&lt;p&gt;I originally went to &lt;a href="http://www.apachecon.eu/" target="_blank"&gt;ApacheCon&lt;/a&gt; because it took place in Sinsheim, which is close to Karlsruhe. Fortunately in this year the conference also hosted the LuceneCon Europe, so there were lots of interesting talks for me. Additionally I had been voted as a committer to the &lt;a href="http://incubator.apache.org/odftoolkit/" target="_blank"&gt;ODFToolkit&lt;/a&gt; just before it and I was able to meet some other people that are involved in the project. The location was really special (soccer stadium) but I think a lot of non-locals had to suffer a bit because of the lack of hotels and taxis. This community event can be really interesting even if you are only a user of a project.&lt;/p&gt; 

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-216RO9qNaj8/UOVQvGWfO-I/AAAAAAAAAGE/d_VYS67dRS8/s1600/2012-11-06%2B10.10.07.jpg" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="240" width="320" src="http://3.bp.blogspot.com/-216RO9qNaj8/UOVQvGWfO-I/AAAAAAAAAGE/d_VYS67dRS8/s320/2012-11-06%2B10.10.07.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;h4&gt;Devoxx&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://www.devoxx.com" target="_blank"&gt;Devoxx&lt;/a&gt; is the largest Java conference in Europe, organized by the Java User Group Belgium. They attract a lot of high class speakers so this is the place to keep you informed on the Java universe. Located in a large multiplex cinema in a suburb of Antwerp, very comfy chairs and huge screens. The week starts with two university days that contain longer talks and are usually less crowded. This year I went there on Wednesday, due to a train strike in Belgium I arrived quite late. I have to admit that it probably wasn't worth the hassle for only 1.5 days. If you are going there I recommend to stay the whole week.&lt;/p&gt;

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-dVO5wozSqGA/UOVRHYZ-qHI/AAAAAAAAAGQ/M3VAJRLzG2s/s1600/2012-11-14%2B17.51.50.jpg" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="240" width="320" src="http://3.bp.blogspot.com/-dVO5wozSqGA/UOVRHYZ-qHI/AAAAAAAAAGQ/M3VAJRLzG2s/s320/2012-11-14%2B17.51.50.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;h4&gt;2013&lt;/h4&gt;

&lt;p&gt;So this was a lot last year. I don't plan to visit that many conferences again. I am sure that I will be going to Berlin Buzzwords, Socrates, FrOSCon. There probably will be more, but it won't get 13 this year :).&lt;/p&gt; 
&lt;p&gt;Finally, I couldn't have afforded to pay for all those conferences myself, so thanks to &lt;a href="http://synyx.de/" target="_blank"&gt;synyx&lt;/a&gt;, who paid for FOSDEM and BEDCon when I was still employed there and provided me with a free ticket to OpenCms-Days. Thanks to &lt;a href="http://sandsmedia.com/" target="_blank"&gt;Software &amp;amp; Support&lt;/a&gt; for letting me attend JAX and JAX On Tour for free, those guys are fantastic supporters of the Java User Group Karlsruhe. Also, thanks to the Devoxx team for letting me attend as an ambassador of our JUG.&lt;/p&gt;



</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/4066957055825785412/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2013/01/12-conferences-of-2012.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/4066957055825785412?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/4066957055825785412?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/otR_U8uEoIU/12-conferences-of-2012.html" title="12 Conferences of 2012" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-8aVDubc9AV0/UOVNc4A7F5I/AAAAAAAAAFk/LAbJmWk6t0o/s72-c/2012-08-03%2B18.01.43.jpg" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2013/01/12-conferences-of-2012.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkYNQnc6cCp7ImA9WhNWGUU.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-3212644665984939632</id><published>2012-12-19T22:16:00.000-08:00</published><updated>2012-12-19T22:16:33.918-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-12-19T22:16:33.918-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Test" /><category scheme="http://www.blogger.com/atom/ns#" term="Gradle" /><title>Gradle is too Clever for my Plans</title><content type="html">&lt;p&gt;While writing &lt;a href="http://blog.florian-hopf.de/2012/12/looking-at-plaintext-lucene-index.html"&gt;this post about the Lucene Codec API&lt;/a&gt; I noticed something strange when running the tests with &lt;a href="http://www.gradle.org/" target="_blank"&gt;Gradle&lt;/a&gt;. When experimenting with a library feature most of the times I write unit tests that validate my expectations. This is a habit I learned from &lt;a href="http://www.manning.com/hatcher2/" target="_blank"&gt;Lucene in Action&lt;/a&gt; and can also be useful in real world scenarios, e.g. to make sure that nothing breaks when you update a library.&lt;/p&gt;

&lt;p&gt;OK, what happened? This time I did not only want to have the test result but also ran the test for a side effect, I wanted a Lucene index to be written to the /tmp directory to manually have a look at it. This worked fine for the first time, but not afterwards, e.g. after my machine was rebooted and the directory cleared.&lt;/p&gt;

&lt;p&gt;It turns out that the Gradle developers know that a test shouldn't be used to execute stuff. So once the test is run successfully it is just not run again &lt;a href="http://gradle.1045684.n5.nabble.com/how-does-gradle-decide-when-to-run-tests-td3314172.html#none" target="_blank"&gt;until its input changes&lt;/a&gt;! Though this bit me this time this is a really nice feature to speed up your builds. And if you really need to execute the tests, you can always run &lt;code&gt;gradle cleanTest test&lt;/code&gt;.&lt;/p&gt;
</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/3212644665984939632/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/12/gradle-is-too-clever-for-my-plans.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/3212644665984939632?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/3212644665984939632?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/vxusDtrxcc4/gradle-is-too-clever-for-my-plans.html" title="Gradle is too Clever for my Plans" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/12/gradle-is-too-clever-for-my-plans.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ck8BQXYyeip7ImA9WhNXGEU.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-8890575562863443290</id><published>2012-12-07T03:45:00.001-08:00</published><updated>2012-12-07T03:47:30.892-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-12-07T03:47:30.892-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="Lucene" /><title>Looking at a Plaintext Lucene Index</title><content type="html">&lt;p&gt;The &lt;a href="http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html#package_description" target="_blank"&gt;Lucene file format&lt;/a&gt; is one of the reasons why Lucene is as fast as it is. An index consist of several binary files that you can't really inspect if you don't use tools like the fantastic &lt;a href="http://code.google.com/p/luke/" target="_blank"&gt;Luke&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Starting with Lucene 4 the format for these files can be configured using the &lt;a href="http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/package-summary.html#package_description" target="_blank"&gt;Codec API&lt;/a&gt;. Several implementations are provided with the release, among those the &lt;a href="http://lucene.apache.org/core/4_0_0/codecs/org/apache/lucene/codecs/simpletext/SimpleTextCodec.html" target="_blank"&gt;SimpleTextCodec&lt;/a&gt; that can be used to write the files in plaintext for learning and debugging purposes.&lt;/p&gt;

&lt;p&gt;To configure the Codec you just set it on the IndexWriterConfig:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
// recreate the index on each execution
config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
config.setCodec(new SimpleTextCodec());&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The rest of the indexing process stays exactly the same as it used to be:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Directory luceneDir = FSDirectory.open(plaintextDir);
try (IndexWriter writer = new IndexWriter(luceneDir, config)) {
    writer.addDocument(Arrays.asList(
            new TextField("title", "The title of my first document", Store.YES),
            new TextField("content", "The content of the first document", Store.NO)));

    writer.addDocument(Arrays.asList(
            new TextField("title", "The title of the second document", Store.YES),
            new TextField("content", "And this is the content", Store.NO)));
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;After running this code the index directory contains several files. Those are not the same type of files that are created using the default codec.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ls /tmp/lucene-plaintext/
_1_0.len  _1_1.len  _1.fld  _1.inf  _1.pst  _1.si  segments_2  segments.gen&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The segments_x file is the starting point (x depends on the amount of times you have written to the index before and starts with 1). This still is a binary file but contains the information which codec is used to write to the index. It contains the name of each Codec that is used for writing a certain segment.&lt;/p&gt;

&lt;p&gt;The rest of the index files are all plaintext. They do not contain the same information as their binary cousins. For example the .pst file represents the complete posting list, the structure you normally mean when talking about an inverted index:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;field content
  term content
    doc 0
      freq 1
      pos 1
    doc 1
      freq 1
      pos 4
  term document
    doc 0
      freq 1
      pos 5
  term first
    doc 0
      freq 1
      pos 4
field title
  term document
    doc 0
      freq 1
      pos 5
    doc 1
      freq 1
      pos 5
  term first
    doc 0
      freq 1
      pos 4
  term my
    doc 0
      freq 1
      pos 3
  term second
    doc 1
      freq 1
      pos 4
  term title
    doc 0
      freq 1
      pos 1
    doc 1
      freq 1
      pos 1
END&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The content that is marked as stored resides in the .fld file:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;doc 0
  numfields 1
  field 0
    name title
    type string
    value The title of my first document
doc 1
  numfields 1
  field 0
    name title
    type string
    value The title of the second document
END&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you'd like to have a look at the rest of the files checkout the code at &lt;a href="https://github.com/fhopf/lucene-codec-example" target="_blank"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The SimpleTextCodec only is an interesting byproduct. The Codec API can be used for a lot useful things. For example the feature to read indices of older Lucene versions is implemented using seperate codecs. Also, you can mix several Codecs in an index so reindexing on version updates should not be necessary immediately. I am sure more useful codecs will pop up in the future.&lt;/p&gt;
</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/8890575562863443290/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/12/looking-at-plaintext-lucene-index.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/8890575562863443290?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/8890575562863443290?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/oR4SUitKBFo/looking-at-plaintext-lucene-index.html" title="Looking at a Plaintext Lucene Index" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/12/looking-at-plaintext-lucene-index.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEIARX4_eSp7ImA9WhJWF00.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-1358192770538056280</id><published>2012-08-22T23:02:00.000-07:00</published><updated>2012-08-22T23:15:44.041-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-08-22T23:15:44.041-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="Akka" /><category scheme="http://www.blogger.com/atom/ns#" term="Lucene" /><title>Getting rid of synchronized: Using Akka from Java</title><content type="html">&lt;p&gt;&lt;i&gt;I've been giving an &lt;a href="http://slideshare.net/fhopf/akka-presentation-schulesynyx" target="_blank"&gt;internal talk&lt;/a&gt; on &lt;a href="http://akka.io" target="_blank"&gt;Akka&lt;/a&gt;, the Actor framework for the JVM, at my former company &lt;a href="http://synyx.de" target="_blank"&gt;synyx&lt;/a&gt;. For the talk I implemented a small example application, kind of a web crawler, using Akka. I published the &lt;a href="https://github.com/fhopf/akka-crawler-example" target="_blank"&gt;source code on Github&lt;/a&gt; and will explain some of the concepts in this post.&lt;/i&gt;&lt;/p&gt;

&lt;h4&gt;Motivation&lt;/h4&gt;

&lt;p&gt;To see why you might need something like Akka, think you want to implement a simple web crawler for offline search. You are downloading pages from a certain location, parse and index the content and follow any links that you haven't parsed and indexed yet. I am using &lt;a href="http://htmlparser.sourceforge.net/" target="_blank"&gt;HtmlParser&lt;/a&gt; for downloading and parsing pages and &lt;a href="http://lucene.apache.org" target="_blank"&gt;Lucene&lt;/a&gt; for indexing them. The logic is contained in two service objects, PageRetriever and Indexer, that can be used from our main application.&lt;/p&gt;

&lt;p&gt;A simple sequential execution might then look something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void downloadAndIndex(String path, IndexWriter writer) {
    VisitedPageStore pageStore = new VisitedPageStore();
    pageStore.add(path);
        
    Indexer indexer = new IndexerImpl(writer);
    PageRetriever retriever = new HtmlParserPageRetriever(path);
        
    String page;
    while ((page = pageStore.getNext()) != null) {
        PageContent pageContent = retriever.fetchPageContent(page);
        pageStore.addAll(pageContent.getLinksToFollow());
        indexer.index(pageContent);
        pageStore.finished(page);
    }
        
    indexer.commit();
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We are starting with one page, extract the content and the links, index the content and store all links that are to be visited in the VisitedPageStore. This class contains the logic to determine which links are visited already. We are looping as long as there are more links to follow, once we are done we commit the Lucene IndexWriter.&lt;/p&gt;

&lt;p&gt;This implementation works fine, when running on my outdated laptop it will finish in around 3 seconds for an example page. (Note that the times I am giving are by no means meant as a benchmark but are just there to give you some idea on the numbers).&lt;/p&gt;

&lt;p&gt;So we are done? No, of course we can do better by optimizing the resources we have available. Let's try to improve this solution by splitting it into several tasks that can be executed in parallel.&lt;/p&gt; 

&lt;h4&gt;Shared State Concurrency&lt;/h4&gt;

&lt;p&gt;The normal way in Java would be to implement several Threads that do parts of the work and access the state via guarded blocks, e.g. by synchronizing methods. So in our case there might be several Threads that access our global state that is stored in the VisitedPageStore.&lt;/p&gt;

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-c2Pir19FrKY/UC5gu58ACPI/AAAAAAAAAEk/GjHFDb7q2Fc/s1600/synchronize.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="101" width="320" src="http://4.bp.blogspot.com/-c2Pir19FrKY/UC5gu58ACPI/AAAAAAAAAEk/GjHFDb7q2Fc/s320/synchronize.png" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;This model is what Venkat Subramaniam calls Synchronize and Suffer in his great book &lt;a href="http://pragprog.com/book/vspcon/programming-concurrency-on-the-jvm" target="_blank"&gt;Programming Concurrency on the JVM&lt;/a&gt;. Working with Threads and building correct solutions might not seem that hard at first but is inherintly difficult. I like those two tweets that illustrate the problem:&lt;/p&gt;

&lt;blockquote class="twitter-tweet tw-align-center" lang="de"&gt;&lt;p&gt;Adding the "synchronized" keyword to Java was a mistake. Makes people believe they can write multi-threaded code.&lt;/p&gt;&amp;mdash; Erik Dörnenburg (@erikdoe) &lt;a href="https://twitter.com/erikdoe/status/128817862268821504" data-datetime="2011-10-25T12:59:06+00:00"&gt;Oktober 25, 2011&lt;/a&gt;&lt;/blockquote&gt;

&lt;blockquote class="twitter-tweet tw-align-center" lang="de"&gt;&lt;p&gt;95% of syncronized code is broken. The other 5% is written by Brian Goetz. - Venkat Subramaniam at &lt;a href="https://twitter.com/search/?q=%23s2gx"&gt;&lt;s&gt;#&lt;/s&gt;&lt;b&gt;s2gx&lt;/b&gt;&lt;/a&gt;&lt;/p&gt;&amp;mdash; Ronny Løvtangen (@rlovtangen) &lt;a href="https://twitter.com/rlovtangen/status/129323480159232000" data-datetime="2011-10-26T22:28:15+00:00"&gt;Oktober 26, 2011&lt;/a&gt;&lt;/blockquote&gt;
&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;

&lt;p&gt;Brian Goetz of course being the author of the de-facto standard book on the new Java concurrency features, &lt;a href="http://jcip.net/" target="_blank"&gt;Java Concurrency in Practice&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Akka&lt;/h4&gt;

&lt;p&gt;So what is Akka? It's an Actor framework for the JVM that is implemented in Scala but that is something that you rarely notice when working from Java. It offers a nice Java API that provides most of the functionality in a convenient way.&lt;/p&gt;

&lt;p&gt;Actors are a concept that was introduced in the seventies but became widely known as one of the core features of Erlang, a language to build fault tolerant, self healing systems. Actors employ the concept of Message Passing Concurrency. That means that Actors only communicate by means of messages that are passed into an Actors mailbox. Actors can contain state that they shield from the rest of the system. The only way to change the state is by passing in messages. Each Actor is executed in a different Thread but they provide a higher level of abstraction than working with Threads directly.&lt;/p&gt;

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-hQwp7o8cH_s/UC5hEJ9IkAI/AAAAAAAAAEw/CVqfjT2Z0nY/s1600/actors.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="151" width="320" src="http://1.bp.blogspot.com/-hQwp7o8cH_s/UC5hEJ9IkAI/AAAAAAAAAEw/CVqfjT2Z0nY/s320/actors.png" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;When implementing Actors you put the behaviour in a method receive() that can act on incoming messages. You can then reply asynchronously to the sender or send messages to any other Actor.&lt;/p&gt;

&lt;p&gt;For our problem at hand an Actor setup might look something like this:&lt;/p&gt;

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-I5TC1mL-bHI/UC5hMo0rtkI/AAAAAAAAAE8/6OGAL2hsBzg/s1600/actor-setup.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="214" width="320" src="http://2.bp.blogspot.com/-I5TC1mL-bHI/UC5hMo0rtkI/AAAAAAAAAE8/6OGAL2hsBzg/s320/actor-setup.png" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;There is one Master Actor that also contains the global state. It sends a message to fetch a certain page to a PageParsingActor that asynchonously responds to the Master with the PageContent. The Master can then send the PageContent to an IndexingActor which responds with another message. With this setup we have done a first step to scale our solution. There are now three Actors that can be run on different cores of your machine.&lt;/p&gt;

&lt;p&gt;Actors are instantiated from other Actors. On the top there's the ActorSystem that is provided by the framework. The MasterActor is instaciated from the ActorSystem:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ActorSystem actorSystem = ActorSystem.create();
final CountDownLatch countDownLatch = new CountDownLatch(1);
ActorRef master = actorSystem.actorOf(new Props(new UntypedActorFactory() {

    @Override
    public Actor create() {
        return new SimpleActorMaster(new HtmlParserPageRetriever(path), writer, countDownLatch);
    }
}));

master.tell(path);
try {
    countDownLatch.await();
    actorSystem.shutdown();
} catch (InterruptedException ex) {
    throw new IllegalStateException(ex);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Ignore the CountdownLatch as it is only included to make it possible to terminate the application. Note that we are not referencing an instance of our class but an ActorRef, a reference to an actor. You will see later why this is important.&lt;/p&gt;

&lt;p&gt;The MasterActor contains references to the other Actors and creates them from its context. This makes the two Actors children of the Master:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public SimpleActorMaster(final PageRetriever pageRetriever, final IndexWriter indexWriter,
    final CountDownLatch latch) {

    super(latch);
    this.indexer = getContext().actorOf(new Props(new UntypedActorFactory() {

        @Override
        public Actor create() {

            return new IndexingActor(new IndexerImpl(indexWriter));
        }
    }));

    this.parser = getContext().actorOf(new Props(new UntypedActorFactory() {

        @Override
        public Actor create() {

           return new PageParsingActor(pageRetriever);
        }
    }));
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The PageParsingActor acts on messages to fetch pages and sends a message with the result to the sender:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void onReceive(Object o) throws Exception {
    if (o instanceof String) {
        PageContent content = pageRetriever.fetchPageContent((String) o);
        getSender().tell(content, getSelf());
    } else {
        // fail on any message we don't expect
        unhandled(o);
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The IndexingActor contains some state with the Indexer. It acts on messages to index pages and to commit the indexing process.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void onReceive(Object o) throws Exception {
    if (o instanceof PageContent) {
        PageContent content = (PageContent) o;
        indexer.index(content);
        getSender().tell(new IndexedMessage(content.getPath()), getSelf());
    } else if (COMMIT_MESSAGE == o) {
        indexer.commit();
        getSender().tell(COMMITTED_MESSAGE, getSelf());
    } else {
        unhandled(o);
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The MasterActor finally orchestrates the other Actors in its receive() method. It starts with one page and sends it to the PageParsingActor. It keeps the valuable state of the application in the VisitedPageStore. When no more pages are to be fetched and indexed it sends a commit message and terminates the application.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void onReceive(Object message) throws Exception {

    if (message instanceof String) {
        // start
        String start = (String) message;
        visitedPageStore.add(start);
        getParser().tell(visitedPageStore.getNext(), getSelf());
    } else if (message instanceof PageContent) {
        PageContent content = (PageContent) message;
        getIndexer().tell(content, getSelf());
        visitedPageStore.addAll(content.getLinksToFollow());

        if (visitedPageStore.isFinished()) {
            getIndexer().tell(IndexingActor.COMMIT_MESSAGE, getSelf());
        } else {
            for (String page : visitedPageStore.getNextBatch()) {
                getParser().tell(page, getSelf());
            }
        }
    } else if (message instanceof IndexedMessage) {
        IndexedMessage indexedMessage = (IndexedMessage) message;
        visitedPageStore.finished(indexedMessage.path);

        if (visitedPageStore.isFinished()) {
            getIndexer().tell(IndexingActor.COMMIT_MESSAGE, getSelf());
        }
    } else if (message == IndexingActor.COMMITTED_MESSAGE) {
        logger.info("Shutting down, finished");
        getContext().system().shutdown();
        countDownLatch.countDown();
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What happens if we run this example? Unfortunately it now takes around 3.5 seconds on my dual core machine. Though we are now able to run on both cores we have actually decreased the speed of the application. This is probably an important lesson. When building scalable applications it might happen that you are introducing some overhead that decreases the performance when running in the small. Scalability is not about increasing performance but about the ability to distribute the load. &lt;/p&gt;

&lt;p&gt;So it was an failure to switch to Akka? Not at all. It turns out that most of the time the application is fetching and parsing pages. This includes waiting for the network. Indexing in Lucene is blazing fast and the Master mostly only dispatches messages. So what can we do about it? We already have split our application into smaller chunks. Fortunately the PageParsingActor doesn't contain any state at all. That means we can easily parallelize its tasks.&lt;/p&gt;

&lt;p&gt;This is where the talking to references is important. For an Actor it's transparent if there is one or a million Actors behind a reference. There is one mailbox for an Actor reference that can dispatch the messages to any amount of Actors.&lt;/p&gt;

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-klg16pejZZs/UC5hjo8UpCI/AAAAAAAAAFI/DIeTzalkdEU/s1600/routing.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="174" width="320" src="http://3.bp.blogspot.com/-klg16pejZZs/UC5hjo8UpCI/AAAAAAAAAFI/DIeTzalkdEU/s320/routing.png" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;We only need to change the instanciation of the Actor, the rest of the application remains the same:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;parser = getContext().actorOf(new Props(new UntypedActorFactory() {

        @Override
        public Actor create() {

            return new PageParsingActor(pageRetriever);
        }
}).withRouter(new RoundRobinRouter(10)));&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;By using a router the Akka framework automatically takes care that there are 10 Actors available. The messages are distributed to any available Actor. This takes the runtime down to 2 seconds.&lt;/p&gt;

&lt;h4&gt;A word on Blocking&lt;/h4&gt;

&lt;p&gt;Note that the way I am doing network requests here is not recommended in Akka. HTMLParser is doing blocking networking which should be carefully reconsidered when designing a reactive system. In fact, as this application is highly network bound, we might even gain more benefit by just using an asynchronous networking library. But hey, then I wouldn't be able to tell you how nice it is to use Akka. In a future post I will highlight some more Akka features that can help to make our application more robust and fault tolerant.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/1358192770538056280/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/08/getting-rid-of-synchronized-using-akka.html#comment-form" title="8 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/1358192770538056280?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/1358192770538056280?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/xp_X3EDIdM4/getting-rid-of-synchronized-using-akka.html" title="Getting rid of synchronized: Using Akka from Java" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-c2Pir19FrKY/UC5gu58ACPI/AAAAAAAAAEk/GjHFDb7q2Fc/s72-c/synchronize.png" height="72" width="72" /><thr:total>8</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/08/getting-rid-of-synchronized-using-akka.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUcHQHw_cSp7ImA9WhJSFU0.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-2637138147240757105</id><published>2012-07-05T09:50:00.001-07:00</published><updated>2012-07-05T09:50:31.249-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-07-05T09:50:31.249-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Event" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="Tika" /><category scheme="http://www.blogger.com/atom/ns#" term="Lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="Elasticsearch" /><title>Slides and demo code for my talk at JUG KA available</title><content type="html">&lt;p&gt;I just uploaded the &lt;a href="http://www.florian-hopf.de/files/lucene-solr-jugka-040712.pdf"&gt;(german) slides&lt;/a&gt; as well as the &lt;a href="https://github.com/fhopf/lucene-solr-talk"&gt;example code&lt;/a&gt; for yesterdays talk on &lt;a href="http://lucene.apache.org"&gt;Lucene&lt;/a&gt; and &lt;a href="http://lucene.apache.org/solr/"&gt;Solr&lt;/a&gt; at our local &lt;a href="http://jug-ka.de"&gt;Java User Group&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The demo application contains several subprojects for indexing and searching with Lucene and Solr as well as a simple Dropwizard application that demonstrates some search features. See the README files in the source tree to find out how to run the application.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/2637138147240757105/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/07/slides-and-demo-code-for-my-talk-at-jug.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/2637138147240757105?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/2637138147240757105?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/JbwAhZocwX0/slides-and-demo-code-for-my-talk-at-jug.html" title="Slides and demo code for my talk at JUG KA available" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/07/slides-and-demo-code-for-my-talk-at-jug.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkIDQXk4fCp7ImA9WhJTGUo.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-2889689672026265828</id><published>2012-06-29T00:57:00.000-07:00</published><updated>2012-06-29T04:49:30.734-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-29T04:49:30.734-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Fail" /><title>Dropwizard Encoding Woes</title><content type="html">&lt;p&gt;I have been working on an example application for Lucene and Solr for my upcoming talk at the &lt;a href="http://jug-ka.de"&gt;Java User Group Karlsruhe&lt;/a&gt;. As a web framework I wanted to try &lt;a href="http://dropwizard.codahale.com"&gt;Dropwizard&lt;/a&gt;, a lightweight application framework that can expose resources via JAX-RS, provides out of the box monitoring support and can render resource representations using Freemarker. It's really easy to get started, there's a good &lt;a href="http://dropwizard.codahale.com/getting-started/"&gt;tutorial&lt;/a&gt; and the &lt;a href="http://dropwizard.codahale.com/manual/"&gt;manual&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;An example resource might look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.core.MediaType;

@Path("/example")
@Produces(MediaType.TEXT_HTML)
public class ExampleResource {

    @GET
    public ExampleView illustrate() {
        return new ExampleView("Mot\u00f6rhead");
    }

}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The Resource produces HTML using Freemarker, which is possible if you add the view bundle in the service. There is one method that is called when the resource is addressed using GET. Inside the method we create a view object accepting a message that in this case contains the umlaut 'ö'. The view class that is returned by the method looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import com.yammer.dropwizard.views.View;

public class ExampleView extends View {

    private final String message;

    public ExampleView(String message) {
        super("example.fmt");
        this.message = message;
    }

    public String getMessage() {
        return message;
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It accepts a message as constructor parameter. The template name is passed to the parent class. This view class is now available in a freemarker template, an easy variant looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;html&amp;gt;
    &amp;lt;body&amp;gt;
        &amp;lt;h1&amp;gt;${message} rocks!&amp;lt;/h1&amp;gt;
    &amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

If I run this on my machine and access it with Firefox it doesn't work as expected. The umlaut character is broken, something Lemmy surely doesn't approve:

&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-I88R0n2P9EM/T-1fKj1_a4I/AAAAAAAAADo/S8VWbvJiY4g/s1600/broken.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="99" width="320" src="http://1.bp.blogspot.com/-I88R0n2P9EM/T-1fKj1_a4I/AAAAAAAAADo/S8VWbvJiY4g/s320/broken.png" /&gt;&lt;/a&gt;&lt;/div&gt;

&lt;p&gt;Accessing the resource using curl works flawlessly:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl http://localhost:8080/example
&amp;lt;html&amp;gt;
    &amp;lt;body&amp;gt;
        &amp;lt;h1&amp;gt;Mot&amp;#246;rhead rocks!&amp;lt;/h1&amp;gt;
    &amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Why is that? It's Servlet Programming 101: You need to set the character encoding of the response. My Firefox defaults to ISO-8859-1, curl seems to use UTF-8 by default. How can we fix it? Tell the client which encoding we are using, which can be done using the Produces annotation:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@Produces("text/html; charset=utf-8")&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So what does it have to do with Dropwizard? Nothing really, it's a JAX-RS thing. All components in Dropwizard (Jetty and Freemarker notably) are using UTF-8 by default.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/2889689672026265828/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/06/dropwizard-encoding-woes.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/2889689672026265828?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/2889689672026265828?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/FAp-lIdjcJ0/dropwizard-encoding-woes.html" title="Dropwizard Encoding Woes" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-I88R0n2P9EM/T-1fKj1_a4I/AAAAAAAAADo/S8VWbvJiY4g/s72-c/broken.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/06/dropwizard-encoding-woes.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEENR3g7fyp7ImA9WhJTEkw.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-1587337737068567724</id><published>2012-06-20T05:28:00.000-07:00</published><updated>2012-06-20T10:18:16.607-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-20T10:18:16.607-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="Test" /><category scheme="http://www.blogger.com/atom/ns#" term="Jetty" /><category scheme="http://www.blogger.com/atom/ns#" term="Gradle" /><title>Running and Testing Solr with Gradle</title><content type="html">&lt;p&gt;A while ago I blogged on &lt;a href="http://blog.synyx.de/2011/01/integration-tests-for-your-solr-config/"&gt;testing Solr with Maven&lt;/a&gt; on the &lt;a href="http://blog.synyx.de"&gt;synyx blog&lt;/a&gt;. In this post I will show you how to setup a similar project with &lt;a href="http://gradle.org/"&gt;Gradle&lt;/a&gt; that can start the &lt;a href="http://lucene.apache.org/solr/"&gt;Solr&lt;/a&gt; webapp and execute tests against your configuration.&lt;/p&gt;
&lt;h4&gt;Running Solr&lt;/h4&gt;
&lt;p&gt;Solr is running as a webapp in any JEE servlet container like Tomcat or Jetty. The index and search configuration resides in a directory commonly referred to as Solr home that can be outside of the webapp directory. This is also the place where the Lucene index files are created. The location for Solr home can be set using an environment variable.&lt;/p&gt;
&lt;p&gt;The Solr war file is available in Maven Central. &lt;a href="http://tadtech.blogspot.de/2012/03/run-jetty-on-dependency-resolved-war-in.html"&gt;This post&lt;/a&gt; describes how to run a war file that is deployed in a Maven repository using Gradle. Let's see how the Gradle build file looks like for running Solr:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import org.gradle.api.plugins.jetty.JettyRunWar

apply plugin: 'java'
apply plugin: 'jetty'

repositories {
    mavenCentral()
}

// custom configuration for running the webapp
configurations {
    solrWebApp
}

dependencies {
    solrWebApp "org.apache.solr:solr:3.6.0@war"
}

// custom task that configures the jetty plugin
task runSolr(type: JettyRunWar) {
    webApp = configurations.solrWebApp.singleFile

    // jetty configuration
    httpPort = 8082
    contextPath = 'solr'
}

// executed before jetty starts
runSolr.doFirst {
    System.setProperty("solr.solr.home", "./solrhome")
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We are creating a custom configuration that contains the Solr war file. In the task runSolr we configure the Jetty plugin. To add the Solr home environment variable we can use the way &lt;a href="http://www.sebastian.himberger.de/blog/2011/07/16/adding-custom-jvm-properties-to-gradles-jettyrun/"&gt;described by Sebastian Himberger&lt;/a&gt;. We add a code block that is executed before Jetty starts and sets the environment variable using standard Java mechanisms. You can now start Solr using &lt;em&gt;gradle runSolr&lt;/em&gt;. You will see some errors regarding multiple versions of slf4j that are very like caused by &lt;a href="http://issues.gradle.org/browse/GRADLE-897"&gt;this bug&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Testing the Solr configuration&lt;/h4&gt;
&lt;p&gt;Solr provides some classes that start an embedded instance using your configuration. You can use these classes in any setup as they do not depend on the gradle jetty plugin. Starting with Solr 3.2 the test framework is not included in solr-core anymore. This is what the relevant part of the dependency section looks like now:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;testCompile "junit:junit:4.10"
testCompile "org.apache.solr:solr-test-framework:3.6.0"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now you can place a test in &lt;em&gt;src/test/java&lt;/em&gt; that either uses the convenience methods provided by SolrTestCaseJ4 or you can instantiate an EmbeddedSolrServer and execute any SolrJ actions. Both of these ways will use your custom config. This way you can easily validate that configuration changes don't break existing functionality. An example of using the convenience methods:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import org.apache.solr.SolrTestCaseJ4;
import org.apache.solr.client.solrj.SolrServerException;
import org.junit.BeforeClass;
import org.junit.Test;
import java.io.IOException;

public class BasicConfigurationTest extends SolrTestCaseJ4 {

    @BeforeClass
    public static void initCore() throws Exception {
        SolrTestCaseJ4.initCore("solrhome/conf/solrconfig.xml", "solrhome/conf/schema.xml", "solrhome/");
    }

    @Test
    public void noResultInEmptyIndex() throws SolrServerException {
        assertQ("test query on empty index",
                req("text that is not found")
                , "//result[@numFound='0']"
        );
    }

    @Test
    public void pathIsMandatory() throws SolrServerException, IOException {
        assertFailedU(adoc("title", "the title"));
    }

    @Test
    public void simpleDocumentIsIndexedAndFound() throws SolrServerException, IOException {
        assertU(adoc("path", "/tmp/foo", "content", "Some important content."));
        assertU(commit());

        assertQ("added document found",
                req("important")
                , "//result[@numFound='1']"
        );
    }

}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We extend the class &lt;a href="http://lucene.apache.org/solr//api/test-framework/org/apache/solr/SolrTestCaseJ4.html"&gt;SolrTestCaseJ4&lt;/a&gt; that is responsible for creating the core and instanciating the runtime using the paths we provide with the method &lt;em&gt;initCore()&lt;/em&gt;. Using the available assert methods you can execute queries and validate the result using XPath expressions.&lt;/p&gt;

&lt;p&gt;An example that instanciates a SolrServer might look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import org.apache.solr.SolrTestCaseJ4;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer;
import org.apache.solr.client.solrj.response.FacetField;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrInputDocument;
import org.apache.solr.common.params.SolrParams;
import org.junit.After;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;

import java.io.IOException;

public class ServerBasedTalkTest extends SolrTestCaseJ4 {

    private EmbeddedSolrServer server;

    @BeforeClass
    public static void initCore() throws Exception {
        SolrTestCaseJ4.initCore("solr/conf/solrconfig.xml", "solr/conf/schema.xml");
    }

    @Before
    public void initServer() {
        server = new EmbeddedSolrServer(h.getCoreContainer(), h.getCore().getName());
    }

    @Test
    public void queryOnEmptyIndexNoResults() throws SolrServerException {
        QueryResponse response = server.query(new SolrQuery("text that is not found"));
        assertTrue(response.getResults().isEmpty());
    }

    @Test
    public void singleDocumentIsFound() throws IOException, SolrServerException {
        SolrInputDocument document = new SolrInputDocument();
        document.addField("path", "/tmp/foo");
        document.addField("content", "Mein Hut der hat 4 Ecken");

        server.add(document);
        server.commit();

        SolrParams params = new SolrQuery("ecke");
        QueryResponse response = server.query(params);
        assertEquals(1L, response.getResults().getNumFound());
        assertEquals("/tmp/foo", response.getResults().get(0).get("path"));
    }

    @After
    public void clearIndex() {
        super.clearIndex();
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The tests can now be executed using &lt;em&gt;gradle test&lt;/em&gt;.&lt;/p&gt; 
&lt;p&gt;Testing your Solr configuration is important as changes in one place might easily lead to side effects with another search functionality. I recommend to add tests even for basic functionality and evolve the tests with your project.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/1587337737068567724/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/06/running-and-testing-solr-with-gradle.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/1587337737068567724?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/1587337737068567724?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/PCI93YrYNbY/running-and-testing-solr-with-gradle.html" title="Running and Testing Solr with Gradle" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/06/running-and-testing-solr-with-gradle.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ck8CSX0yfCp7ImA9WhVaGE4.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-2001702363580224109</id><published>2012-06-16T00:13:00.000-07:00</published><updated>2012-06-16T00:14:28.394-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-16T00:14:28.394-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="Lucene" /><title>Reading term values for fields from a Lucene Index</title><content type="html">&lt;p&gt;Sometimes when using Lucene you might want to retrieve all term values for a given field. Think of categories that you want to display as search links or in a filtering dropdown box. Indexing might look something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, new StandardAnalyzer(Version.LUCENE_36));
IndexWriter writer = new IndexWriter(directory, config);

Document doc = new Document();

doc.add(new Field("Category", "Category1", Field.Store.NO, Field.Index.NOT_ANALYZED));
doc.add(new Field("Category", "Category2", Field.Store.NO, Field.Index.NOT_ANALYZED));
doc.add(new Field("Author", "Florian Hopf", Field.Store.NO, Field.Index.NOT_ANALYZED));
writer.addDocument(doc);

doc.add(new Field("Category", "Category3", Field.Store.NO, Field.Index.NOT_ANALYZED));
doc.add(new Field("Category", "Category2", Field.Store.NO, Field.Index.NOT_ANALYZED));
doc.add(new Field("Author", "Theo Tester", Field.Store.NO, Field.Index.NOT_ANALYZED));
writer.addDocument(doc);

writer.close();
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We are adding two documents, one that is assigned Category1 and Category2 and one that is assigned Category2 and Category3. Note that we are adding both fields unanalyzed so the Strings are added to the index as they are. Lucenes index looks something like this afterwards:&lt;/p&gt;

&lt;table class="table"&gt;
&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Term&lt;/th&gt;&lt;th&gt;Documents&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Author&lt;/td&gt;&lt;td&gt;Florian Hopf&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;Theo Tester&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Category&lt;/td&gt;&lt;td&gt;Category1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;Category2&lt;/td&gt;&lt;td&gt;1, 2&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;Category3&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;The fields are sorted alphabetically by fieldname first and then by term value. You can access the values using the IndexReaders &lt;em&gt;terms()&lt;/em&gt; method that returns a TermEnum. You can instruct the IndexReader to start with a certain term so you can directly jump to the category without having to iterate all values. But before we do this let's look at how we are used to access Enumeration values in Java:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Enumeration en = ...;
while(en.hasMoreElements()) {
    Object obj = en.nextElement();
    ...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In a while-loop we are checking if there is another element and retrieve it inside the loop. As this pattern is very common when iterating the terms with Lucene you might end with something like this (Note that all the examples here are missing the stop condition. If there are more fields the terms of those fields will also be iterated):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;TermEnum terms = reader.terms(new Term("Category"));
// this code is broken, don't use
while(terms.next()) {
    Term term = terms.term();
    System.out.println(term.text());
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;em&gt;next()&lt;/em&gt; method returns a boolean if there are more elements and points to the next element. The &lt;em&gt;term()&lt;/em&gt; method then can be used to retrieve the Term. But this doesn't work as expected. The code only finds Category2 and Category3 but skips Category1. Why is that? The Lucene TermEnum works differently than we are used from Java Enumerations. When the TermEnum is returned it already points to the first element so with &lt;em&gt;next()&lt;/em&gt; we skip this first element.&lt;/p&gt;

&lt;p&gt;This snippet instead works correctly using a for loop:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;TermEnum terms = reader.terms(new Term("Category"));
for(Term term = terms.term(); term != null; terms.next(), term = terms.term()) {
    System.out.println(term.text());
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or you can use a do while loop with a check for the first element:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;TermEnum terms = reader.terms(new Term("Category"));
if (terms.term() != null) {
    do {
        Term term = terms.term();
        System.out.println(term.text());
    } while(terms.next());
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can't really blame Lucene for this as the methods are aptly named. It's our habits that lead to minor errors like this. &lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/2001702363580224109/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/06/reading-term-values-for-fields-from.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/2001702363580224109?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/2001702363580224109?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/-Y6ihi8tgxw/reading-term-values-for-fields-from.html" title="Reading term values for fields from a Lucene Index" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/06/reading-term-values-for-fields-from.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0EESH45cSp7ImA9WhVbGUQ.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-3066661955350244611</id><published>2012-06-06T09:20:00.000-07:00</published><updated>2012-06-06T09:20:09.029-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-06T09:20:09.029-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Event" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="Mahout" /><category scheme="http://www.blogger.com/atom/ns#" term="Tika" /><category scheme="http://www.blogger.com/atom/ns#" term="Lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="Elasticsearch" /><title>Berlin Buzzwords 2012</title><content type="html">&lt;p&gt;&lt;a href="http://berlinbuzzwords.de/"&gt;Berlin Buzzwords&lt;/a&gt; is an annual conference on search, store and scale technology. I've heard good things about it before and finally got convinced to go there this year. The conference itself lasts for two days but there are additional events before and afterwards so if you like you can spend a whole week.&lt;/p&gt;

&lt;h4&gt;The Barcamp&lt;/h4&gt;

&lt;p&gt;As I had to travel on sunday anyway I took an earlier train to attend the &lt;a href="http://berlinbuzzwords.de/wiki/barcamp"&gt;barcamp&lt;/a&gt; in the early evening. It started with a short introduction of the concepts and the scheduling. Participants could suggest topics that they either would be willing to introduce by themselfes or just anything they are interested in. There were three roomes prepared, a larger and two smaller ones.&lt;/p&gt;

&lt;p&gt;Among others I attended sessions on HBase, designing hybrid applications, Apache Tika and Apache Jackrabbit Oak.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://hbase.apache.org/"&gt;HBase&lt;/a&gt; is a distributed database build on top of the Hadoop filesystem. It seems to be used more often than I would have expected. Interesting to hear about the problems and solutions of other people.&lt;/p&gt;

&lt;p&gt;The next session on hybrid relational and NoSQL applications stayed rather high level. I liked the remark by one guy that &lt;a href="http://lucene.apache.org/solr/"&gt;Solr&lt;/a&gt;, the underdog of NoSQL, often is the first application where people are ok with dismissing some guarantees regarding their data. Adding NoSQL should be exactly like this.&lt;/p&gt;

&lt;p&gt;I only &lt;a href="http://blog.florian-hopf.de/2012/05/content-extraction-with-apache-tika.html"&gt;started just recently&lt;/a&gt; to use &lt;a href="http://tika.apache.org/"&gt;Tika&lt;/a&gt; directly so it was really interesting to see where the project is heading in the future. I was surprised to hear that there now also is a TikaServer that can do similar things like those I described for Solr. That's something I want to try in action.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://wiki.apache.org/jackrabbit/Jackrabbit%203"&gt;Jackrabbit Oak&lt;/a&gt; is a next generation content repository that is mostly driven by the Day team of Adobe. Some of the ideas sound really interesting but I got the feeling that it still can take some time until this really can be used. Jukka Zitting also gave a lightning talk on this topic at the conference, the &lt;a href="http://people.apache.org/~jukka/2012/Jackrabbit%20Oak.pdf"&gt;slides are available here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The atmosphere in the sessions was really relaxed so even though I expected to only listen I took the chance to participate and ask some questions. This probably is the part that makes a barcamp as effective as it is. As you are constantly participating you keep really contentrated on the topic.&lt;/p&gt;

&lt;h4&gt;Day 1&lt;/h4&gt;

&lt;p&gt;The first day started with a great keynote by &lt;a href="http://hawthornlandings.org/"&gt;Leslie Hawthorn&lt;/a&gt; on building and maintaining communities. She compared a lot of the aspects of community work with gardening and introduced &lt;a href="http://openmrs.org/"&gt;OpenMRS&lt;/a&gt;, a successful project building a medical record platform. Though I currently am not actively involved in an open source project I could relate to a lot of the situations she described. All in all an inspiring start of the main conference.&lt;/p&gt;

&lt;p&gt;Next I attended a talk on building hybrid applications with &lt;a href="http://www.mongodb.org/"&gt;MongoDb&lt;/a&gt;. Nothing new for me but I am glad that a lot of people now recommend to split monolithic applications into smaller services. This also is a way to experiment with different languages and techniques without having to migrate large parts of an application.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://berlinbuzzwords.de/sessions/jcr-view-world-everything-content-everything-tree"&gt;A JCR view of the world&lt;/a&gt; provided some examples on how to model different structures using a content tree. Though really introductionary it was interesting to see what kind of applications can be build using a content repository. I also liked the attitude of the speaker: The presentation was delivered using &lt;a href="http://sling.apache.org/site/index.html"&gt;Apache Sling&lt;/a&gt; which uses JCR under the hood.&lt;/p&gt;

&lt;p&gt;Probably the highlight of the first day was the talk by Grant Ingersoll on &lt;a href="http://berlinbuzzwords.de/sessions/large-scale-search-discovery-and-analytics-hadoop-mahout-and-solr"&gt;Large Scale Search, Discovery and Analytics&lt;/a&gt;. He introduced all the parts that make up larger search systems and showed the open source tools he uses. To increase the relevance of the search results you have to integrate solutions to adapt to the behaviour of the users. That's probably one of the big takeaways for me of the whole conference: Always collect data on your users searches to have it available when you want to tune the relevance, either manually or through some learning techniques. &lt;a href="http://www.slideshare.net/gsingers/large-scale-search-discovery-and-analytics-with-hadoop-mahout-and-solr-13203456"&gt;The slides of the talk&lt;/a&gt; are worth looking at.&lt;/p&gt;

&lt;p&gt;The rest of the day I attended several talks on the internals of &lt;a href="http://lucene.apache.org/"&gt;Lucene&lt;/a&gt;. Hardcore stuff, I would be lying if I said I would have understood everything but it was interesting nevertheless. I am glad that some really smart people are taking care that Lucene stays as fast and feature rich as it is.&lt;/p&gt;

&lt;p&gt;The day ended with interesting discussions and some beer at the Buzz Party and BBQ.&lt;/p&gt;

&lt;h4&gt;Day 2&lt;/h4&gt;

&lt;p&gt;The first talk of the second day on &lt;a href="http://berlinbuzzwords.de/sessions/smart-autocompl"&gt;Smart Autocompl...&lt;/a&gt; by Anne Veling was fantastic. Anne demonstrated a rather simple technique for doing semantic analysis of search queries for specialized autocompletion for the largest travel information system in the Netherlands. The query gets tokenized and then each field of the index (e.g. street or city) is queried for each of the tokens. This way you can already guess which might be good field matches.&lt;/p&gt; 

&lt;p&gt;Another talk introduced a scalable tool for preprocessing of documents, &lt;a href="http://www.findwise.com/hydra"&gt;Hydra&lt;/a&gt;. It stores the documents as well as mapping data in a MongoDb instance and you can parallelize the processing steps. The concept sounds really interesting, I hope I can find time to have a closer look.&lt;/p&gt;

&lt;p&gt;In the afternoon I attended several talks on &lt;a href="http://www.elasticsearch.org/"&gt;Elasticsearch&lt;/a&gt;, the scalable search server. Interestingly a lot of people seem to use it more as a storage engine than for searching.&lt;/p&gt;

&lt;p&gt;One of the tracks was cancelled, &lt;a href="http://tdunning.blogspot.de/"&gt;Ted Dunning&lt;/a&gt; introduced new stuff in &lt;a href="http://mahout.apache.org/"&gt;Mahout&lt;/a&gt; instead. He's a really funny speaker and though I am not deep into machine learning I was glad to hear that you are allowed to use and even contribute to Mahout even if you don't have a PhD.&lt;/p&gt; 

&lt;p&gt;In the last track of the day Alex Pinkin showed 10 problems and solutions that you might encounter when building a large app using Solr. Quite some useful advice.&lt;/p&gt;

&lt;h4&gt;The location&lt;/h4&gt;

&lt;p&gt;The event took place at &lt;a href="http://www.urania.de/"&gt;Urania&lt;/a&gt;, a smaller conference center and theatre. Mostly it was suited well but some of the talks were so full that you either had to sit on the floor or weren't even able to enter the room. I understand that it is difficult to predict how many people attend a certain event but some talks probably should have been scheduled in different rooms.&lt;/p&gt;

&lt;p&gt;The food was really good and though it first looked like the distribution was a bottleneck this worked pretty well.&lt;/p&gt;

&lt;h4&gt;The format&lt;/h4&gt;

&lt;p&gt;This year Berlin Buzzwords had a rather unusual format. Most of the talks were only 20 minutes long with some exceptions that were 40 minutes long. I have mixed feelings about this: On the one hand it was great to have a lot of different topics. On the other hand some of the concepts definitively would have needed more time to fully explain and grasp. Respect to all the speakers who had to think about what they would talk about in such a short timeframe.&lt;/p&gt;

&lt;p&gt;Berlin Buzzwords is a fantastic conference and I will definitively go there again.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/3066661955350244611/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/06/berlin-buzzwords-2012.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/3066661955350244611?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/3066661955350244611?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/K0FW64fzd8I/berlin-buzzwords-2012.html" title="Berlin Buzzwords 2012" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/06/berlin-buzzwords-2012.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEcHR3g6eyp7ImA9WhJTEE0.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-7890591294539577145</id><published>2012-05-11T10:58:00.003-07:00</published><updated>2012-06-18T00:53:56.613-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-18T00:53:56.613-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="Tika" /><title>Content Extraction with Apache Tika</title><content type="html">&lt;p&gt;Sometimes you need access to the content of documents, be it that you want to analyze it, store the content in a database or index it for searching. Different formats like word documents, pdfs and html documents need different treatment. &lt;a href="http://tika.apache.org/" target="_blank"&gt;Apache Tika&lt;/a&gt; is a project that combines several open source projects for reading content from &lt;a href="http://tika.apache.org/1.1/formats.html" target="_blank"&gt;a multitude of file formats&lt;/a&gt; and makes the textual content as well as some metadata available using a uniform API. I will show two ways how to leverage the power of Tika for your projects.&lt;/p&gt;&lt;h4&gt;Accessing Tika programmatically&lt;/h4&gt;&lt;p&gt;First, Tika can of course be used as a library. Surprisingly the user docs on the website explain a lot of the functionality that you might be interested in when writing custom parsers for Tika but don't show directly how to use it.&lt;/p&gt;&lt;p&gt;I am using Maven again, so I add a dependency for the most recent version:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;&amp;lt;dependency&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;groupId&amp;gt;org.apache.tika&amp;lt;/groupId&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;artifactId&amp;gt;tika-parsers&amp;lt;/artifactId&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;version&amp;gt;1.1&amp;lt;/version&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;type&amp;gt;jar&amp;lt;/type&amp;gt;&lt;br /&gt;&amp;lt;/dependency&amp;gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;em&gt;tika-parsers&lt;/em&gt; also includes all the other projects that are used so be patient when Maven fetches all the transitive dependencies.&lt;/p&gt; &lt;p&gt;Let's see what some test code for extracting data from a pdf document called &lt;em&gt;slides.pdf&lt;/em&gt;, that is available in the classpath, looks like.&lt;/p&gt;&lt;pre&gt;&lt;code&gt;Parser parser = new PdfParser();
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
InputStream content = getClass().getResourceAsStream("/slides.pdf");
parser.parse(content, handler, metadata, new ParseContext());
assertEquals("Solr Vortrag", metadata.get(Metadata.TITLE));
assertTrue(handler.toString().contains("Lucene"));&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First, we need to instanciate a &lt;a href="http://tika.apache.org/1.1/parser.html" target="_blank"&gt;Parser&lt;/a&gt; that is capable of reading the format, in this case PdfParser that uses &lt;a href="http://pdfbox.apache.org/" target="_blank"&gt;PDFBox&lt;/a&gt; for extracting the content. The parse method expects some parameters to configure the parsing process as well as an InputStream that contains the data of the document. Metadata will contain all the metadata for the document, e.g. the title or the author after the parsing is finished. &lt;/p&gt;&lt;p&gt;Tika uses XHTML as the internal representation for all parsed content. This XHTML document can be processed by a SAX ContentHandler. A custom implementation BodyContentHandler returns all the text in the body area, which is the main content. The last parameter ParseContext can be used to configure the underlying parser instance. &lt;/p&gt;&lt;p&gt;The Metadata class consists of a Map-like structure with some common keys like the title as well as optional format specific information. You can look at the contents with a simple loop:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;for (String name: metadata.names()) { 
    System.out.println(name + ": " + metadata.get(name));
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will produce an output similar to this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;xmpTPg:NPages: 17
Creation-Date: 2010-11-20T09:47:28Z
title: Solr Vortrag
created: Sat Nov 20 10:47:28 CET 2010
producer: OpenOffice.org 2.4
Content-Type: application/pdf
creator: Impress&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The textual content of the document can be retrieved by calling the toString() method on the BodyContentHandler.&lt;/p&gt;&lt;p&gt;This is all fine if you exactly know that you only want to retrieve data from pdf documents. But you probably don't want to introduce a huge switch-block for determining the parser to use depending on the file name or some other information. Fortunately Tika also provides an AutodetectParser that employs different strategies for determining the content type of the document. The code above all stays the same, you just use a different parser:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Parser parser = new AutodetectParser();&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This way you don't have to know what kind of document you are currently processing, Tika will provide you with metadata as well as the content. You can pass in additional hints for the parser e.g. the filename or the content type by setting it in the Metadata object.&lt;/p&gt;
&lt;h4&gt;Extracting content using Solr&lt;/h4&gt;&lt;p&gt;If you are using &lt;a href="http://lucene.apache.org/solr/" target="_blank"&gt;the search server Solr&lt;/a&gt; you can also leverage its REST API for extracting the content. The default configuration has a &lt;a href="http://wiki.apache.org/solr/ExtractingRequestHandler" target="_blank"&gt;request handler&lt;/a&gt; configured for &lt;em&gt;/update/extract&lt;/em&gt; that you can send a document to and it will return the content it extracted using Tika. You just need to add the necessary libraries for the extraction. I am still using Maven so I have to add an additional dependency:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;dependency&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;groupId&amp;gt;org.apache.solr&amp;lt;/groupId&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;artifactId&amp;gt;solr&amp;lt;/artifactId&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;version&amp;gt;3.6.0&amp;lt;/version&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;type&amp;gt;war&amp;lt;/type&amp;gt;&lt;br /&gt;&amp;lt;/dependency&amp;gt;&lt;br /&gt;&amp;lt;dependency&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;groupId&amp;gt;org.apache.solr&amp;lt;/groupId&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;artifactId&amp;gt;solr-cell&amp;lt;/artifactId&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;version&amp;gt;3.6.0&amp;lt;/version&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;type&amp;gt;jar&amp;lt;/type&amp;gt;&lt;br /&gt;&amp;lt;/dependency&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This will include all of the Tika dependencies as well as all necessary third party libraries.&lt;/p&gt;&lt;p&gt;Solr Cell, the request handler, normally is used to index binary files directly but you can also just use it for extraction. To transfer the content you can use any tool that can speak http, e.g. for curl this might look like this:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;curl -F "file=@slides.pdf" "localhost:8983/solr/update/extract?extractOnly=true&amp;amp;extractFormat=text"&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;By setting the parameter &lt;em&gt;extractOnly&lt;/em&gt; to true we advice Solr that we don't want to index the content but want to have it extracted to the response. The result will be the standard Solr XML format that contains the body content as well as the metadata.&lt;/p&gt;&lt;p&gt;You can also use the Java client library &lt;a href="http://wiki.apache.org/solr/Solrj" target="_blank"&gt;SolrJ&lt;/a&gt; for doing the same:&lt;/p&gt;&lt;pre&gt;&lt;code&gt;ContentStreamUpdateRequest request = new ContentStreamUpdateRequest(&amp;quot;/update/extract&amp;quot;);
request.addFile(new File(&amp;quot;slides.pdf&amp;quot;));
request.setParam(&amp;quot;extractOnly&amp;quot;, &amp;quot;true&amp;quot;);
request.setParam(&amp;quot;extractFormat&amp;quot;, &amp;quot;text&amp;quot;);
NamedList&amp;lt;Object&amp;gt; result = server.request(request);&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;em&gt;NamedList&lt;/em&gt; will contain entries for the body content as well as another &lt;em&gt;NamedList&lt;/em&gt; with the metadata.&lt;/p&gt;&lt;br /&gt;&lt;h4&gt;Update&lt;/h4&gt;&lt;br /&gt;&lt;p&gt;Robert has asked in the comments what the response looks like.&lt;br /&gt;Solr uses configurable response writers for marshalling the message. The default format is xml but can be influenced by passing the wt attribute to the request. A simplified standard response looks like this:&lt;/p&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;curl -F &amp;quot;file=@slides.pdf&amp;quot; &amp;quot;localhost:8983/solr/update/extract?extractOnly=true&amp;amp;extractFormat=text&amp;quot;&lt;br /&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;UTF-8&amp;quot;?&amp;gt;&lt;br /&gt;&amp;lt;response&amp;gt;&lt;br /&gt;&amp;lt;lst name=&amp;quot;responseHeader&amp;quot;&amp;gt;&amp;lt;int name=&amp;quot;status&amp;quot;&amp;gt;0&amp;lt;/int&amp;gt;&amp;lt;int name=&amp;quot;QTime&amp;quot;&amp;gt;1952&amp;lt;/int&amp;gt;&amp;lt;/lst&amp;gt;&amp;lt;str name=&amp;quot;slides.pdf&amp;quot;&amp;gt;&lt;br /&gt;&lt;br /&gt;Features                                                                                                                                                                            &lt;br /&gt;                                                                                                                                                                                    &lt;br /&gt;HTTP&amp;#173;Schnittstelle                                                                                                                                                                  &lt;br /&gt;XML&amp;#173;basierte&amp;#160;Konfiguration                                                                                                                                                          &lt;br /&gt;Facettierung                                                                                                                                                                        &lt;br /&gt;Sammlung&amp;#160;n&amp;#252;tzlicher&amp;#160;Lucene&amp;#173;Module/Dismax                                                                                                                                            &lt;br /&gt;                                                                                                                                                                                    &lt;br /&gt;Features                                                                                                                                                                            &lt;br /&gt;                                                                                                                                                                                    &lt;br /&gt;HTTP&amp;#173;Schnittstelle                                                                                                                                                                  &lt;br /&gt;XML&amp;#173;basierte&amp;#160;Konfiguration                                                                                                                                                          &lt;br /&gt;Facettierung                                                                                                                                                                        &lt;br /&gt;Sammlung&amp;#160;n&amp;#252;tzlicher&amp;#160;Lucene&amp;#173;Module/Dismax                                                                                                                                            &lt;br /&gt;Java&amp;#173;Client&amp;#160;SolrJ                                                                                                                                                                   &lt;br /&gt;                                                                                                                                                                                    &lt;br /&gt;[... more content ...] &lt;br /&gt;                                                                                                                                                                                   &lt;br /&gt;&amp;lt;/str&amp;gt;&amp;lt;lst name=&amp;quot;slides.pdf_metadata&amp;quot;&amp;gt;&amp;lt;arr name=&amp;quot;xmpTPg:NPages&amp;quot;&amp;gt;&amp;lt;str&amp;gt;17&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;arr name=&amp;quot;Creation-Date&amp;quot;&amp;gt;&amp;lt;str&amp;gt;2010-11-20T09:47:28Z&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;arr name=&amp;quot;title&amp;quot;&amp;gt;&amp;lt;str&amp;gt;Solr Vortrag&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;arr name=&amp;quot;stream_source_info&amp;quot;&amp;gt;&amp;lt;str&amp;gt;file&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;arr name=&amp;quot;created&amp;quot;&amp;gt;&amp;lt;str&amp;gt;Sat Nov 20 10:47:28 CET 2010&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;arr name=&amp;quot;stream_content_type&amp;quot;&amp;gt;&amp;lt;str&amp;gt;application/octet-stream&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;arr name=&amp;quot;stream_size&amp;quot;&amp;gt;&amp;lt;str&amp;gt;425327&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;arr name=&amp;quot;producer&amp;quot;&amp;gt;&amp;lt;str&amp;gt;OpenOffice.org 2.4&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;arr name=&amp;quot;stream_name&amp;quot;&amp;gt;&amp;lt;str&amp;gt;slides.pdf&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;arr name=&amp;quot;Content-Type&amp;quot;&amp;gt;&amp;lt;str&amp;gt;application/pdf&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;arr name=&amp;quot;creator&amp;quot;&amp;gt;&amp;lt;str&amp;gt;Impress&amp;lt;/str&amp;gt;&amp;lt;/arr&amp;gt;&amp;lt;/lst&amp;gt;                                                                            &lt;br /&gt;&amp;lt;/response&amp;gt;     &lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;p&gt;The response contains some metadata (how long the processing took), the content of the file as well as the metadata that is extracted from the document.&lt;/p&gt;&lt;br /&gt;&lt;p&gt;If you pass the atrribute &lt;em&gt;wt&lt;/em&gt; and set it to &lt;em&gt;json&lt;/em&gt;, the response is contained in a json structure:&lt;/p&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;curl -F &amp;quot;file=@slides.pdf&amp;quot; &amp;quot;localhost:8983/solr/update/extract?extractOnly=true&amp;amp;extractFormat=text&amp;amp;wt=json&amp;quot;             &lt;br /&gt;{&amp;quot;responseHeader&amp;quot;:{&amp;quot;status&amp;quot;:0,&amp;quot;QTime&amp;quot;:217},&amp;quot;slides.pdf&amp;quot;:&amp;quot;\n\n\n\n\n\n\n\n\n\n\n\nSolr Vortrag\n\n&amp;#160; &amp;#160;\n\nEinfach mehr finden mit\n\nFlorian&amp;#160;Hopf\n29.09.2010\n\n\n&amp;#160; &amp;#160;\n\nSolr?\n\n\n&amp;#160; &amp;#160;\n\nSolr?\n\nServer&amp;#173;ization&amp;#160;of&amp;#160;Lucene\n\n\n&amp;#160; &amp;#160;\n\nApache Lucene?\n\nSearch&amp;#160;engine&amp;#160;library\n\n\n&amp;#160; &amp;#160;\n\nApache Lucene?\n\nSearch&amp;#160;engine&amp;#160;library\nTextbasierter&amp;#160;Index\n\n\n&amp;#160; &amp;#160;\n\nApache Lucene?\n\nSearch&amp;#160;engine&amp;#160;library\nTextbasierter&amp;#160;Index\nText&amp;#160;Analyzer\n\n\n&amp;#160; &amp;#160;\n\nApache Lucene?\n\nSearch&amp;#160;engine&amp;#160;library\nTextbasierter&amp;#160;Index\nText&amp;#160;Analyzer\nQuery&amp;#160;Syntax&amp;#160;\n\n\n&amp;#160; &amp;#160;\n\nApache Lucene?\n\nSearch&amp;#160;engine&amp;#160;library\nTextbasierter&amp;#160;Index\nText&amp;#160;Analyzer\nQuery&amp;#160;Syntax&amp;#160;\nScoring\n\n\n&amp;#160; &amp;#160;\n\nFeatures\n\nHTTP&amp;#173;Schnittstelle\n\n\n&amp;#160; &amp;#160;\n\nArchitektur\n\nClient SolrWebapp Lucene\nhttp\n\nKommunikation&amp;#160;&amp;#252;ber&amp;#160;XML,&amp;#160;JSON,&amp;#160;JavaBin,&amp;#160;Ruby,&amp;#160;...\n\n\n&amp;#160; &amp;#160;\n\nFeatures\n\nHTTP&amp;#173;Schnittstelle\nXML&amp;#173;basierte&amp;#160;Konfiguration\n\n\n&amp;#160; &amp;#160;\n\nFeatures\n\nHTTP&amp;#173;Schnittstelle\nXML&amp;#173;basierte&amp;#160;Konfiguration\nFacettierung\n\n\n&amp;#160; &amp;#160;\n\nFeatures\n\nHTTP&amp;#173;Schnittstelle\nXML&amp;#173;basierte&amp;#160;Konfiguration\nFacettierung\nSammlung&amp;#160;n&amp;#252;tzlicher&amp;#160;Lucene&amp;#173;Module/Dismax\n\n\n&amp;#160; &amp;#160;\n\nFeatures\n\nHTTP&amp;#173;Schnittstelle\nXML&amp;#173;basierte&amp;#160;Konfiguration\nFacettierung\nSammlung&amp;#160;n&amp;#252;tzlicher&amp;#160;Lucene&amp;#173;Module/Dismax\nJava&amp;#173;Client&amp;#160;SolrJ\n\n\n&amp;#160; &amp;#160;\n\nDemo\n\n\n&amp;#160; &amp;#160;\n\nWas noch?\nAdmin&amp;#173;Interface\nCaching\nSkalierung\nSpellchecker\nMore&amp;#173;Like&amp;#173;This\nData&amp;#160;Import&amp;#160;Handler\nSolrCell\n\n\n&amp;#160; &amp;#160;\n\nRessourcen\nhttp://lucene.apache.org/solr/\n\n\n\n&amp;quot;,&amp;quot;slides.pdf_metadata&amp;quot;:[&amp;quot;xmpTPg:NPages&amp;quot;,[&amp;quot;17&amp;quot;],&amp;quot;Creation-Date&amp;quot;,[&amp;quot;2010-11-20T09:47:28Z&amp;quot;],&amp;quot;title&amp;quot;,[&amp;quot;Solr Vortrag&amp;quot;],&amp;quot;stream_source_info&amp;quot;,[&amp;quot;file&amp;quot;],&amp;quot;created&amp;quot;,[&amp;quot;Sat Nov 20 10:47:28 CET 2010&amp;quot;],&amp;quot;stream_content_type&amp;quot;,[&amp;quot;application/octet-stream&amp;quot;],&amp;quot;stream_size&amp;quot;,[&amp;quot;425327&amp;quot;],&amp;quot;producer&amp;quot;,[&amp;quot;OpenOffice.org 2.4&amp;quot;],&amp;quot;stream_name&amp;quot;,[&amp;quot;slides.pdf&amp;quot;],&amp;quot;Content-Type&amp;quot;,[&amp;quot;application/pdf&amp;quot;],&amp;quot;creator&amp;quot;,[&amp;quot;Impress&amp;quot;]]}&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;p&gt;There are quite some ResponseWriters available for different languages, e.g. for Ruby. You can have a look at them at the bottom of this page: &lt;a href="http://wiki.apache.org/solr/QueryResponseWriter"&gt;http://wiki.apache.org/solr/QueryResponseWriter&lt;/a&gt;&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/7890591294539577145/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/05/content-extraction-with-apache-tika.html#comment-form" title="4 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/7890591294539577145?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/7890591294539577145?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/WGLnZmbhOQs/content-extraction-with-apache-tika.html" title="Content Extraction with Apache Tika" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/05/content-extraction-with-apache-tika.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0MGRHs-fCp7ImA9WhVUEUo.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-6782370324296441724</id><published>2012-05-07T09:58:00.001-07:00</published><updated>2012-05-16T05:30:25.554-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-05-16T05:30:25.554-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><title>Importing Atom feeds in Solr using the Data Import Handler</title><content type="html">&lt;p&gt;I am working on a search solution that makes some of the content I am producing available through one search interface. One of the content stores is &lt;a href="http://fhopf.blogspot.com"&gt;the blog&lt;/a&gt; you are reading right now, which among other options makes the content available &lt;a href="http://fhopf.blogspot.com/feeds/posts/default?max-results=100"&gt;here&lt;/a&gt; using &lt;a href="http://en.wikipedia.org/wiki/Atom_%28standard%29"&gt;Atom&lt;/a&gt;.&lt;/p&gt; 

&lt;p&gt;&lt;a href="http://lucene.apache.org/solr/"&gt;Solr&lt;/a&gt;, my search server of choice, provides the &lt;a href="http://wiki.apache.org/solr/DataImportHandler"&gt;Data Import Handler&lt;/a&gt; that can be used to import data on a regular basis from sources like databases via JDBC or remote XML sources, like Atom.&lt;/p&gt;

&lt;p&gt;Data Import Handler used to be a core part of Solr but &lt;a href="http://www.lucidimagination.com/blog/2011/04/01/solr-powered-isfdb-part-8/"&gt;starting from 3.1&lt;/a&gt; it is shipped as a separate jar and not included in the standard war anymore. I am using &lt;a href="http://maven.apache.org"&gt;Maven&lt;/a&gt; with overlays for development so I have to add a dependency for it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;dependencies&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;lt;dependency&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;groupId&amp;gt;org.apache.solr&amp;lt;/groupId&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;artifactId&amp;gt;solr&amp;lt;/artifactId&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;version&amp;gt;3.6.0&amp;lt;/version&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;type&amp;gt;war&amp;lt;/type&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;lt;/dependency&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;lt;dependency&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;groupId&amp;gt;org.apache.solr&amp;lt;/groupId&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;artifactId&amp;gt;solr-dataimporthandler&amp;lt;/artifactId&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;version&amp;gt;3.6.0&amp;lt;/version&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;type&amp;gt;jar&amp;lt;/type&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;lt;/dependency&amp;gt;&lt;br/&gt;&amp;lt;/dependencies&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To enable the data import handler you have to add a request handler to your &lt;em&gt;solrconfig.xml&lt;/em&gt;. Request handlers are registered for a certain url and, as the name suggests, are responsible for handling incoming requests:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;requestHandler&amp;nbsp;name=&amp;quot;/dataimport&amp;quot;&amp;nbsp;class=&amp;quot;org.apache.solr.handler.dataimport.DataImportHandler&amp;quot;&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;lt;lst&amp;nbsp;name=&amp;quot;defaults&amp;quot;&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;str&amp;nbsp;name=&amp;quot;config&amp;quot;&amp;gt;data-config.xml&amp;lt;/str&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;lt;/lst&amp;gt;&lt;br/&gt;&amp;lt;/requestHandler&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The file &lt;em&gt;data-config.xml&lt;/em&gt; that is referenced here contains the mapping logic as well as the endpoint to access:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;?xml&amp;nbsp;version=&amp;quot;1.0&amp;quot;&amp;nbsp;encoding=&amp;quot;UTF-8&amp;quot;&amp;nbsp;?&amp;gt;&lt;br/&gt;&amp;lt;dataConfig&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;dataSource&amp;nbsp;type=&amp;quot;URLDataSource&amp;quot;&amp;nbsp;encoding=&amp;quot;UTF-8&amp;quot;&amp;nbsp;connectionTimeout=&amp;quot;5000&amp;quot;&amp;nbsp;readTimeout=&amp;quot;10000&amp;quot;/&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;document&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;entity&amp;nbsp;name=&amp;quot;blog&amp;quot;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;pk=&amp;quot;url&amp;quot;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;url=&amp;quot;http://fhopf.blogspot.com/feeds/posts/default?max-results=100&amp;quot;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;processor=&amp;quot;XPathEntityProcessor&amp;quot;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;forEach=&amp;quot;/feed/entry&amp;quot;&amp;nbsp;transformer=&amp;quot;DateFormatTransformer,HTMLStripTransformer,TemplateTransformer&amp;quot;&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field&amp;nbsp;column=&amp;quot;title&amp;quot;&amp;nbsp;xpath=&amp;quot;/feed/entry/title&amp;quot;/&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field&amp;nbsp;column=&amp;quot;url&amp;quot;&amp;nbsp;xpath=&amp;quot;/feed/entry/link[@rel='alternate']/@href&amp;quot;/&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;!--&amp;nbsp;2012-03-07T21:35:51.229-08:00&amp;nbsp;--&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field&amp;nbsp;column=&amp;quot;last_modified&amp;quot;&amp;nbsp;xpath=&amp;quot;/feed/entry/updated&amp;quot;&amp;nbsp;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;dateTimeFormat=&amp;quot;yyyy-MM-dd'T'hh:mm:ss.SSS&amp;quot;&amp;nbsp;locale=&amp;quot;en&amp;quot;/&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field&amp;nbsp;column=&amp;quot;text&amp;quot;&amp;nbsp;xpath=&amp;quot;/feed/entry/content&amp;quot;&amp;nbsp;stripHTML=&amp;quot;true&amp;quot;/&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field&amp;nbsp;column=&amp;quot;category&amp;quot;&amp;nbsp;xpath=&amp;quot;/feed/entry/category/@term&amp;quot;/&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field&amp;nbsp;column=&amp;quot;type&amp;quot;&amp;nbsp;template=&amp;quot;blog&amp;quot;/&amp;gt;&amp;nbsp;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/entity&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/document&amp;gt;&lt;br/&gt;&amp;lt;/dataConfig&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;First we configure which &lt;em&gt;datasource&lt;/em&gt; to use. This is where you alternatively would use another implementation when fetching documents from a database.&lt;/p&gt;

&lt;p&gt;Documents describe the fields that will be stored in the index. The attributes for the &lt;em&gt;entity&lt;/em&gt; element determine where and how to fetch the data, most importantly the &lt;em&gt;url&lt;/em&gt; and the &lt;em&gt;processor&lt;/em&gt;. &lt;em&gt;forEach&lt;/em&gt; contains an XPath to identify the elements we'd like to loop over. The &lt;em&gt;transformer&lt;/em&gt; attribute is used to specify some classes that are the available when mapping the remote XML to the Solr fields.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;field&lt;/em&gt; elements contain the mapping between the Atom document and the Solr index fields. The &lt;em&gt;column&lt;/em&gt; attribute determines the name of the index field, &lt;em&gt;xpath&lt;/em&gt; determines the node to use in the remote XML document. You can use advanced XPath options like mapping to attributes of elements where only another attribute is set. E.g. &lt;em&gt;/feed/entry/link[@rel='alternate']/@href&lt;/em&gt; points to an element that determines an alternative representation of a blog post entry:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;feed&amp;nbsp;...&amp;gt;&amp;nbsp;&lt;br/&gt;&amp;nbsp;&amp;nbsp;...&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;lt;entry&amp;gt;&amp;nbsp;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;...&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;link&amp;nbsp;rel='alternate'&amp;nbsp;type='text/html'&amp;nbsp;href='http://fhopf.blogspot.com/2012/03/testing-akka-actors-from-java.html'&amp;nbsp;title='Testing&amp;nbsp;Akka&amp;nbsp;actors&amp;nbsp;from&amp;nbsp;Java'/&amp;gt;&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;...&lt;br/&gt;&amp;nbsp;&amp;nbsp;&amp;lt;/entry&amp;gt;&lt;br/&gt;...&lt;br/&gt;&amp;lt;/feed&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For the column &lt;em&gt;last_modified&lt;/em&gt; we are transforming the remote date format to the internal Solr representation using the DateProcessor. I am not sure yet if this is the correct solution as it seems to me I'm losing the timezone information. For the &lt;em&gt;text&lt;/em&gt; field we are first removing all html elements that are contained in the blog post using the HTMLStripTransformer. Finally, the &lt;em&gt;type&lt;/em&gt; contains a hardcoded value that is set using the TemplateTransformer.&lt;/p&gt;

&lt;p&gt;To have everything in one place let's see how the schema for our index looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;field&amp;nbsp;name=&amp;quot;url&amp;quot;&amp;nbsp;type=&amp;quot;string&amp;quot;&amp;nbsp;indexed=&amp;quot;true&amp;quot;&amp;nbsp;stored=&amp;quot;true&amp;quot;&amp;nbsp;required=&amp;quot;true&amp;quot;/&amp;gt;&lt;br/&gt;&amp;lt;field&amp;nbsp;name=&amp;quot;title&amp;quot;&amp;nbsp;type=&amp;quot;text_general&amp;quot;&amp;nbsp;indexed=&amp;quot;true&amp;quot;&amp;nbsp;stored=&amp;quot;true&amp;quot;/&amp;gt;&lt;br/&gt;&amp;lt;field&amp;nbsp;name=&amp;quot;category&amp;quot;&amp;nbsp;type=&amp;quot;text_general&amp;quot;&amp;nbsp;indexed=&amp;quot;true&amp;quot;&amp;nbsp;stored=&amp;quot;true&amp;quot;&amp;nbsp;multiValued=&amp;quot;true&amp;quot;/&amp;gt;&lt;br/&gt;&amp;lt;field&amp;nbsp;name=&amp;quot;last_modified&amp;quot;&amp;nbsp;type=&amp;quot;date&amp;quot;&amp;nbsp;indexed=&amp;quot;true&amp;quot;&amp;nbsp;stored=&amp;quot;true&amp;quot;/&amp;gt;&lt;br/&gt;&amp;lt;field&amp;nbsp;name=&amp;quot;text&amp;quot;&amp;nbsp;type=&amp;quot;text_general&amp;quot;&amp;nbsp;indexed=&amp;quot;true&amp;quot;&amp;nbsp;stored=&amp;quot;false&amp;quot;&amp;nbsp;multiValued=&amp;quot;true&amp;quot;/&amp;gt;&lt;br/&gt;&amp;lt;field&amp;nbsp;name=&amp;quot;type&amp;quot;&amp;nbsp;type=&amp;quot;string&amp;quot;&amp;nbsp;indexed=&amp;quot;true&amp;quot;&amp;nbsp;stored=&amp;quot;false&amp;quot;/&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Finally, how can you trigger the dataimport? There is an option described in the &lt;a href="http://wiki.apache.org/solr/DataImportHandler#Scheduling"&gt;Solr wiki&lt;/a&gt;, but probably a simple solution might be enough for you. I am using a shell script that is triggered by a cron job. These are the contents:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#!/bin/bash
curl localhost:8983/solr/dataimport?command=full-import&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The data import handler is really easy to setup and you can use it to import quite a lot of data sources into your index. If you need more advanced crawling features you might want to have a look at &lt;a href="http://incubator.apache.org/connectors/"&gt;Apache ManifoldCF&lt;/a&gt;, a connector framework for plugging content repositories into search engines like Apache Solr.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/6782370324296441724/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/05/importing-atom-feeds-in-solr-using-data.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/6782370324296441724?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/6782370324296441724?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/Y1711IgmR-k/importing-atom-feeds-in-solr-using-data.html" title="Importing Atom feeds in Solr using the Data Import Handler" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/05/importing-atom-feeds-in-solr-using-data.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0AHRXY6eSp7ImA9WhNbF0Q.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-9085339176956797693</id><published>2012-03-07T01:02:00.008-08:00</published><updated>2013-01-21T12:22:14.811-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-01-21T12:22:14.811-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Scala" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="Test" /><category scheme="http://www.blogger.com/atom/ns#" term="Akka" /><title>Testing Akka actors from Java</title><content type="html">&lt;p&gt;&lt;em&gt;If you're looking for a general introduction into using Akka from Java have a look at &lt;a href="http://blog.florian-hopf.de/2012/08/getting-rid-of-synchronized-using-akka.html"&gt;this post&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In a recent project I've been using &lt;a href="http://akka.io/"&gt;Akka&lt;/a&gt; for a concurrent producer-consumer setup. It is an actor framework for the JVM that is implemented in Scala but provides a Java API so normally you don't notice that your dealing with a Scala library. &lt;/p&gt;

&lt;p&gt;Most of my business code is encapsulated in services that don't depend on Akka and can therefore be tested in isolation. But for some cases I've been looking for a way to test the behaviour of the actors. As I struggled with this for a while and didn't find a real howto on testing Akka actors from Java I hope my notes might be useful for other people as well.&lt;/p&gt;

&lt;p&gt;The main problem when testing actors is that they are managed objects and you can't just instanciate them. Akka comes with a module for tests that is documented well for &lt;a href="http://akka.io/docs/akka/2.0/scala/testing.html"&gt;using it from Scala&lt;/a&gt;. But besides the note that it's possible you don't find a lot of information on using it from Java.&lt;/p&gt;

&lt;p&gt;When using Maven you need to make sure that you have the akka-testkit dependency in place:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;dependency&amp;gt;&lt;br /&gt;    &amp;lt;groupId&amp;gt;com.typesafe.akka&amp;lt;/groupId&amp;gt;&lt;br /&gt;    &amp;lt;artifactId&amp;gt;akka-testkit&amp;lt;/artifactId&amp;gt;&lt;br /&gt;    &amp;lt;version&amp;gt;2.1-SNAPSHOT&amp;lt;/version&amp;gt;&lt;br /&gt;    &amp;lt;scope&amp;gt;test&amp;lt;/scope&amp;gt;&lt;br /&gt;&amp;lt;/dependency&amp;gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I will show you how to implement a test for the actors that are introduced in the &lt;a href="http://akka.io/docs/akka/2.0/intro/getting-started-first-java.html"&gt;Akka java tutorial&lt;/a&gt;. It involves one actor that does a substep of calculating Pi for a certain start number and a given number of elements. &lt;/p&gt;

&lt;p&gt;To test this actor we need a way to set it up. Akka-testkit provides a helper &lt;a href="http://akka.io/api/akka/2.0/akka/testkit/TestActorRef.html"&gt;TestActorRef&lt;/a&gt; that can be used to set it up. Using scala this seems to be rather simple:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;val testActor = TestActorRef[Worker]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you try to do this from Java you will notice that you can't use a similar call. I have to admit that I am not quite sure yet what is going on. I would have expected that there is an apply() method on the TestActorRef companion object that uses some kind of implicits to instanciate the Worker object. But when inspecting the sources the thing that comes closest to it is this definition:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;def apply[T &lt;: Actor](factory: ⇒ T)(implicit system: ActorSystem)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;No sign of implicit for the factory. Something I still have to investigate further.&lt;/p&gt;

&lt;p&gt;To use it from Java you can use the method apply that takes a reference to a &lt;a href="http://www.scala-lang.org/api/current/scala/Function0.html"&gt;Function0&lt;/a&gt; and an actor system. The actor system can be setup easily using&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;actorSystem = ActorSystem.apply();&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;a href="http://stackoverflow.com/questions/1223834/how-does-scalas-apply-method-magic-work/1223913#1223913"&gt;apply() method&lt;/a&gt; is very important in scala as it's kind of the default method for objects. For example myList(1) is internally using myList.apply(1).&lt;/p&gt;

&lt;o&gt;If you're like me and expect that Function0 is a single method interface you will be surprised. It contains a lot of strange looking methods that you really don't want to have cluttering your test code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;TestActorRef&lt;Pi.Worker&gt; workerRef = TestActorRef.apply(new Function0&lt;Pi.Worker&gt;() {&lt;br /&gt;&lt;br /&gt;    @Override&lt;br /&gt;    public Worker apply() {&lt;br /&gt;        throw new UnsupportedOperationException("Not supported yet.");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    @Override&lt;br /&gt;    public void apply$mcV$sp() {&lt;br /&gt;        throw new UnsupportedOperationException("Not supported yet.");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    @Override&lt;br /&gt;    public boolean apply$mcZ$sp() {&lt;br /&gt;        throw new UnsupportedOperationException("Not supported yet.");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    @Override&lt;br /&gt;    public byte apply$mcB$sp() {&lt;br /&gt;        throw new UnsupportedOperationException("Not supported yet.");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    @Override&lt;br /&gt;    public short apply$mcS$sp() {&lt;br /&gt;        throw new UnsupportedOperationException("Not supported yet.");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    @Override&lt;br /&gt;    public char apply$mcC$sp() {&lt;br /&gt;        throw new UnsupportedOperationException("Not supported yet.");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    @Override&lt;br /&gt;    public int apply$mcI$sp() {&lt;br /&gt;        throw new UnsupportedOperationException("Not supported yet.");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    @Override&lt;br /&gt;    public long apply$mcJ$sp() {&lt;br /&gt;        throw new UnsupportedOperationException("Not supported yet.");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    @Override&lt;br /&gt;    public float apply$mcF$sp() {&lt;br /&gt;        throw new UnsupportedOperationException("Not supported yet.");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    @Override&lt;br /&gt;    public double apply$mcD$sp() {&lt;br /&gt;        throw new UnsupportedOperationException("Not supported yet.");&lt;br /&gt;    }&lt;br /&gt;}, actorSystem);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The only method we really are interested in is the normal apply method. Where do those other methods come from? There is no obvious hint in the scaladocs. &lt;/p&gt;

&lt;p&gt;During searching for the solution I found a &lt;a href="http://scala-programming-language.1934581.n4.nabble.com/Problem-Java-and-Scala-interoperability-in-2-8-0-RC1-td2125847.html"&gt;mailing list thread&lt;/a&gt; that explains some of the magic. The methods are performance optimizations for boxing and unboxing that are automatically generated by the scala compiler for the &lt;a href="http://www.scala-lang.org/archives/downloads/distrib/files/nightly/docs/library/scala/specialized.html"&gt;@specialized annotation&lt;/a&gt;. Still, I am unsure about why this is happening exactly. According to &lt;a href="http://days2010.scala-lang.org/node/138/151"&gt;this presentation&lt;/a&gt; I would have expected that I am using the specialized instance for Object, maybe that is something special regarding traits? &lt;/p&gt;

&lt;p&gt;Fortunately we don't really need to implement the interface ourself: There's an adapter class, &lt;a href="http://www.scala-lang.org/api/current/scala/runtime/AbstractFunction0.html"&gt;AbstractFunction0&lt;/a&gt;, that makes your code look much nicer:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@Before&lt;br /&gt;public void initActor() {&lt;br /&gt;    actorSystem = ActorSystem.apply();&lt;br /&gt;    actorRef = TestActorRef.apply(new AbstractFunction0&lt;Pi.Worker&gt;() {&lt;br /&gt;&lt;br /&gt;        @Override&lt;br /&gt;        public Pi.Worker apply() {&lt;br /&gt;            return new Pi.Worker();&lt;br /&gt;        }&lt;br /&gt;           &lt;br /&gt;    }, actorSystem);&lt;br /&gt;}&lt;/code&gt;&lt;/pre&gt;  

&lt;p&gt;This is like I would have expected it to behave in the first place.&lt;/p&gt;

&lt;p&gt;Now, as we have setup our test we can use the TestActorRef to really test the actor. For example we can test that the actor doesn't do anything for a String message:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@Test&lt;br /&gt;public void doNothingForString() {&lt;br /&gt;    TestProbe testProbe = TestProbe.apply(actorSystem);&lt;br /&gt;    actorRef.tell("Hello", testProbe.ref());&lt;br /&gt;&lt;br /&gt;    testProbe.expectNoMsg(Duration.apply(100, TimeUnit.MILLISECONDS));&lt;br /&gt;}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;a href="http://akka.io/api/akka/2.0/akka/testkit/TestProbe.html"&gt;TestProbe&lt;/a&gt; is another helper that can be used to check the messages that are sent between cooperating actors. In this example we are checking that no message is passed to the sender for 100 miliseconds, which should be enough for execution.&lt;/p&gt;

&lt;p&gt;Let's test some real functionality. Send a message to the actor and check that the result message is send:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@Test&lt;br /&gt;public void calculatePiFor0() {&lt;br /&gt;    TestProbe testProbe = TestProbe.apply(actorSystem);&lt;br /&gt;    Pi.Work work = new Pi.Work(0, 0);        &lt;br /&gt;    actorRef.tell(work, testProbe.ref());&lt;br /&gt;&lt;br /&gt;    testProbe.expectMsgClass(Pi.Result.class);     &lt;br /&gt;    TestActor.Message message = testProbe.lastMessage();&lt;br /&gt;    Pi.Result resultMsg = (Pi.Result) message.msg();&lt;br /&gt;    assertEquals(0.0, resultMsg.getValue(), 0.0000000001);&lt;br /&gt;}&lt;/code&gt;&lt;/pre&gt; 

&lt;p&gt;Now we use the TestProbe to block until a message arrives. When it's there we can have a look at using the lastMessage().&lt;/p&gt;

&lt;p&gt;You can look at the rest of the test on &lt;a href="https://github.com/fhopf/akka/tree/master/akka-tutorials/akka-tutorial-first"&gt;Github&lt;/a&gt;. Comments are more than welcome as I am pretty new to Scala as well as Akka.&lt;/p&gt;

&lt;h4&gt;Update&lt;/h4&gt;
&lt;p&gt;As &lt;a href="https://twitter.com/#!/jboner/status/177488321784713216"&gt;Jonas Bonér points out&lt;/a&gt; I've been using the Scala API. Using the Props class the setup is easier:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    @Before&lt;br /&gt;    public void initActor() {&lt;br /&gt;        actorSystem = ActorSystem.apply();&lt;br /&gt;        actorRef = TestActorRef.apply(new Props(Pi.Worker.class), actorSystem);&lt;br /&gt;    }&lt;/code&gt;&lt;/pre&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/9085339176956797693/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/03/testing-akka-actors-from-java.html#comment-form" title="2 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/9085339176956797693?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/9085339176956797693?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/JvWmqqQg3yY/testing-akka-actors-from-java.html" title="Testing Akka actors from Java" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/03/testing-akka-actors-from-java.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU8NQ3ozeCp7ImA9WhVUEUo.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-6750336721789300941</id><published>2012-02-19T08:22:00.007-08:00</published><updated>2012-05-16T06:11:32.480-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-05-16T06:11:32.480-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Event" /><title>Legacy Code Retreat</title><content type="html">&lt;p&gt;Yesterday I attended the first german Legacy Code Retreat in Bretten. The event was organized by &lt;a href="http://groupspaces.com/softwerkskammer/"&gt;Softwerkskammer&lt;/a&gt;, the german software craftsmanship community. &lt;/p&gt;

&lt;p&gt;A &lt;a href="http://whatis.legacycoderetreat.com/"&gt;legacy code retreat&lt;/a&gt; doesn't work like a common code retreat where you implement a certain functionality again and again. It instead starts with some really flawed code and the participants apply different refactoring steps to make it more testable and maintainable. There are six iterations of 45 minutes with different tasks or aims. For each iteration you work with a different partner and after a short retrospective with all participants you mostly start again from the original code. &lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/jbrains/trivia"&gt;github repository for the legacy code&lt;/a&gt; contains the code in several languages among which are Java, C++, C# and Ruby.&lt;/p&gt;

&lt;h4&gt;Iteration 1&lt;/h4&gt;

&lt;p&gt;The first iteration was used to get to know the functionality of the code. There were no real rules so the participants were free to explore the code in any way they liked.&lt;/p&gt;

&lt;p&gt;I paired with &lt;a href="http://www.lesscode.de/"&gt;Heiko Seebach&lt;/a&gt; who I already knew to be a Ruby guy. We were looking at the code with a standard text editor, already quite unfamiliar to standard Java IDE work. I already got enough Ruby knowledge to understand code when I see it so this was no problem. For quite some time we tried to understand a certain aspect that was happening when running the code. It turned out that this was a bug in the Ruby-version of the code. Next we tried to setup RSpec and get starting with some tests.&lt;/p&gt;

&lt;p&gt;During this iteration I didn't learn that much about the legacy code but more about some Ruby stuff.&lt;/p&gt;

&lt;h4&gt;Iteration 2&lt;/h4&gt;

&lt;p&gt;The target of the second iteration was to prepare a golden master test that could be used during all of the following iterations. The original legacy code is triggered by random input (in the Java version using java.util.Random) and writes all its state to System.out. We should capture the output for a certain input sequence and write it to a file. This can then automatically be compared to the output of a modified version. If both files are the same there are likely no regresions in the code.&lt;/p&gt;

&lt;p&gt;I paired with another Java guy and we were working on my machine in Netbeans. I noticed how unfamiliar I am with standard Netbeans project setup as I am using Maven most of the time. We were doing the test and started some refactorings, all in all a quite productive iteration. Things I learned: &lt;a href="http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Random.html"&gt;java.util.Random&lt;/a&gt; really only uses the seed for its number generation so if you are using the same seed again and again you always get the same result. Also, when doing file stuff in plain Java I really miss &lt;a href="http://commons.apache.org/io/"&gt;commons-io&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Iteration 3&lt;/h4&gt;

&lt;p&gt;In Iteration 3 we were supposed to use an antipattern for testing: &lt;a href="http://c2.com/cgi/wiki?SubclassToTestAntiPattern"&gt;Subclass to Test&lt;/a&gt;. You take the original class and overwrite some methods in it that are called from the method to test.&lt;/p&gt;

&lt;p&gt;It turned out that the original code is not suited well for this approach. There are only few methods that really rely on other methods. Most of the methods are accessing the state via the fields directly. Me and my partner therefore didn't really overwrite the methods but instead use an &lt;a href="http://docs.oracle.com/javase/tutorial/java/javaOO/initial.html"&gt;initializer block&lt;/a&gt; for prepareing the state of method calls. This is similar to an approach for Map-initialization that I started to apply only just recently:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Map&lt;String, String&gt; data = new HashMap&lt;String, String&gt;() {&lt;br /&gt;    {&lt;br /&gt;        put("key", "value");&lt;br /&gt;    }&lt;br /&gt;};&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The approach worked quite fine for the given code but it's probably true that the tests won't stay maintainable.&lt;/p&gt;

&lt;h4&gt;Iteration 4&lt;/h4&gt;

&lt;p&gt;Iteration 4 was based on the previous iteration. All the methods that have been subclassed for testing should be moved to delegates and passed into the original class using a dependency injection approach.&lt;/p&gt;

&lt;p&gt;I paired with a C++ guy who is doing embedded stuff during his day job on his C++ code. It turned out that we had quite different opinions and experiences. He was really focused on performance and couldn't understand why you would want to move methods to another class just to delegate to them as you are introducing overhead with the method call.&lt;/p&gt;

&lt;p&gt;I haven't done any C++ programming since University. Eclipse seems to be suited well for development but compared to Java it still seems to lack a lot of convenience functionality. &lt;/p&gt;

&lt;h4&gt;Iteration 5&lt;/h4&gt;&lt;br /&gt;

&lt;p&gt;On Iteration 5 I paired with Tilman, a Clean Code Aficionado who I already knew from our &lt;a href="http://jug-karlsruhe.de/"&gt;local Java User Group&lt;/a&gt;. We were supposed to change as many methods as possible to real functions that don't work on fields but on parameter values only.&lt;/p&gt;

&lt;p&gt;A lot of people were struggling with this approach at first. But it turns out if you are doing this you have a really good starting position for doing further refactorings more easily. &lt;/p&gt;

&lt;p&gt;My partner was doing most of the coding with some input from me. We were taking some directions I wouldn't have taken by myself but the resulting code was really well structured and could be reduced in size. Also we worked with an interesting Eclipse plugin I had seen before already: &lt;a href="http://infinitest.github.com/"&gt;Infinitest&lt;/a&gt; always runs the tests in the background, no need to run the tests manually. Have to check if there's something like this available for Netbeans as well.&lt;/p&gt;

&lt;h4&gt;Iteration 6&lt;/h4&gt;

&lt;p&gt;To be honest, I don't know what the goal of the sixth iteration really was. I was pairing with a developer that was still fighting with the failing tests from the previous iteration. Most of the iteration we tried to get these running again. In the last few minutes we managed to extract some clases and clean up some code.&lt;/p&gt;

&lt;h4&gt;Conclusion&lt;/h4&gt;
&lt;p&gt;The first german legacy code retreat really was a great experience. I learned a lot and, probably even more important, had a lot of fun.&lt;/p&gt;
&lt;p&gt;The food and the location both were excellent. Thanks to the organizers &lt;a href="http://pboop.wordpress.com/"&gt;Nicole and Andreas&lt;/a&gt; as well as the sponsors for making it possible. It's great to be able to attend a high quality event totally for free.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/6750336721789300941/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/02/legacy-code-retreat.html#comment-form" title="1 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/6750336721789300941?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/6750336721789300941?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/oCbGT7ePq_8/legacy-code-retreat.html" title="Legacy Code Retreat" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/02/legacy-code-retreat.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEINSX89cSp7ImA9WhRVEk4.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-664556238900210155</id><published>2012-01-10T13:29:00.000-08:00</published><updated>2012-01-10T13:56:38.169-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-10T13:56:38.169-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Test" /><category scheme="http://www.blogger.com/atom/ns#" term="Maven" /><category scheme="http://www.blogger.com/atom/ns#" term="Netbeans" /><title>Running my Tests again</title><content type="html">For some time I've been bugged by a &lt;a href="http://netbeans.org"&gt;Netbeans&lt;/a&gt; problem that I couldn't find any solution to. When running a unit test from within Netbeans from time to time it happended that the tests just failed. They seemed to be executed in an old state. Running them again didn't help either, it seemed that some parts of the project didn't get recompiled. When executing the tests from a command line Maven build there were never any problems and afterwards the tests could be run again from Netbeans. The problem only occured very infrequently but nevertheless it was really annoying. I started not running the tests from Netbeans at all but only using Maven. That is also not a good solution as you either run all tests or have to edit the command line all the time for running only a single test.&lt;br /&gt;&lt;br /&gt;Recently I noticed what caused the problem: Netbeans has its &lt;a href="http://wiki.netbeans.org/FaqCompileOnSave"&gt;compile on save&lt;/a&gt; feature on for tests. This means it is using its internal incremental compile feature which doesn't seem to work fine at least for some project setups. &lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-JL0unt9i9Zo/Twyz51UurNI/AAAAAAAAADA/Q5VIVX8HLxc/s1600/cos.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 86px;" src="http://4.bp.blogspot.com/-JL0unt9i9Zo/Twyz51UurNI/AAAAAAAAADA/Q5VIVX8HLxc/s320/cos.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5696125434864774354" /&gt;&lt;/a&gt;&lt;br /&gt;You can disable it in the project properties on the Build/Compile node. I haven't seen any problems since disabling it. Saves me a lot of time to run the tests from the IDE again.</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/664556238900210155/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2012/01/running-my-tests-again.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/664556238900210155?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/664556238900210155?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/b5xZj7OxQIs/running-my-tests-again.html" title="Running my Tests again" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-JL0unt9i9Zo/Twyz51UurNI/AAAAAAAAADA/Q5VIVX8HLxc/s72-c/cos.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2012/01/running-my-tests-again.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DE4FQHg7eip7ImA9WhRWEUs.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-59383568412412420</id><published>2011-12-29T01:22:00.000-08:00</published><updated>2011-12-29T05:55:11.602-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-29T05:55:11.602-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Event" /><title>Talking about Code</title><content type="html">Yesterday I attended the &lt;a href="http://groupspaces.com/softwerkskammer/"&gt;Softwerkskammer&lt;/a&gt; Karlsruhe meetup for the first time. Softwerkskammer tries to connect the Software craftmanship community in Germany.&lt;br /&gt;&lt;br /&gt;The topic for the evening was simple: More Code. We looked at a lot of samples from a real project and discussed what's wrong with them and what could be done better. There were a lot of different opinions but that's a good thing as I got to question some habits I have when programming.&lt;br /&gt;&lt;br /&gt;This has been the first time I've been to a meeting where there is a lively discussion like this. The conferences and &lt;a href="http://jug-ka.de"&gt;user groups&lt;/a&gt; I attend mostly have classic talks with one speaker and far less audience participation. Talking about code is a really good way to learn and this won't be the last time I attended a meetup. Thanks to the organizers.</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/59383568412412420/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2011/12/talking-about-code.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/59383568412412420?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/59383568412412420?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/XXuf_aLeDiA/talking-about-code.html" title="Talking about Code" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2011/12/talking-about-code.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Dk4HQng-eCp7ImA9WhVUEUo.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-3431038754084065309</id><published>2011-12-26T07:46:00.000-08:00</published><updated>2012-05-16T06:28:53.650-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-05-16T06:28:53.650-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="Spring" /><category scheme="http://www.blogger.com/atom/ns#" term="Book" /><title>Spring in Action</title><content type="html">&lt;p&gt;Sometimes it's comfortable to not be an absolute expert in a certain technology. This makes it really easy to learn new stuff, e.g. by profane methods like reading a book. Even if you are a Spring expert it is still likely that you will take something from the latest edition of &lt;a href="http://www.manning.com/walls4/"&gt;Spring in Action by Craig Walls&lt;/a&gt; as it covers a wide range of topics. I haven't read one of the predecessors but people told me that those are even better.&lt;/p&gt; 

&lt;p&gt;Having finished the book recently I just wanted to take the time to write down two interesting small configuration features that I learned from it.&lt;/p&gt;

&lt;h4&gt;p-Namespace&lt;/h4&gt;

&lt;p&gt;A feature that I just didn't know before but seems to be quite useful is the &lt;a href="http://static.springsource.org/spring/docs/3.0.6.RELEASE/spring-framework-reference/html/beans.html#beans-p-namespace"&gt;p-Namespace&lt;/a&gt;. It's a namespace that is not backed by a schema and allows to configure beans in a really concise way. For example look at how a bean might be configured normally:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    &amp;lt;bean id="foo" class="foo.bar.Baz"&amp;gt;&lt;br /&gt;        &amp;lt;property name="myLongProperty" value="2"/&amp;gt;&lt;br /&gt;        &amp;lt;property name="myStringProperty" value="Hallo"/&amp;gt;&lt;br /&gt;    &amp;lt;/bean&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The properties we'd like to set are children of the bean node. Netbeans comes with nice autocompletion support for the property names as you can see from the screenshot.&lt;p&gt;

&lt;a href="http://2.bp.blogspot.com/-gFYvhLg2vwo/TvmGWCb6GRI/AAAAAAAAACw/AFwJLfc-5eU/s1600/blog-netbeans-spring-property.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 86px;" src="http://2.bp.blogspot.com/-gFYvhLg2vwo/TvmGWCb6GRI/AAAAAAAAACw/AFwJLfc-5eU/s320/blog-netbeans-spring-property.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5690727317328501010" /&gt;&lt;/a&gt;

&lt;p&gt;The p-Namespace is a more concise version where the property names itself become attributes of the bean node:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    &amp;lt;bean id="foo" class="foo.bar.Baz"&lt;br /&gt;        p:myLongProperty="2" p:myStringProperty="Hallo"/&amp;gt;&lt;/code&gt;&lt;/pre&gt; 

&lt;p&gt;See that Netbeans is also clever enough to offer code completion here as well. &lt;/p&gt;

&lt;a href="http://4.bp.blogspot.com/-cwq2dtJ9eEw/TvmGIRSzexI/AAAAAAAAACk/RcPWwIeLjY0/s1600/blog-netbeans-spring-pnamespace.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 116px;" src="http://4.bp.blogspot.com/-cwq2dtJ9eEw/TvmGIRSzexI/AAAAAAAAACk/RcPWwIeLjY0/s320/blog-netbeans-spring-pnamespace.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5690727080798681874" /&gt;&lt;/a&gt;

&lt;p&gt;I am not sure if I will use the short form of the p-Namespace a lot. A consistent use of the features in a project is quite important so I think if the short form is used it should be used everywhere in the project.&lt;p&gt;

&lt;h4&gt;Accessing Constants&lt;/h4&gt;

&lt;p&gt;Sometimes you need to access some constants in your Spring configuration files. There are several ways to handle this, one of it using the util-Namespace:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;property name="day"&amp;gt;&lt;br /&gt;    &amp;lt;util:constant static-field="java.util.Calendar.WEDNESDAY"/&amp;gt;&lt;br /&gt;&amp;lt;/property&amp;gt;&lt;/code&gt;&lt;/pre&gt;  

&lt;p&gt;Another way can be to use the Spring Expression Language to access it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;property name="day" value="#{T(java.util.Calendar).WEDNESDAY}"/&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I think this can be used more commonly as the value doesn't need to be registered as a subnode. For example I had some problems using util:constant as key or value in a util:map. That would have been easy just using the EL version.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/3431038754084065309/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2011/12/spring-in-action.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/3431038754084065309?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/3431038754084065309?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/ouNU8EAMMFE/spring-in-action.html" title="Spring in Action" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-gFYvhLg2vwo/TvmGWCb6GRI/AAAAAAAAACw/AFwJLfc-5eU/s72-c/blog-netbeans-spring-property.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2011/12/spring-in-action.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DE4HQ30zfSp7ImA9WhRWEUs.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-5469074833229820472</id><published>2011-12-07T13:46:00.000-08:00</published><updated>2011-12-29T05:55:32.385-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-29T05:55:32.385-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Event" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="JUG" /><title>Not another Diamond Operator Introduction</title><content type="html">I just returned from the talk "Lucky Seven" of our local &lt;a href="http://jug-ka.de/"&gt;Java User Group&lt;/a&gt;. It was far better than I expected. Not that I expected Wolfgang Weigend to be a bad speaker but though I organized the event I got the feeling that I had seen one too many Java 7 introductions already. But there was more ...&lt;br /&gt;&lt;br /&gt;One of the interesting aspects that I haven't been paying that much attention to is the merge of the &lt;a href="http://www.oracle.com/technetwork/middleware/jrockit/overview/index.html"&gt;JRockit&lt;/a&gt; and &lt;a href="http://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html"&gt;Hotspot VM&lt;/a&gt;. Hotspot will be the base of the new development and JRockit features will be merged in. Some of these features will already be available in OpenJDK during the JDK 7 timespan.&lt;br /&gt;&lt;br /&gt;One of the changes got some amount of interest lately: &lt;a href="http://openjdk.java.net/jeps/122"&gt;The PermGen space will be removed.&lt;/a&gt; Sounds like a major change but, once it works, it will definitively be a huge benefit.&lt;br /&gt;&lt;br /&gt;JRockit has been highly respected for its monitoring features. Among those is the interesting &lt;a href="http://docs.oracle.com/cd/E15289_01/doc.40/e15070/introduction.htm"&gt;Java Flight Recorder&lt;/a&gt; that reminds me of the commercial project &lt;a href="http://www.chrononsystems.com/"&gt;Chronon&lt;/a&gt;. It will be an always on recording  of data in the JVM that can be used for diagnostic purposes. Sounds really interesting!&lt;br /&gt;&lt;br /&gt;The overall goal of the convergence is to have a VM that can tune itself. Looking forward to it!&lt;br /&gt;&lt;br /&gt;&lt;a href="http://jug-karlsruhe.mixxt.de/networks/files/file.84658"&gt;The (mixed german and english) slides of the talk are available for download.&lt;/a&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/5469074833229820472/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2011/12/not-another-diamond-operator.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/5469074833229820472?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/5469074833229820472?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/jQhfUoDI84A/not-another-diamond-operator.html" title="Not another Diamond Operator Introduction" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2011/12/not-another-diamond-operator.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0UCRH4yeCp7ImA9WhdaFk8.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-8311420640525110415</id><published>2011-10-26T01:58:00.000-07:00</published><updated>2011-10-26T03:34:25.090-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-26T03:34:25.090-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Maven" /><category scheme="http://www.blogger.com/atom/ns#" term="Gradle" /><title>Getting started with Gradle</title><content type="html">&lt;a href="http://maven.apache.org/"&gt;Maven&lt;/a&gt; has been my build tool of choice for some years now. Coming from Ant the declarative approach, useful conventions as well as the dependency management offered a huge benefit. But as with most technologies the more you are using it the more minor and major flaws appear. A big problem is that with Maven builds are sometimes not reproducable. The outcome of the build is influenced by the state of your local repository.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gradle.org/"&gt;Gradle&lt;/a&gt; is a &lt;a href="http://groovy.codehaus.org/"&gt;Groovy&lt;/a&gt; based build system that is often recommended as a more advanced system. The features that make it appealing to me are probably the easier syntax and the &lt;a href="http://forums.gradle.org/gradle/topics/welcome_to_our_new_dependency_cache"&gt;advanced dependency cache&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;For a &lt;a href="https://github.com/fhopf/opencms-rfs-driver"&gt;recent project that I just uploaded for someone else&lt;/a&gt; I needed to add a simple way to build the jar. Time to do it with Gradle and see what it feels like.&lt;br /&gt;&lt;h4&gt;The build script&lt;/h4&gt;&lt;br /&gt;The purpose of the build is simple: compile some classes with some dependencies and package those to a jar file. Same as Maven and Ant, Gradle also needs at least one file that describes the build. This is what build.gradle looks like:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;apply plugin: 'java'&lt;br /&gt;&lt;br /&gt;repositories {&lt;br /&gt;    mavenCentral()&lt;br /&gt;    mavenRepo url: "http://bp-cms-commons.sourceforge.net/m2repo"&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;dependencies {&lt;br /&gt;    compile group: 'org.opencms', name: 'opencms-core', version: '7.5.4'&lt;br /&gt;    compile group: 'javax.servlet', name: 'servlet-api', version: '2.5'&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;sourceSets {&lt;br /&gt;    main {&lt;br /&gt;        java {&lt;br /&gt;            srcDir 'src'&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Let's step through the file line by line. The first line tells gradle to use the java plugin. This plugin ships with tasks for compiling and packaging java classes.&lt;br /&gt;&lt;br /&gt;In the next block we are declaring some dependency repositories. Luckily Gradle supports Maven repositories so existing repositories like Maven central can be used. I guess without this feature Gradle would not gain a lot of adoption at all. There are two repos declared: Maven central where most of the common dependencies are stored and a custom repo that provides the OpenCms dependencies.&lt;br /&gt;&lt;br /&gt;The next block is used to declare which dependencies are necessary for the build. Gradle also supports scopes (in Gradle: configurations) so for example you can declare that some jars are only needed during test run. The dependency declaration is in this case similar to the Maven coordinates but Gradle also supports more advanced features like &lt;a href="http://gradle.org/current/docs/userguide/artifact_dependencies_tutorial.html"&gt;version ranges&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The last block isn't really necessary. It's only there because my Java sources are located in src instead of the default src/main/java. Gradle uses a lot of the Maven conventions so it's really easy to migrate builds.&lt;br /&gt;&lt;h4&gt;Building&lt;/h4&gt;&lt;br /&gt;To build the project you need Gradle installed. You can download a single distribution that already packages Grooovy and all the needed files. You only need to add the bin folder to your path.&lt;br /&gt;&lt;br /&gt;Packaging the jar is easy: You just run the jar task in the java plugin: gradle :jar. Gradle will start to download all direct and transitive dependencies. The fun part: It uses a nice command line library that can display text in bold, rewrite lines and the like. Fun to watch it.&lt;br /&gt;&lt;br /&gt;I like the simplicity and readability of the build script. You don't need to declare anything if you don't really need it. No coordinates, no schema declaration, nothing. I hope I will find time to use it in a larger project so I can see what it really feels like in the daily project work.</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/8311420640525110415/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2011/10/getting-started-with-gradle.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/8311420640525110415?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/8311420640525110415?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/tsgrYASLUBQ/getting-started-with-gradle.html" title="Getting started with Gradle" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2011/10/getting-started-with-gradle.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ck8ERX0_eSp7ImA9Wx9bGU8.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-5468042992969890140</id><published>2011-02-28T11:08:00.000-08:00</published><updated>2011-02-28T11:20:04.341-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-02-28T11:20:04.341-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="Book" /><title>Book Review: Solr 1.4 Enterprise Search Server</title><content type="html">I've been interested in &lt;a href="http://lucene.apache.org/solr/"&gt;Solr&lt;/a&gt; since I read about it the first time, must have been some time in 2008, doing some research for a search centric web page that was supposed to be run on OpenCms but unfortunately was never developed. At that time I wouldn't have used it as I hadn't heard about it before but I liked the idea a lot. After having attended the &lt;a href="http://parleys.com/#sl=1&amp;st=5&amp;id=1546"&gt;Devoxx university session by Eric Hatcher&lt;/a&gt; on Solr in 2009 I was completely sure that the next search system I would implement would be based on Solr. The project's nearly finished now, time to recap what I took out of the book I got for learning Solr.&lt;br /&gt;&lt;br /&gt;First of all, when learning a new technology I prefer paper books over internet research. Though there are other books available,&lt;a href="https://www.packtpub.com/solr-1-4-enterprise-search-server/book"&gt; Solr 1.4 Enterprise Search Server by David Smiley and Eric Pugh&lt;/a&gt; seems to be the one that is most often recommended.&lt;br /&gt;&lt;br /&gt;The book starts off with a high level introduction into what Solr and Lucene are, some first examples and interestingly, how to build Solr from source. Though the book was released before Solr 1.4 the authors seemed to have the foresight that some features might still be lacking and had to be included manually. In fact, I've never seen an open source project where applying patches is such a common thing as it seems to be the case for Solr.&lt;br /&gt;&lt;br /&gt;Schema configuration and text analysis are the topics for the second chapter. It begins with an introduction into &lt;a href="http://musicbrainz.org/"&gt;MusicBrainz&lt;/a&gt;, a freely available data set of music data is used as an example throughout the book. This chapter is crucial to the understanding of Solr as it introduces a lot of Lucene concepts that probably not every reader is familiar with.&lt;br /&gt;&lt;br /&gt;After quite some theory chapter 3 starts with the practical parts, covering the indexing process. Curl, the command line http client, is used to send data to solr and retrieve it. Another option, the data import handler, that directly imports data from a database, is also introduced.&lt;br /&gt;&lt;br /&gt;Chapter 4 to 6 walk the reader through the search process and several useful components to enhance the users search experience like faceting and the dismax request handler. This is the part where Solr really shines as you can see how easy it is to integrate new features in your application that probably would have taken a long time to develop using plain Lucene.&lt;br /&gt;&lt;br /&gt;Deploying Solr is covered in Chapter 7 with quite some useful information on configuring and monitoring a Solr instance. Chapter 8 looks at some client APIs from different programmin languages, SolrJ being the most important to me. The book ends with an in-depth look at how Solr can be tunded and scaled.&lt;br /&gt;&lt;br /&gt;I can say that this is a really excellent book, as an introduction to Solr as well as a reference while developing your application. The most common use cases are covered, the examples make it really easy to adopt the concepts in your application. There are lots of hands on information that prove useful during development and deployment of your application.&lt;br /&gt;&lt;br /&gt;Some slight drawbacks I don't want to keep to myself: As the common message format for Solr is a custom XML dialect, there is a lot of XML in the book to digest. As it's so common to use it that's not necessarily a bad thing but you might get quite dizzy looking at a lot of angle brackets. From a readers perspective some variety would have been nice e.g. by mixing XML with the Ruby format or JSON or introducing client APIs earlier. Also, while it's a good idea to use a data set that is freely available, MusicBrainz probably isn't the best format for demoing some features. There are no large text sections or documents, which are often what a search application will be build on. And finally, not really an issue of the authors but rather of the publisher, PacktPub: When skimming through the book it's quite hard to see when a new section begins. The headlines do not contain a numbering scheme and are of a very similar size.&lt;br /&gt;&lt;br /&gt;Nevertheless, if you have to develop an application using Solr, you should by all means buy this book, you won't regret it.</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/5468042992969890140/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2011/02/book-review-solr-14-enterprise-search.html#comment-form" title="1 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/5468042992969890140?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/5468042992969890140?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/1mLde0ExQBA/book-review-solr-14-enterprise-search.html" title="Book Review: Solr 1.4 Enterprise Search Server" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2011/02/book-review-solr-14-enterprise-search.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU4ARnw4fCp7ImA9Wx9WF0o.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-7749486079180236047</id><published>2011-01-22T17:55:00.000-08:00</published><updated>2011-01-23T01:12:27.234-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-01-23T01:12:27.234-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Test" /><category scheme="http://www.blogger.com/atom/ns#" term="Ruby on Rails" /><category scheme="http://www.blogger.com/atom/ns#" term="Netbeans" /><title>Running Ruby on Rails Tests in Netbeans</title><content type="html">I don't get it. Netbeans is often recommended as an excellent IDE for Ruby on Rails development, not only when targeting the JVM. Nevertheless, even some basic features don't seem to be working with the default setup. You can't even run the tests, which is fundamental to developing using a dynamic language.&lt;br /&gt;&lt;br /&gt;What's happening? Suppose you have a simple app and you want to run some tests using the test database. Not sure if this is mandatory when using the built in JRuby but it &lt;a href="http://netbeans.org/kb/docs/ruby/rapid-ruby-weblog.html"&gt;seems to be normal&lt;/a&gt; to use the jdbcmysql adapter. When you try to run the tests you will see something like this:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;1) Error:&lt;br /&gt;test_index_is_ok(ContactsControllerTest):&lt;br /&gt;ActiveRecord::StatementInvalid: ActiveRecord::JDBCError: Table 'kontakt_test.contacts' doesn't exist: DELETE FROM `contacts`&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;followed by the stack trace that isn't really helpful as it's not the root cause. Rails somehow doesn't create the tables in the test database. You'll see a more helpful output when starting the rake task "db:test:prepare" directly in debug mode:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;** Invoke db:test:prepare (first_time)&lt;br /&gt;** Invoke db:abort_if_pending_migrations (first_time)&lt;br /&gt;** Invoke environment (first_time)&lt;br /&gt;** Execute environment&lt;br /&gt;** Execute db:abort_if_pending_migrations&lt;br /&gt;rake aborted!&lt;br /&gt;Task not supported by 'jdbcmysql'&lt;br /&gt;/path/to/netbeans-6.9.1/ruby/jruby-1.5.1/lib/ruby/gems/1.8/gems/rails-2.3.8/lib/tasks/databases.rake:380&lt;br /&gt;/path/to/netbeans/ruby/jruby-1.5.1/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:636:in `call'&lt;br /&gt;/path/to/netbeans/ruby/jruby-1.5.1/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:636:in `execute'&lt;br /&gt;/path/to/netbeans/ruby/jruby-1.5.1/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:631:in `each'&lt;br /&gt;[...]&lt;br /&gt;** Execute db:test:prepare&lt;br /&gt;** Invoke db:test:load (first_time)&lt;br /&gt;** Invoke db:test:purge (first_time)&lt;br /&gt;** Invoke environment &lt;br /&gt;** Execute db:test:purge&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The task fails in the database task in the rails lib. You can open up the source code by opening the node &lt;code&gt;Libraries/Built-in-JRuby/rails-2.3.8/lib/tasks/databases.rake&lt;/code&gt; in Netbeans.&lt;br /&gt;&lt;br /&gt;At line 357 you can see the problem: Rails only expexts some hardcoded adapters, jdbcmysql not being one of them. It skips the task for unknown adapters. Two options to fix it: Insert a regular expression that matches both:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;when /mysql/ # instead of when "mysql"&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;or add the jdbcmysql adapter as a second option:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;when "mysql","jdbcmysql"&lt;br /&gt;&lt;/pre&gt; &lt;br /&gt;Now the tests are running and hopefully passing. The same kind of error might occur for other tasks as well as there are some more checks for the mysql adapter in this file. You should be able to fix them the same way.&lt;br /&gt;&lt;br /&gt;I wouldn't have expected to have to patch the rails code for using it in Netbeans but this doesn't seem to be &lt;a href="http://markmail.org/thread/w6iwdrtwq32oktir#query:+page:1+mid:wjysi5gab76dtdas+state:results"&gt;uncommon&lt;/a&gt;. &lt;a href="http://blog.nicksieger.com/articles/2009/10/12/fresh-0-9-2-activerecord-jdbc-adapter-release"&gt;Using a recent active record version&lt;/a&gt; is supposed to fix the problem as you can use mysql as an adapter name then but I didn't find a way to run the jdbc generator from Netbeans. It isn't available in the list of generators and I didn't find a generator gem to download.&lt;br /&gt;&lt;br /&gt;What's to be learned for me from this? I got a better understanding of how the build process  works using rake. But more importantly: even technologies that have been hyped for a long time might not be that flawless as you would expect.</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/7749486079180236047/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2011/01/running-ruby-on-rails-tests-in-netbeans.html#comment-form" title="0 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/7749486079180236047?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/7749486079180236047?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/Z24Ueo3O9yY/running-ruby-on-rails-tests-in-netbeans.html" title="Running Ruby on Rails Tests in Netbeans" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2011/01/running-ruby-on-rails-tests-in-netbeans.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0QBQn0_fyp7ImA9WhVUEUo.&quot;"><id>tag:blogger.com,1999:blog-4583666011870993475.post-8103899511419123298</id><published>2011-01-09T04:49:00.000-08:00</published><updated>2012-05-16T06:35:53.347-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-05-16T06:35:53.347-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Git" /><title>Refactoring in Git</title><content type="html">&lt;p&gt;To me, when using SVN, the most important reason for using an IDE plugin was the refactoring support: SVN doesn't notice when you rename a file, you have to explicitly call svn mv.&lt;/p&gt;

&lt;p&gt;I thought this would be a major problem with Git, as a Java refactoring changes the content and the filename in one go. As the content changes the SHA1-checksum also changes and you'd run into problems. Fortunately, that's not the case.&lt;/p&gt;

&lt;p&gt;With Git, you don't need a special operation: It detects renames with minor changes automatically.&lt;/p&gt;

&lt;p&gt;Time for a test. Suppose you have a simple Java class like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public class TestClass {&lt;br /&gt;&lt;br /&gt;    public static void main(String [] args) {&lt;br /&gt;        System.out.println("Hello Git");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Commit it to the Git repository: &lt;/p&gt;

&lt;pre&gt;&lt;code&gt;flo@hank:~/git-netbeans$ git add src/TestClass.java&lt;br /&gt;flo@hank:~/git-netbeans$ git commit -m "added test class"&lt;br /&gt;[master 9269c2f] added test class&lt;br /&gt; 1 files changed, 7 insertions(+), 0 deletions(-)&lt;br /&gt; create mode 100644 src/TestClass.java&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Rename the class (either by using an IDE or by executing a manual refactoring by changing the file name and the class name):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public class TestClassWithNewName {&lt;br /&gt;&lt;br /&gt;    public static void main(String [] args) {&lt;br /&gt;        System.out.println("Hello Git");&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;git status&lt;/code&gt; will tell you something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;flo@hank:~/git-netbeans$ git status&lt;br /&gt;# On branch master&lt;br /&gt;# Changed but not updated:&lt;br /&gt;#   (use "git add/rm &lt;file&gt;..." to update what will be committed)&lt;br /&gt;#   (use "git checkout -- &lt;file&gt;..." to discard changes in working directory)&lt;br /&gt;#&lt;br /&gt;#       deleted:    src/TestClass.java&lt;br /&gt;#&lt;br /&gt;# Untracked files:&lt;br /&gt;#   (use "git add &lt;file&gt;..." to include in what will be committed)&lt;br /&gt;#&lt;br /&gt;#       src/TestClassWithNewName.java&lt;br /&gt;no changes added to commit (use "git add" and/or "git commit -a")&lt;/pre&gt;&lt;/code&gt;

&lt;p&gt;Doesn't look that good yet. It detects an added and a removed file. Next, stage the changes and have another look at the status:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;flo@hank:~/git-netbeans$ git rm src/TestClass.java&lt;br /&gt;rm 'src/TestClass.java'&lt;br /&gt;flo@hank:~/git-netbeans$ git add src/TestClassWithNewName.java&lt;br /&gt;flo@hank:~/git-netbeans$ git status&lt;br /&gt;# On branch master&lt;br /&gt;# Changes to be committed:&lt;br /&gt;#   (use "git reset HEAD &lt;file&gt;..." to unstage)&lt;br /&gt;#&lt;br /&gt;#       renamed:    src/TestClass.java -&gt; src/TestClassWithNewName.java&lt;br /&gt;#&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Neat, Git detected a rename. Let's commit and see the log:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;flo@hank:~/git-netbeans$ git commit -m "refactored class"&lt;br /&gt;[master 4acd7f1] refactored class&lt;br /&gt; 1 files changed, 1 insertions(+), 1 deletions(-)&lt;br /&gt; rename src/{TestClass.java =&gt; TestClassWithNewName.java} (72%)&lt;br /&gt;flo@hank:~/git-netbeans$ git log src/TestClassWithNewName.java&lt;br /&gt;commit 4acd7f19ccd6cc02816ee7f1293ea5a69d7a4ca7&lt;br /&gt;Author: Florian Hopf &lt;fhopf@web.de&gt;&lt;br /&gt;Date:   Sun Jan 9 14:27:59 2011 +0100&lt;br /&gt;&lt;br /&gt;    refactored class&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Hmmm, only the last commit? Looks like we have to tell that we want to follow renames:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;flo@hank:~/git-netbeans$ git log --follow src/TestClassWithNewName.java&lt;br /&gt;commit 4acd7f19ccd6cc02816ee7f1293ea5a69d7a4ca7&lt;br /&gt;Author: Florian Hopf &lt;fhopf@web.de&gt;&lt;br /&gt;Date:   Sun Jan 9 14:27:59 2011 +0100&lt;br /&gt;&lt;br /&gt;    refactored class&lt;br /&gt;&lt;br /&gt;commit 9269c2fd194b2bd2b93a18ab88f21fb2180c5870&lt;br /&gt;Author: Florian Hopf &lt;fhopf@web.de&gt;&lt;br /&gt;Date:   Sun Jan 9 13:48:35 2011 +0100&lt;br /&gt;&lt;br /&gt;    added test class&lt;/code&gt; &lt;/pre&gt;

&lt;p&gt;What do I take from this experiment? I guess I won't use the &lt;a href="http://nbgit.org/"&gt;Netbeans Git plugin&lt;/a&gt; for now. I still have to get acquainted to the command line and its better to learn the basics first.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.florian-hopf.de/feeds/8103899511419123298/comments/default" title="Kommentare zum Post" /><link rel="replies" type="text/html" href="http://blog.florian-hopf.de/2011/01/refactoring-in-git.html#comment-form" title="1 Kommentare" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/8103899511419123298?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/4583666011870993475/posts/default/8103899511419123298?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/florian-hopf/UjyC/~3/KBvmw0Mj-QM/refactoring-in-git.html" title="Refactoring in Git" /><author><name>Florian Hopf</name><uri>http://www.blogger.com/profile/00629881090876630907</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://blog.florian-hopf.de/2011/01/refactoring-in-git.html</feedburner:origLink></entry></feed>
