<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:series="http://unfoldingneurons.com/" version="2.0">

<channel>
	<title>mgm technology blog</title>
	
	<link>http://blog.mgm-tp.com</link>
	<description>We discuss software innovation</description>
	<lastBuildDate>Fri, 03 Feb 2012 14:42:03 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/MgmTechBlog" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="mgmtechblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">MgmTechBlog</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Ultra-Performant Dynamic Websites with Varnish</title>
		<link>http://blog.mgm-tp.com/2012/01/varnish-web-cache/</link>
		<comments>http://blog.mgm-tp.com/2012/01/varnish-web-cache/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 11:41:25 +0000</pubDate>
		<dc:creator>Dr. Christian Winkler</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[ECommerce]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1059</guid>
		<description><![CDATA[This article describes how we configured and used the Varnish web cache for the popular German online shop www.lidl.de. Varnish gave us a tremendous performance boost. With this new caching setup, we easily achieve request rates of several thousand pages per second, which are quite common during marketing campaigns like special offers.

In a typical non-caching [...]]]></description>
			<content:encoded><![CDATA[<p>This article describes how we configured and used the <a href="https://www.varnish-cache.org/">Varnish</a> web cache for the popular German online shop <a href="http://www.lidl.de/">www.lidl.de</a>. Varnish gave us a tremendous performance boost. With this new caching setup, we easily achieve request rates of several thousand pages per second, which are quite common during marketing campaigns like special offers.</p>
<p><span id="more-1059"></span></p>
<p>In a typical <em>non-caching</em> setup of a web application as illustrated in the figure below, Apache handles static requests for images, scripts, etc. and forwards requests for the HTML pages to an application server like Tomcat or Glassfish. There the dynamic content is generated and then sent back to Apache and finally to the user. In this scenario, the database access is the most critical bottleneck. Even worse, each page request can cause multiple database requests, i.e. SQL statements.</p>
<div id="attachment_1068" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-varnish" href="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Setup-without-Web-Cache.png"><img class="size-full wp-image-1068" title="Setup without Web Cache" src="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Setup-without-Web-Cache.png" alt="" width="480" height="255" /></a><p class="wp-caption-text">Initial setup without our caching solution: the slow components are shown in red. (Load balancing, which could be done by Apache, is not considered here.)</p></div>
<p>Let&#8217;s assume that without caching, an application server can serve up to <span id="statefullperformance">100 dynamic pages per second</span>. Through a bit of vertical scaling, i.e. using two server instances (nodes) and load balancing, this can be increased to about 200 dynamic requests. However, this scalability is not perfect and once it grows to three and more nodes, it already starts to get worse as the sessions have to be distributed among the nodes in the cluster.</p>
<p>The system can of course handle many more simultaneous users than the number 200 suggests, as users do not permanently access links. So the number of users did not really pose a problem during normal operation. However, the situation immediately got critical when newsletters with special offers were sent, as the application server instances were now under &#8220;siege&#8221;. An overload of the instances led to slower and slower responses and decreasing customer satisfaction. Another reason a shop would want to be responsive, is that search engines consider measured response times during crawling for ranking search results.</p>
<p>So the question we had to solve was: <em>How can we keep the system responsive (ideally with a response time of 1-2 seconds) during high load and peak situations?</em> Please be aware that in the case of online shops, the highest turnover occurs in these situations.</p>
<p>When we analyzed the server log of the <a href="http://www.lidl.de/">www.lidl.de</a> online shop, we noticed an interesting fact, which we used to our advantage later on: the behavior of users is different in these situations. Most users are just browsing and reading. Consider e.g. a newsletter sent to a few million users: most of the readers will just click a few links (which can still easily amount to several million page impressions). Taking a deeper look we found out that most users are viewing absolutely and exactly identical content which has nevertheless been produced exclusively for them. Only a small percentage used the interactive services of the website like shopping carts, ordering etc.</p>
<h2>Introducing Varnish</h2>
<p>The peak situation described above implies that most content (even though dynamically generated by the web application) is identical for all users. So the obvious idea for a cache is to store the most frequently requested pages. The <a href="http://www.mediawiki.org/wiki/Manual:Varnish_caching">Varnish manual</a> describes Varnish as a lightweight, efficient <a href="https://www.varnish-cache.org/docs/trunk/tutorial/advanced_backend_servers.html">reverse proxy</a> server, meaning it is working in front of the web servers (Apache). It acts as a so-called <em>HTTP accelerator</em> which stores (caches) copies of the pages served by the web server (thus the synonym &#8220;web cache&#8221;). The next time the same page is requested by a user, Varnish will serve the copy instead of requesting the page from the Apache server. Varnish is blazingly fast, since it stores its cached data in memory.</p>
<p>The new architecture with Varnish as a web cache now looks like this:</p>
<div id="attachment_1069" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-varnish" href="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Setup-with-Varnish-as-Web-Cache.png"><img class="size-full wp-image-1069" title="Setup with Varnish as Web Cache" src="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Setup-with-Varnish-as-Web-Cache.png" alt="" width="480" height="367" /></a><p class="wp-caption-text">Varnish in front of Apache acting as a Web cache. It is configured to cache only stateless page requests. Stateful page requests (session) and static resources are forwarded to Apache.</p></div>
<h2>Performance Improvements</h2>
<p>Caching with Varnish removes the need for the web application to regenerate the same page over and over again, resulting in a tremendous performance boost. Varnish can easily handle 10,000 requests/s on a single node. Especially in high load situations the hit rate is easily above 90% (and almost 100% for the mostly clicked homepage) so that the setup described above can now handle 50 times the original volume. However, this high performance will only hold for <em>stateless</em> users. Any user with a session will fall back to the 100 requests/s class.</p>
<p>As most of the load is now taken by the Varnish cache servers, the load on the application servers has dropped considerably. Even in high load situations where the Varnish servers handle several thousand requests per second, most of the content comes from the cache and the application servers can concentrate on re-creating expired content (which is then kept in the cache for s-maxage seconds) and handling users with a session (who are hopefully going to order).</p>
<p>Our setup leads to a significantly improved end-to-end performance of the system &ndash; even during normal operation. This is interesting as it creates an advantage for users during normal operation and saves money for the website owner at the same time.</p>
<p>Using less hardware means investing less money initially. However, an even more important fact is, that the operating costs will also be much lower. These operating costs are caused by permanent maintenance of the system, like powering servers around the clock, updating, applying patches etc. Since these costs are the main drivers for the total cost of ownership (TCO), the potential savings are also largest in this regime.</p>
<p>Using fewer servers also means consuming less power. By reducing the energy bill this &#8220;green IT&#8221; approach therefore leads to lower operating costs. Compared to extending the existing system without a cache, an enormous amount of money was saved both in hardware and operating costs, while introducing a &#8220;performance buffer&#8221; for situations with even higher loads at the same time.</p>
<p>Another effect is that the shop&#8217;s marketing division can now act freely without having to keep technical constraints in mind: new campaigns can be planned to increase the turnover significantly, like sending more frequent newsletters, using special offers etc.</p>
<h2>Challenges</h2>
<p>Before we dive into the details of our Varnish configuration, let&#8217;s first discuss three problems we had to solve, specifically handling stateful users, keeping users stateless w.r.t. caching as long as possible, and caching pages with changing content.</p>
<h3>Problem: Websites are Stateful</h3>
<p>Most websites nowadays are stateful, e.g. a server-side session is created when a user logs in. In case of an online shop, the session might contain the shopping cart, login information etc.</p>
<p>The problem is that as soon as the session contains personalized information, caching must immediately stop. But, as long as state information does not have an effect on the content of generated pages, it can be ignored. This is what we call a <em>stateless or browsing user</em>, and our first objective should be to cache pages suitable for this user class.</p>
<p>Thus, our <strong>solution is to classify users</strong>, i.e. to carefully distinguish between stateless and stateful users. As the web application did not originally take care of that, it had to be changed in two fundamental ways:</p>
<ol>
<li>The application must only generate and send cookies if it has created some internal state for a user.</li>
<li>This state transition can happen at any time. So a user who has not even touched the application server and is completely unknown to the application must be able to become a stateful user at any time.</li>
</ol>
<div id="attachment_1070" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-varnish" href="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Stateless-vs-Stateful-Users.png"><img class="size-full wp-image-1070" title="Stateless vs Stateful Users" src="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Stateless-vs-Stateful-Users.png" alt="" width="480" height="337" /></a><p class="wp-caption-text">Two classes of users are distinguished by certain attributes. A user should stay stateless as long as possible. Stateful (red) users will need contact to the application server and experience slower performance.</p></div>
<p>Fortunately, the web application was already obeying the <a href="http://en.wikipedia.org/wiki/Representational_state_transfer">REST</a> paradigm. HTTP GET requests were used for all content that was just shown to the users. In contrast to this, all user actions which were actually creating some state on the server side were already modeled in HTTP POST requests. This proved to be extremely helpful when we started to configure the cache software.</p>
<h3>Keeping Users Stateless</h3>
<p>The general goal must be to keep users stateless, at least as long as possible. In a first naive approach, only this facilitates caching.</p>
<p>Keeping users stateless means that the server should never send a session cookie unless really necessary. On the other hand, a lot of web applications require some basic personalization. This dilemma can be solved by using cookies which will be evaluated on the client side only. For example, let&#8217;s assume that users can change the background color of the website as a very simple form of personalization. This can be performed by Javascript and, for the sake of caching (and achieving a high hit rate), this should be the preferred way of doing simple personalization. Of course, a server-side cookie for personalized background color could be used to get the same result. But the cache hit rate would then suffer considerably (to be exact, by a factor  identical to the number of background colors, since exactly the same amount of cached copies has to be saved).</p>
<p>So one <strong>recipe for staying stateless is to keep simple state on the client-side</strong> and never send it to the server. This state does not necessarily have to reside in a cookie &ndash; you can also use the browser local storage for that, as described in Smashing Magazin&#8217;s <a href="http://coding.smashingmagazine.com/2010/10/11/local-storage-and-how-to-use-it/">&#8220;Using Local Storage In HTML5-Capable Browsers&#8221;</a> article.</p>
<h3>Dealing with Content that is Changing</h3>
<p>Even if now all stateless users can see the same cached content, this content is changing over time. In an online shop, for example, some products might run out of stock and become unavailable or need to be replaced by other products. Unfortunately, this does not only affect the product pages themselves but sometimes also pages that reference them; e.g. links and thumbnail images will have to be changed or removed. Similar situations often occur in online publishing and in nearly all websites which change over time.</p>
<p>Thus, another requirement for the cache is its ability to <strong>partially</strong> <strong>expire content</strong>. And of course, the bookkeeping must be performed externally so that the affected pages can be removed individually.</p>
<p>For the cache to work properly and perform <strong>automatic expiration</strong> of content, it needs to know how long the currently cached content should be kept (i.e. its maximal age). The web application therefore has to generate this so-called time-to-live (TTL) information.</p>
<p>The HTTP specification has defined HTTP response header fields such as <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9"><code>Cache-Control</code></a> for exactly this purpose a long time ago. These are set by the web application itself, since it knows best how long the content will be considered &#8220;current&#8221;/&#8221;valid&#8221;. This setting could even be dynamic , e.g. giving a shorter time-to-live to a product page if stock is low. The Cache-Control directive most suitable for this purpose is <code>s-maxage</code> as it specifies the maximum age of the object in seconds that the response is allowed to be kept in the web cache.</p>
<h3>Determining Cacheable Candidates</h3>
<p>Not all content can be or even should be cached. Caching on completely static websites is easier by far, however, these tend to be very unattractive, could be pre-generated and then moved to the web server. As the cache will sit in front of the web server, all requests will go to the cache first. It does not make much sense to store pages in the cache which are kept statically in the web server&#8217;s file system anyway.</p>
<p>On the other hand, only GET URLs can be candidates for caching. As a POST request transmits information from the browser to the server, it cannot be cached and must always be handled by the application server. This might sound like a big constraint at first but is actually a feature that can be nicely utilized: all URLs which are candidates for performing the state transition of a user from stateless to stateful will be POST requests. And consequently, the application itself can decide whether the POST requests actually qualify for making a user stateful or whether s/he can remain stateless, for example when a wrong login/password combination is entered.</p>
<h2>Anatomy of Varnish&#8217;s Request Processing</h2>
<p>Varnish distinguishes three stages when processing a request:</p>
<ul>
<li>The request is received from the browser (<code>vcl_recv</code>).<br />
At this stage, Varnish calls the subroutine <code>vcl_recv</code> in the configuration file (VCL). Here, the request header can be manipulated e.g. by removing cookies. It can be decided whether the content should be looked up in the cache or be propagated to the backend server.</li>
<li>The response is received from the backend (<code>vcl_fetch</code>).<br />
This function is only executed when the content is not delivered from the cache. In this phase, response headers from the backend can be modified (either for delivery or for saving in the cache). The request attributes are also still available and can be used for manipulating several settings.</li>
<li>The response is sent to the browser (<code>vcl_deliver</code>).<br />
This stage is passed by all requests and can be used to add headers (like TTL), change cookies etc. The request parameters are available for reading.</li>
</ul>
<div id="attachment_1063" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-varnish" href="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Varnish-Request-Processing.png"><img class="size-large wp-image-1063" title="Varnish Request Processing" src="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Varnish-Request-Processing-480x310.png" alt="" width="480" height="310" /></a><p class="wp-caption-text">Different stages of Varnish&#39;s request processing. Everything related to the cache is in red, i.e. all cacheable content is looked up in the cache and possibly delivered; if it&#39;s not in the cache, the web server will be asked via vcl_fetch.</p></div>
<p>Varnish defines additional subroutines which also hook into the Varnish workflow, but they are not as important. See also the <a href="https://www.varnish-cache.org/docs/trunk/tutorial/vcl.html">VCL tutorial</a> and the <a href="https://www.varnish-cache.org/docs/trunk/reference/vcl.html">VCL reference</a>.</p>
<h3 id="vcl">A Sample Varnish Configuration (VCL)</h3>
<p>This section contains a simple Varnish configuration that provides caching as required. The challenge is to keep the user stateless as long as possible. In order to achieve this, a simple trick is used: if a request does not contain a <code>JSESSIONID</code> cookie, it is a stateless request and even if the (uneducated) backend wants to set a cookie, it will be removed. Only POST requests will set necessary cookies. Manipulating the TTL compliments the configuration. A lot of logging is used in the example; this is not just for illustrative purposes but also practical for debugging and optimizing the configuration.</p>
<pre class="brush: plain; wrap-lines: false;">
import std;

backend default {
    .host = &quot;localhost&quot;;  # Varnish is running on same server as Apache
    .port = &quot;80&quot;;
}

sub vcl_recv {
  # remove unnecessary cookies
  if (req.http.cookie ~ &quot;JSESSIONID&quot;) {
    std.log(&quot;found jsessionid in request, passing to backend server&quot;);
    return (pass);
  } else {
    unset req.http.cookie;
  }
}

sub vcl_fetch {
  if (req.http.cookie ~ &quot;JSESSIONID&quot; || req.request == &quot;POST&quot;) {
    std.log(&quot;not removing cookie/passing POST, url &quot; + req.url);
    return (pass);
  } else {
    # remove all other cookies and prevent backend from setting any
    std.log(&quot;removing cookie in url &quot; + req.url);
    unset beresp.http.set-cookie;
    set beresp.ttl = 600s;
  }
}

sub vcl_deliver {
  # send some handy statistics back, useful for checking cache
  if (obj.hits &gt; 0) {
    set resp.http.X-Cache-Action = &quot;HIT&quot;;
    set resp.http.X-Cache-Hits = obj.hits;
  } else {
    set resp.http.X-Cache-Action = &quot;MISS&quot;;
  }
}
</pre>
<p>Notice the C-like syntax in the Varnish configuration. This is no accident; in fact, the whole configuration code is compiled to a binary shared object at startup and when reloading the script to optimize for performance. As the subroutines in this configuration are called for each request, this helps immensely in creating a fast cache server. Moreover, it is possible to add C code directly to the configuration.</p>
<p>It might seem strange at first to define the configuration in a procedural language, but it proved to be extremely valuable as it enables us to be flexible and to formulate how exactly to handle the requests. Overall, this leads to a much more readable configuration than a declarative approach.</p>
<p>Notice the different &#8220;top level&#8221; objects in the configuration file:</p>
<ul>
<li><code>req</code> is the request (i.e. the URL including all headers) coming from the browser,</li>
<li><code>resp</code> is the response before it is sent to the client, i.e. when it can still be manipulated.</li>
<li><code>beresp</code>: The response which Varnish gets from the backend (if the object is not cacheable or not cached) is also available as <code>beresp</code> and can be evaluated.</li>
</ul>
<p>On a side note, Varnish can use ACLs to restrict the access to certain resources. The same ACLs can also be used to (declaratively) tell Varnish what to cache and what not. This technique is the sometimes used <a href="https://www.varnish-cache.org/docs/trunk/tutorial/purging.html#bans">&#8220;banning&#8221;</a>. Varnish can also (atomically) delete certain elements from the cache. This is accomplished via a <a href="https://www.varnish-cache.org/docs/trunk/tutorial/purging.html#http-purges">&#8220;purge&#8221; command</a> through the HTTP interface and should be restricted to IP addresses (which is the standard configuration together with a secret).</p>
<h2>Configuration Details and Tips</h2>
<p>Now that we have seen the basic VCL file and understood how a request is usually processed, let&#8217;s dive in even further and discuss the details and lessons learned.</p>
<h3>Improving the Hit rate with Header Normalization</h3>
<p>Varnish has to be told which HTTP request header fields it should use as a cache index. The index is organized as a hash, thus these selected header fields are often referred to as the <em>hash key</em>.</p>
<p>On a side note, you can select the header fields to be used as a hash by implementing the subroutine <code>vcl_hash</code>. If you don&#8217;t implement it, Varnish uses the full URL plus the <code>Host</code> request header field by default. In addition to the hash key computed in <code>vcl_hash</code>, the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html"><code>Vary</code></a> header field is always automatically added to the hash key . For further information on the hash key, see <a href="https://www.varnish-cache.org/docs/trunk/phk/varnish_does_not_hash.html">&#8220;What Varnish Does&#8221;</a> and <a href="http://stackoverflow.com/questions/6098914/varnish-and-http-header">&#8220;Varnish and http header&#8221; on Stackoverflow</a>.</p>
<p>To improve the cache hit-rate, it is crucial that you clean up the request header fields used for the hash key. Cleaning up means to change them to a common denominator (so-called <em>header normalization</em>). Another very good candidate is of course the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23"><code>Host</code> header field</a>, where a normalized version (like &#8220;www.sitename.com&#8221;) should be used even if &#8220;sitename.com&#8221; is sent in the request header. In addition to that, removing unnecessary headers is always a good idea.</p>
<p>Be careful that the application server does not send a <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44"><code>Vary</code> header field</a> for the user agent as this effectively means that there has to be a distinct copy for each user agent. There are so many different browsers (http://panopticlick.eff.org/) that this will basically make caching useless. See also <a href="http://mark.koli.ch/2010/09/understanding-the-http-vary-header-and-caching-proxies-squid-etc.html">&#8220;Understanding the HTTP Vary Header and Caching Proxies (Squid, etc.)&#8221;</a> and the<br />
<a href="https://www.varnish-cache.org/docs/trunk/tutorial/vary.html#tutorial-vary">Varnish Documentation on Vary</a>.</p>
<h3>Compression</h3>
<p>The <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3"><code>Accept-Encoding</code></a> request header field plays an important role: it can have different values like &#8220;plain&#8221;, &#8220;gzip&#8221; or &#8220;deflate&#8221;. Unfortunately, Internet Explorer prefers the deflate encoding while all other browsers favor gzip. Without intervention, this leads to different copies of the same content in the cache, one in deflate format, the other in gzip format.</p>
<p>Since the request header can be modified on the fly in the <code>vcl_recv<br />
</code> subroutine, we can effectively control that only one variant of the content is cached. In your VCL you can modify the request header field and use gzip exclusively if it is available (which is true for both Internet Explorer and others). This technique is presented in detail in the article <a href="https://www.varnish-cache.org/trac/wiki/VCLExampleNormalizeAcceptEncoding">&#8220;Normalize Accept-Encoding header&#8221;</a>. Since both browser families have a market share of roughly 50%, this simple change effectively doubles the hit rate.</p>
<p>Please note that beginning with Varnish 3.0, Varnish <a href="https://www.varnish-cache.org/docs/trunk/tutorial/compression.html">supports gzip natively</a> and can modify the <code>Accept-Encoding</code> field by itself, so the measures discussed in the previous paragraph can be skipped.</p>
<h3>Handling Cookies</h3>
<p>Cookies basically fall into different categories:</p>
<ul>
<li>Cookies <em>relevant</em> for caching: These should be kept and their values can be used as part of the hash key for cache index.</li>
<li>Cookies <em>irrelevant</em> for caching: These should be discarded and not considered by the cache.</li>
<li>Cookies <em>partially relevant</em> for caching: These should be modified and the irrelevant parts should be removed. The remaining cookie should then be used as part of the hash key for the cache index.</li>
<li>Session cookies: These cookies must be treated differently as they basically make caching impossible. If such cookies are detected, Varnish should not cache anything but work as a proxy only sending data from the backend server directly to the client.</li>
</ul>
<h3>Consistent Values for TTL and the Expires Field</h3>
<p>Varnish has to decide whether and how long to keep elements in the cache. As we have already learned, the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9"><code>Cache-Control</code></a> header field is utilized here. More specifically, the <code>s-maxage</code> directive part (or <code>maxage</code> as a fallback if <code>s-maxage</code> is not present) is examined to determine the specified maximum lifetime of a cacheable object. Of course, this only works as long as the cache is not full; in the latter case the <a href="http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used">LRU algorithm</a> is used.</p>
<p>If the web application was not designed with a web cache in mind, it might have conflicting values in <code>s-maxage</code> and the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21"><code>Expires</code></a> response header field. (See the HTTP specification for a <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3">discussion of the Expires versus the maxage field</a>.) This might lead to the bizarre situation that the cached content is sent by Varnish with an <code>Expires</code> header field value that lies in the past if <code>s-maxage</code> has a larger value than <code>Expires</code>.</p>
<p>This weird behavior can be fixed in several ways, e.g. by statically setting the Expires header in Varnish for each request to <code>s-maxage</code> seconds into the future during &#8220;vcl_fetch&#8221;. This will increase the cache efficiency <em>on the browser side</em> and lead to a more responsive website.</p>
<h3>File Descriptors</h3>
<p>In our first tests the solution performed well, but not excellently. But even more critical were the many dropped connections, i.e. requests from browsers that did not even reach Varnish.</p>
<p>The reason and the fix were easy &ndash; the number of file descriptors had to be increased. This is even more important in real-life situation where connections tend to be slow, as each TCP connection consumes one file descriptor. It does not hurt to allow 32768 descriptors for Varnish.</p>
<h3>Monitoring</h3>
<p>If you have setup a web cache solution with Varnish, it is important to measure its performance and especially monitor the hit rate of the cache. This turns out to be a bit complicated since the Varnish log files are not written to disk for performance reasons; instead of this, Varnish logs to a circular buffer residing in a shared memory segment. The circular buffer can be read at any time but past values will vanish forever. Since we wanted a monitoring solution that would also allow us to perform a post-mortem analysis in case of a problem, we configured the logging to write the <a href="https://www.varnish-cache.org/docs/trunk/reference/varnishd.html#storage-types">circular buffer to a persistent file</a>.</p>
<p>The most relevant tools for monitoring Varnish are:</p>
<ul>
<li><a href="https://www.varnish-cache.org/docs/trunk/reference/varnishlog.html">varnishlog</a>: This shows current requests from the logging ring buffer. Usually request phases will be shown in chronological order which mixes up the requests themselves. This can be fixed by using appropriate options though.</li>
<li><a href="https://www.varnish-cache.org/docs/trunk/reference/varnishtop.html">varnishtop</a>: This shows the CPU distribution inside the varnish process and can be used to optimize the configuration if too much time is spent in only a few functions.</li>
<li><a href="https://www.varnish-cache.org/docs/trunk/reference/varnishhist.html">varnishhist</a>: This is easily the most intuitive and graphical tool for analyzing Varnish. It shows a (text) histogram of the response time distribution and thus gives a good overview how the whole system is performing.</li>
<li><a href="https://www.varnish-cache.org/docs/trunk/reference/varnishstat.html">varnishstat</a>: This shows important statistical information about hit rates, total cache hits, accepted connections from clients etc.</li>
</ul>
<h2>Why Varnish is the best Caching Solution (for us)</h2>
<p>When we began to investigate ways to speed up the <a href="http://www.lidl.de">www.lidl.de</a> site, our first choice was to add the <a href="http://httpd.apache.org/docs/2.1/mod/mod_cache.html">Apache mod_cache</a> caching module to the Apache web server already in use. The first hurdle was the declarative configuration; it is well-suited for a web server but not perfect for modeling a caching behavior. After some fiddling around, it was working smoothly. But more serious problems arose from the fact that certain cookies had to be considered and others had to be neglected. It was impossible to find a viable solution, so the cookie was filtered out by the load balancer. Cache invalidation is performed lazily in Apache, i.e. an outdated resource is removed from the cache only after it is requested. Consequently, outdated resources which are not requested will stay in the cache forever and can only be expired externally. As all cached components are distributed in single files, this expiry is slow and the whole process complicated. For our situation, Apache was not a good solution (although it was in use for quite some time) and hit rates were also rather disappointing.</p>
<p>So our search continued. Via dedicated proxy servers, which are more suitable for large client-side installations like <a href="http://www.squid-cache.org/">Squid</a>, we finally encountered Varnish, an HTTP accelerator specially built for caching purposes on the server-side. Varnish is already used by many big websites like <a href="http://www.facebook.com/">Facebook</a>, <a href="http://search.twitter.com">Twitter (Search)</a>, <a href="http://www.hulu.com/">Hulu</a>.</p>
<p>Varnish is very flexible as it offers procedural configuration of all request stages in a C-like language (which is actually translated to C and compiled at start time to be as efficient as possible). This enables creative cookie handling and all kinds of other tricks which are usually needed in such a scenario. Varnish was specially designed to run on servers with a VM subsystem, so all cached objects live in a single memory-mapped file and can be accessed extremely fast. Varnish handles expiry automatically and correctly and is even much faster than Apache. So the decision was made to go with Varnish.</p>
<p>Other <em>HTTP accelerators</em> were also considered, but proved to be not feasible, like <a href="http://www.oracle.com/technetwork/middleware/ias/index-089317.html">Oracle Web Cache</a>, a commercial software package from Oracle Inc.; the problem here is that the cache cannot grow easily, and that the manipulation of requests and responses is limited. A hardware-based solution is e.g. <a href="http://www.f5.com/solutions/acceleration/web-acceleration/">F5&#8217;s BIG-IP WebAccelerator</a>.</p>
<h2>Further Optimizations</h2>
<p>Below is a discussion of measures that build on a Varnish setup and would speed-up the page delivery even further.</p>
<h3>Using a CDN to increase Scale, Reach &amp; Performance</h3>
<p>CDNs take care of delivering the static content while the dynamic content is served via the usual stack. They work in an inherently distributed way and have clever algorithms to select the topologically nearest server for each user. Static and dynamic content can be separated by using virtual webservers with different hostnames. The PDF article <a href="http://www.akamai.com/dl/technical_publications/GloballyDistributedContentDelivery.pdf">&#8220;Globally distributed content delivery&#8221;</a> from Akamai provides an excellent introduction.</p>
<p>Most CDNs offer an API for invalidating all or partial content and respect the expires header field sent from the originating servers. So the Varnish server can work as a central content repository and will be the upstream server for refreshing the CDN.</p>
<p>Almost all traffic would then be served by the CDN. This saves a lot of bandwidth on the Varnish server and the Gigabit interface will not so easily be overloaded. Moreover, as traffic costs in the CDN are negligible, money can be saved as the hosting company does not have to increase its own upstream link. For more information on how to build a CDN see <a href="http://blog.unixy.net/2010/07/how-to-build-your-own-cdn-using-bind-geoip-nginx-and-varnish/">&#8220;How to build your own CDN using BIND, GeoIP, Nginx, and Varnish&#8221;</a>.</p>
<h3>ESI: Caching Page Fragments with diverse TTL</h3>
<p>From a technical point of view, only pages which are requested by the GET method can be cached at all. This is due to the fact that &ndash; by definition &ndash; POST requests change state on the server which then necessarily needs to reach the application server.</p>
<p>However, the solution described above performs less &#8220;aggressive&#8221; caching since it just stops caching as soon as a session cookie is present. The effect is that stateful users never get cached pages and therefore might have to wait longer for the page to render completely. On the other hand, it does not make sense to cache pages for individual users since it is quite unlikely that the same user will come back to the exactly same page. Even if the user would come back, it would not be safe to assume that the page is still up-to-date (e.g. since the shopping cart might have changed in the meantime).</p>
<p>To speed things up again, a compromise needs to be found between caching invariant fragments of a page and producing personalized content on the fly for stateful users. Fortunately, Varnish offers the correct arsenal to perform exactly this decomposition by leveraging <a href="http://www.w3.org/TR/esi-lang">Edge Side Include (ESI)</a>.</p>
<p>When <a href="https://www.varnish-cache.org/docs/trunk/tutorial/esi.html">Varnish processes ESI tags</a>, the page assembly (out of fragments) is done by Varnish. As these fragments are separate web resources (requested through GET or POST) they can be assigned their own cache settings and handling information. For example, a cache time-to-live (TTL) of several days could be appropriate for the template, but a fragment containing a frequently-changing story or ad may require a much lower TTL. Some fragments may have to be marked uncacheable.</p>
<p>It must be carefully analyzed how the decomposition of the page might look like as getting it right is essential to achieve a high hit-rate and a low overhead. In case of an online shop, the page could e.g. consist of different (graphical) fragments:</p>
<div id="attachment_1071" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-varnish" href="http://blog.mgm-tp.com/wp-content/uploads/2012/01/ESI-Page-Template-and-Fragements.png"><img class="size-full wp-image-1071" title="ESI Page Template and Fragements" src="http://blog.mgm-tp.com/wp-content/uploads/2012/01/ESI-Page-Template-and-Fragements.png" alt="" width="480" height="365" /></a><p class="wp-caption-text">Decomposition of a typical page into user-specific, dynamic (red) and static (blue) fragments.</p></div>
<p>The shopping cart and the login details would then be transferred directly from the application server via an appropriate ESI fragment, whereas the rest of the page is identical for all users and can be stored in the cache. To minimize the number of requests from Varnish to the application server, both fragments can be transferred in one part and integrated in different locations on the page on the client side or via CSS.</p>
<p>Compared to the performance numbers <a href="#statefullperformance">above</a>, the stateful performance is much higher when using ESI. Rates of about 500 stateful requests per second are now easily possible.</p>
<h3>Memcached: Caching Session-specific Page-Fragments</h3>
<p>If you examine the page diagram above, you might notice that even though the shopping cart and login details are user-specific elements on the page, they are not very dynamic, i.e. they change infrequently.</p>
<p>This leads to an opportunity for further optimization: the user-specific fragments can also be stored, but must of course be associated with the session of the corresponding user. As the information is not persistent (as it becomes invalid with an invalidated session) it can be stored in memory. <a href="http://memcached.org/">Memcached</a> is just made for this scenario and therefore a perfect fit, see e.g. the article <a href="http://blog.preinheimer.com/index.php?/archives/334-Storing-Sessions-in-Memcache-how-everything-behaves.html">&#8220;Storing Sessions in Memcache&#8221;</a>.</p>
<p>Any change in the shopping cart or login details will trigger a regeneration of the HTML fragments which will then be stored in memcached. (This can be done in the same POST request by the application server.) Varnish will include the fragment from Memcached (either via direct integration, via Apache or via Nginx). A SessionListener within Tomcat can take care of removing stale sessions from Memcached.</p>
<p>Memcached is extremely fast. Even for stateful users this leads to a performance of well above 5,000 GET requests/s. POST requests are a different story as they still have to be handled by the application server. As they perform only internal tasks and write both to the database and Memcached, a rate of 500 requests/s is nonetheless realistic.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2012/01/varnish-web-cache/">Ultra-Performant Dynamic Websites with Varnish</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=20" title="View articles by Dr. Christian Winkler">Dr. Christian Winkler</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2012/01/varnish-web-cache/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Practical Customer Participation in JIRA Workflows - JIRA beyond Bug Tracking, Part 2</title>
		<link>http://blog.mgm-tp.com/2011/12/jira-beyond-bug-tracking-part2/</link>
		<comments>http://blog.mgm-tp.com/2011/12/jira-beyond-bug-tracking-part2/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 15:43:19 +0000</pubDate>
		<dc:creator>Alexander Weiss</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Agile]]></category>
		<category><![CDATA[JIRA]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[QA]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1046</guid>
		<description><![CDATA[This second part of our blog series continues with the topic of direct involvement of customers and gives some practical examples of when, where and how to introduce and include your customer into JIRA. We will then discuss two of mgm&#8217;s proven real-world workflows and use them as case studies about appropriate modes for successful [...]]]></description>
			<content:encoded><![CDATA[<p>This second part of our blog series continues with the topic of direct involvement of customers and gives some practical examples of when, where and how to introduce and include your customer into <a href="http://www.atlassian.com/software/jira/overview">JIRA</a>. We will then discuss two of mgm&#8217;s proven real-world workflows and use them as case studies about appropriate modes for successful customer participation. You will also learn about our recommended ways of keeping the complexity of huge JIRA projects from the customer.</p>
<p><span id="more-1046"></span></p>
<p>Let&#8217;s begin with how JIRA can be utilized during the initial project phases. The main steps here are to prepare a more detailed business modeling and to complete the technical and business concepts. These steps are tightly connected with the compilation of the requirements and the requirement management phase. The requirement analysts of the project and the responsible project manager will interview all the necessary stakeholders to get a complete picture of the required solution that the business modelers and architects need for their work.</p>
<h2>Capturing Requirements as JIRA Tickets</h2>
<p>During this requirement management phase all the collected functional and non-functional requirements will already be stored as JIRA tickets to control their content and impact and to prioritize them with respect to the solution and its implementation order (planning process). And exactly this phase can be used to create a first contact point for customers with JIRA: Involvement in the compilation of new requirements and detailing of already filed items.</p>
<p>But as the customer is not yet very familiar with JIRA in this very early stage, we typically choose to create all the new requirements tickets ourselves instead of the customer. This is not just to unburden the customer: we also want to avoid the additional work of correcting imprecisely formulated requirements. </p>
<p>The descriptions of requirement tickets should always be <em>unambiguous and complete</em>. Thus, the responsibility to verbalize requirements usually remains with us. But the customer can be involved at any time to contribute details and he can (and should) be an active part during the elaboration phase and deliver his input and expertise through comments to the respective tickets. </p>
<p>Another very important point in requirement management is the used <em>terminology</em>. It is very important to always talk (and to write) the customers&#8217; domain specific language. Use only terms that can be understood by the customer! We find it very helpful to maintain a glossary of all domain specific words and terms together with the customer. </p>
<h2>Customer Involvement in the JIRA Requirement Process</h2>
<p>In addition to detailing the content of requirements and ensuring their correctness, the customer can take over two other important tasks in the requirement process:</p>
<ul>
<li><strong>Assignment:</strong> Once a requirement is elaborated, the effort estimated and it is ready for realization, it has to be assigned to the supplier (us) for release planning and implementation.</li>
<li><strong>Approval:</strong> When the requirement is implemented and approved by the development team and our internal quality assurance, the realized requirement is ready for approval by the customer.</li>
</ul>
<p>Both of these steps (assignment and approval) can be realized as workflow steps for the issue type &#8220;Requirement&#8221;. Dependent on the character of the project and customer, mgm runs projects with different levels of integration. </p>
<h2>Proven Workflow Implementations</h2>
<p>Let&#8217;s take a look at two requirement workflows that we designed for our projects, each with a different integration level of the mentioned steps.</p>
<div id="attachment_1047" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-jira-beyond-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-workflow-1.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-workflow-1-480x313.png" alt="" title="blog2-workflow-1" width="480" height="313" class="size-large wp-image-1047" /></a><p class="wp-caption-text">Workflow 'Requirement' with dedicated customer steps for assignment and approval (Variant 1).</p></div>
<p>This first workflow (shown above) contains a dedicated step to alert the customer that the requirement is &#8220;READY TO ASSIGN&#8221;. On his dashboard the customer has a portlet listing all these &#8216;marked&#8217; requirements as a working queue for assignments!</p>
<p>It is not strictly necessary that the customer himself executes the transition &#8220;assign&#8221; in JIRA. We have projects where the requirement assignment is officially sent via mail or e-mail by the customer. In these cases, our requirement manager performs the transition on behalf of the customer. But we also have projects where the customer himself pushes the &#8220;assign&#8221; button in JIRA.</p>
<p>Following the implementation part with the steps &#8220;IN PROGRESS&#8221;, &#8220;RESOLVED&#8221; and &#8220;CODE-REVIEWED&#8221;, the requirement workflow contains the steps &#8220;READY FOR TESTING&#8221; and &#8220;VERIFIED&#8221;. During the software approval stage the customer can use these dedicated steps to manage his testing and approval tasks. Once again, the needed filters are integrated into the customer dashboard. The development team will explicitly hand over all implementations that passed internal quality assurance to the customer. The approval transition &#8220;VERIFY ISSUE&#8221; will then be executed by the customer himself. Usually we convince the customer to do this directly in JIRA. </p>
<p>In this first example, the step &#8220;REVISION&#8221; of the requirement (the elaboration phase) is separated from the step &#8220;ESTIMATE&#8221;, thus the customer will keep the control of ordering the effort estimations. </p>
<p>Now let us consider a second workflow example as depicted below: </p>
<div id="attachment_1048" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-jira-beyond-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-workflow-2.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-workflow-2-480x311.png" alt="" title="blog2-workflow-2" width="480" height="311" class="size-large wp-image-1048" /></a><p class="wp-caption-text">Workflow 'Requirement' with dedicated customer steps for assignment and approval (Variant 2).</p></div>
<p>The interesting parts here are the initial step &#8220;DRAFT&#8221;, the step &#8220;ANALYSIS&#8221; (before &#8220;REVISION&#8221;) and the approval step &#8220;SIGNED OFF&#8221;. The &#8220;DRAFT&#8221; step is especially useful if customers create requirement tickets by themselves. The distinct step &#8220;ANALYSIS&#8221; is an independent elaboration and phrasing phase for the customer (typically when they have a dedicated operations department) whereas the step &#8220;REVISION&#8221; is an elaboration phase for the project team (development). At the end of the whole implementation process the customer can use the step &#8220;SIGNED OFF&#8221; for the approval process.</p>
<h2>Overview: Where and How to Involve the Customer</h2>
<p>Requirement management is an obvious area for direct participation of customers, but more traditional areas like &#8220;bug tracking&#8221; and &#8220;change management&#8221; are also potential candidates for customer involvement.</p>
<p>Below is a collection of areas where we constantly try to convince our customers to participate directly within our established JIRA processes:</p>
<ul>
<li><strong>Requirement management</strong>
<ul>
<li>Input of new requirement tickets </li>
<li>Direct participation in the elaboration phase (optionally with additional workflow steps)</li>
<li>Assignment for realization (workflow steps)</li>
<li>Testing and approval of implemented requirements (workflow steps)</li>
</ul>
</li>
<li><strong>Change management</strong>
<ul>
<li>Input of new change request tickets </li>
<li>Direct participation in the elaboration phase (optionally with own workflow steps)</li>
<li>Assignment for realization (workflow steps)</li>
<li>Testing and approval of implemented change requests (workflow steps)</li>
</ul>
</li>
<li><strong>Bug tracking</strong>
<ul>
<li>Input of new bug tickets </li>
<li>Testing and approval of fixed bugs (workflow steps)</li>
</ul>
</li>
<li><strong>Software approval process</strong>
<ul>
<li>Execution of dedicated testing tickets </li>
<li>Approval of all individual development tickets (requirements, change requests and bugs) (workflow steps)</li>
<li>Issue the final software (or release) acceptance (workflow steps)</li>
</ul>
</li>
</ul>
<p>The &#8220;Change Management&#8221; process has to be aligned with the customers&#8217; organization structure and change process. We experienced that especially change management is in most cases an already well defined process at the customer side. However, for &#8220;Change Management&#8221; we can typically apply the same workflow as for &#8220;Requirements&#8221;.</p>
<p>&#8220;Bug tracking&#8221; nowadays follows standard workflows. But bug tickets are also development tasks, i.e. changes to the product/source code. Thus, we extended the bug workflows with steps representing the approval and quality assurance parts (&#8220;READY FOR TESTING&#8221;, &#8220;VERIFIED&#8221; and &#8220;SIGNED OFF&#8221;) as well. This applies to all issue types leading to development activities where requirements and change requests just represent the controlling/management part and not the realization part, e.g. Requirements, Change Requests, Bugs and dedicated implementation tasks.</p>
<h2>Dedicated JIRA Projects for the Customer and Development</h2>
<p>Sometimes the periodic amount of JIRA tickets (e.g. needed for a software release) exceeds the &#8220;pain&#8221; threshold (typically &gt; 300 per release) and the customer is beginning to loose the project overview and feels lost in the overwhelming amount of requirements, change request, bugs, QA tasks and tasks in general. Our recommendation for these cases is to split it up and create 2 dedicated JIRA projects:</p>
<ul>
<li><strong>Customer facing project:</strong> Used for all operational bugs and incidents (source: customer and end-user), comprises the complete requirement and change management and the software approval process. </li>
<li><strong>Development facing project:</strong> Used for all bugs during the development phase, all implementation tasks derived from requirements and change requests, development internal quality assurance tasks, all tasks that are related to the project in general, etc.</li>
</ul>
<div id="attachment_1049" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-jira-beyond-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-project-splitting.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-project-splitting-480x226.png" alt="" title="blog2-project-splitting" width="480" height="226" class="size-large wp-image-1049" /></a><p class="wp-caption-text">Splitting up a JIRA project into a dedicated JIRA project for the customer and another one for the development.</p></div>
<p>The ticket handling is very easy: Assigned requirements within the customer project are just cloned and moved to the development project. The original and the cloned requirements are then automatically linked by JIRA. In the development project you can then create the appropriate division into implementation tasks needed for your team and component diversity. The status update of the customer&#8217;s source tickets (linked tickets) has to be done manually.</p>
<p>The development project is typically only visible for the development team and not for the customer. When we did this in the past, the customer&#8217;s initial doubts that we just want to hide information from him could always be resolved by simply opening the development project to him and showing him the hundreds of (open) tickets. Normally he would loose interest in this project very quickly because he is not getting any additional benefits out of it. On the contrary, he will be getting rather confused by the amount of information.</p>
<p>We&#8217;ve made really good experiences with the concept of 2 dedicated JIRA projects for the customer and the development, respectively. But there is a <em>second way</em> to remove redundant information from an overstrained customer. You can use JIRA&#8217;s security level concept. This way you can keep all tickets in one JIRA project. But you will then have to cope with the maintenance of security settings at ticket level due to the fact that security levels have to be set manually for each required ticket. To set a default security level is counterproductive because then every customer created ticket will be automatically hidden from the customer directly after creation.</p>
<h2>Conclusion</h2>
<p>In our experience the advantages obtained through direct participation of customers in JIRA exceed the disadvantages of for example the increased efforts necessary for JIRA configuration. A well informed customer who is directly involved in his project feels much more comfortable even if something goes wrong or the progress of the project gets stuck. <em>Transparency is the magical keyword.</em></p>
<p>But you have to accept that every project has its own characteristics. It will be mainly influenced by the customer&#8217;s character, organization and stakeholders. You have to find the most appropriate and fitting level for a customer&#8217;s direct participation. Try to get the most accurate picture of the stakeholders you have to work with and then decide how they could fit into the process. And keep in mind that the process can always be adapted afterwards in order to achieve the greatest efficiency in project progress and customer satisfaction.</p>
<p>In keeping with agile practice apply <em>continuous improvement</em> to your project management processes: Change something &ndash; find out how it went &ndash; learn from it &ndash; change something again!</p>
<h2>Summary of the Key Success Factors</h2>
<ul>
<li>Let customers create tickets directly in JIRA: requirements, change requests, support inquiries, bugs.</li>
<li>Incorporate customers&#8217; duties and responsibilities (e.g. assignments, approvals) directly into the issues workflow as dedicated steps.</li>
<li>Prepare specific filters and dashboards for the customer
<ul>
<li>for his duties (detailing, assignments, approval)</li>
<li>for overviews</li>
<li>for status</li>
</ul>
</li>
<li>Split projects with an overwhelming amount of implementation tasks into two separate instances &#8211; one for the customer and one for development.</li>
<li>Give customers a short JIRA training covering all standard actions as well as how to use and adapt dashboards and how to interpret the data and analysis reports.</li>
<li>Tailor every project set-up individually and don&#8217;t try to compress it into the same template.</li>
</ul>
<p>If you keep all this in mind, you have a good chance that JIRA will become customer&#8217;s &#8217;sweetheart&#8217;!</p>
<p>There are tons of other interesting topics around JIRA. I will continue to provide you with further ideas, suggestions and mgm experiences. And of course if you have additional questions, ideas, and suggestions around JIRA I would really appreciate any comments and input from you.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/12/jira-beyond-bug-tracking-part2/">Practical Customer Participation in JIRA Workflows - JIRA beyond Bug Tracking, Part 2</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=21" title="View articles by Alexander Weiss">Alexander Weiss</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/12/jira-beyond-bug-tracking-part2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<series:name><![CDATA[JIRA beyond Bug Tracking]]></series:name>
	</item>
		<item>
		<title>Successfully Embracing JIRA in the whole Project Lifecycle - JIRA beyond Bug Tracking, Part 1</title>
		<link>http://blog.mgm-tp.com/2011/11/jira-beyond-bug-tracking-part1/</link>
		<comments>http://blog.mgm-tp.com/2011/11/jira-beyond-bug-tracking-part1/#comments</comments>
		<pubDate>Fri, 18 Nov 2011 10:49:22 +0000</pubDate>
		<dc:creator>Alexander Weiss</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Agile]]></category>
		<category><![CDATA[JIRA]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[QA]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1023</guid>
		<description><![CDATA[This two-part series shows how JIRA can be used for other things than just bug tracking and presents mgm&#8217;s experiences with embracing JIRA in nearly all parts of traditional and agile project lifecycles, resulting in a number of customized, optimized JIRA workflows and processes. In this first part, I will give you an overview of [...]]]></description>
			<content:encoded><![CDATA[<p>This two-part series shows how <a href="http://www.atlassian.com/software/jira/overview">JIRA</a> can be used for other things than just bug tracking and presents mgm&#8217;s experiences with embracing JIRA in nearly all parts of traditional and agile project lifecycles, resulting in a number of customized, optimized JIRA workflows and processes. In this first part, I will give you an overview of challenges we faced over the years and how we adapted JIRA to meet them. You will get a taste of the vast variety of uses we have found for JIRA and understand why we decided to use only one tool instead of many.</p>
<p><span id="more-1023"></span></p>
<p>mgm has been using <a href="http://www.atlassian.com/software/jira/overview">JIRA</a> for project management since 2005 for all its projects, ranging from small projects (&lt; 5 members), medium (&lt; 30 members) ones up to really large projects with up to 500 members and more. Part of my job is setting up JIRA and designing customer-tailored, optimized JIRA workflows. We&#8217;re setting up JIRA externally for customers, usually as part of software development projects, and also internally, e.g. for marketing projects, front-office and administrative tasks.</p>
<p>During the past 6 years, our initially defined processes – especially the ones concerning project management and requirement management – passed through several iterations of refinement. We learned a lot and embraced our lessons learned into improved process definitions. And we introduced more and more agile methods into projects fitting to this methodology. The latest step in this direction was to extend JIRA with Atlassians agile plug-in <a href="http://www.atlassian.com/software/greenhopper">GreenHopper</a> which enables a gradual introduction of agile tools.</p>
<div id="attachment_1024" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-jira-beyond-part1" href="http://blog.mgm-tp.com/wp-content/uploads/2011/11/greenhopper.png"><img class="size-large wp-image-1024" src="http://blog.mgm-tp.com/wp-content/uploads/2011/11/greenhopper-480x324.png" alt="" width="480" height="324" /></a><p class="wp-caption-text">Agile task tracking using the GreenHopper extension.</p></div>
<p>We make sure that every project gets its own tailored project set-up (set of issue types, workflows, custom fields, GreenHopper configuration) optimized for their individual project needs. And although we have several standard configurations sets for new software projects inherited from our best practises, we use them only for the initial project set-up. Once the basic functionality is implemented we try to reflect the distinct project requirements (character of project, product and customer) by creating an individual project configuration instead of trying to condense them into a one-size-fits-all template.</p>
<h2>About our largest JIRA Installation with 700+ People</h2>
<p>One of mgm&#8217;s biggest pool for JIRA project experiences is the set-up and ongoing maintenance of a JIRA instance that supports a huge multi-project environment with more than 700 active members. The set-up for 28 managed software products is split into 60 JIRA projects with approximately 80 workflows definitions and almost 300 custom fields.</p>
<p>The projects form a network of vertical layers (product software development) and horizontal layers (cross-section functions like quality assurance, expertise, strategic product management, support, operation etc). There is also a strong interdependency between the software projects with respect to results and releases that we had to take into account. One of the largest of these software projects generates an average of 2.500 JIRA tickets for every main release (i.e. every 4 months). So we have a lot of different aspects to consider.</p>
<h2>Software Project Process Maturity at mgm</h2>
<p>As agile project management patterns become more and more prevalent within software projects, we adapted our projects&#8217; lifecycle with the following steps: The typical project starts with more or less agile iterations in which requirements are compiled and analyzed, followed by the design and modelling phase. After several development cycles with accompanying quality assurance activities and a final software approval process, the software moves into operation. Product maintenance and enhancement projects typically follow release plans and put all necessary cycle steps into more or less short time boxes.</p>
<p>Today, mgm has reached a process maturity that allows us to control all aspects of the project lifecycle through JIRA processes:</p>
<ul>
<li>Requirement management</li>
<li>Change management</li>
<li>Bug tracking</li>
<li>Task management</li>
<li>Release management</li>
<li>Development cycles and sprint planning</li>
<li>Software approval process</li>
<li>Quality assurance and test management</li>
</ul>
<div id="attachment_1029" class="wp-caption alignnone" style="width: 489px"><a rel="lightbox-jira-beyond-part1" href="http://blog.mgm-tp.com/wp-content/uploads/2011/11/project_lifecycle_with_jira.png"><img class="size-large wp-image-1029" src="http://blog.mgm-tp.com/wp-content/uploads/2011/11/project_lifecycle_with_jira-479x824.png" alt="" width="479" height="824" /></a><p class="wp-caption-text">Software Project lifecycle and its reflection within JIRA.</p></div>
<p>Of these aspects, only the requirement engineering part including business modeling (for example using UML tools etc.) and the design of business and technical concepts is typically done outside of JIRA, but even they can be controlled by JIRA tickets with adequate dedicated workflows.</p>
<p>But why do we spend so much effort to squeeze as many lifecycle steps into JIRA processes as possible? The answer is that we have learned our lessons from experiences: we realized there are many advantages in using only one tool instead of a bunch of different ones, thus reducing system discontinuity.</p>
<p>In summary, one tool for the whole project lifecycle has these advantages:</p>
<ul>
<li>Creates more expertise and familiarity.</li>
<li>Avoids disruptions of process and media during transitions to the next lifecycle step.</li>
<li>Enables fast feedback between
<ul>
<li>Development &amp; Project management,</li>
<li>Customer &amp; Project management,</li>
<li>Development &amp; Quality Assurance.</li>
</ul>
</li>
<li>Facilitates a release oriented approach for development and planning.</li>
<li>Allows cross-processing reports and progress control.</li>
<li>Helps when introducing agile tools and methods at the appropriate level of need.</li>
</ul>
<h2>Direct Customer Involvement through JIRA</h2>
<p>In addition to the benefits to the project as mentioned above, there is one other advantage of using JIRA as the main tool for all phases of the project:  it creates a point of access for direct participation of the customer. But why involve the customer at all?</p>
<p>Quite often project managers and suppliers keep away insight into the detailed project work and progress from the customer and just provide him with status reports on a regular basis instead. But our experiences showed that in software maintenance and enhancement projects – especially in large projects – there is less overhead and more benefits for everybody involved if customers aren&#8217;t locked out of the JIRA processes.</p>
<p>For example, if the project lifecycle is well defined and well known and all participants know their parts and responsibilities within JIRA, the process runs very smooth. And as all communications can be conducted within this one tool (JIRA) there will be no more bypassed communications, activities and results that have to be painstakingl  translated into the necessary JIRA processes and information snippets (and no information should be kept out of JIRA anyway!). Thus, one more cause of problems is be eliminated!</p>
<h2>Avoid Creeping Requirements through Process Transparency enabled by JIRA</h2>
<p>Furthermore, the customer gets a completely different attachment to &#8216;his&#8217; project if he is able to see all ongoing activities and if he can generate an actual project state by himself. We have realized that the customer feels much more satisfied and more responsible for his assigned lifecycle parts once he experiences the whole power of process transparency. He also becomes more sensitive for interference factors and usually starts to avoid &#8216;creeping requirements&#8217; once he realizes the consequences of adding just another &#8220;simple&#8221; request (but sometimes the exception proves the rules).</p>
<p>Having only one tool to manage and view all processes has the advantage that the customer can concentrate on acquiring only one set of necessary skills. In addition we support the customers in getting all the necessary information by preparing <em>views</em> for his tasks, responsibilities and status reports by creating all needed JIRA filters and dashboards for him. These will then be shared with everyone who needs them on the customer side (or within the project team) – so one set of preparations can be used by any amount of users to get the same results. Another great help are <em>filter subscriptions</em> especially to track and manage time critical process steps. These tend to make the customer&#8217;s start in JIRA very smooth and comfortable.</p>
<p>Nevertheless, the customer has to be willing to take an active part in JIRA. If the customer isn&#8217;t very technically skilled and has problems with or an aversion to use software tools at all, we can only recommend to stop your efforts to make him an active part in the JIRA process. Compulsion to use JIRA will have the exact opposite effect and creates feelings of unease and dissatisfaction. In such a case stop your efforts to convince him to use JIRA and just claim higher fees for your larger project management efforts.</p>
<h2>To be continued&#8230;</h2>
<p>In the next part of this blog series we will continue with the topic of direct involvement of customers as introduced above and give you some practical examples of when, where and how to introduce and include your customer into JIRA. We will show you some proven real-world workflows that include customer interaction and cover appropriate modes of customer participation.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/11/jira-beyond-bug-tracking-part1/">Successfully Embracing JIRA in the whole Project Lifecycle - JIRA beyond Bug Tracking, Part 1</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=21" title="View articles by Alexander Weiss">Alexander Weiss</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/11/jira-beyond-bug-tracking-part1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<series:name><![CDATA[JIRA beyond Bug Tracking]]></series:name>
	</item>
		<item>
		<title>Must-Know URL Hash Techniques for AJAX Applications</title>
		<link>http://blog.mgm-tp.com/2011/10/must-know-url-hashtechniques-for-ajax-applications/</link>
		<comments>http://blog.mgm-tp.com/2011/10/must-know-url-hashtechniques-for-ajax-applications/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 07:53:09 +0000</pubDate>
		<dc:creator>Dr. Christian Winkler</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[AJAX]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[jQuery]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[URL]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1006</guid>
		<description><![CDATA[By coding the page state into the URL, even single-page web applications can  support deep bookmarks and the browser&#8217;s back button. The most accepted approach is to utilize the location hash, i.e. the local part of the URL. This article explains this technique and what pitfalls you should be aware of, based on my [...]]]></description>
			<content:encoded><![CDATA[<p>By coding the page state into the URL, even single-page web applications can  support deep bookmarks and the browser&#8217;s back button. The most accepted approach is to utilize the location hash, i.e. the local part of the URL. This article explains this technique and what pitfalls you should be aware of, based on my team&#8217;s experience from building an AJAX interface for Solr. You will also learn about the HTML5 History API, which is a second, more modern technique.</p>
<p><span id="more-1006"></span></p>
<p>Modern web applications often load data via AJAX without leaving the original page; some sites even have only one single page left. The idea behind that was <a href="http://www.zdnet.com/blog/btl/will-the-real-ajax-pioneer-please-stand-up/1201">pioneered</a> by Outlook Webclient and, more prominently for Web users, <a href="http://mail.google.com/">Google Mail</a> in order to create a desktop-like experience. Other benefits include less bandwidth used by the client, faster response time and eventually a more interactive application feeling. </p>
<p>After the first enthusiasm about all these new applications, users found the first <a href="http://isolani.co.uk/blog/javascript/FixingTheBackButtonThatAjaxBroke">serious drawbacks hitting them hard</a>. The beloved back button was not working anymore, bookmarks always led to the first and not the current page and links could not be forwarded.</p>
<p>All these problems have been known for a long time. New techniques now offer very elegant solutions for this problem and I will discuss them in this article.</p>
<h2>From Stateless Web Sites to Single-Page Web Applications</h2>
<p><em>In the beginning, the web was stateless.</em> This reflects the stateless nature of the <a href="http://tools.ietf.org/html/rfc1945">HTTP protocol</a>. Most web sites were purely informational and content-driven like e.g. newspapers and similar. New content was requested via navigating URLs and a whole new page with a new URL was shown. </p>
<p>When the first applications like online shops were created, a state or session would have proved immensely useful; otherwise all URLs had to be dynamic. The web was still young and dynamic, so <a href="http://www.w3.org/Protocols/rfc2109/rfc2109">cookies</a> were invented for this and have become immensely popular since then. The request-response cycle was unaffected; the HTTP requests were still stateless. </p>
<p>The situation changed again when more and more Javascript was used. Javascript now had implications in rendering the local page which were only known to the browser. By leveraging the <a href="http://www.w3.org/TR/XMLHttpRequest/">XMLHttpRequest object</a>, it soon became possible to get information from the server and incorporate that into the document object model (DOM) in the browser (<a href="http://www.adaptivepath.com/ideas/ajax-new-approach-web-applications">AJAX</a>). A client-side action can trigger a request to the server which brought the displayed URL totally out-of-sync with the state of the web application. This is still the case in most web applications today!</p>
<h2>Coding the Application State into the URL</h2>
<p>Some clever developers have found out that changes in the URL (also via Javascript) will be interpreted as new pages by the browser. This immediately enables the back button. And if done correctly, the URL once again uniquely designates the current page even if that has been initially created by a single request but modified by a multitude of subsequent AJAX requests.</p>
<p>An URL that is in-sync with the web page state does not come for free though. In order to work correctly, some things have to be taken into account:</p>
<ol>
<li>The URL must be changeable without reloading the page.</li>
<li>Each and every action that modifies the current page must trigger a URL change.</li>
<li>When the AJAX-changed page is opened using the URL (from a bookmark), the URL must be interpreted to &#8220;replay&#8221; the changes which are recorded in it.</li>
<li>When the back button is hit, Javascript must detect this.</li>
</ol>
<p>For an example, consider the scenario in the figure below, where a bookmarked page &#8220;/url#2&#8243; is called from a browser. Only the part before the &#8220;#&#8221; (i.e. &#8220;/url&#8221;) is relevant for the server, so this URL is requested. The Javascript code in the browser needs to interpret the local part (i.e. &#8220;2&#8243;) by itself and determines which actions to take. In our case, this triggers a subsequent AJAX request. Note that the URL in the browser then does not have to be changed again.</p>
<div id="attachment_1008" class="wp-caption alignnone" style="width: 377px"><a rel="lightbox-url-hash" href="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-1.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-1.png" alt="" title="Hash in URL Scenario 1" width="367" height="294" class="size-full wp-image-1008" /></a><p class="wp-caption-text">Reconstructing the page state from a bookmark.</p></div>
<p>Another scenario is shown in the following figure. Here, a user hits the back button. The <code>hashchange</code> event fires and the Javascript code in the browser must decide how to get the appropriate content, in this case also via an AJAX request like in the previous figure.</p>
<div id="attachment_1009" class="wp-caption alignnone" style="width: 379px"><a rel="lightbox-url-hash" href="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-2.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-2.png" alt="" title="Hash in URL Scenario 2" width="369" height="294" class="size-full wp-image-1009" /></a><p class="wp-caption-text">Reconstructing the previous page state when the user presses the back button.</p></div>
<p><a href="http://ajaxpatterns.org/Unique_URLs">Unique URLs (AJAX Patterns)</a> has some great details about this technique.</p>
<h2>Changing the Local Part of the URL</h2>
<p>Although it might sound easy, already the first point in the list above turns out to be complicated. Of course, Javascript can modify the URL in the browser using <code>location.href</code>, but the browser will then load the page from the modified URL. This is apparently not the desired result.</p>
<p>Clever programmers soon found a solution by only modifying the <em>local part</em> of the URL which then does not trigger a page reload. The local part is defined to be everything after the first hash &#8220;#&#8221; in the URL. You can directly modify this local part by using the <code>location.hash</code> property. The article <a href="http://www.elated.com/articles/using-javascripts-location-object-to-work-with-urls/">&#8220;Using Javascript&#8217;s Location Object to Work with URLs&#8221;</a> is a great introduction.</p>
<p>This technique has become very popular. A lot of sites are using this, the most famous one is probably Twitter. If you open the Tweets page of somebody, you will see that the URL looks like &#8220;http://twitter.com/#!/myachinghead&#8221;. Effectively, this means that there is only a single web page (apart from about etc.) and everything else is loaded via AJAX.</p>
<p>Links on the page to other &#8220;pages&#8221; are all intercepted via Javascript and trigger changes in the local part of the URL. For example with Twitter, see what happens if you click on other users but also on the tabs on the tweets page. The URL will change in a well-defined way which is suitable for bookmarks and interpreting all actions which have been performed.</p>
<p>By using this technique, bookmarking is easy. The URL structure of Twitter is fortunately quite simple: the local part is interpreted as the username and the corresponding tweets are loaded asynchronously via AJAX. The same is true for subpages like &#8220;Favorites&#8221;, &#8220;Following&#8221; etc.</p>
<h2>Detecting Changes in the Local Part of a URL</h2>
<p>As the technique has become quite popular, an API for changes on the local part of the URL has been designed and is called <a href="https://developer.mozilla.org/en/DOM/window.onhashchange">onHashChange</a>. You can easily catch this event by using something like:</p>
<pre>window.onhashchange = function () { ...  }</pre>
<p>Unfortunately, older browsers have no easy interface for listening to hash changes. An alternative is to use <code>setInterval</code> in order to check periodically for a changed hash.</p>
<p>Frameworks like <a href="http://jquery.com/">jQuery</a> have plugins for hash, see e.g. <a href="http://benalman.com/projects/jquery-bbq-plugin/">jQuery BBQ</a>. These plugins degrade gracefully to older browsers and thus hide the messing around with different APIs from your code etc.</p>
<p>The following figures shows the workflow when changing hashes: clicking an active element must only change the hash (i.e. by serializing the current application state there). Immediately after that, a <code>hashchange</code> event fires and will be detected by the browser. The local part of the URL must then be interpreted and in this case generates an AJAX request to the server. Note that if the element is clicked again and the serialized data is unchanged, no event will be triggered (as the hash has not changed) and the application has been accidentally optimized.</p>
<div id="attachment_1010" class="wp-caption alignnone" style="width: 379px"><a rel="lightbox-url-hash" href="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-3.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-3.png" alt="" title="Hash in URL Scenario 3" width="369" height="296" class="size-full wp-image-1010" /></a><p class="wp-caption-text">Workflow when changing URL hashes.</p></div>
<p>Twitter is also an excellent example of what problems can arise using this solution:</p>
<ul>
<li>Search for the Twitter page of somebody using Google (or Bing, it doesn&#8217;t matter). Click the result and see how the URL changes (the &#8220;#!/&#8221; is inserted). The reason is that search engines will never index pages which only differ in the local part of the URL. For search engines to work properly, some tricks like <a href="http://en.wikipedia.org/wiki/Sitemaps">sitemap.xml</a>, <a href="http://en.wikipedia.org/wiki/Rewrite_engine">rewrite rules</a> etc. have to be used. See our <a href="http://blog.mgm-tp.com/series/the-art-of-search-engine-optimization/">blog series about SEO</a> to get more information. Also, Google provides an interesting <a href="http://code.google.com/intl/de/web/ajaxcrawling/docs/getting-started.html">document &#8220;Making AJAX Applications Crawlable&#8221;</a>.</li>
<li>Open the Tweets page of a user, click &#8220;about&#8221; (the company link) and then use the back button. The result is not what you expect! The back button works fine when jumping between tweets of users, though.</li>
</ul>
<p>Using a real URL and not only one differing in the local part, offers many exciting possibilities.</p>
<h2>Example 1: AJAX-ifying a conventional page-based website </h2>
<p>For example, if you convert a normal page-based (i.e. request-response based) website into an AJAX-based one, you can keep your current URL structure but generate the URLs on the client side by using <code>pushState</code>. The associated AJAX call will create a request to the server. You can (and should!) use the same URL as for the whole page and detect on the server side that this is an AJAX request and should render as e.g. JSON (which can be pushed as data in <code>pushState</code>) whereas normal requests will render as complete HTML. </p>
<p>Following this approach, your server-side logic needs no modification, only the rendering must be changed. Bookmarking works still in the usual way as the bookmarked page will be requested as a full HTML page even if the application has become a single page application under the covers. This technique has been used successfully by e.g. <a href="https://www.github.com">Github</a> for quite some time. It is so seamless that you might not even have noticed it.</p>
<h2>Example 2: Creating an HTML search interface</h2>
<p>In a real-world project, we have created a large-scale search application using <a href="http://lucene.apache.org/solr/">Apache Solr</a> as a backend. The API is exposed via Tomcat, which intelligently distributes search requests among the cluster, and we created a <a href="http://en.wikipedia.org/wiki/Representational_state_transfer">REST</a> interface with <a href="http://en.wikipedia.org/wiki/JSON">JSON</a> as data transport.</p>
<div id="attachment_1011" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-url-hash" href="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Screenshot.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Screenshot-480x275.png" alt="" title="Screenshot" width="480" height="275" class="size-large wp-image-1011" /></a><p class="wp-caption-text">Query Parameters coded into the local part of the URL (example taken from a custom Solr interface).</p></div>
<p>Putting an AJAX-enabled Web application on top of that proved to be quite easy. JSON data can be used directly in Javascript. Each search creates an AJAX request to the server and updates the results table. Trouble hit us as soon as the first users were testing the application. They were used to working with the back button which didn&#8217;t work as expected. A solution had to be found.</p>
<p>As legacy browsers still had to be supported we chose to implement a URL hash. As <a href="http://jquery.com">jQuery</a> was already in use in the project, the choice was to go with <a href="http://benalman.com/projects/jquery-bbq-plugin/">jQuery BBQ</a>. The application was changed to &#8220;serialize&#8221; the state into the URL. URL changes then raise the <code>hashchange</code> event which in turn trigger the search. We got some nice add-ons for free:</p>
<ul>
<li>The first search starts automatically.</li>
<li>Modifying the URL directly also creates a search.</li>
<li>Hitting search again without changing the parameters does not change the hash and no event is raised, i.e. no search is performed.</li>
<li>Reload works out of the box.</li>
<li>Users can (again) use bookmarks e.g. for wrong results and check back later.</li>
<li>Users can forward URLs to their colleagues for cross-checks.</li>
</ul>
<p>After this change has been successfully rolled out, the customers were much happier. The application got a more desktop-like feeling without sacrificing the convenience of the Web&#8217;s ubiquitous back button and bookmarks.</p>
<h2>Solutions for more complex Web Applications</h2>
<p>The discussed solution works fine for small websites or dialog-driven applications. As soon as the application gets more complex, new approaches will prove more efficient. </p>
<p>There are quite a few client-side frameworks which all have their individual strengths. The more popular ones are <a href="http://documentcloud.github.com/backbone/">Backbone.js</a> and <a href="http://knockoutjs.com/">Knockout.js</a>. For an interesting discussion about their differences see the discussion <a href="http://stackoverflow.com/questions/5112899/knockout-js-vs-backbone-js-vs">&#8220;Knockout.js vs Backbone.js (vs ?)&#8221;</a>. Depending on the requirements, other paradigms like <a href="http://www.sproutcore.com/">SproutCore</a>, <a href="http://code.google.com/webtoolkit/">GWT</a> or <a href="http://vaadin.com/home">Vaadin</a> should also be considered.</p>
<h2>Using the HTML5 History API</h2>
<p>Instead of messing with URLs and <code>onhashchange</code> events and since the functionality is already so important and will be gaining even more relevance in the future, HTML5 provides a different, more elegant solution. This solution is the <a href="http://www.w3.org/TR/html5/history.html">History API</a>, a dedicated interface that has been designed solely for the forward/backward navigation in the browser. For this post, mostly two methods are interesting:</p>
<ol>
<li>If a URL has to change, the method <code>history.pushState(url, title, data)</code> can be called. For the parameter: Besides the new URL (url) which replaces the current URL <em>without reloading</em>, a title parameter and a data object can be given. The title is more or less irrelevant, whereas the data parameter can be used to encapsulate the state of the page. If the back button is pressed, this state can be retrieved and used to correctly render the page. Using this state opens therefore possibilities of handling navigation more fine-grained than on a pure URL basis (it can also be viewed as a local session storage which is context-sensitive to each navigational step in the page).
<p>Interesting reads on this topic are: <a href="http://badassjs.com/post/840846392/location-hash-is-dead-long-live-html5-pushstate">&#8220;location.hash is dead. Long live HTML5 pushState!&#8221;</a>, <a href="http://blog.new-bamboo.co.uk/2011/2/2/degradable-javascript-applications-using-html5-pushstate">Degradable JavaScript Applications Using HTML5 pushState</a>, and the jQuery Plugins <a href="http://benalman.com/projects/jquery-bbq-plugin/">BBQ</a> and <a href="http://plugins.jquery.com/project/history-js">History.js</a> with their fallback capabilities to the old onhashchange functionality.</p>
</li>
<li>The History API also defines a callback for the <code>window</code> element which is called <code>onpopstate</code>. You should define a function and assign it to <code>window.onpopstate</code>, which is then responsible for reconstructing the original page. Of course you have access to the URL of the desired page and to the state object. Both can be used for rendering the page, e.g. via performing AJAX requests or using the pushed variables.</li>
</ol>
<div id="attachment_1012" class="wp-caption alignnone" style="width: 380px"><a rel="lightbox-url-hash" href="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Scenario-with-HTML5-History-API.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Scenario-with-HTML5-History-API.png" alt="" title="Scenario with HTML5 History API" width="370" height="442" class="size-full wp-image-1012" /></a><p class="wp-caption-text">Workflow when using the HTML5 History API.</p></div>
<p>The two figures above show the workflow when using the HTML5 History API. The upper figure describes forward navigation and includes pushing the new state and the AJAX request to the server. The lower figure shows that the URL changes and a <code>popstate</code> event fires when the back button is pressed. The application can either use local state storage or (in our case) use an AJAX call to update the page.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/10/must-know-url-hashtechniques-for-ajax-applications/">Must-Know URL Hash Techniques for AJAX Applications</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=20" title="View articles by Dr. Christian Winkler">Dr. Christian Winkler</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/10/must-know-url-hashtechniques-for-ajax-applications/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Effective and Efficient Techniques for a Rule-Based Test-Data Generator - Test Data Generation for Form-Centric Applications, Part 3</title>
		<link>http://blog.mgm-tp.com/2011/09/test-data-generation3/</link>
		<comments>http://blog.mgm-tp.com/2011/09/test-data-generation3/#comments</comments>
		<pubDate>Mon, 12 Sep 2011 14:49:30 +0000</pubDate>
		<dc:creator>Dr. Hans-Martin Adorf</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[QA]]></category>
		<category><![CDATA[Testing]]></category>
		<category><![CDATA[Validation]]></category>
		<category><![CDATA[Web Forms]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=982</guid>
		<description><![CDATA[This part explains some of the sophisticated software technology that is working behind the scenes in our rule-based test-data generator for form-centric applications. You will see that a simple enumeration of all possible ways to fill in a form is likely doomed to run longer than the age of the universe. Therefore more efficient techniques [...]]]></description>
			<content:encoded><![CDATA[<p>This part explains some of the sophisticated software technology that is working behind the scenes in our rule-based test-data generator for form-centric applications. You will see that a simple enumeration of all possible ways to fill in a form is likely doomed to run longer than the age of the universe. Therefore more efficient techniques are needed to make the seemingly impossible possible.</p>
<p><span id="more-982"></span></p>
<p>In the <a href="http://blog.mgm-tp.com/2010/12/test-data-generation-part2/">previous part</a> of this blog series, we have seen that it is necessary to generate test data containing <em>extreme or special values</em> (ESVs) which are capable of exerting some pressure onto the software under test. We have also seen that it is desirable to squeeze into a single test-data record as many ESVs as possible, in order to reduce the number of records to be fed into the test system. Running functional tests may take considerable time. Therefore we want to achieve a high compression, i.e. a high density of ESVs per data record.</p>
<p>As we mentioned in <a href="http://blog.mgm-tp.com/2010/10/test-data-generation-part1/">part 1</a> of this blog series, the main obstacle in the way of an automated generation of test data for form-centric applications is the presence of cross-field constraints. Before we get to these, let us look at constraints that can be formulated for individual fields alone.</p>
<h2>Constraints on single fields &mdash; the easy problem</h2>
<p>The kind of constraints that one may formulate for a field depends on the type of the latter. Consider a Boolean field. The field may assume only one of two values, <em>true</em> or <em>false</em>. A simple constraint on such a field is that it may not be empty. Additionally there may be attributes governing the external representation of the field values, such as &#8220;wahr&#8221; and &#8220;falsch&#8221;, or &#8220;1&#8243; and &#8220;0&#8243;.</p>
<p>Next, let&#8217;s look at a field representing an <em>amount of Euro with Cents</em>. Again a simple constraint on such a field is that it may not be empty. Other constraints may be that the amount may not be negative, or that the amount may not be zero. Invariably, there will be a constraint on the length of the field, which entails a lower and an upper limit for the amount the field can hold. Further attributes may involve the external representation such as the presence/absence of a positive sign in front of the amount, or the appearance of the decimal separator (a comma in Germany, a dot in the UK or the USA). Working with these constraints can be separated into a first phase, where a valid value is generated, followed by a second phase where an appropriate external representation is constructed. Our generator typically produces extreme and special values (ESVs), as explained in the <a href="http://blog.mgm-tp.com/2010/12/test-data-generation-part2/">previous part 2</a> of this blog series.</p>
<p>For a <em>string field</em>, we always have a maximum length constraint, plus a global constraint describing the character set that may be used. All we need is a simple string generator that looks at the string fields one after the other, and for each field generates data, while obeying the constraints that may exist (see e.g. <a href="#ref2">[2]</a>).</p>
<p>In our form-centric applications, we usually encounter a type of constraint on string fields that is a bit harder to cope with than the length and character set constraints mentioned above: it is the regular expression constraint. For instance in order to specify a valid German zip code number we may use a simple regular expression. Likewise, a regular expression may be devised constraining valid phone numbers, e.g. &#8220;<code>(\d+[ -])?\d+</code>&#8220;. A regular expression for valid e-mail addresses is considerably harder to come by. </p>
<p>Matching a given string using a regular expression is a common task in software engineering. The process of generating a valid (or non-valid) string from a regular expression is the inverse of matching. While there are numerous packages that perform matching, it is a lot more difficult to locate a good string-generator that produces a matching (or non-matching) string from a regular expression, while taking into account the side constraints (see e.g. [8]). Fortunately, the theory of finite automata (FA) comes to the rescue, which allows the conversion of any regular expression into a FA <a href="#ref3">[3]</a>. From there it is only a small step to an operational string generator that allows the production of matching or non-matching strings with relatively little effort. Not surprisingly, we have such a string-generator built into our test-data generator.</p>
<p>Figure 1 below shows the transition graph of a finite-state automaton (FA) for the regular expression &#8220;ab|cd&#8221;. By traversing paths from the start node (marked with an &#8216;S&#8217;) to one of the end nodes (marked with an &#8216;E&#8217;) one can generate the two strings &#8220;ab&#8221; or &#8220;cd&#8221; that match the given regular expression.</p>
<div id="attachment_983" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-test-data-generation-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-1-NFA-for-Regular-Expression.jpg"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-1-NFA-for-Regular-Expression.jpg" alt="" title="Figure 1 - NFA for Regular Expression" width="480" height="352" class="size-large wp-image-983" /></a><p class="wp-caption-text">Transition graph of the corresponding finite-state automaton (FA) for the regular expression ab|cd.</p></div>
<p>The nice thing is that any valid data item generated for a field not participating in any cross-field constraints can be combined with any valid data item generated for another such field.</p>
<p>Let us now turn to the harder problem, namely to the fields whose values are restricted by cross-field constraints.</p>
<h2>Cross-field constraints &mdash; the hard problem</h2>
<p>Let us begin by looking at a relatively simple problem, namely, how to fill in two fields in an address block of a form. Assume that the fields are the given name and the surname of a person. The single-field constraints that will exist for each field individually may be dealt with in the way described above. Let us assume that there is a single cross-field constraint linking the given name to the surname, and assume that the constraint reads: &#8220;if the given name is present, the surname has also to be present&#8221;. Formally we might encode this in the assertion</p>
<pre>absent(givenName) or present(surname)
</pre>
<p>Both fields may be empty, but the given name should not occur without the surname.</p>
<p>The simplest approach to solving this miniature <a href="http://en.wikipedia.org/wiki/Constraint_satisfaction_problem">constraint satisfaction problem</a> (CSP) is &#8220;trial and error&#8221;, or &#8220;generate and test&#8221;. The task of the generator consists in completely filling in the first two columns of the following table:</p>
<table>
<tr>
<th>givenName</th>
<th>surname</th>
<th>valid</th>
</tr>
<tr>
<td>absent</td>
<td>absent</td>
<td>yes</td>
</tr>
<tr>
<td>present</td>
<td>absent</td>
<td>no</td>
</tr>
<tr>
<td>absent</td>
<td>present</td>
<td>yes</td>
</tr>
<tr>
<td>present</td>
<td>present</td>
<td>yes</td>
</tr>
</table>
<p>Then we use a validator to check whether the constraint is fulfilled. </p>
<p>While this simple strategy is effective and efficient enough for a small number of fields, we quickly run into problems as soon as that number grows. For 10 fields filling in a table like the one above might take as little as 1 microsec, but for 100 fields it might easily take 1000 times the age of the universe. We are witnessing the curse of the combinatorial explosion.</p>
<p>Note that the problem of validating the rows in the truth table only takes polynomial time.</p>
<h2>What can be done?</h2>
<p>A very successful strategy to cope with the combinatorial explosion is to interleave the data generation with the checking/validation steps. This mixing of steps allows an &#8220;early pruning&#8221; of the search tree.</p>
<p>Early pruning is a hallmark of practically all CSP-solvers, and, perhaps not surprisingly, CSP-solvers have been used for two decades in the area of automated test data generation (see <a href="#ref4">[4]</a>, <a href="#ref5">[5]</a>, <a href="#ref7">[7]</a>, <a href="#ref8">[8]</a>). Also, the increasing importance of CSPs in main-stream computing is witnessed by the on-going attempt to define a Java-API for CSP-solvers <a hef="#ref6">[6]</a>. </p>
<p>While early pruning is essential, experience shows that it not sufficient to solve real-world CSPs that occur in the area of automated test data generation. As I mentioned in <a href="http://blog.mgm-tp.com/2010/10/test-data-generation-part1/">part 1</a> of this blog series, it is necessary to partition the CSP into independent components before submitting it to a CSP-solver.</p>
<p>The following figure illustrates this concept. It shows the 32 largest of more than 200 independent connected components. Each little square depicts a field, and a connection between two fields indicates that the two fields occur together in a constraint. This primal constraint graph has been derived from the validation rule base of a large form-centric application developed at mgm tp.</p>
<div id="attachment_984" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-test-data-generation-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-2-Independent-Connected-Components.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-2-Independent-Connected-Components-480x460.png" alt="" title="Figure 2 - Independent Connected Components" width="480" height="460" class="size-large wp-image-984" /></a><p class="wp-caption-text">This example for partition the CSP into independent components shows the 32 largest independent connected components of more than 200.</p></div>
<p>One might think that such a partitioning algorithm would be an integral part of a CSP-solver, but in our experience this is not always the case.</p>
<p>Here is, in brief, how a decomposition of a CSP into independent components can be accomplished:</p>
<ol>
<li>Initially create a component for each field, and put the field into a set belonging to that component.</li>
<li>Go through the constraints one by one. Identify all fields of the constraint. Identify all components to which these fields belong and merge their sets.</li>
<li>Label the resulting sets.</li>
<li>Go through the constraints one by one again. Identify a field in the constraint. Find its component, and label the constraint with the label of that component.</li>
</ol>
<p>Clearly, each field that does not participate in any cross-field constraint ends up in its own independent component, where test data generation can take place undisturbed by the rest of the problem.</p>
<h2>Implementation and data representation</h2>
<p>The bulk of our test data generator has been implemented in Java. We greatly benefit from representing the CSP&#8217;s variables and constraints in <a href="http://en.wikipedia.org/wiki/Lisp">Lisp</a> data structures. The representation of a constraint in a Lisp-data structure with prefix notation looks like this:</p>
<pre class="brush: plain;">
(constraint (
		2 &quot;Regel_Zahlen_Vergleich&quot;  &quot;4&quot; &quot;Test_Vordruck&quot;
            &quot;Zahlen_Kontext&quot; &quot;Nicht_Negative_Zahl&quot;
		1 1 1 1 1 1)
 (not (and (= 2
              (+ num-def_positive_zahl_$v1_$u1_$z1
                 num-def_nicht_negative_zahl_$v1_$u1_$z1))
           (&gt;= $t_positive_zahl_$v1_$u1_$z1
               $t_nicht_negative_zahl_$v1_$u1_$z1))))
</pre>
<p>As a matter of fact, we have never regretted the decision of choosing a Lisp representation, which was taken early on in the lifetime of the project. There are certain transformations of the original CSP that are easy to carry out, when the variables and constraints are encoded as Lisp data. Also intermediate results can easily be written back to a file for visual inspection.</p>
<p>A lot of configuration information is necessary in order to control our Rule-Based Test Data Generator, such as the number of forms that should be filled in, the maximum number of rows in the fields, etc. but we spare you the detail.</p>
<p>Our test-data generator produces data records for each component individually. Due to the independence of components, the records of the components can be freely combined.</p>
<p>Finally, we obtain a table with the generated test-data records. The figure below shows a portion of this table with several solutions of a test-data generation process. Each row represents a solution; its value consists of a mixture of extreme or special values (such as -99999999999.99 or 0.01), and of values that are the result of satisfying the validation constraints.</p>
<div id="attachment_985" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-test-data-generation-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-3-Table-with-Solutions.jpg"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-3-Table-with-Solutions-480x262.jpg" alt="" title="Figure 3 - Table with Solutions" width="480" height="262" class="size-large wp-image-985" /></a><p class="wp-caption-text">A portion of the table with the generated test-data records.</p></div>
<p>Most of the desired extreme and special values (ESVs) are present in at least one of the records (high test coverage). Since most of the time many ESVs are present in a single record, we attain a high compression rate (i.e. many ESVs in few records).</p>
<h2>Conclusion</h2>
<p>We are at the end of this blog series on the Rule-Based Test Data Generator (R-TDG). You must be left with the impression that the R-TDG is a complex machinery, and, undoubtedly, it is. Be assured, however, it is in operational use at mgm tp since about two years.</p>
<p>The constraints which the R-TDG needs are served to us on a silver tablet by the rule-based system whose main purpose consists in generating the validators for form-centric applications. This approach is radically different from earlier attempts in the area of automated test-data generation, where the constraints are extracted from source code. To the best of our knowledge the R-TDG is the only working automated test-data generator world-wide that can properly take into account cross-field constraints.</p>
<h2>References</h2>
<ul class="references">
<li id="ref1">[1]	Bergmann, V. (2008). <a href="http://databene.org/databene-benerator">Databene Benerator</a>.</li>
<li id="ref2">[2] Brüggemann-Klein, A. (1992). &#8220;Regular expressions into finite automata.&#8221; Lecture Notes in Computer Science 583: 87-98.</li>
<li id="ref3">[3] DeMillo, R. A. and A. J. Offutt (1991). &#8220;Constraint-Based Automatic Test Data Generation.&#8221; IEEE Transactions on Software Engineering 17(9): 900-910.</li>
<li id="ref4">[4] Feldman, J. (2011). <a href="http://jcp.org/en/jsr/detail?id=331">JSR 331: Constraint Programming API</a>, Java Community Process. 2011</li>
<li id="ref5">[5] Gotlieb, A., B. Botella, et al. (1998). &#8220;Automatic test data generation using constraint solving techniques.&#8221; ACM SIGSOFT Software Engineering Notes 23 (2): 53&#8211;62.</li>
<li id="ref6">[6] Ince, D. C. (1987). &#8220;The Automatic Generation of Test Data.&#8221; Computer Journal 30 (1): 63-69.</li>
<li id="ref7">[7] msdn (2010). <a href="http://msdn.microsoft.com/en-us/divbrary/aa833197.aspx">&#8220;The Regular Expression Generator&#8221;</a>, Visual Studio Team System 2008 Database Developer Center, Microsoft Corporation.</li>
<li id="ref8">[8] Zhan, Y. (2002). Constraint Solving in Test-Data Generation. Lecture Notes in Computer Science 2470. P. V. Hentenryck. Berlin-Heidelberg: 770-771.</li>
</ul>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/09/test-data-generation3/">Effective and Efficient Techniques for a Rule-Based Test-Data Generator - Test Data Generation for Form-Centric Applications, Part 3</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=14" title="View articles by Dr. Hans-Martin Adorf">Dr. Hans-Martin Adorf</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/09/test-data-generation3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<series:name><![CDATA[Test Data Generation for Form-Centric Applications]]></series:name>
	</item>
		<item>
		<title>How Google’s Geocoding solves Address Validation</title>
		<link>http://blog.mgm-tp.com/2011/08/address-validation-with-geocoding/</link>
		<comments>http://blog.mgm-tp.com/2011/08/address-validation-with-geocoding/#comments</comments>
		<pubDate>Fri, 12 Aug 2011 12:39:21 +0000</pubDate>
		<dc:creator>Petr Skokan</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[ECommerce]]></category>
		<category><![CDATA[Validation]]></category>
		<category><![CDATA[Web Forms]]></category>
		<category><![CDATA[Web Service]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=962</guid>
		<description><![CDATA[For the e-commerce business, it is very important to have a valid and correct address base. Sending packages to incorrect addresses causes significant losses, since packages are not delivered and sent back. In the case of the Kickz online shop, we faced exactly this problem. Thus, we thought about how to improve and force users [...]]]></description>
			<content:encoded><![CDATA[<p>For the e-commerce business, it is very important to have a valid and correct address base. Sending packages to incorrect addresses causes significant losses, since packages are not delivered and sent back. In the case of the <a href="http://www.kickz.com/">Kickz</a> online shop, we faced exactly this problem. Thus, we thought about how to improve and force users to enter correct addresses. Learn how we used Google&#8217;s Geocoding service to solve this problem.</p>
<p><span id="more-962"></span></p>
<p>There are many commercial providers for addresses, however not all of them provide a world-wide address base and the Kickz shop is delivering world-wide. It turned out that the best value for money would be to use Google&#8217;s <a href="http://code.google.com/apis/maps/documentation/geocoding/">Geocoding web service</a>. Geocoding is the process of converting addresses into geographic coordinates, i.e. latitude and longitude.</p>
<p>However, as the response from Google includes not only the longitude/latitude data but also further address suggestions, this service can also be easily used to present refined/suggested addresses to the user. And that was exactly what we needed. So our basic idea of the service usage was to request exact coordinates and refined addresses (suggestions) for a given address. The coordinates were then used to show the location on the map.</p>
<p>The Geocoding web service is part of the Google Maps API and is offered both as free and as commercial. The basic service is free for 2500 requests per day and does not support HTTPS requests. As Kickz requires HTTPS and more requests per day, we decided to go for the commercial service, which allows of up to 100,000 requests per day and provides HTTPS access with signed web service requests.</p>
<h2>Using the Geolocation Web Service</h2>
<p>Google&#8217;s <a href="http://code.google.com/apis/maps/documentation/geocoding/">Geocoding web service</a> is quite simple to use. All you need to do is send a simple HTTP request to the specified address and you will receive a response in either XML or JSON format. The response contains zero, one or even several addresses, depending on the precision of the input. It can be empty if Google cannot locate your address &mdash; but based on our experience, this rarely happens.</p>
<p>A <a href="http://de.wikipedia.org/wiki/JavaScript_Object_Notation">JSON</a>  response is the obvious choice for JavaScript-based validation on the client-side using (AJAX). XML is more suitable for classic server-side validation. For the Kickz project, we chose the XML format. We validate using the Google web service all over the application, where the user enters a address, i.e. during customer registration and checkout. In these scenarios, the request from the browser is first sent to our server-side application, which calls the Google web service with a certificate-signed request. The request for the input &#8220;Frankfurstein ring 105a,München, de, 80000&#8243; looks this:</p>
<pre class="brush: plain; light: true; wrap-lines: true;">
https://maps-api-ssl.google.com/maps/api/geocode/xml?address=Frankfurstein+ring+105a,M%C3%BCnchen,de, 80000,&amp;amp;sensor=false&amp;amp;client=gme-kickzag&amp;amp;signature=VF930KLrbu98sKKLqIjn4adIoTs=
</pre>
<p>Google returns the following XML document as a response:</p>
<pre class="brush: xml;">
&lt;GeocodeResponse&gt;
    &lt;status&gt;OK&lt;/status&gt;
    &lt;result&gt;
        &lt;type&gt;street_address&lt;/type&gt;
        &lt;formatted_address&gt;Frankfurter Ring 105, 80807 Munich, Germany&lt;/formatted_address&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;105&lt;/long_name&gt;
            &lt;short_name&gt;105&lt;/short_name&gt;
            &lt;type&gt;street_number&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;Frankfurter Ring&lt;/long_name&gt;
            &lt;short_name&gt;Frankfurter Ring&lt;/short_name&gt;
            &lt;type&gt;route&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;M&amp;#252nchen&lt;/long_name&gt;
            &lt;short_name&gt;M&amp;#252nchen&lt;/short_name&gt;
            &lt;type&gt;sublocality&lt;/type&gt;
            &lt;type&gt;political&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;Munich&lt;/long_name&gt;
            &lt;short_name&gt;Munich&lt;/short_name&gt;
            &lt;type&gt;locality&lt;/type&gt;
            &lt;type&gt;political&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;Munich&lt;/long_name&gt;
            &lt;short_name&gt;M&lt;/short_name&gt;
            &lt;type&gt;administrative_area_level_2&lt;/type&gt;
            &lt;type&gt;political&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;Bayern&lt;/long_name&gt;
            &lt;short_name&gt;BY&lt;/short_name&gt;
            &lt;type&gt;administrative_area_level_1&lt;/type&gt;
            &lt;type&gt;political&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;Germany&lt;/long_name&gt;
            &lt;short_name&gt;DE&lt;/short_name&gt;
            &lt;type&gt;country&lt;/type&gt;
            &lt;type&gt;political&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;80807&lt;/long_name&gt;
            &lt;short_name&gt;80807&lt;/short_name&gt;
            &lt;type&gt;postal_code&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;geometry&gt;
            &lt;location&gt;
                &lt;lat&gt;48.1883675&lt;/lat&gt;
                &lt;lng&gt;11.5857053&lt;/lng&gt;
            &lt;/location&gt;
            &lt;location_type&gt;ROOFTOP&lt;/location_type&gt;
            &lt;viewport&gt;
                &lt;southwest&gt;
                    &lt;lat&gt;48.1852199&lt;/lat&gt;
                    &lt;lng&gt;11.5825577&lt;/lng&gt;
                &lt;/southwest&gt;
                &lt;northeast&gt;
                    &lt;lat&gt;48.1915151&lt;/lat&gt;
                    &lt;lng&gt;11.5888529&lt;/lng&gt;
                &lt;/northeast&gt;
            &lt;/viewport&gt;
        &lt;/geometry&gt;
        &lt;partial_match&gt;true&lt;/partial_match&gt;
    &lt;/result&gt;
&lt;/GeocodeResponse&gt;
</pre>
<p>The XML response is deserialized to Java objects using <a href="http://jaxb.java.net/">JAXB 2</a>. After the response is processed, the user is given some options to refine the original address entered in the browser.</p>
<p>It&#8217;s convenient that all responses from the Google web service are localized. In the screenshots below, you see the same validation done in the German shop (left) and in the international shop (right). The only difference is the language parameter sent with the web service request and set in the HTTP request header &#8220;Accept-Language&#8221;. To visually improve validation, we also use static Google maps URL to display the address to the user graphically.</p>
<div id="attachment_965" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-geocoding" href="http://blog.mgm-tp.com/wp-content/uploads/2011/08/LocalizedResult.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/08/LocalizedResult-480x294.png" alt="" title="LocalizedResult" width="480" height="294" class="size-large wp-image-965" /></a><p class="wp-caption-text">The Geolocation web service is localized: Address data returned for the german shop (left) and for the international shop (right). Click to enlarge. </p></div>
<h2>Conclusion</h2>
<p>Google&#8217;s Geocoding web service turned out to be an effective tool for the validation of the user addresses in our Kickz online shop. Furthermore, this service can be used free of charge for small projects or sites with lower traffic. In Kickz though, we are using the commercial version of the web service, since the traffic is higher and we need the HTTPS service URL.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/08/address-validation-with-geocoding/">How Google&#8217;s Geocoding solves Address Validation</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=19" title="View articles by Petr Skokan">Petr Skokan</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/08/address-validation-with-geocoding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GR8Conf: A gr8t Conference on Groovy based Technologies</title>
		<link>http://blog.mgm-tp.com/2011/06/gr8conf/</link>
		<comments>http://blog.mgm-tp.com/2011/06/gr8conf/#comments</comments>
		<pubDate>Wed, 22 Jun 2011 23:37:39 +0000</pubDate>
		<dc:creator>Marcus Olk</dc:creator>
				<category><![CDATA[Talks]]></category>
		<category><![CDATA[DSL]]></category>
		<category><![CDATA[Groovy]]></category>
		<category><![CDATA[Testing]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=941</guid>
		<description><![CDATA[On May 17th-19th, I took the opportunity to escape the daily &#8220;Java business as usual&#8221; on the GR8Conf conference in Copenhagen, Denmark. The three days were packed with the latest information on Groovy related technologies such as Grails, Griffon, Gradle, GPars, Spock etc.

The whole conference seemed to have a mantra: A-S-T for Abstract Syntax Tree [...]]]></description>
			<content:encoded><![CDATA[<p>On May 17th-19th, I took the opportunity to escape the daily &#8220;Java business as usual&#8221; on the GR8Conf conference in Copenhagen, Denmark. The three days were packed with the latest information on <a href="http://groovy.codehaus.org/">Groovy</a> related technologies such as <a href="http://www.grails.org/">Grails</a>, <a href="http://griffon.codehaus.org/">Griffon</a>, <a href="http://www.gradle.org/">Gradle</a>, <a href="http://gpars.codehaus.org/">GPars</a>, <a href="http://spockframework.org/">Spock</a> etc.</p>
<p><span id="more-941"></span></p>
<p>The whole conference seemed to have a mantra: A-S-T for <a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax Tree</a> transformations. <em>A language should have access to its own abstract syntax</em>, as McCarthy claimed back in the days of LISP &#8211; Groovy has its access via AST transformations.</p>
<h2>AST Transformations</h2>
<p><a href="http://twitter.com/HamletDRC">Hamlet D&#8217;Arcy</a>, an experienced engineer with <a href="http://www.canoo.com">Canoo</a> held an in depth workshop on transformations of the abstract syntax tree of Groovy code.</p>
<p>In a nutshell AST transformations are Java annotations allowing to <a href="http://java.dzone.com/articles/groovy-ast-transformations">manipulate Groovy code</a>. Practical use cases are best practices patterns such as Joshua Blochs &#8216;Effective Java&#8217; implemented once in terms of an AST annotation applied to every day Groovy code.</p>
<p>Examples are <a href="http://groovy.codehaus.org/gapi/groovy/transform/Canonical.html">@Canonical</a> applied to a Groovy class generating correct implementations of Object.hashCode, Object.equals, and Object.toString or <a href="http://groovy.codehaus.org/gapi/groovy/transform/Synchronized.html">@Synchronized</a>, <a href="http://groovy.codehaus.org/gapi/groovy/transform/WithReadLock.html">@WithReadLock</a>or <a href="http://groovy.codehaus.org/gapi/groovy/transform/WithWriteLock.html">@WithWriteLock</a> generating the correct error prone synchronization boiler plate code for synchronization of mutual state access.</p>
<p>Another handy AST annotation is <a href="http://groovy.codehaus.org/gapi/groovy/util/logging/Log.html">@Log</a> which creates the standard logger and guarding log level check code for log statements.</p>
<p><a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/log_ast1.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/log_ast1-480x383.png" alt="" title="log_ast1" width="480" height="383" class="alignnone size-large wp-image-942" /></a></p>
<p><a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/log_ast2.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/log_ast2.png" alt="" title="log_ast2" width="279" height="69" class="alignnone size-full wp-image-943" /></a></p>
<p>AST transformation are very powerful but one must not forget that &#8211; like AOP &#8211; your code will look different in a debugger.</p>
<h2>Design by Contract with GContracts</h2>
<p>The <a href="https://github.com/andresteingress/gcontracts/">GContracts project</a> uses AST transformations to implement a feature the <a href="http://www.eiffel.com">Eiffel</a> language has for years: <a href="http://www.eiffel.com/developers/design_by_contract.html">Design by Contract</a> by defining invariants and pre- and post conditions. The infamous example of an integer overflow that caused the spectacular <a href="http://archive.eiffel.com/doc/manuals/technology/contract/ariane/">Ariane 5 crash</a> served as the obvious example.</p>
<p><a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5-480x432.png" alt="" title="ariane5" width="480" height="432" class="alignnone size-large wp-image-944" /></a><br />
<a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5_assert11.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5_assert11-480x146.png" alt="" title="ariane5_assert1" width="480" height="146" class="alignnone size-large wp-image-946" /></a><br />
<a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5_assert2.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5_assert2-480x162.png" alt="" title="ariane5_assert2" width="480" height="162" class="alignnone size-large wp-image-947" /></a></p>
<p>Applying these contract annotations obviously seems like a good idea.</p>
<h2>Static Code analysis for Groovy</h2>
<p>Java has some mature static code analysis tools such as <a href="http://findbugs.sourceforge.net/">Findbugs</a>, <a href="http://pmd.sourceforge.net/">PMD</a> or the inspections built into the <a href="http://www.jetbrains.com/idea/documentation/inspections.jsp">IntelliJ IDE</a> bugging a developer with violations of best coding practices rules. <a href="http://codenarc.sourceforge.net">CodeNarc</a> is a tool that offer a similar functionality. It doesn&#8217;t seem to be as mature as its senior Java counter parts, but the number of rules is increasing and writing CodeNarc rules is quite easy. Integrating CodeNarc into a Groovy build process and enabling the corresponding <a href="http://plugins.intellij.net/plugin/?idea&#038;id=5925">IntelliJ plugin</a> should become as mandatory as using Findbugs and friends.</p>
<h2>Happy spec&#8217;ing with Spock</h2>
<p><a href="http://spockframework.org/">Spock</a> is a <a href="http://junit.org">JUnit</a> compliant testing framework allowing to write very expressive test cases. Peter Niederwieser, the creator of Spock uses Groovy&#8217;s internal DSL capabilities to provide some very appealing features to write test cases called <em>specifications</em> in the Spock terminology.</p>
<p>A specification follows the <em>given-when-then</em> pattern also known as AAA, <em>arrange-act-assert</em>. JUnit provides the @Before and @Test annotations in combination with a whole bunch of Assert-Methods. Asserts are implict in the &#8220;then&#8221; section of a spec and the &#8220;given&#8221; and &#8220;then&#8221; sections are the <em>arrange</em> and <em>act</em> realizations.</p>
<p>Apart from this basic functionality Spock comes with many more powerful features such as the <a href="http://hamletdarcy.blogspot.com/2009/05/new-power-assertions-in-groovy.html">Power Assert</a> which has been extracted into the standard Groovy GDK, by the way. One of these features is the &#8220;where&#8221; section allowing to provide test data in a tabular form or to be read from a database in a very convenient way.</p>
<p><a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/spock_demo.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/spock_demo.png" alt="" title="spock_demo" width="410" height="373" class="alignnone size-full wp-image-948" /></a></p>
<p>A deeper look into Spock is highly recommended. Spock can serve as a replacement for <a href="http://junit.org">JUnit</a>or <a href="http://testng.org/">TestNG</a> because due to Groovy&#8217;s intrinsic Java integration it allows to test any JVM language.</p>
<h2>Languages are either concurrent or obsolete</h2>
<p>A very good argument to persuade Java purist to consider Groovy as a language is <a href="http://gpars.codehaus.org/">GPars</a>. <a href="https://twitter.com/#!/vaclav_pech">Vaclav Pech</a>, the project lead of the GPars project gave a very good introduction to GPars by demonstrating its power by coding concrete examples instead of flipping slides.</p>
<p>GPars can be considered as an abstraction layer on top of the Java concurrency packages. It is a library that enhances Groovy in a way that a programmer can safely use all prominent concurrency patterns such as <a href="http://gpars.codehaus.org/ForkJoin">thread pools</a>, <a href="http://gpars.codehaus.org/Actor">actors</a> or <a href="http://gpars.codehaus.org/Dataflow">data flow variables</a>. Vaclav demonstrated with a simple piece of code sequentially processing a number of images that the eight cores of his machine are not used as they are supposed to. By simply wrapping the same code with a &#8220;withPool&#8221; clause the process monitor showed all cores at work. The interesting fact was that no threads had to be created manually &#8211; the GPars construct figured out how to distribute the work load.</p>
<p><a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/gpars.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/gpars-480x348.png" alt="" title="gpars" width="480" height="348" class="alignnone size-large wp-image-949" /></a></p>
<p>Data flow variables take care of the synchronization of concurrently executed tasks. There are no explicit threads and locks required any longer. Anyone who has to implement concurrent computations should consider using GPars instead of having to code the corresponding concurrency patterns using the lower level <a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/package-summary.html">Java concurrency packages</a>.</p>
<h2>Summary</h2>
<p>Gr8Conf was worth a visit. Compared to the heavy weight JCP the evolution of the Java language is tied to, the growth of the Groovy ecosystem is breath taking. Some of the projects presented on this conference should be evaluated for existing and upcoming mgm projects.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/06/gr8conf/">GR8Conf: A gr8t Conference on Groovy based Technologies</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=6" title="View articles by Marcus Olk">Marcus Olk</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/06/gr8conf/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Rapid Prototyping with Spring Roo</title>
		<link>http://blog.mgm-tp.com/2011/06/rapid-prototyping-with-spring-roo/</link>
		<comments>http://blog.mgm-tp.com/2011/06/rapid-prototyping-with-spring-roo/#comments</comments>
		<pubDate>Thu, 16 Jun 2011 19:01:07 +0000</pubDate>
		<dc:creator>Christian Belka</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Reviews]]></category>
		<category><![CDATA[CRUD]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Hibernate]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[JPA]]></category>
		<category><![CDATA[Spring]]></category>
		<category><![CDATA[Web Frameworks]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=908</guid>
		<description><![CDATA[Spring Roo, a new contender to Grails and Ruby on Rails, provides flexible scaffolding for Spring based applications. We used Spring Roo in a web project to quickly provide working software in order to discuss features and gather feedback. This article shows how we generated an early prototype and transitioned to early development and then [...]]]></description>
			<content:encoded><![CDATA[<p>Spring Roo, a new contender to Grails and Ruby on Rails, provides flexible scaffolding for Spring based applications. We used Spring Roo in a web project to quickly provide working software in order to discuss features and gather feedback. This article shows how we generated an early prototype and transitioned to early development and then to production code.</p>
<p><span id="more-908"></span></p>
<p><a href="http://www.springsource.org/roo">Spring Roo</a>&#8217;s mission is to &#8220;<em>fundamentally and sustainably improve Java developer productivity without compromising engineering integrity or flexibility</em>&#8220;. It follows the <a href="http://en.wikipedia.org/wiki/Convention_over_configuration">Convention over Configuration</a> principle, and provides rapid application development for Java EE applications.</p>
<p>Spring Roo itself isn&#8217;t a framework; it&#8217;s more like a software tool for generating Java EE web applications with <a href="http://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a> functionality. This means the user can create, read/find/list, update and delete entities. Spring Roo offers several technology options, like <a href="http://www.springsource.org/roo/guide?w=base-persistence">various  providers</a> for the JPA-based persistence (e.g. Hibernate and EclipseLink),  Spring MVC controllers and a frontend based on either JSPs, JSF 2.0 (coming in Spring Roo 1.2), Google Web Toolkit (<a href="http://www.springsource.org/roo/guide?w=base-gwt">GWT</a>) or <a href="http://blog.springsource.com/2010/07/14/introducing-the-flex-addon-for-spring-roo">Flex</a>. If you want to change your persistence provider, you could easily do this; if you want to add a JSON output (or whatever else is needed) to the controllers, you could add it with only small effort. And everything integrated into the project has a reasonable &#8220;default&#8221; configuration, so that very little configuration overhead is needed.</p>
<p>The toolset of Spring Roo can generate code and integrate a lot of standard modules into a Java EE project, with very little configuration overhead, leaving you always the choice to modify everything you need. So with some easy commands you can add technologies like <a href="http://www.springsource.org/roo/guide?w=command-index#command-index-jms-commands">JMS</a> for messaging, <a href="http://www.springsource.org/roo/guide?w=base-json">JSON</a> for REST, <a href="http://www.springsource.org/roo/guide?w=base-solr">Solr</a> for full-text search etc. </p>
<h2>How does Spring Roo work?</h2>
<p>The heart of Spring Roo is the <strong>Roo Shell</strong> with a really great help system, which is self-explaining and easy to use. From here you could setup your JPA provider, create entity objects etc.</p>
<p>There are lots of <a href="http://www.springsource.org/roo/guide?w=intro#intro-exploring-sample">examples</a>  where you can quickly try out Spring Roo. In those, Spring Roo creates a complete project for you, including a Jetty server and a in memory database if needed.</p>
<p>We will use our own short example to explain a few features along with the article.</p>
<h3>A Short Example</h3>
<p>We are going to build a simple address database, where a person can have multiple addresses. Setting this up is a really easy process:</p>
<ol>
<li>First install Roo from the <a href="http://www.springsource.org/roo/">Spring Roo homepage</a> and add the bin folder to you path.</li>
<li>Then create an empty folder and start Roo with roo.sh (or roo.bat) from within this folder.</li>
<li>Next, follow the following commands to setup the project. You can type &#8220;hint&#8221; anytime, showing a smart help system which suggests you the next steps.</li>
</ol>
<pre class="brush: plain;">
project --topLevelPackage de.mgm.roo.sample
</pre>
<p>After this command an initial project will be created.</p>
<p>Here we will choose Hibernate for persistence and the Hypersonic in-Memory Database as specific database:</p>
<pre class="brush: plain; wrap-lines: true;">
persistence setup --provider HIBERNATE --database HYPERSONIC_IN_MEMORY
</pre>
<div id="attachment_909" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/console-setup.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/console-setup-480x365.png" alt="" title="Console Setup" width="480" height="365" class="size-large wp-image-909" /></a><p class="wp-caption-text">A Spring Roo project is usually created and modified in the Roo Shell.</p></div>
<p>There are lots of ORM technology options and databases to choose from, nearly all you would find in a normal environment. </p>
<div id="attachment_910" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/projectstruture.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/projectstruture-480x308.png" alt="" title="Project Struture" width="480" height="308" class="size-large wp-image-910" /></a><p class="wp-caption-text">Project structure generated by Roo.</p></div>
<p>This is our first entity, the Address:</p>
<pre class="brush: plain;">
entity --class ~.domain.Address --testAutomatically
field string --fieldName street --notNull --sizeMin 2
field string --fieldName city --notNull --sizeMin 2
field number --type int --fieldName postcode --notNull
</pre>
<p>This is our second entity, the Person, which can have several addresses:</p>
<pre class="brush: plain;">
entity --class ~.domain.Person --testAutomatically
field string --fieldName name --notNull --sizeMin 2
field string --fieldName lastname --notNull --sizeMin 2
field set --fieldName addresses --type ~.domain.Address
</pre>
<p>Pressing Tab during the commands shows you all available options.</p>
<div id="attachment_911" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/entity-classes-in-eclipse.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/entity-classes-in-eclipse-480x367.png" alt="" title="Entity Classes in Eclipse" width="480" height="367" class="size-large wp-image-911" /></a><p class="wp-caption-text">The new entity classes are reflected in the Eclipse IDE.</p></div>
<p>After the following commands, we have a fully working web application, that can be imported into eclipse (other IDEs are available as well):</p>
<pre class="brush: plain;">
perform eclipse
controller all --package ~.web
</pre>
<div id="attachment_912" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/web-classes-in-eclipse.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/web-classes-in-eclipse-480x337.png" alt="" title="Web Classes in Eclipse" width="480" height="337" class="size-large wp-image-912" /></a><p class="wp-caption-text">The Web tier and resources are created through the Roo Shell.</p></div>
<p>This application can be started easily with the command &#8220;mvn jetty:run&#8221; and you have the following web application: </p>
<div id="attachment_913" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/roo-web-application.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/roo-web-application-480x328.png" alt="" title="Roo Web Application" width="480" height="328" class="size-large wp-image-913" /></a><p class="wp-caption-text">The out-of-box generated Web application.</p></div>
<p>The core modules are included and managed via the OSGI implementation Felix, so the Roo-Shell also supports all OSGi Commands, e.g. to load and unload modules. You should take a look at the <a href="http://en.wikipedia.org/wiki/Spring_Roo#Base_Add-Ons">module list</a>.</p>
<p>But how does Spring Roo manage the generation and even modifications?<br />
Try to add a field to your entity object and Spring Roo will update everything for you. </p>
<p>How does Spring Roo work technically? Spring Roo uses <a href="http://static.springsource.org/spring-roo/reference/html/architecture.html">AspectJ mixins</a> called inter-type declarations in order to achieve separation of concerns, because the Spring Roo code is in a different compilation unit, as the code the user writes. So Spring Roo can modify it files as it needs them, without getting in conflict with the user.</p>
<h2>Spring Roo as a Rapid Prototype tool</h2>
<p>As the headline suggests, we used Spring Roo with a customer for Rapid Prototyping. Because the specific project had a very short timeline, there was not much time for a long requirement phase, so we tried rapid prototyping instead. So the customer got a very early working base, from where we could add and remove requirements on interacting cycles. Another problem were the changing requirements.</p>
<p>The general concept of the application was very clear, but the details like fields and entities changed. This is where rapid prototyping did really help. Rapid prototyping allowed us to show a lot of changes to the customers early in our development cycle so that we began getting the feedback that will be crucial to the success of the project. Not only the showing was successful, also the &#8220;usage&#8221; by the customer did help.</p>
<p>The requirement phase was followed by an early development stage:</p>
<ul>
<li>At first we developed the <strong>data model/entity model</strong> with the customer and transferred it to Spring Roo, then added</li>
<li>the <strong>web frontend</strong> generated and maintained by Spring Roo, and</li>
<li>the <strong>layout and CSS styles</strong> for the customer to see.</li>
<li>We then included the <strong>existing databases</strong> with <a href="http://www.springsource.org/roo/guide?w=base-dbre">DBRE</a> (Spring Roo Database Reverse Engineering), </li>
<li>showed the system to the customer and <strong>added/removed entities</strong> and attributes as needed, and</li>
<li>finally added <strong>security</strong>.</li>
</ul>
<h3>Adding and Removing Attributes and Entities</h3>
<p>There are different ways to add and remove entities in the existing project. The first would be using the Roo shell to add and remove fields of an object. This is the most obvious way. But another, much better way is changing the classes directly, i.e. in the Java code.</p>
<p>Lets take a look at a domain object, generated by Spring Roo. The Address entity looks pretty clean and straight forward. The getters and setters are managed by Roo, but you can easily overwrite and modify them as you wish.</p>
<p>So lets add an field to our address. We noticed that the country is missing, so lets add this! I just added the following code: </p>
<pre class="brush: java;">
    @NotNull
    @Size(min = 2)
    private String country;
</pre>
<p>As soon as I save the domain object, Roo updates the project and the new field appears in the web application. Here&#8217;s an excerpt from the Roo shell output:</p>
<pre class="brush: plain; wrap-lines: false;">
Updated SRC_MAIN_WEBAPP/WEB-INF/views/addresses/list.jspx
Updated SRC_MAIN_WEBAPP/WEB-INF/views/addresses/show.jspx
Updated SRC_MAIN_WEBAPP/WEB-INF/views/addresses/create.jspx
Updated SRC_MAIN_WEBAPP/WEB-INF/views/addresses/update.jspx
Updated SRC_MAIN_WEBAPP/WEB-INF/i18n/application.properties
Updated SRC_MAIN_JAVA/de/mgm/roo/sample/domain/Address_Roo_ToString.aj
Updated SRC_TEST_JAVA/de/mgm/roo/sample/domain/AddressDataOnDemand_Roo_DataOnDemand.aj
Updated SRC_MAIN_JAVA/de/mgm/roo/sample/domain/Address_Roo_JavaBean.aj
Updated SRC_MAIN_JAVA/de/mgm/roo/sample/web/ApplicationConversionServiceFactoryBean_Roo_ConversionService.aj
</pre>
<p>You can keep Roo running when you modifying a class outside of Roo &ndash; it should detect changes, or it will do it on next restart.</p>
<a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/fields-in-web-application.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/fields-in-web-application-480x328.png" alt="" title="Fields in Web Application" width="480" height="328" class="size-large wp-image-921" /></a>
<p>You can use the same way to create and delete entities &ndash; just create or delete the classes as you need. <strong>Tip: </strong>A good practice is changing the entities and attributes, but all other stuff should be changed through the Roo shell, like adding and removing security etc.</p>
<h3>Removing Spring Roo from Project</h3>
<p>After this early development phase, we decided to remove Spring Roo, resulting in a regular Spring based web application, and continued with &#8220;normal&#8221; Java Development. This way is intended by the Spring Roo authors; there is a whole chapter of <a href="http://www.springsource.org/roo/guide?w=removing">removing Spring Roo</a> from a project.</p>
<p>From this point on, we refactored the entity layer (just to get more structure in the packaging etc) and got a quite nice Java EE App as a working base.</p>
<h2>Spring vs. Grails vs. Ruby on Rails</h2>
<p>Here I want to explain shortly why we chose Roo over Grails and Ruby on Rails. The quickest answer would be, because there was a requirement to develop in Java. But I want the show a little bit about the advantages and disadvantages of this three comparable systems.</p>
<p><a href="http://rubyonrails.org/">Ruby on Rails</a> is a very powerful platform with a large community, but you would have to learn Ruby and you would have to understand Rails as a platform. You couldn&#8217;t use existing modules (in Java) easily.</p>
<p><a href="http://www.grails.org/">Grails</a> has also a very good community, uses standard Java technologies like Hibernate and runs on the JVM. Here, the only effort would be learning the Groovy language. But on the downside, Groovy is always a little bit slower than native Java. Here the main aspect against Groovy was again the language.</p>
<p>Spring Roo has a very small community, but help is available. The advantages here were that Spring Roo creates &#8220;regular&#8221; Spring Projects as soon as you remove Spring Roo itself.</p>
<p>Another interesting option is <a href="http://appfuse.org">AppFuse</a>, which focusses more on the <em>initial</em> scaffold of a new project (a bit like Maven&#8217;s archetypes). We had no chance to investigate it, but consider the <a ref="http://stackoverflow.com/questions/1536268/appfuse-vs-roo-what-would-you-use">Discussion &#8220;Appfuse vs Roo &#8211; what would you use&#8221;</a>.</p>
<h2>Conclusion</h2>
<p>Spring Roo is definitively useable as a tool for Rapid Prototyping, i.e. to quickly and easily create a first usable application. But you should face the fact that the current version 1.1.2 still has a lot of missing features (like composite primary keys) and bugs, where you suddenly need a solid understanding of AspectJ and databases. This was the main reason why we decided, after the requirements phase, not to continue working with Spring Roo.</p>
<p>The real great upside was, that we got a complete application up in running in a few days with a clean codebase where we could easy start from developing all the details needed in the application. The CRUD web pages created by Spring Roo and the RESTful architecture are a great help extending the application clean and fast.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/06/rapid-prototyping-with-spring-roo/">Rapid Prototyping with Spring Roo</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=17" title="View articles by Christian Belka">Christian Belka</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/06/rapid-prototyping-with-spring-roo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting the most out of LiquiBase - Tool-supported Data Modeling and Change Management, Part 3</title>
		<link>http://blog.mgm-tp.com/2011/04/data-modeling-part3/</link>
		<comments>http://blog.mgm-tp.com/2011/04/data-modeling-part3/#comments</comments>
		<pubDate>Thu, 07 Apr 2011 14:19:39 +0000</pubDate>
		<dc:creator>Ursula Bauer</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Evolution]]></category>
		<category><![CDATA[LiquiBase]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=887</guid>
		<description><![CDATA[This last part of the series, we focus on successful database schema evolution with LiquiBase. You will learn about LiquiBase features that we find most useful in our projects. This article also gives a roundup of our best practices and tips, like how to include custom SQL and stored procedures and how to use LiquiBase [...]]]></description>
			<content:encoded><![CDATA[<p>This last part of the series, we focus on successful database schema evolution with LiquiBase. You will learn about LiquiBase features that we find most useful in our projects. This article also gives a roundup of our best practices and tips, like how to include custom SQL and stored procedures and how to use LiquiBase with different DBMS products.</p>
<p><span id="more-887"></span></p>
<p>Remember that one of the benefits of using LiquiBase is that it allows you to keep all the Changes to the database(s) in one place, the Changelog. But what happens if we are dealing with a very large and complex physical data model with frequent changes? Won&#8217;t the Changelog grow to such a size that it is completely unmanageable? This is a valid question to which the simple answer is that although LiquiBase will always have <strong>only one Master-Changelog</strong> per database, this does not mean that <em>all</em> Changesets have to be in that one Changelog file.</p>
<h2>Partitioning the Changelog</h2>
<p>In fact, it is very advisable to create smaller <strong>Sub-Changelogs</strong> and use the <code><strong>include</strong></code> script statement to link these into the Master-Changelog:</p>
<pre class="brush: xml;">
&lt;databaseChangeLog ...&gt;
   &lt;include file=&quot;r1.0/r1.0_changelog.xml&quot;/&gt;

   &lt;include file=&quot;r2.0/r1.0_changelog.xml&quot;/&gt;
   ...
&lt;/databaseChangeLog&gt;
</pre>
<p>It is then quite easy and comfortable to create a hierarchically ordered directory of different Sub-Changelogs which allow to bundle all changes that concern a certain release or concept like triggers etc. into separate Sub-Changelogs. If the Sub-Changelogs are already referenced in the Master-Changelog, you only have to update the smaller Sub-Changelogs with any new Changesets while you will always just run the MasterChangelog.</p>
<p>This mechanism allows us to organize the changes for different releases into separate folders to create an easy to understand structure which makes it quite simple to add any new Changesets at the correct space.</p>
<h2>Stored Procedures</h2>
<p>We have found it particularly helpful to keep stored procedures in separate Sub-Changelog files, since they tend to get rather big. We use a special approach to achieve this (which differs from the way other Changes are handled): for these stored procedures, we only keep a single version of the procedure in the Sub-Changelog file – instead of appending a new Changeset each time the stored procedure changes. To ensure that the changed Changesets are run again, each procedures is wrapped into a single &lt;changeSet&gt; to which we add the attribute <code><strong>runOnChange="true"</strong></code> that forces LiquiBase to check if the Changeset was changed instead of just checking if it was run once before so it can be skipped.  This exception to the rule makes it a lot easier to manage and spot changes to stored procedures and as all our Changelogs are stored and versioned in Subversion there is no danger to lose data.</p>
<div id="attachment_906" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-datamodel-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/04/Hierarchy-of-sub-changelogs.jpg"><img class="size-large wp-image-906" title="Hierarchy of sub-changelogs" src="http://blog.mgm-tp.com/wp-content/uploads/2011/04/Hierarchy-of-sub-changelogs-480x558.jpg" alt="" width="480" height="558" /></a><p class="wp-caption-text">An example how we organize our Changes into a hierarchy of Sub-Changelogs.</p></div>
<h2>Using Relative Paths</h2>
<p>We consider it a good practice to use relative paths to ensure consistency and to avoid problems caused by mistakenly reapplied Changesets. This is especially important if a structure of Master and Sub-Changelogs and/or a lot of <code>include</code> statements are used.</p>
<p>A short recap: The &#8220;DATABASECHANGELOG&#8221; table records each Changeset that has successfully applied by author and id tag and by filename and uses this information to check if a given Changeset has already been applied.</p>
<p>If absolute paths are used they will be recorded as part of the filename and should (which is quite likely) the next application of the Master Changelog be triggered from a different path (e.g. a different developer&#8217;s machine, as part of a build process, etc.) the absolute path will no longer be the same and LiquiBase will not recognize the Changeset.</p>
<p>To avoid running into this kind of problems, some of my colleagues force LiquiBase to record the relative path names in the &#8220;DATABASECHANGELOG&#8221; table by adding the Master Changelog directory to the CLASSPATH so that Liquibase is launched to execute the Master Changelog only by filename (without path). All the <code>include</code> statements are configured to also use only paths relative to the Master Changelog to ensure that no absolute paths are written to the &#8220;DATABASECHANGELOG&#8221; table. This has so far allowed us to avoid these problems.</p>
<p>For another discussion covering quite broadly why and how to use relative paths with LiquiBase, see the posting <a href="http://forum.liquibase.org/topic/why-does-the-change-log-contain-the-file-name">&#8220;Why does the change log contain the file name?&#8221;</a> in the LiquiBase forum.</p>
<h2>Custom SQL — Blessing and curse</h2>
<p>In the former Installments we mentioned that PowerArchitect does not offer to reverse or forward engineer concepts like check constraints, triggers or stored procedures but you may have noticed that we just explained how we use separate Changelogs for stored procedures. So it is clear that there is another way to create Changesets for these. The concept is called Custom SQL and it enables us to add any SQL statement as a valid Changeset to LiquiBase. We just enclose the respective statement with the tags:</p>
<pre class="brush: xml;">
&lt;changeSet author='xxxx' id='x'&gt;
  &lt;sql &gt;
    ADD YOUR SQL STATEMENT HERE
  &lt;/sql&gt;
&lt;/changeSet&gt;
</pre>
<p>If it is a stored Procedure the tags will be:</p>
<pre class="brush: xml;">
&lt;changeSet author='xxxx' id='x' runOnChange=&quot;true&quot;&gt;
  &lt;sql &gt;
    ADD YOUR SQL STATEMENT HERE
  &lt;/sql&gt;
&lt;/changeSet&gt;
</pre>
<p>Sometimes it is helpful to add rollback information after the <code>&lt;/sql&gt;</code> tag using the <code>&lt;rollback&gt;</code> tag, to avoid problems should the custom SQL somehow fail.</p>
<p>Some advanced concepts are dependent on the DBMS (as they use SQL dialects that slightly vary). If it is obvious that the Changelog will have to be applied to different flavors of SQL we use the Include mechanism and create separate Folders to store all the custom sql Changesets for a given flavor of SQL (depends on DBMS). We might for example have one subdirectory for Oracle Custom SQL and one for PostgreSQL.</p>
<p>Using Preconditions in the Master-Changelog we can then control which set of Sub-Changelogs will be run for a given DBMS. Once this is implemented correctly in the Master-Changelog you can safely update your database specific Changelogs in the respective folders.</p>
<h2>Applying Changes</h2>
<p>One of the major selling points to use LiquiBase for us is that it makes applying Data model Changes very quick, easy and safe. LiquiBase can be set up to be run using Ant, Maven, Spring etc. or by using the Commandline option. As we currently use Maven we can thus integrate LiquiBase into our build management to deal with database changes. So applying LiquiBase is just a click or two.</p>
<p>LiquiBase will keep track of which changes have already been applied (though you can force it to reapply all Changes by using the attributes &#8220;runAlways&#8221; or &#8220;runOnChange&#8221;). It does this by means of an internal table &#8220;DatabaseChangeHistory&#8221;, which registers every applied Changeset by filename, id and author tags. When applying a Changelog to a database, the LiquiBase process scans the &#8220;DatabaseChangeHistory&#8221; table and skips all Changesets that have already been applied. Once a Changeset has been applied the filename author and id will be logged in the table. This is a major time saver in the long run.</p>
<p>But what if a customer insists on getting a SQL file that his DBA can apply? Though we do not recommend it because it will be potentially less safe, more time consuming and give less control we can use LiquiBase for the creation of traditional SQL scripts as well. And we can easily switch to LiquiBase should the customer at a later stage decide that he wants it.</p>
<h2>Special DBMS specific tips and warnings</h2>
<p>Let me finish with a few experiences we have had using LiquiBase with various database management systems.</p>
<p>I will start with <strong>MySQL</strong>, the popular free DBMS, as it has some quirks that set it apart from most other DBMS.</p>
<p>In MySQL it is necessary to select a storage engine like <strong>InnoDB</strong> when creating new tables. This poses a problem in LiquiBase that we currently solve by a workaround suggested by LiquiBase. We put the create table statement into a separate Changeset which we then adapt using the tag <code><strong>&lt;modifySql&gt;</strong></code>.</p>
<p>For example:</p>
<pre class="brush: xml;">
&lt;changeSet id=&quot;55&quot; author=&quot;xxx&quot;&gt;
  &lt;createTable tableName=&quot;person&quot;&gt;
    &lt;column name=&quot;id&quot; type=&quot;bigint&quot;/&gt;
    &lt;column name=&quot;firstname&quot; type=&quot;varchar(255)&quot;/&gt;
    &lt;column name=&quot;lastname&quot; type=&quot;varchar(255)&quot;/&gt;
  &lt;/createTable&gt;
  &lt;modifySql&gt;
     &lt;replace replace=&quot;bigint&quot; with=&quot;long&quot;/&gt;
  &lt;/modifySql&gt;
  &lt;modifySql dbms=&quot;mysql&quot;&gt;
     &lt;append value=&quot; engine innodb&quot;/&gt;
  &lt;/modifySql&gt;
&lt;/changeSet&gt;
</pre>
<p>This is not perfect but it works.</p>
<p>Another MySQL specific quirk is that opposed to most other DBMS <strong>MySQL</strong> expects the definition of <strong>Constraints to be part of the column definition</strong> like:</p>
<pre class="brush: xml;">
&lt;column name=&quot;ID&quot; type=&quot;BIGINT&quot;
        remarks=&quot;Primary key&quot; autoIncrement=&quot;true&quot;&gt;
  &lt;constraints nullable=&quot;false&quot; primaryKey=&quot;true&quot;
               primaryKeyName=&quot;KATEGORIE_pk&quot; /&gt;
&lt;/column&gt;
</pre>
<p>Instead of adding the key afterwards in a statement as it is generated by the PowerArchitect export:</p>
<pre class="brush: xml;">
&lt;addPrimaryKey tableName=&quot;KATEGORIE&quot;
               constraintName=&quot;KATEGORIE_pk&quot;
               columnNames=&quot;ID&quot;/&gt;
</pre>
<p>To correct this manually is quite tedious but we have found it is possible to use an <strong>XSLT</strong> transformation to deal with this problem which takes much of the pain out of this.</p>
<p>Another inconvenience we have encountered with MySQL is that it enforces a limit of a maximum of 64 characters on remarks on tables or columns. Currently there is no workaround for this. While this is not a problem introduced by either Power Architect or LiquiBase, my colleague Thomas Kellerer has delivered a solution that can be run with the &#8220;Criticize Model&#8221; context menu item in Power Architect to check if the comment is compatible with MySQL (See image below). Maybe not a big deal for some but still a little handy solution.</p>
<div id="attachment_895" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-datamodel-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/04/Criticize-Model-Option-in-Power-Architect.png"><img class="size-large wp-image-895" title="Criticize Model Option in Power Architect" src="http://blog.mgm-tp.com/wp-content/uploads/2011/04/Criticize-Model-Option-in-Power-Architect-480x325.png" alt="" width="480" height="325" /></a><p class="wp-caption-text">Use the &#39;Criticize Mode&#39; context menu item in Power Architect to check if the comment is compatible with MySQL.</p></div>
<p>On <strong>Oracle</strong> we have discovered that we can run into a situation were we have to use preconditions to ensure the correct username and this database is used when dealing with multiple databases as in Oracle DB user and schema are the same thing. We use Changeset modifications like the following to ensure that we do not run into problems </p>
<pre class="brush: xml;">
&lt;changeSet author=&quot;xxx&quot; id=&quot;5&quot;&gt;
  &lt;preConditions onFail=&quot;MARK_RAN&quot;&gt;
    &lt;or&gt;
      &lt;runningAs username=&quot;SHOP&quot; /&gt;              &lt;!-- PROD   --&gt;
      &lt;runningAs username=&quot;SHOPBUGFIXONLINE&quot; /&gt;  &lt;!-- BUGFIX --&gt;
      &lt;runningAs username=&quot;SHOPTESTONLINE&quot; /&gt;    &lt;!-- TEST   --&gt;
    &lt;/or&gt;
  &lt;/preConditions&gt;
</pre>
<p>We also use Liquibase with <strong>PostgreSQL</strong> but have not yet ran into any major problems there.</p>
<h2>Managing Data with LiquiBase?</h2>
<p>A question that keeps coming up is: <em>&#8220;Can I manage my (test) data with LiquiBase as well?&#8221;</em></p>
<p>The answer is yes and no. While it is often no problem to use LiquiBase to deal with small amounts of relatively static test data (e.g. to fill a smaller test database for the first time) using LiquiBase to maintain large amounts of quick changing data is often not so convenient. We are currently still investigating the best ways to deal with this problem and we will keep you informed about the outcome in another blog post.</p>
<h2>Conclusion</h2>
<p>Now that we have shared our experiences with you let us conclude to sum it up to answer the question: Why do we do it? Why do we use tools instead of just writing SQL scripts from logical data models?</p>
<p>One answer is that we believe and our experiences support it that well documented data models save us time and money. Many applications evolve over time and our tool-chain supports the adaptation of the physical data model and it simplifies the application of the changes to databases. It also helps us in the test and maintenance phases to create databases in different stages and selective rollbacks of many changes with great precision and little effort. And last but not least it is much easier to test a given data model on different DBMS should a customer want to either change his DBMS or insist on an unsuitable one – nothing is more convincing than a demonstration.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/04/data-modeling-part3/">Getting the most out of LiquiBase - Tool-supported Data Modeling and Change Management, Part 3</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=13" title="View articles by Ursula Bauer">Ursula Bauer</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/04/data-modeling-part3/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<series:name><![CDATA[Tool-supported Data Modeling and Change Management]]></series:name>
	</item>
		<item>
		<title>Non-Standard Ways of using Lucene</title>
		<link>http://blog.mgm-tp.com/2011/03/non-standard-ways-of-using-lucene/</link>
		<comments>http://blog.mgm-tp.com/2011/03/non-standard-ways-of-using-lucene/#comments</comments>
		<pubDate>Tue, 01 Mar 2011 13:14:41 +0000</pubDate>
		<dc:creator>Michal Perlik</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Hibernate]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=865</guid>
		<description><![CDATA[For our recent online shop project, we required a full-text, multi-criteria product search. Lucene, the popular Java search engine, is an ideal candidate for this functionality. But in order to meet the high performance requirement, we had to extend its usage beyond standard full-text search. This posting describes our solution including index switching and using [...]]]></description>
			<content:encoded><![CDATA[<p>For our recent online shop project, we required a full-text, multi-criteria product search. Lucene, the popular Java search engine, is an ideal candidate for this functionality. But in order to meet the high performance requirement, we had to extend its usage beyond standard full-text search. This posting describes our solution including index switching and using Lucene as a simple NoSQL database.</p>
<p><span id="more-865"></span></p>
<p>Searching is a frequent activity on the web and one of the most important features of every online shop. With a powerful searching facility, customers can easily specify what they want and then find a product they look for. This includes all kinds of standard catalog search (i.e. category) and full-text search.</p>
<p>Traditional web shop solutions implement a catalog search with relational database queries. Full-text search tasks can then be implemented either by native features of the databases, or by using external indexing engines. <a href="http://www.lucene.net">Lucene</a> is an open-source Java indexing engine library. It is used by a huge number of web sites and applications. Internally, Lucene stores data in a flat storage structure, where each record consists of several fields as key/value pairs. In Lucene terms, such a record is called a <em>Document</em>.</p>
<p>We have had very good experience with Lucene in previous projects and decided to extend its usage beyond standard full-text search. Here are a couple of non-standard usages of Lucene that we&#8217;ve used in our projects and that you might find interesting too.</p>
<h2>Criteria search with Lucene </h2>
<p>Our recent shop project <a href="http://www.kickz.com">www.kickz.com</a> had high performance requirements for a multi-criteria search. Specifically, we required &lt;100 ms for &gt;200 concurrent queries on average. The application itself is written in Java and can be clustered easily.</p>
<p>We are using <a href="http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/store/RAMDirectory.html">RAMDirectory</a> to load an index entirely into RAM (see also <a href="#RAMDirectory">later on in the blog</a>). This is especially convenient since we have a pre-built index living on disk and can easily slurp the whole thing into RAM for faster searching. In such a setup, a search index is only limited by the amount of heap memory available to the JVM process. A mid-sized shop contains up to hundred thousands of products what, which should not reach usual memory limits. In our case, the index size is about 15 MByte per language, summing up to 60 MByte in total. However, if the number of indexed items is very high (resulting in big indexes) or a distributed search is needed then <a href="http://lucene.apache.org/solr">Solr</a> could be considered as an alternative. See also our <a href="/series/scalable-log-data-management-with-hadoop/">blog series about using Solr/Lucene with Hadoop</a>.</p>
<h2>Mapping Relational Data to the Lucene Index</h2>
<p>Our first challenge was to represent the relations in the Lucene index, i.e. searchable data that comes from embedded or associated entities (*-to-many associations). For example, when searching products, one might restrict the search on specific categories, such as shoes. In relational databases, a SQL query on an entity and its associations can easily join several tables by primary/foreign keys. Such a join cannot be done with Lucene&#8217;s document model in a straight-forward way.</p>
<p>However, there is a workaround at the cost of storing <strong>redundant data</strong>: a document might contain more fields with the same key, i.e. key/value = &#8220;category/shoes&#8221; and &#8220;category/basketball-shoes&#8221;. This can be used for *-to-many associations where joins are represented by its business key (i.e. unique category key from the ERP system) or primary key from the database. Then it is necessary to collect all required data during index creation and build index documents from them. In our case, this means the transformation of the domain model (left side) into Lucene&#8217;s flat document structure (right side):</p>
<div id="attachment_867" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-lucene-in-shop" href="http://blog.mgm-tp.com/wp-content/uploads/2011/02/Mapping.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/02/Mapping-480x181.png" alt="" title="Mapping" width="480" height="181" class="size-large wp-image-867" /></a><p class="wp-caption-text">Extracting information from an domain entity and its associated entities into a single Lucene document.</p></div>
<p>When searching for a product with a given relation, i.e. for products of a specific category, then the search will be performed for all documents that contain the relation&#8217;s key with a given relation value, i.e. documents containing the key &#8220;category&#8221; with the value &#8220;basketball-shoes&#8221;.</p>
<p>Please note, that this approach is not suitable in cases when the associated entity in a &#8220;*-to-many&#8221; relation is changed frequently. Because in this case, either the complete index or at least all the documents containing the changed data would have to be updated. This might be time consuming operation. However, for our shop engine, this concern is not relevant, because data is changed only within the underlying ERP system, and each publication triggers a complete rebuild of the index (see <a href="#index-switching">below</a>). Thus, the approach can be used without any limitations.</p>
<p><em>A sidenote</em>: If you&#8217;re using JPA with Hibernate, you might want to have a look at <a href="http://www.hibernate.org/subprojects/search.html">Hibernate Search</a>. It uses special annotations and automates the indexing and extracting of data from the entities, and even supports the indexing of <a href="http://docs.jboss.org/hibernate/stable/search/reference/en-US/html/search-mapping.html#search-mapping-associated">embedded and associated JPA entities</a>. For one-to-many and many-to-many relations, it uses the same &#8220;trick&#8221; as mentioned above.</p>
<h2>Lucene as a NoSQL Database</h2>
<p>Our first versions of the search returned only IDs of database records that met the given condition. The records were then loaded from the database. This approach, combined with a properly configured Hibernate second-level query cache, perfectly fits to most usages.</p>
<p>However, the performance can still be improved. For example, the search result page does not display all product information &mdash; only brand, product name, price is typically displayed, but the detailed product description is not, see <a href="http://www.kickz.com/en/shoes/basketball-function">example</a>. Lucene allows to store &#8220;result values&#8221; in the index, which are not processed, i.e. kept untokenized, which can be retrieved for displaying the result. For example like this:</p>
<pre class="brush: java;">
Field field = new Field(&quot;brand&quot;, brand.getName(),
                        Field.Store.YES,
                        Field.Index.UN_TOKENIZED);
document.add(field);
</pre>
<p>And after performing the search, the value can be easily retrieved from the query result:</p>
<pre class="brush: java;">
String brand = document.get(&quot;brand&quot;);
</pre>
<p>Like brand, we also store product name and price in the index, so that all necessary data to build the search result page can be retrieved from Lucene alone. Specifically, there is no need afterwards to make an extra database query to load them. A similar effect can be achieved by a second level cache containing all products.</p>
<h2 id="index-switching">Concurrent Rebuilding and Searching with Index Switching</h2>
<p>The index creation is running in an external process that is triggered after  data from the <a href="http://en.wikipedia.org/wiki/Enterprise_resource_planning">ERP system</a> has been imported into the SQL database. This is typically done once a day, or at maximum several times a day. The imported data is read-only in the SQL databse. Once the import is finshed, all available products are indexed during this process and thus, the <strong>index is always completely rebuilt</strong>.</p>
<p>Unfortunately, Lucene has a restriction in that a index cannot be updated while it is open for reading by another process. But this issue can be easily solved by <strong>working with two indexes: one for searching, the other for updating</strong>. While one index is used by the shop for searching, the other might be recreated in the background. If the new index is ready, the clustered shop application is notified via JMS and opens the index that has just been recreated. The original search index stays untouched and is thus available for the next import and index re-creation.</p>
<div id="attachment_868" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-lucene-in-shop" href="http://blog.mgm-tp.com/wp-content/uploads/2011/02/index-switch.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/02/index-switch-480x233.png" alt="" title="index-switch" width="480" height="233" class="size-large wp-image-868" /></a><p class="wp-caption-text">The principle of Index Switching allow concurrent rebuilding and searching of different JVM processes.</p></div>
<h2 id="RAMDirectory">Searching in RAM</h2>
<p>The index is normally being stored in the filesystem and performs all operations there. If there is enough JVM heap space, then the index file can be loaded into memory using <a href="http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/store/RAMDirectory.html">RAMDirectory</a> and all search operations then do not require any disk access. This will <a href="http://stackoverflow.com/questions/1582377/need-to-know-pros-and-cons-of-using-ramdirectory">increase the performance</a>, if the file system does not keep recently accessed files in shared memory:</p>
<pre class="brush: java;">
if (ramSearch) {
	searcher = new IndexSearcher(new RAMDirectory(directory));
} else {
    searcher = new IndexSearcher(directory);
</pre>
<p>Other <a href="http://my.safaribooksonline.com/book/web-development/seo/9781933988177/further-lucene-extensions/ch09lev1sec4">high-performance options are discussed</a> in chapter 9.4 of the latest edition of the <em>Lucene in Action</em> book.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/03/non-standard-ways-of-using-lucene/">Non-Standard Ways of using Lucene</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=18" title="View articles by Michal Perlik">Michal Perlik</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/03/non-standard-ways-of-using-lucene/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Why Functional Tests don’t belong in a Build Environment - A Practitioner's Guide to Successful Software Testing, Part 2</title>
		<link>http://blog.mgm-tp.com/2011/02/successful-software-testing-part2/</link>
		<comments>http://blog.mgm-tp.com/2011/02/successful-software-testing-part2/#comments</comments>
		<pubDate>Tue, 08 Feb 2011 10:15:55 +0000</pubDate>
		<dc:creator>Dr. Martin Varendorff</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Build]]></category>
		<category><![CDATA[Hudson]]></category>
		<category><![CDATA[Jenkins]]></category>
		<category><![CDATA[Maven]]></category>
		<category><![CDATA[QA]]></category>
		<category><![CDATA[Testing]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=857</guid>
		<description><![CDATA[The previous part discussed why a unit test for a class should be written by the developer of that class, and why a functional test should be created by an independent tester. This posting argues that functional tests should not be part of the build process of the product, but instead should be developed and [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://blog.mgm-tp.com/2010/11/successful-software-testing-part1/">previous part</a> discussed why a unit test for a class should be written by the developer of that class, and why a functional test should be created by an independent tester. This posting argues that functional tests should not be part of the build process of the product, but instead should be developed and executed separately. For this, I give guidelines for setting up an independent validation system.</p>
<p><span id="more-857"></span></p>
<p>Unit tests are written by the developer simultaneously with the production code. If the API or the <em>functionality of a class</em> in the production code changes, then the corresponding unit test class has to be adapted accordingly, otherwise the build executing the unit test will fail. This is no principal problem, since both should be changed by the same developer simultaneously and should be committed only after they are consistent. </p>
<p>How is the situation with functional tests? A functional test with its test data is coupled to the business requirement and the part of the product fulfilling this business requirement. So if the <em>product</em> changes, then the functional tests must change as well.</p>
<p>If the functional tests run in the same build as the product, every change to the product will cause the build process to fail, unless the developer and tester change their code simultaneously. This would require a strong synchronization between developer and tester, which is usually not feasible.</p>
<p>The result is that functional tests almost always fail if they are integrated into the build environment. This calls at least for different builds for the product (code plus unit tests) and the functional tests. But I will go further: <em>I recommend to not run the functional tests in the build environment at all</em>! Because the build environment not only does not facilitate anything, but on the contrary adds a lot of extra complexity to the test environment.</p>
<p>The following diagram compares the two approaches of running functional tests, i.e. in a build environment and in a separate test environment against an independent validation system (or staging system). </p>
<div id="attachment_858" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-software-testing-2" href="http://blog.mgm-tp.com/wp-content/uploads/2011/02/Comparison-of-Approaches.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/02/Comparison-of-Approaches-480x344.png" alt="" title="Comparison of Approaches" width="480" height="344" class="size-large wp-image-858" /></a><p class="wp-caption-text">System setup for unit tests within a build system and functional tests against a validation system. <em>On the left side</em>: Unit tests within a build system are triggered by the build system (Jenkins/Hudson, Luntbuild, Ant, Maven etc.), and executed locally. The application runs on the build server ideally setup by the build system and unit test framework (JUnit, TestNG) before the execution of the unit tests. <em>On the right side</em>: Functional Tests are executed by test drivers for functional tests on dedicated test clients against a validation system, which is set up and configured like the live system.</p></div>
<h2>How to run GUI-/Browser-based Tests</h2>
<p>If you run functional tests in the build environment, then they must be executable in a batch-run, driven by the build framework. Specifically, on the build system, the build framework, e.g. <a href="http://jenkins-ci.org/">Jenkins</a> (formerly known as Hudson), <a href="http://luntbuild.javaforge.com/">Luntbuild</a> etc., starts a build script (Maven, Ant), which starts a test framework (JUnit, TestNG), which executes the test case. Of course, this is technically possible. There are functional test tools and build environments, which have such capabilities, but they bring in extra complexity and entail extra development and significant configuration effort (often in the range of 1-2 weeks). But the restrictions are high.</p>
<p>It is very hard to develop GUI-related or browser-based functional tests in an environment without visible screen-output. How do you debug test failures, if they occur only within the build environment? How do you create keyboard and mouse events (i.e. entering text in an applet within a browser)?</p>
<p>If the functional tests are executed on <em>dedicated test clients</em> (often the tester&#8217;s PC), then any test and scripting software available for the desktop OS of the tester&#8217;s PC can be used. What&#8217;s more, the test execution can be easily observed (GUI-Tests) and debugged by the tester, so the test development has less obstacles in its path.</p>
<h2>Merits of an Independent Validation System</h2>
<p>Another important point is the test of the configuration of the system. If the system under test runs within the build framework, then usually the configuration of the system deviates strongly from the configuration of the live system. An <em>independent validation system</em> can be set up and configured much closer to the live system including hardware, installed software and components and configuration of all parts. Thus, a functional test against the validation system checks a large part of the system configuration as well and finds system specific errors and failures in the deployment of the system.</p>
<p>A test on the validation system almost satisfies the well-known software development paradigm: <em>The work is only done, if the requirement is fulfilled on the live system</em>. Manual reproduction and repetition of failed tests is also easier, if the test is executed from the same client against the same system under test (validation system) with the same product version, the same configuration and the same test data.</p>
<p>If the functionality of the product is accessed by an API instead of a GUI and if the product or parts of the product are already very stable, then it can be useful to put all stable functional tests as regression tests into the build environment.</p>
<h2>Stable Builds, or when to Deploy on the Validation System</h2>
<p>If a lot of effort is invested in builds with high quality standards, like high coverage of unit tests or code reviews before a commit, then the builds are usually stable enough to be deployed on the validation system as a basis for the execution of functional tests. If however the builds are not very stable, then there should be an additional pre-validation system, which is used to <em>stabilize a build</em>. As soon as a build is deployed on the pre-validation or validation system, functional tests are executed against it.</p>
<p>In order to adapt existing functional tests and to prepare new ones, the testers must be aware of all functional changes and additions that are to be expected in the new build. Depending on the team size, this information exchange can be done in small teams in ad-hoc meetings with the developers or &mdash; in larger teams &mdash; with the support of detailed feature lists maintained in task management tools like <a href="http://www.atlassian.com/software/jira/">JIRA</a>.</p>
<p>For all failing functional tests, the requirements must be checked (often in collaboration with the developer), to find out whether the test or the product is working incorrectly. After the errors in the functional tests are fixed, the functional tests are repeated. The remaining failing tests should then all be caused by flaws of the product.</p>
<p>A build is considered <em>stable</em> if and only if the functional tests checking the main paths of the business functionality have passed, so that the execution of further tests is feasible and makes sense. In projects with longer development cycles (several months), stable builds should be achieved every one or two weeks and towards the end in even shorter intervals. In projects with short development cycles (several weeks), stable builds should be achieved every one or two days. 10 to 15 stable builds should be achieved per development cycle. Each stable build shows, that the development and test group is in sync and the product is stable. A close collaboration between the functional testers and the developers is essential in such a setup. </p>
<p>Summarizing the argumentation, I emphasize the following points:</p>
<ul>
<li>Functional tests are executed on dedicated test clients.</li>
<li>Functional tests are executed against a dedicated validation system.</li>
<li>Only stable builds are deployed on the validation system.</li>
<li>Stable builds should be achieved regularly and in adequate frequency throughout the development cycle.</li>
</ul>
<p>In a later blog, I will discuss test coverage, when to write and rely on unit tests and when on functional tests and how to derive test coverage. A rough guide for electing the right tools will be given as well.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/02/successful-software-testing-part2/">Why Functional Tests don&#8217;t belong in a Build Environment - A Practitioner's Guide to Successful Software Testing, Part 2</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=15" title="View articles by Dr. Martin Varendorff">Dr. Martin Varendorff</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/02/successful-software-testing-part2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<series:name><![CDATA[A Practitioner's Guide to Successful Software Testing]]></series:name>
	</item>
		<item>
		<title>Portlets with XForms in Liferay - Rethinking Web Forms with XForms, Part 3</title>
		<link>http://blog.mgm-tp.com/2011/01/rethinking-web-forms-xforms-part3/</link>
		<comments>http://blog.mgm-tp.com/2011/01/rethinking-web-forms-xforms-part3/#comments</comments>
		<pubDate>Mon, 31 Jan 2011 14:01:58 +0000</pubDate>
		<dc:creator>Christian Belka</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Liferay]]></category>
		<category><![CDATA[Orbeon]]></category>
		<category><![CDATA[Portlet]]></category>
		<category><![CDATA[Spring]]></category>
		<category><![CDATA[Web Forms]]></category>
		<category><![CDATA[XForms]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=824</guid>
		<description><![CDATA[This posting discusses how we integrated an web application with XForms as one of the main view technologies into a Liferay portal. We were using the Orbeon 3.8 XForms engine, which was introduced in the previous part of this series, and Spring Portlet MVC. Since we were facing quite a few problems and issues regarding [...]]]></description>
			<content:encoded><![CDATA[<p>This posting discusses how we integrated an web application with XForms as one of the main view technologies into a Liferay portal. We were using the Orbeon 3.8 XForms engine, which was introduced in the <a href="http://blog.mgm-tp.com/2010/09/rethinking-web-forms-xforms-part2/">previous part</a> of this series, and Spring Portlet MVC. Since we were facing quite a few problems and issues regarding Orbeon and Liferay, I go to great length discribing our working solution.</p>
<p><span id="more-824"></span></p>
<p>First of all, it is important to understand that our project uses the &#8220;separate deployment&#8221; approach of Orbeon integration. Here, Orbeon is only used as XForms renderer &mdash; neither the Orbeon Portlet nor the Formbuilder / Formrunner system was used.</p>
<p>Orbeon XForms supports <em>two</em> modes for a <a href="http://jcp.org/en/jsr/detail?id=286">JSR-286</a> portal integration:</p>
<ul>
<li>The <strong>Orbeon Portlet</strong> uses the <a href="http://www.orbeon.com/forms/orbeon-form-builder">Orbeon Form Builder</a> and <a href="http://www.orbeon.com/forms/orbeon-form-runner">Orbeon Form Runner</a>, which are Orbeon&#8217;s own XForms editing and management toolkits.  They allow you, to easily develop your own XForms and XForms screenflows with Orbeon. The catch is that you have to use the Orbeon framework and these toolkits for your whole XForms Application.</li>
<li>With the <strong>&#8220;separate deployment&#8221; mode</strong>, Orbeon is only used as a XForms rendering engine, you have to create and mange the application by yourself. This gives you the ability to create 100% custom applications. As mentioned, this is the mode we chose.</li>
</ul>
<h2>Overview of our Integration Approach</h2>
<p>Besides XForms, our existing web application was developed with <strong>Spring MVC</strong>, therefore the <a href="http://static.springsource.org/spring/docs/3.0.x/reference/portlet.html">Spring Portlet MVC</a> technology was used for the portlet implementation. The Spring application loads the XForms from a repository corresponding to the required business case. The specific architecture as well as Spring and Spring Portlet MVC were project technology decisions. They are <em>not needed</em> to integrate Orbeon with a portal server.</p>
<p>Orbeon is deployed as a normal web application besides Liferay, but is stripped down (Form Runner/Builder and samples removed). The JavaEE server is <a href="http://jboss.org/">JBoss</a> 5.1 with <a href="http://www.liferay.com/">Liferay</a> and Orbeon 3.8 installed on it, and configured with 1,5 GB RAM for the JVM.</p>
<p>Please note that Orbeon, as of October 2010, proclaims that the 3.8 stable version has <a href="http://wiki.orbeon.com/forms/doc/developer-guide/admin/deployment-portlet#TOC-Status-of-Portlet-support-in-Orbeon">various issues with the Portlet mode</a>, so a <strong>Orbeon 3.8 nightly build</strong> from end of October 2010 was used. We also chose the stable release of Liferay 6, since the nightly build of Orbeon 3.8 is currently tested with this version only.</p>
<div id="attachment_830" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-xforms-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/01/Integration-and-Architecture.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/01/Integration-and-Architecture-480x265.png" alt="" title="Integration and Architecture" width="480" height="265" class="size-large wp-image-830" /></a><p class="wp-caption-text">Integration of the different components and the overall architecture.</p></div>
<p>The Spring Portlet loads the XForms through some services invoked from the controller and uses an own XForms View to send the XForms directly to Orbeon through the portlet filter for further rendering. Other Spring MVC views like the JSTL view (a Spring view to display normal JSPs) are not affected, and can be used normally. The edit mode of this portlet is for example written as normal JSP Page.</p>
<h2>The XForms XHTML Portlet Page</h2>
<p>Spring MVC uses views to render models which are filled and created by controllers. (Ok, you might have guessed that;). There are many different included views for displaying Velocity templates, XML and JSPs. The &#8220;view resolvers&#8221; decide which View should be used, to render the requested page. So an own Spring ViewResolver was written, which reacts on our Spring convention that all View requests should start with &#8220;XForms::&#8221;. All other View requests are ignored by this resolver, so that the other resolvers can fetch it. As mentioned before, Spring is not needed in order to use XForms in Portlets; it was a general project requirement in our case. </p>
<p>Against your expectation, and not common when using Portlet page snippets, the XForms portlet page must be a <a href="http://orbeon-forms-ops-users.24843.n4.nabble.com/Missing-portlet-filter-class-Inaccurate-Wiki-page-td2720997.html#a3027420">complete XHTML page</a> including all HTML Tags as shown below:</p>
<pre class="brush: xml; wrap-lines: false;">
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;
  xmlns:ev=&quot;http://www.w3.org/2001/xml-events&quot;
  xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
  xmlns:xhtml=&quot;http://www.w3.org/1999/xhtml&quot;
  xmlns:xf=&quot;http://www.w3.org/2002/xforms&quot;
  xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
  xmlns:n1=&quot;http://www.w3.org/2004/11/xmlmime&quot;
  xmlns:xxf=&quot;http://orbeon.org/oxf/xml/xforms&quot;
  xmlns:f=&quot;http://orbeon.org/oxf/xml/formatting&quot;
  xmlns:c=&quot;http://java.sun.com/jsp/jstl/core&quot;&gt;
&lt;head&gt;
  &lt;xf:model&gt;
    &lt;xf:instance id=&quot;vu&quot;&gt;
      &lt;repo:vu xmlns:repo=&quot;repos-ns&quot;
               xmlns:partner=&quot;partner-ns&quot; xmlns:allg=&quot;allg-bs&quot;&gt;
        &lt;repo:Versicherungsunternehmen&gt;
          &lt;partner:Name&gt;Versicherungsunternehmen&lt;/partner:Name&gt;
          &lt;partner:Anschrift&gt;
            &lt;allg:Strasse&gt;&lt;/allg:Strasse&gt;
            &lt;allg:Hausnummer&gt;543&lt;/allg:Hausnummer&gt;
            &lt;allg:Postleitzahl&gt;32430&lt;/allg:Postleitzahl&gt;
            &lt;allg:Ort&gt;Stadt&lt;/allg:Ort&gt;
            &lt;allg:Land&gt;D&lt;/allg:Land&gt;
        [...]
      &lt;/repo:vu&gt;
    &lt;/xf:instance&gt;
  &lt;/xf:model&gt;
&lt;/head&gt;
&lt;body&gt;
  &lt;h2&gt;Adressdaten&lt;/h2&gt;
  &lt;xf:group ref=&quot;instance('vu')/repo:Versicherungsunternehmen&quot;&gt;
    &lt;xf:label&gt;Firma&lt;/xf:label&gt;
    &lt;xf:input ref=&quot;partner:Name&quot;/&gt; &lt;br/&gt;

    &lt;xf:group ref=&quot;partner:Anschrift&quot;&gt;
    &lt;xf:input ref=&quot;allg:Strasse&quot;/&gt;
    [...]
  &lt;/xf:group&gt;
  [...]
&lt;/body&gt;
&lt;/html&gt;
</pre>
<div id="attachment_826" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-xforms-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/01/Screenshot-of-XForms-in-Portal.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/01/Screenshot-of-XForms-in-Portal-480x323.png" alt="" title="Screenshot of XForms in Portal" width="480" height="323" class="size-large wp-image-826" /></a><p class="wp-caption-text">Screenshot of the XForms web application running in a Liferay portal. The form on the right side is rendered by the XForms portlet. It contains a XForms page with different validation rules (seen by the red exclamation marks).</p></div>
<h2>A Liferay Bug and a Workaround for rendering XForms</h2>
<p>Usually, Spring views act like a normal &#8220;view&#8221; Servlet or JSP, i.e. you can write to the output stream or forward to a JSP to render your model. However, there seems to be a <a href="http://www.liferay.com/community/forums/-/message_boards/message/5682233">problem with the portlet filtering mode in Liferay 6</a>, so JSPs cannot be used as view technology. The filter only works for direct writing to the response stream, not when a redirection to a JSP happens.</p>
<p>The first work-around was to create the complete XForms pages in the Spring view and to send it out to the response via some Output Writer. But while debugging the bug described above, we discovered the request attribute &#8220;oxf.xforms.renderer.document&#8221;. Orbeon first checks this attribute in its filter initialization, and, it&#8217;s set, uses the content of the attribute to render the page. </p>
<p>In our case, the complete page was filled via a String into this attribute, which seems to be quite a bit faster then the normal response stream filter:</p>
<pre class="brush: java;">
request.setAttribute(&quot;oxf.xforms.renderer.document&quot;, result);
</pre>
<h2>Receiving POST Data from the Browser</h2>
<p>For receiving POST Data, no further configuration needed to be done. The binding URLs get rewritten automatically by the portlet filter. The following binding is sufficient to address the portlet correctly:</p>
<pre class="brush: xml;">
&lt;xforms:submission id=&quot;gui-submission&quot; method=&quot;post&quot; ref=&quot;instance('getOffer')&quot; resource=&quot;/&quot; replace=&quot;all&quot; /&gt;
</pre>
<p>With this binding, the submission resource URL (&#8220;/&#8221;) gets rewritten to the correct portlet action URL and the XForms model gets converted through the Orbeon filter. This filter is specified in the portlet configuration file &#8220;portlet.xml&#8221; as shown below:</p>
<pre class="brush: xml;">
&lt;filter&gt;
  &lt;filter-name&gt;orbeon-forms-portlet-xforms-filter&lt;/filter-name&gt;
  &lt;filter-class&gt;org.orbeon.oxf.portlet.OrbeonPortletXFormsFilter&lt;/filter-class&gt;
  &lt;lifecycle&gt;RENDER_PHASE&lt;/lifecycle&gt;
  &lt;lifecycle&gt;ACTION_PHASE&lt;/lifecycle&gt;
  &lt;lifecycle&gt;RESOURCE_PHASE&lt;/lifecycle&gt;
  &lt;init-param&gt;
    &lt;name&gt;oxf.xforms.renderer.context&lt;/name&gt;
    &lt;value&gt;/orbeon&lt;/value&gt;
  &lt;/init-param&gt;
&lt;/filter&gt;

&lt;filter-mapping&gt;
  &lt;filter-name&gt;orbeon-forms-portlet-xforms-filter&lt;/filter-name&gt;
  &lt;portlet-name&gt;xFormPortlet&lt;/portlet-name&gt;
&lt;/filter-mapping&gt;
</pre>
<p>For this to work, the portlet filter JAR &#8220;Orbeon-xforms-filter.jar&#8221; must be included within the Portlet WAR archive. For greater detail, see the Orbeon Wiki on <a href="http://wiki.orbeon.com/forms/doc/developer-guide/xforms-separate-deployment-with-portlets">XForms &#8211; Separate Deployment with Portlets</a>.</p>
<h2>Conclusion</h2>
<p>We have discussed our quick prototype to test whether portal integration could be achieved easily or not. The general impression was that this solution based on the &#8220;separate deployment&#8221; mode works quite nice. And it seemed faster than the operating mode &#8220;Orbeon Portlet&#8221;. This might be because, in the portal environment, the portal takes up most of the loading time. A quick <a href="http://jakarta.apache.org/jmeter/" title="JMeter">JMeter</a> test revealed an increase of 50% load time. </p>
<p>The rewriting of URLs to the portlet addresses works quite well and transparent. The downsides of using Orbeon with portlets using our approach are the following:</p>
<ul>
<li>limitations in your application design, through the portlet filter bug (no redirection and JSPs),</li>
<li>bugs in the 3.8 release of Orbeon XForms,</li>
<li>the current release of Orbeon Portlet Integration has been tested only with Liferay 6.x.</li>
</ul>
<p>Another interesting approach using <em>remote portlets</em> (had the same idea) can be <a href="http://wiki.orbeon.com/forms/doc/developer-guide/xforms-remote-portlet-deployment">found in the Orbean Developer Wiki</a>.</p>
<p>In conclusion, if you are planning to use Orbeon XForms with your portal server, you should use Liferay and also think about getting professional support from the Orbeon Team (which requires a commercial license), because further insight into the Orbeon system is of great advantage here. Other great sources are the <a href="http://orbeon-forms-ops-users.24843.n4.nabble.com/" title="Orbeon Mailing List Archive">Orbeon mailing list</a>, the <a href="http://wiki.orbeon.com/forms/projects/xforms-improved-portlet-support">Development Overview</a>, and the <a href="http://www.liferay.com/community/forums/-/message_boards/category/4470265">Liferay Forum</a>.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/01/rethinking-web-forms-xforms-part3/">Portlets with XForms in Liferay - Rethinking Web Forms with XForms, Part 3</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=17" title="View articles by Christian Belka">Christian Belka</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/01/rethinking-web-forms-xforms-part3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<series:name><![CDATA[Rethinking Web Forms with XForms]]></series:name>
	</item>
		<item>
		<title>Producing High-Quality Test Data - Test Data Generation for Form-Centric Applications, Part 2</title>
		<link>http://blog.mgm-tp.com/2010/12/test-data-generation-part2/</link>
		<comments>http://blog.mgm-tp.com/2010/12/test-data-generation-part2/#comments</comments>
		<pubDate>Thu, 16 Dec 2010 19:48:27 +0000</pubDate>
		<dc:creator>Dr. Hans-Martin Adorf</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[QA]]></category>
		<category><![CDATA[Testing]]></category>
		<category><![CDATA[Validation]]></category>
		<category><![CDATA[Web Forms]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=801</guid>
		<description><![CDATA[This part addresses the question what makes test data valuable for functional tests. You will understand the important concept of extreme and special values, and how to obtain test data that is highly compressed and also attains a high test coverage. The article also explains our novel idea for constructing a generator for such high-quality [...]]]></description>
			<content:encoded><![CDATA[<p>This part addresses the question what makes test data valuable for functional tests. You will understand the important concept of extreme and special values, and how to obtain test data that is highly compressed and also attains a high test coverage. The article also explains our novel idea for constructing a generator for such high-quality test data.</p>
<p><span id="more-801"></span></p>
<p>As explained in <a href="/2010/10/test-data-generation-part1/">part 1 of this blog series</a>, the overall challenge consists in generating test data that comply with complex constraints imposed by cross-field validation rules. </p>
<h2>Positive Functional Tests</h2>
<p>Software applications, in order to assure and possibly improve their quality, are usually subject to dynamic functional tests, and form-centric applications are no exception to the rule. A <em>functional test</em>, as we understand it here, is an end-to-end test of an unmodified software application. The test has to reduce the risk of software errors, and to assure that the application&#8217;s functionality conforms to the specification.</p>
<p>In the simplest situation, a <em>test driver</em> feeds valid test data to the application, which is requested to work as specified. No error should occur. The aim of this kind of positive tests consists in demonstrating that the application is working correctly, as illustrated in the figure below.</p>
<div id="attachment_809" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-test-data-generation-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2010/12/Fig-1-Functional-Test-Architecture.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2010/12/Fig-1-Functional-Test-Architecture-480x161.png" alt="" title="Fig 1 - Functional Test Architecture" width="480" height="161" class="size-large wp-image-809" /></a><p class="wp-caption-text">Positive functional test of a form-centric application. The test driver feeds valid test data to the application. They pass the internal validator mechanism, and the application provides positive feedback to the test driver.</p></div>
<p>The question is: what kind of data is of high quality, i.e. is especially <em>valuable</em> for positive functional tests? This is what we are going to explore next. </p>
<h3>Obtaining a High Test Coverage</h3>
<p>When carrying out positive functional tests, we aim at putting the application under some sort of &#8216;pressure&#8217;. We want to make sure as much as possible that the application works flawlessly, and that the normal execution paths in the code are all executed. And if there are errors in the code, we want to maximize the likelihood that our tests uncover them. </p>
<p>Experience shows that it pays off to look out for challenging values for the fields in a form-based application, such as <strong>extreme or otherwise special values (ESVs)</strong>. For a numeric field, the most important special value is 0 &mdash; in view of the trouble this number causes when used as denominator in a numerical division. It also pays off to request very small values for some fields and very large ones for others, particularly when such numbers participate in additions or other numerical operations.</p>
<p>Let&#8217;s look at an example, namely a field representing an integer money amount of <strong>Euro without Cents</strong>. The first two (pseudo-) values that an ESV-generator will request are a <em>generic</em> <code>#filled</code> and an <code>#empty</code> for any field, without actually specifying any value. For each field we want it to be filled in one data record, and to be empty in another data record. Our field usually has a maximum length, say 5, thus the minimum and maximum integer values the ESV-generator will produce are 99999 and -9999. The most important special value is 0, and finally the generator will add the smallest positive and negative values, i.e. +1 and -1 to our set. Thus our seven-element &#8216;<strong>wish-list</strong>&#8216; for the Euro-amount becomes</p>
<pre>#filled, #empty, 99999, -9999, 0, 1, -1
</pre>
<p>Of course, should the meta-information forbid the value 0, it will be dropped from our wish-list. Likewise, when negative values are not allowed, they will be dropped.</p>
<p>For a field representing an amount of <strong>Euro with Cents</strong>, small amounts, such as 0,01 and -0,01 should be added to the wish-list.</p>
<p>When the field represents a simple <strong>yes-no-decision</strong>, we can even specify a complete wish-list of desirable values:</p>
<pre>#filled, #empty, true, false
</pre>
<p>For a field representing a <strong>calendar date</strong>, we may want to put as much as 10 or 15 different special values onto our wish-list. And, in addition to the fixed values, we may want to add some random values to most of the wish-lists.</p>
<p>Obviously we need an ESV-generator which considers a field&#8217;s data type and other meta-information, and generates a list of ESVs for that field. Not surprisingly, our Rule-Based Test Data Generator (R-TDG), briefly introduced in <a href="/2010/10/test-data-generation-part1/">part 1 of this blog series</a>, comprises such a module. </p>
<p>Clearly, if each value in our wish-lists can be placed in at least one valid test data record, we shall achieve high test coverage, when the data are used in functional tests. However, as we shall see in a moment, a good test coverage is not the only need we have.</p>
<h3>The Quest for High Compression</h3>
<p>Suppose for the moment that we are able to generate test data in such a way that each test data record contains exactly one ESV, i.e. one value from our wish-lists. Assume further that the forms together contain 1000 fields, with an average of 5 ESVs per field. Then, the number of data records required to assure the desired test coverage becomes as large as 5000. In reality, for reasons not explained here, the number of ESVs and thus the corresponding number of data records may be much larger. So many records cannot simply be used in a functional test of a form-centric web-application, just because of the substantial turn-around times for a single test case.</p>
<p>We therefore arrive at the conclusion that we need some form of &#8220;compression&#8221;, i.e. more than one value from our wish-list shall somehow be squeezed into a single test data record. Actually, we want as many values as possible from our wish-lists to appear in each data record in order to minimize the time-to-completion of the whole functional test.</p>
<p>At this point at latest the <em>cross-field constraints</em> start to play a pivotal role. It is simply not possible to loop through all the fields and select one of the ESVs from our wish-list independently from the values selected for the other fields. </p>
<p>Let us look at an example: suppose we have three fields A, B, and C of type Euro without Cent, and suppose that our wish list of ESVs for each field comprises the three values -999, 0, 9999. A <em>naive</em> test data generator would come up with 9 test data records. For each of these records, exactly one field would have one of its ESVs:</p>
<table>
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
</tr>
<tr>
<td>-999</td>
<td>*</td>
<td>*</td>
</tr>
<tr>
<td>0</td>
<td>*</td>
<td>*</td>
</tr>
<tr>
<td>9999</td>
<td>*</td>
<td>*</td>
</tr>
<tr>
<td>*</td>
<td>-999</td>
<td>*</td>
</tr>
<tr>
<td>*</td>
<td>0</td>
<td>*</td>
</tr>
<tr>
<td>*</td>
<td>9999</td>
<td>*</td>
</tr>
<tr>
<td>*</td>
<td>*</td>
<td>-999</td>
</tr>
<tr>
<td>*</td>
<td>*</td>
<td>0</td>
</tr>
<tr>
<td>*</td>
<td>*</td>
<td>9999</td>
</tr>
</table>
<p>The star indicates that we don&#8217;t care about the value of the field. It may or may not be one of our ESVs.</p>
<p>The maximum compression we might be able to achieve is 3 ESVs per record:</p>
<table>
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
</tr>
<tr>
<td>-999</td>
<td>-999</td>
<td>-999</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>9999</td>
<td>9999</td>
<td>9999</td>
</tr>
</table>
<p>However, in the presence of constraints, such a compression might not be achievable. Suppose there is a sum constraint C = A + B, relating the three fields. Then the values for the fields cannot be selected independently from each other. What we can hope for are usually at most 2 ESVs per record. A compressed set of records might look like this:</p>
<table>
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
</tr>
<tr>
<td>-999</td>
<td>0</td>
<td>-999</td>
</tr>
<tr>
<td>9999</td>
<td>-9999</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>9999</td>
<td>9999</td>
</tr>
<tr>
<td>0</td>
<td>-999</td>
<td>-999</td>
</tr>
</table>
<p>The full constraint-handling machinery of the R-TDG is capable of achieving the desired high compression, i.e. of delivering test data records that are both <em>valid and densely</em> populated with ESVs.</p>
<h2>Automated Test Data Generation</h2>
<p>Let us briefly summarize what we have learnt so far about the problem of generating test data. The test data sets should be of <em>high quality</em>. By this, we mean:</p>
<ul>
<li><strong>high coverage</strong> of test cases, and also</li>
<li><strong>high compression</strong>, as measured by the average density of extreme and special values (ESVs) in the data records.</li>
</ul>
<p>How can the R-TDG fulfill these requirements and produce valuable test data for functional tests?</p>
<h3>Constraint-based Test Data Generation</h3>
<p>The key idea behind the R-TDG is a really simple one: produce the test data directly from the set of validation constraints. This idea is not a particularly new one. However, after decades of research, there are only very few solutions working in practice, and these have a very limited range of applicability, see the figure below.</p>
<p>At mgm technology partners, we greatly benefit from the parallel development of the rule-based validation framework. This framework captures the validation logic of a form-centric application in a central rule-base, where individual constraints are associated with the validation rules (see right side of the figure below).</p>
<div id="attachment_811" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-test-data-generation-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2010/12/Fig-2-Constraints.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2010/12/Fig-2-Constraints-480x208.png" alt="" title="Fig 2 - Constraints" width="480" height="208" class="size-large wp-image-811" /></a><p class="wp-caption-text">The traditional way of arriving at constraints (left side). The source code of the application is analyzed, and the conditional expressions are examined. A set of constraints is extracted, which form the basis for the constraint solver inside the test data generator. Right side: The novel way of arriving at constraints. The constraints are centrally collected in a rule base, from which a code generator generates the validator code for the application. The same validation rules are ingested by the constraint solver which uses it for producing test data records.</p></div>
<p>The process of generating a test data record from constraints and the process of validating a data record are intimately related.  Actually these processes are inverses of each other. What do I mean by that? First, consider the <strong>validation process</strong> as illustrated in the figure below.</p>
<div id="attachment_812" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-test-data-generation-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2010/12/Fig-3-Data-Record-Validation.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2010/12/Fig-3-Data-Record-Validation-480x182.png" alt="" title="Fig 3 - Data Record Validation" width="480" height="182" class="size-large wp-image-812" /></a><p class="wp-caption-text">Validation of a data record: The input (left) to the validator consists of the data record and the validation rules (bottom). The output (right) is a Boolean value: 'valid'/true for a valid data set, and 'invalid'/false (plus one or more error messages) for an invalid one.</p></div>
<p>In the validation framework, the rule base is compiled into executable validator code. When, in a positive test, a valid data record is being fed to the application, its validator will respond with &#8220;valid&#8221;. Conversely, when, in a negative test, an invalid data record is being fed in, the validator will respond with &#8220;not valid&#8221;. In addition, the application will usually return one or more error messages identifying the cause of the problem.</p>
<p>Next, turn to the generation of test data (see the figure below). The same validation rules that are compiled into the validator are being interpreted by the Rule-Based Test Data Generator (R-TDG).</p>
<div id="attachment_813" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-test-data-generation-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2010/12/Fig-4-Test-Data-Generation.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2010/12/Fig-4-Test-Data-Generation-480x187.png" alt="" title="Fig 4 - Test Data Generation" width="480" height="187" class="size-large wp-image-813" /></a><p class="wp-caption-text">High-level view of the constraint-based generation of test data: The main input to the data generator consists of the same validation rules (top) that form the input to the validator. In addition there is auxiliary input (right) such as a Boolean value indicating whether the data set shall be valid or invalid, and if invalid, which error conditions are supposed to be violated. The output (left) of the data generator is a corresponding valid or invalid test data record.</p></div>
<p>When both valid and invalid records may be produced, we need a Boolean value stating whether a valid or an invalid data record is requested (see bottom of the figure above). When a valid record is requested, the R-TDG will produce one, if possible. When an invalid record is requested, the R-TDG will produce one. In addition the user will normally specify a constraint to be validated. Again, if possible the R-TDG will generate an invalid test data record that violates the specified constraint, and no other.</p>
<h2>Summary and Outlook</h2>
<p>In this article I have discussed in some detail positive functional tests of form-centric applications. We have seen again that, in view of the complexity of the problem, severe time constraints, and particularly economic factors, it is necessary to automatically generate the data sets for such tests. In order to put the application under pressure, the test data records have to contain extreme or special values (ESVs) for all the fields on the forms. Due to time-limitations for functional tests, the test data sets have to be highly compressed, i.e. they should contain these ESVs in as few data records as possible. We have briefly discussed the general constraint-based approach used in the R-TDG for automatic generation of test data. As we have seen this approach actually inverts the process of data validation. How this inversion process is going to be accomplished will be the subject of another article in this blog series.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2010/12/test-data-generation-part2/">Producing High-Quality Test Data - Test Data Generation for Form-Centric Applications, Part 2</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=14" title="View articles by Dr. Hans-Martin Adorf">Dr. Hans-Martin Adorf</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2010/12/test-data-generation-part2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<series:name><![CDATA[Test Data Generation for Form-Centric Applications]]></series:name>
	</item>
		<item>
		<title>Developers, Don’t Write Functional Tests! - A Practitioner's Guide to Successful Software Testing, Part 1</title>
		<link>http://blog.mgm-tp.com/2010/11/successful-software-testing-part1/</link>
		<comments>http://blog.mgm-tp.com/2010/11/successful-software-testing-part1/#comments</comments>
		<pubDate>Mon, 22 Nov 2010 15:26:23 +0000</pubDate>
		<dc:creator>Dr. Martin Varendorff</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[JUnit]]></category>
		<category><![CDATA[QA]]></category>
		<category><![CDATA[Testing]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=777</guid>
		<description><![CDATA[Over the past few years I have noticed that the distinction between functional tests and unit tests has blurred in a lot of projects. I think that using the features of modern testing frameworks like JUnit and TestNG to push functional tests into unit tests is the wrong approach, because it shifts the focus of [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past few years I have noticed that the distinction between functional tests and unit tests has blurred in a lot of projects. I think that using the features of modern testing frameworks like JUnit and TestNG to push functional tests into unit tests is the wrong approach, because it shifts the focus of the test from the test perspective to the development perspective. In this blog post, I explain in detail how I have come to this conclusion.</p>
<p><span id="more-777"></span></p>
<p>In order to facilitate the discussion I want to elaborate the arguments using a simplified example, the development of an online shop. Let&#8217;s call the artefact of the development effort a <em>product</em>. And please note, that we talk about <em>automated</em> functional tests only.</p>
<p>Let&#8217;s focus first on the information flow of the development process. The stakeholder of the shop to be built has a mental picture of the shop in his mind and creates a story board and some business requirements (see figure 1). The analyst studies the story board and the business requirements, analyzes the business processes, and creates a specification with detailed functional and non-functional requirements. The developer reads the detailed requirements and to some degree all other documents created so far and analyzes them, too. The developer creates additional documents (architecture, design, technical requirements), produces the source code (i.e. Java, HTML, SQL &#8230;) and the build files.</p>
<h2>Imprecise Information Flow</h2>
<p>On the way from the stakeholder to the developer, there is a lot of loss or change or even creation of information. The written story board and the business requirements deviate already from the original business model in the mind of the <strong>stakeholder</strong>.</p>
<p>The <strong>analyst</strong> reads and (mis-)interprets the written documents, builds a model of his own, and creates further documents, the content of which deviates even further from the mental picture of the stakeholder.</p>
<p>Eventually, the <strong>developer</strong> reads some or all of the documents, builds a development model in his mind and creates architecture and design documents, finally also code and configuration, which in their functionality will be quite distant to the intention of the stakeholder.</p>
<p>Lots of meetings are organized to try and minimize the discrepancies in understanding caused by imprecise information flow; numerous development iterations are planned to narrow the gap between the intended shop and the created shop.</p>
<p>An additional effort to reach the goal is invested in the area of quality assurance. Functional tests are created to check whether the shop works as specified. Information loss occurs here as well: The <strong>tester</strong> reads some or all of the created documents (perhaps including the development documents) and builds a model of the shop and how to test it in his mind. Based on this model, the tester creates test descriptions (specifications, test cases &#8230;), and finally functional test code, test data and a test configuration.</p>
<h2>The Functional Tester as an Independent Institution</h2>
<p>The tester builds functional tests from the perspective of the end user of the system and his scope is the entire system including neighbouring systems, i.e. end-to-end. This is usually much closer to the perspective of the stakeholder than the perspective of the developer, who wants to test, what he has built and not what should have been built. The functional tests execute paths through the application (use cases), simulating expected behaviour. First the main paths as described in the story board are tested, then less important paths.</p>
<p>The tester goes through the same steps as the analyst and the developer, but (more or less) independent from them. So due to the imprecise information flow and the bugs in the production code and the test code, the resulting functional tests will not match the implementation of the shop. The tester finds those mismatches, which result in the failing of test cases. He then discusses those mismatches with the developer and sometimes with the analyst or even the stakeholder. Together, they clarify, whether the implementation of the shop or of the test case is wrong (or both), resulting either in fixes to the test case implementation or fixes to the shop implementation. The number of defects in the product found depends (among other things) on the test coverage of the functional tests, the completeness of the specification, and how independent the tester and the developer are working and thinking.</p>
<div id="attachment_779" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-software-testing-1" href="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Information-Flow-for-Functional-Tests.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Information-Flow-for-Functional-Tests-480x428.png" alt="" title="Information Flow for Functional Tests" width="480" height="428" class="size-large wp-image-779" /></a><p class="wp-caption-text">The Information flow between Business, Development and Quality Assurance regarding Functional Tests. The stakeholder on the business side describes the business story and defines high level requirements. An analyst creates a more detailed description of the artefacts to be created by the development. The developer uses all or part of the descriptions for developing the application; the tester uses also all or part of the descriptions for creating functional tests.</p></div>
<h2>The Four-Eye Principle for Functional Testing</h2>
<p>This approach ensures that the documents of the stakeholder and the analyst, and to some degree also those of the developer, are studied <strong>twice and independently</strong> and from <strong>different perspectives</strong> (four-eye-principle). So there is a higher chance to detect flaws in the document. This is especially important considering the fact that up to 50% of the defects in a product result from wrong requirements. See e.g. the article <a href="http://www.isixsigma.com/index.php?option=com_k2&#038;view=item&#038;id=1376:software-defect-prevention-in-a-nutshell&#038;Itemid=121">Software Defect Prevention &#8211; In a Nutshell</a>.</p>
<p>In addition, the likelihood that a misinterpretation of requirements leads to a defect in the product is diminished, because of the two independent interpretations of the document by the developer and tester. In particular the different perspective and focus of the tester (&#8220;How can I test this?&#8221;) adds a new dimension in thinking, which often leads to finding gaps in the requirements. There is a crude and rude (to the tester) argument for it: the developer has a creational mind-set and the tester a destructive one. A tester of course has to be creative as well, to be able to create good functional tests, but the intention to focus on checking the details and thereby bring down the application, is very helpful.</p>
<h2>Contrasting Unit and Functional Tests</h2>
<p>A unit test has a different focus than a functional test. It is created in order to check that the functionality promised in the API of the class is fulfilled. I know that a unit test checks a function and can be called a functional test, but I do not want to discuss terminology here. A unit test is a test on a very small scale. It tests a small box with well defined input and well defined output. It is created to stabilize the code and to help the developer change or reorganize existing code. Most classes are not created to fulfil exactly one requirement but are part of a component, which fulfils a set of requirements.</p>
<p>The developer has not the business requirements in his mind while constructing the class, but his own set of requirements for this part of the product, so that it fulfils its tasks within the component (see figure 2). Let&#8217;s have a look at a class implementing a caching algorithm or a special sorting algorithm: often there is no direct business requirement for this, but there are concrete needs from the component under construction (development requirements) for such a class. So the developer knows very well what to expect from the class and is therefore the right person to develop the test for such a class. Here we have an ideal case regarding the flow of information. Everything happens within the mind of one person only: the developer. So there is no misinterpretation of documents or misunderstanding in discussions.</p>
<div id="attachment_789" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-software-testing-1" href="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Information-Flow-for-Unit-Tests1.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Information-Flow-for-Unit-Tests1-480x251.png" alt="" title="Information Flow for Unit Tests" width="480" height="251" class="size-large wp-image-789" /></a><p class="wp-caption-text">Information flow for the development of unit tests. The developer builds up a mental model of the API of a class and how to test this class. Then the developer creates the unit tests, test data and the code in any sequence suitable to the problem and development model.</p></div>
<p>Summarizing the argumentation, I want to emphasize the following points:</p>
<ul>
<li>A strong distinction between functional tests and unit tests is important.</li>
<li>Unit tests are written by the developer.</li>
<li>Functional tests are created by the tester.</li>
</ul>
<p>In the next post, I will discuss, why I do not want functional tests to run in the build environment.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2010/11/successful-software-testing-part1/">Developers, Don&#8217;t Write Functional Tests! - A Practitioner's Guide to Successful Software Testing, Part 1</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=15" title="View articles by Dr. Martin Varendorff">Dr. Martin Varendorff</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2010/11/successful-software-testing-part1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<series:name><![CDATA[A Practitioner's Guide to Successful Software Testing]]></series:name>
	</item>
		<item>
		<title>Using Power Architect and Liquibase Efficiently - Tool-supported Data Modeling and Change Management, Part 2</title>
		<link>http://blog.mgm-tp.com/2010/11/data-modeling-part2/</link>
		<comments>http://blog.mgm-tp.com/2010/11/data-modeling-part2/#comments</comments>
		<pubDate>Wed, 17 Nov 2010 18:45:40 +0000</pubDate>
		<dc:creator>Ursula Bauer</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[LiquiBase]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Power Architect]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=757</guid>
		<description><![CDATA[In the first part of this blog series, I have motivated the combined usage of SQL Power Architect and LiquiBase. This part takes a closer look on how to actually work with our tool-chain. My colleague Thomas Kellerer, who is also involved in the Power Architect development, provides some great tips and tricks.

Before we start, [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://blog.mgm-tp.com/2010/10/data-modeling-part1/">first part</a> of this blog series, I have motivated the combined usage of <a href="http://code.google.com/p/power-architect/">SQL Power Architect</a> and <a href="http://www.liquibase.org/">LiquiBase</a>. This part takes a closer look on how to actually work with our tool-chain. My colleague Thomas Kellerer, who is also involved in the Power Architect development, provides some great tips and tricks.</p>
<p><span id="more-757"></span></p>
<p>Before we start, let me lay down a few preliminaries of the example. At mgm, we start with a blank slate only rarely when it comes to data models. Instead, in the majority of cases, we start with some kind of existing data model or database and adapt it. So in our example we assume that we already have either an existing database or a data model (Power Architect file) to start from.</p>
<h2>Getting the Physical Data Model into Power Architect</h2>
<p>When starting with an existing database, i.e. reverse engineering, we simply add the connection information to Power Architect which then retrieves the meta data about the tables and relations and key constraints automatically for us. This approach will get you a working model. However, be aware that if you intend to copy more sophisticated concepts like check constraints, non-standard indices (e.g. Oracle&#8217;s Bitmap indexes), triggers, views or stored procedures as well, that Power Architect is not yet able to reverse engineer them automatically. You have to extract this information manually from your existing database and enter the concepts as special &#8220;Custom SQL&#8221; statements to LiquiBase separately. We will deal with this issue later and more in depth.</p>
<p><strong>Tip</strong>: Before you start editing a file save the original state as a copy this way you can make a diff of the original and modified versions later on to check for example if all the intended changes have been made. Even if you use a versioning tool like subversion where you can check out any historical status keeping a &#8220;diff&#8221; copy at hand while doing edits can save time and effort. </p>
<h2>Export Options in Power Architect</h2>
<p>Now we edit the model, for example we add new tables and relations. Let&#8217;s take the following film database and add awards information:</p>
<div id="attachment_758" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-datamodel-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Existing-Data-Model.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Existing-Data-Model-480x237.png" alt="" title="Existing Data Model" width="480" height="237" class="size-large wp-image-758" /></a><p class="wp-caption-text">The existing Data Model we use as a starting point in our project.</p></div>
<div id="attachment_759" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-datamodel-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Modified-Data-Model.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Modified-Data-Model-480x216.png" alt="" title="Modified Data Model" width="480" height="216" class="size-large wp-image-759" /></a><p class="wp-caption-text">Modified data model (new tables film_awards and awards to the right).</p></div>
<p>This is only a very small example. In a real project there are usually many more tables and we can add or delete tables, columns, relationships or rename them, add or delete key constraints or indices or even drag and drop tables or columns from another model that we opened in parallel. </p>
<p>Once we have either adapted the existing data model (or created a new one), we add the new data model version as a .architect file to subversion. The next step is to export the physical data model or the changes to the model so we can put it into a Liquibase Changeset and/or create documentation. </p>
<p><strong>We usually export the complete model to a Liquibase Changelog only once</strong> and later on only export changs as Changesets. (The Changeset export feature was added to Power Architect by my colleague Thomas Kellerer.)  Changeset are manually appended to the projects Liquibase Changelog. This once again saves time and effort. </p>
<p>To export the whole as a Changelog we choose &#8220;Tools &#8211; Forward Engineer&#8230;&#8221; in the Power Architect menu. This opens a dialog that offers not only to create SQL Scripts directly but also to create a Liquibase Changelog, with various options to choose from. We will deal with the question which option to choose when and which values to enter later on.</p>
<div id="attachment_760" class="wp-caption alignnone" style="width: 457px"><a rel="lightbox-datamodel-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Exporting-complete-Liquibase-Changelog.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Exporting-complete-Liquibase-Changelog.png" alt="" title="Exporting complete Liquibase Changelog" width="447" height="255" class="size-full wp-image-760" /></a><p class="wp-caption-text">Export the complete model to a Liquibase XML Changelog using Tools - Forward Engineer...</p></div>
<p>To create a working initial Changelog this full export has to be supplemented by the Liquibase start and end tags, which we usually accomplish by pasting the export into a <strong>previously prepared .xml</strong> file between the respective tags:</p>
<pre class="brush: xml;">
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;databaseChangeLog
  xmlns=&quot;http://www.liquibase.org/xml/ns/dbchangelog/1.9&quot;
  xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
  xsi:schemaLocation=&quot;http://www.liquibase.org/xml/ns/dbchangelog/1.9

http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-1.9.xsd&quot;&gt;

PASTE THE CHANGELOG INFORMATION EXPORTED FROM POWER ARCHITECT HERE

&lt;/databaseChangeLog&gt;
</pre>
<p>Once the initial Changelog has been set up, any future changes are exported as deltas, i.e. only the changes that have been made and appended to the existing Changelog. To identify changes, we choose the &#8220;Tools &#8211; Compare data models&#8221; in the Power Architect menu and use the initial unmodified architect file which we saved as a working copy as mentioned before (or if we forgot this step, we check out the initial version once more from Subversion). </p>
<p>The pop-up dialog allows us to choose what we want to compare and how the output of the comparison shall be presented.  </p>
<div id="attachment_761" class="wp-caption alignnone" style="width: 489px"><a rel="lightbox-datamodel-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Compare-Data-Models.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2010/11/Compare-Data-Models-479x401.png" alt="" title="Compare Data Models" width="479" height="401" class="size-large wp-image-761" /></a><p class="wp-caption-text">Use Compare Data Models to product changes as a Liquibase Changeset.</p></div>
<p>As you can see in the screenshots above, the Export as well as the Compare Screen offers a variety of formatting options for the resulting Liquibase Changesets. Let us now examine which choices we have found helpful and which choice applies in which situation. </p>
<h2>Choosing the Optimal Liquibase Changeset Size</h2>
<p>A LiquiBase Changelog is divided in separate changesets that identify sets of changes. As indicated in the screenshots above, Power Architect offers an option to &#8220;Generate separate changesets&#8221; since Version 1.0.0. But why not keep all changes of an edit session in one big Changeset? Wouldn&#8217;t that be easier? The answer is that bigger is not always better, as the best size of your Changesets depends on the kind of DBMS you are using. </p>
<p><strong>When using a DBMS that does not support transactional DDL</strong> (e.g. Oracle, SQL Server), each DDL statement should be put into a single Changeset. The reason is that if a Changeset fails, this makes fixing the cause of the failure a lot easier. </p>
<p>Consider the following Changeset:</p>
<pre class="brush: xml;">
&lt;changeSet author=&quot;1.0&quot; id=&quot;42&quot;&gt;
  &lt;dropForeignKeyConstraint baseTableName=&quot;TABLE_1&quot;
                            constraintName=&quot;FK_TABLE_1_2&quot;/&gt;
  &lt;dropForeignKeyConstraint baseTableName=&quot;TABLE_2&quot;
                            constraintName=&quot;FK_TABLE_2_3&quot;/&gt;

  &lt;dropTable tableName=&quot;TABLE_1&quot;/&gt;
  &lt;dropTable tableName=&quot;TABLE_2&quot;/&gt;

  &lt;addColumn tableName=&quot;OLD_TABLE&quot;&gt;
    &lt;column name=&quot;NEW_COLUMN&quot; type=&quot;INTEGER&quot;
             defaultValueNumeric=&quot;0&quot;&gt;
      &lt;constraints nullable=&quot;false&quot;/&gt;
    &lt;/column&gt;
  &lt;/addColumn&gt;
  &lt;sql&gt;
    ALTER TABLE OLD_TABLE
      ADD CONSTRAINT CHECK_NEW_COLUMN
      CHECK (NEW_COLUMN IN (0,1))
  &lt;/sql&gt;

  &lt;dropPrimaryKey tableName=&quot;TABLE_4&quot;
                  constraintName=&quot;PK_TABLE_4Y&quot;/&gt;
  &lt;dropColumn tableName=&quot;TABLE_4&quot; columnName=&quot;OLD_COL_4&quot;/&gt;
  &lt;addPrimaryKey tableName=&quot;TABLE_4&quot;
                 constraintName=&quot;PK_TABLE_4&quot;
                 columnNames=&quot;COL_1,COL_2,COL_3&quot;/&gt;
&lt;/changeSet&gt;
</pre>
<p>The change to the primary key might fail if there is data in the database that prevents dropping it. If this step fails, the other DDL statements have already been applied. It does not help to fix the data and then re-run the Changelog, as the Changeset will now fail when it comes to the very first statement (as that has already been done before). </p>
<p>If each of the DDL statements is put into a single Changeset, LiquiBase knows that the previous Changesets have been already applied and will not run them again. Thus, fixing our problem is only a matter of fixing the data in TABLE_4 and re-running the whole Changelog.</p>
<p>On the other hand <strong>if you are using a DBMS that supports transactional DDL</strong> (e.g. PostgreSQL, DB2), DDL statements that belong together should be put into one single Changeset, so they will either succeed or fail together.</p>
<h2>Identifying Liquibase Changesets with ID/author tags</h2>
<p>LiquiBase uses the tags ID and author (in connection with the file name) to uniquely identify every Changeset. The Export and Compare dialogs offer the option to add author and ID automatically, but we have often found that it is not initially clear what should be added here. There are no musts here but we have found it highly advisable to set down some kind of convention to make the lives of everyone easier.</p>
<p>We find that the simplest way of using the ID is to use incremental numbers to keep a sequence inside the Changelog. This is also what the Power Architect export supports. Just set the &#8220;Start at&#8221; to the last ID number+1 and tick the &#8220;Generate ID attribute&#8221; to let the Export handle the ID assignment for you. Quite a time-saving option, isn&#8217;t it? (Liquibase also allows non numeric IDs, but there is currently no way to generate them automatically.)</p>
<p>As different people will add adaptations to the data model, it is good practice to use an author tag that helps understanding when a change was added and by which person or for which version. There exists various practical conventions, for example some projects use the maven source code version (pom) as author tag, while others use the developer&#8217;s username. The most important thing is not what is chosen but that the usage is consistent.</p>
<h2>Generating Documentation</h2>
<p>A last word on exporting from Power Architect: By using a custom XSLT template, we use the HTML Report generator to create Wiki documentation on the fly from the physical data model in Power Architect. With a bit of knowledge in the creation of XSLT templates, many different forms of documentation can be created quickly and without further efforts.</p>
<p>The next part will focus on LiquiBase and how to get the most out of it, e.g. how to partition the Changelog, using custom SQL statements for triggers, stored procedures, etc. and we will include a few tips concerning different DBMS.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2010/11/data-modeling-part2/">Using Power Architect and Liquibase Efficiently - Tool-supported Data Modeling and Change Management, Part 2</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=13" title="View articles by Ursula Bauer">Ursula Bauer</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2010/11/data-modeling-part2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<series:name><![CDATA[Tool-supported Data Modeling and Change Management]]></series:name>
	</item>
	</channel>
</rss><!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk (enhanced)
Database Caching 25/30 queries in 0.016 seconds using disk

Served from: blog.mgm-tp.com @ 2012-02-03 15:48:48 -->

