<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:series="http://unfoldingneurons.com/" version="2.0">

<channel>
	<title>mgm technology blog</title>
	
	<link>http://blog.mgm-tp.com</link>
	<description>We discuss software innovation</description>
	<lastBuildDate>Tue, 08 May 2012 14:53:14 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/MgmTechBlog" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="mgmtechblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">MgmTechBlog</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Building a scalable Web-based Call Center CTI Solution</title>
		<link>http://blog.mgm-tp.com/2012/05/scalable-web-cti-solution/</link>
		<comments>http://blog.mgm-tp.com/2012/05/scalable-web-cti-solution/#comments</comments>
		<pubDate>Tue, 08 May 2012 13:26:51 +0000</pubDate>
		<dc:creator>Lars Immisch</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[COMET]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[Telephony]]></category>
		<category><![CDATA[Wicket]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1206</guid>
		<description><![CDATA[My project was part of our customer&#8217;s effort to replace all of the enterprise applications with web applications based on a standardized technology stack. In this strategic move, the call center integration was a crucial step. As it turned out, the technical design of the new call center telephony solution was quite challenging. We did [...]]]></description>
			<content:encoded><![CDATA[<p>My project was part of our customer&#8217;s effort to replace all of the enterprise applications with web applications based on a standardized technology stack. In this strategic move, the call center integration was a crucial step. As it turned out, the technical design of the new call center telephony solution was quite challenging. We did not only learn a lot about <a href="http://en.wikipedia.org/wiki/Computer_telephony_integration">CTI</a>; we also had to implement the system to be scalable and ensure that it handles more than 1000 call center agents.</p>
<p><span id="more-1206"></span></p>
<p>The call center agents should use mostly the standard web applications, but with an additional telephony control that allowed them to accept incoming calls, to disconnect calls, or to make consultation calls to other agents or supervisors.</p>
<h2>An incoming call</h2>
<p>Let&#8217;s have a look at the most important usecase first: an incoming call.</p>
<p>The following diagram gives an overview of the flow of events, before the agent&#8217;s telephone rings:</p>
<div id="attachment_1222" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-web-cti" href="http://blog.mgm-tp.com/wp-content/uploads/2012/05/Incoming-Flow.png"><img class="size-large wp-image-1222" src="http://blog.mgm-tp.com/wp-content/uploads/2012/05/Incoming-Flow-480x366.png" alt="" width="480" height="366" /></a><p class="wp-caption-text">The simplified flow of events before an agent receives a call.</p></div>
<p>The incoming call of the customer is handled by the <a href="http://en.wikipedia.org/wiki/Private_branch_exchange#Private_branch_exchange">PBX (Private Branch Exchange)</a>. When the agent finally takes the call, a lot of information about the customer has already been collected. In most cases, the caller will already have gone through an interactive voice response system that has collected his account number and verified his PIN (omitted from the picture above).</p>
<p>This is how the agent screen might look like after the agent has taken the call:</p>
<div id="attachment_1209" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-web-cti" href="http://blog.mgm-tp.com/wp-content/uploads/2012/05/Agentscreen.png"><img class="size-large wp-image-1209" src="http://blog.mgm-tp.com/wp-content/uploads/2012/05/Agentscreen-480x185.png" alt="" width="480" height="185" /></a><p class="wp-caption-text">Schematic screen of the call center agent UI.</p></div>
<p>The box on the left are the telephony controls. They are embedded in an iframe and allow the agent to disconnect the call or place consultation calls to other agents. The telephony controls send commands to the gateway (and in extension to the PBX) and receive asynchronous events.</p>
<h2>CSTA as a Model for our Protocol</h2>
<p>Before it was even determined whether the telephone system should be integrated directly through the PBX or via an integration layer (Genesys), we decided to use the <a href="http://www.ecma-international.org/publications/standards/Ecma-269.htm">CSTA Phase III</a> communication protocol as an orientation for the protocol between gateway and browser. CSTA (Computer Supported Telephony Applications) is an ECMA standard (like JavaScript) and describes third-party call control using services and events. Third-party call control roughly means that the standard looks at an entire switch and all connected devices (telephones), and not just a single telephone. This point of view is reflected in the naming of the services and events. For example, when a call arrives at a terminal the event is called <em>Delivered</em>. A sample event flow is given in the following diagram.</p>
<div id="attachment_1223" class="wp-caption alignnone" style="width: 489px"><a rel="lightbox-web-cti" href="http://blog.mgm-tp.com/wp-content/uploads/2012/05/CSTA-Message-exchange.png"><img class="size-full wp-image-1223" src="http://blog.mgm-tp.com/wp-content/uploads/2012/05/CSTA-Message-exchange.png" alt="" width="479" height="438" /></a><p class="wp-caption-text">Exemplary flow of CSTA services and events in our system.</p></div>
<p><em>Services</em> are commands to the telephone system. An outgoing call (from any device within the domain of the switch) is initiated by a <em>Make Call</em> Service. But there are also services like <em>Set Agent State</em>.</p>
<p>CSTA is extremely comprehensive; we used only a small selection of its services and events. It is also easy to extend &mdash; the transfer of non-standardized key/value pairs within the data part of services and events is explicitly provided for.</p>
<p>CSTA provides an <a href="http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One">ASN.1</a> and an XML encoding. Writing an ASN.1 parser in JavaScript was obviously not a good idea and even the XML mapping is quite heavy-weight and we decided to design our own transport encoding on top of <a href="http://www.json.org/">JSON</a> and built a <a href="http://en.wikipedia.org/wiki/Representational_state_transfer">REST</a>-inspired web service as a gateway to the PBX.</p>
<h2>Requirements</h2>
<p>The customers&#8217; technology framework requirements were:</p>
<ul>
<li>Internet Explorer 8 as the browser for the call center agents</li>
<li><a href="http://wicket.apache.org/">Wicket</a> as web application framework for the call center application</li>
<li>Tomcat 7 as the web application server for both the call center web app and the gateways.</li>
</ul>
<p>The technical requirements were: minimal latency, high throughput, and high availability. An average delay below 150 ms was required for latency, i.e. a value slightly below the attention threshold. For the call center callers, very low latency is not crucial &mdash; most callers will have waited in the queue for minutes rather than seconds to reach a free agent anyway. But the new web application should &mdash; if at all possible &mdash; not worsen the ergonomics for the call center agents. In the end, this wasn&#8217;t a problem: during tests using moderate load latencies below 80 ms could be achieved.</p>
<p>High availability is an obvious requirement: if a call center with about 1000 agents fails, there will be many unhappy customers. On an unlucky day the failure will even be reported in the news. We solved the problem by designing for redundant server components and a low latency failover protocol. The actual web application uses Tomcat&#8217;s built-in clustering mechanism. We couldn&#8217;t reuse this for the telephony gateway, because the relevant state is distributed across the switches anyway.</p>
<p>The gateway has two essential reliability requirements:</p>
<ul>
<li>Commands to the telephone system have to be retried quickly if a gateway fails.</li>
<li>Telephony events must not be lost.</li>
</ul>
<p>The functional requirements were straightforward:</p>
<ul>
<li>Incoming and outgoing calls (simple call control)</li>
<li>Call forwarding (single-step/two-step transfer)</li>
<li>Forwarding to the IVR (Interactive Voice Response) — including customer dependent data — as well as routing back to the same agent that originally took the call</li>
<li>Setting and displaying the agent status</li>
</ul>
<h2>Architecture</h2>
<p>The architecture consists of several interconnected systems as shown in the diagram below:</p>
<ul>
<li>The call center agents&#8217; browser with the JavaScript/HTML,</li>
<li>Telephony-related systems (left): the <em>gateway</em> (a server-side web application running in Tomcat) and the <em>PBX</em>,</li>
<li>Call center web application (right): <em>Wicket-based web application</em> and its <em>database(s)</em>.</li>
</ul>
<p>The integration of telephony and web application happens in the browser. The web application includes our JavaScript library and a telephony control panel in an <em>iframe</em>.</p>
<div id="attachment_1225" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-web-cti" href="http://blog.mgm-tp.com/wp-content/uploads/2012/05/Layers.png"><img class="size-large wp-image-1225" src="http://blog.mgm-tp.com/wp-content/uploads/2012/05/Layers-480x316.png" alt="" width="480" height="316" /></a><p class="wp-caption-text">The architecture of our CTI solution based on web technology.</p></div>
<h2>Sending Server Events to the Browsers</h2>
<p>For redundancy, every client connects to both gateways, and keeps the TCP connection open. This means that every application server (Tomcat) of the gateways has to hold nearly 1000 open connections. We use the <a href="http://tomcat.apache.org/tomcat-7.0-doc/aio.html">AIO-Interface of Tomcat 7</a>, so all these connections can be processed by a single thread. This greatly minimizes memory requirements and scheduling overhead.</p>
<div id="attachment_1226" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-web-cti" href="http://blog.mgm-tp.com/wp-content/uploads/2012/05/Processes.png"><img class="size-large wp-image-1226" src="http://blog.mgm-tp.com/wp-content/uploads/2012/05/Processes-480x304.png" alt="" width="480" height="304" /></a><p class="wp-caption-text">Overview of our system architecture with redundant gateways and PBX systems.</p></div>
<p><a href="http://html5doctor.com/server-sent-events/">Server-sent events</a> (a.k.a. server push) was recently standardized as part of HTML5 in the <a href="http://dev.w3.org/html5/eventsource/#the-eventsource-interface">EventSource interface</a>. Another convenient method to implement bidirectional communication is <a href="http://dev.w3.org/html5/websockets/">WebSockets</a>. But we couldn&#8217;t use any of these due to the use of legacy browsers — we were glad we didn&#8217;t have to support IE6 and could rely on at least IE8. So we implemented a <a href="http://en.wikipedia.org/wiki/Comet_(programming)">COMET</a> variant, which essentially consists of long running <em>XMLHttpRequest</em> through which events are sent as chunked responses.</p>
<p>The asynchronous events from the gateways are decoded by our JavaScript library, which updates the telephony control and forwards the events to the interface part of the browser, which in turn may trigger a server interaction.</p>
<h3>Cross-Domain COMET with IE8</h3>
<p>The customer wanted to be able to run the web application and the gateways on different application servers. This means that the Javascript <em>XMLHTTPRequests</em> are <em>cross domain</em>, which turned out to be a small challenge on IE8.</p>
<p>Mozilla Firefox, Safari and Google Chrome all support the <a href="http://www.w3.org/TR/cors/">CORS (Cross-Origin Resource Sharing)</a> specification of the W3C. IE8 supports it as well; however, with IE8 one must use <em>XDomainRequests</em> instead of <em>XMLHTTPRequests</em>, and the API is slightly different. There is a also a <a href="http://blogs.msdn.com/b/ieinternals/archive/2010/04/06/comet-streaming-in-internet-explorer-with-xmlhttprequest-and-xdomainrequest.aspx">subtle buffering bug</a> within IE8 that makes it necessary to set 2 KB of fill characters on every new COMET connection to ensure that the next event is received by the application immediately.</p>
<h2>Redundant Gateways and PBX</h2>
<p>Each browser keeps two connections to two different gateways. One is active, and the other is a hot standby. When the connection to the active gateway is broken, the hot standby gateway is immediately activated. If necessary,  the last failed command will be retried. As the hot standby gateway has been sending events the whole time as well, it is guaranteed that no event is lost. After this failover, the connection to the failed gateway is retried. When it is active, the previously failed gateway has become the host standby gateway.</p>
<p>Loss of a gateway does not lead to the loss essential state — the gateways hold as little state as possible. All relevant state was either pushed up into the JavaScript library or down into the PBX integration layer. The gateways are also independent of each other and interchangeable. This makes the solution inherently scalable. More gateways can be added at any time.</p>
<p>The PBX (a Genesys installation) itself is also redundant. The fallback on this level is hidden by the Genesys API and the gateway doesn’t have to handle it.</p>
<h2>Testing</h2>
<p>Our solution was tested with three different methods:</p>
<ul>
<li>Javascript unit tests with <a href="http://docs.jquery.com/QUnit">QUnit</a>,</li>
<li>A simulator that implements the gateway&#8217;s HTTP services and simulates a single agent telephone (with a Swing GUI), and</li>
<li>Load tests.</li>
</ul>
<p>Writing the simulator was a substantial effort, but it helped in two ways:</p>
<ul>
<li>It made development without the telephony hardware possible.</li>
<li>It made it easy to test scenarios that were not reliable testable with real hardware (like deliberate race conditions).</li>
</ul>
<p>In an ideal world, the load tests would have been performed with an external load test tool. We didn&#8217;t have one available, so we wrote our own load test generator using the CSTA API to generate and receive calls.</p>
<h2>Conclusion</h2>
<p>Our solution is light-weight, conceptually simple and scalable. The simplicity is the result of two development iterations and rather long design phases.</p>
<p>The decision to use CSTA as the blueprint for the communication protocol worked well, too. It was helpful that we did not have to re-invent two-step transfer for the umpteenth time. Also, the CSTA vocabulary (which goes down to the text in the log messages) can be understood by personnel that are familiar with <a href="http://en.wikipedia.org/wiki/Computer_telephony_integration">CTI</a>.</p>
<h3>In the Footprints of Arnold Schwarzenegger</h3>
<p>Call center applications always remind me of a <a href="http://www.imdb.com/title/tt0111503/">slightly silly movie</a> starring Arnold Schwarzenegger as an undercover agent and Jamie Lee Curtis as his unsuspecting wife. His cover story for her is that he is doing something with IT and in one scene she inquires about his day at work. He starts telling her enthusiastically and quite elaborately about a call center integration &mdash; and she nearly falls asleep.</p>
<p>I, however, think the combination of a call center and a web application is technically quite fascinating.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2012/05/scalable-web-cti-solution/">Building a scalable Web-based Call Center CTI Solution</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=25" title="View articles by Lars Immisch">Lars Immisch</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2012/05/scalable-web-cti-solution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Consequences when using Mutable Fields in hashCode() - Hacking Java Puzzlers for Fun and Profit, Part 1</title>
		<link>http://blog.mgm-tp.com/2012/03/hashset-java-puzzler/</link>
		<comments>http://blog.mgm-tp.com/2012/03/hashset-java-puzzler/#comments</comments>
		<pubDate>Thu, 08 Mar 2012 13:16:08 +0000</pubDate>
		<dc:creator>Ulrich Schrempp</dc:creator>
				<category><![CDATA[Puzzler]]></category>
		<category><![CDATA[Code Quality]]></category>
		<category><![CDATA[IDE]]></category>
		<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1123</guid>
		<description><![CDATA[We start our new series with an informative HashSet puzzler. It&#8217;s about a bug that gave us quite a headache since its root cause was hard to identify. This subtle bug has without doubt crept into many code bases, so a detailed discussion should interest every Java coder. I will also discuss code inspection tools [...]]]></description>
			<content:encoded><![CDATA[<p>We start our new series with an informative HashSet puzzler. It&#8217;s about a bug that gave us quite a headache since its root cause was hard to identify. This subtle bug has without doubt crept into many code bases, so a detailed discussion should interest every Java coder. I will also discuss code inspection tools that detect this violation (sadly, only few). And by the way, what we learned about HashSet also makes a good topic in our job interviews.</p>
<p><span id="more-1123"></span></p>
<p>The starting point was that we had implemented a class with some fields and needed quick access to its instances. This use case was pretty performance-critical, so we took a HashSet to maintain the class instances. As the instance count was high, we needed a broad distribution of possible hash-values. We used the IDE to generate code for the <code>hashCode()</code> and <code>equals()</code> method. Everything should have been fine with this generated code — at least we thought so. However, our application showed some unexpected results and it took us quite a long time to nail down the root cause.</p>
<p>Here&#8217;s a <em>simplified</em> example of code using a HashSet of PhoneNumber instances:</p>
<pre class="brush: java;">
// Class PhoneNumber implements hashCode() and equals()
PhoneNumber obj = new PhoneNumber(&quot;mgm&quot;, &quot;089/358680&quot;);
System.out.println(&quot;Hashcode: &quot; +
	obj.hashCode());  //prints &quot;1476725853&quot;

// Add PhoneNumber object to HashSet
Set&lt;PhoneNumber&gt; set = new HashSet();
set.add(obj);

// Modify object after it has been inserted
obj.setNumber(&quot;089/358680-0&quot;);

// Modification causes a different hash value
System.out.println(&quot;New hashcode: &quot; +
	obj.hashCode()); //prints &quot;7130851&quot;

// ... Later or in another class, code such as the following
// is operating on the Set:

// Unexpected Result!
// Output: obj is set member: FALSE
System.out.println(&quot;obj is set member: &quot; +
	set.contains(obj));

// Even stranger unexpected Result!
// Output: obj is set member: FALSE
for (PhoneNumber p : set) {
	if (p.equals(obj)) {
		System.out.println(&quot;obj is set member: &quot; +
			set.contains(p));
	}
}
</pre>
<p>Executing the code above surprisingly produces the following output:</p>
<pre>
obj is set member: FALSE
obj is set member: FALSE
</pre>
<p>Obviously, what we would expect is a &#8220;TRUE&#8221;, since obj has been inserted into the HashSet.</p>
<h2>What just happened?</h2>
<p>The unexpected result from the code above is caused by a trap in the JDK Collections framework into which many developers have fallen: <em>If an implementation of <code>hashCode()</code> uses mutable fields to calculate the value, <code>HashSet.contains()</code> produces unexpected results, i.e. your object seems to be not a member of the set.</em></p>
<p>For an illustration, let&#8217;s look at the class <code>PhoneNumber</code> and its mutable field &#8220;number&#8221;:</p>
<pre class="brush: java;">
public class PhoneNumber {

    private final String name;
    private String number;

    public PhoneNumber(String number, String name) {
        this.number = number;
        this.name = name;
    }

	// Setter makes &quot;number&quot; mutable!
    public void setNumber(String number) {
        this.number = number;
    }

    @Override
    public int hashCode() {
        int result = name != null ? name.hashCode() : 0;
        result = 31 * result +
			(number != null ? number.hashCode() : 0);
        return result;
    }

	// equals() left out here ...
}
</pre>
<p>What&#8217;s wrong with this class? Well, it&#8217;s a bad idea to use <a href="http://www.javaranch.com/journal/2003/04/immutable.htm">mutable</a> fields in <code>hashCode()</code> when its instances are put into a <code>HashSet</code> (or as keys into a <code>HashMap</code>). In general, any hash-based collection is problematic. See also <a href="http://stackoverflow.com/questions/5110376/hashset-contains-problem-with-custom-objects">&#8220;HashSet contains problem with custom objects&#8221; (Stackoverflow)</a> and <a href="http://javaadventure.blogspot.com/2007/02/hashcode-pitfalls-with-hashset-and.html">&#8220;hashCode() pitfalls with HashSet and HashMap&#8221;</a>.</p>
<h2><code>HashSet.contains()</code> surprisingly uses <code>hashCode()</code></h2>
<p>Part of our problem to spot our bug was that the <code>HashSet.contains()</code> method relies on hash values to stay immutable. Unfortunately, this is not stated explicitly in the <a href="http://docs.oracle.com/javase/6/docs/api/java/util/HashSet.html#contains(java.lang.Object)">HashSet JavaDoc</a>, which only mentions <em>&#8220;&#8230;returns true if and only if this set contains an element e such that <code>(o==null ? e==null : o.equals(e))</code>&#8220;</em>. Actually, this is the same description as the <a href="http://docs.oracle.com/javase/6/docs/api/java/util/Collection.html#contains(java.lang.Object)">JavaDoc of <code>Set.contains()</code></a>.</p>
<p>A conscientious reader may also find the following note in the <a href="http://docs.oracle.com/javase/6/docs/api/java/util/Set.html">JavaDoc of the <code>Set</code></a> interface, which only mentions <code>equals()</code>:</p>
<blockquote><p>&#8220;Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.&#8221;</p></blockquote>
<p>By the way, the following Sun JDK bug was reported quite some time ago: <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6579200">&#8220;(coll) HashSet.contains method violates Set.contains() contract&#8221;</a>. The bug is approved (but not fixed) and the last comment was made in late 2007.</p>
<h2>Properly coding the <code>hashCode()</code> method</h2>
<p>The <em>contract</em> of <code>hashCode()</code> is explained in the <a href="http://docs.oracle.com/javase/6/docs/api/java/lang/Object.html#hashCode()">JavaDoc of <code>Object</code></a>. You will also find hints on proper implementations in the very interesting book <a href="http://www.amazon.com/Effective-Java-2nd-Joshua-Bloch/dp/0321356683">&#8220;Effective Java&#8221;</a> from Joshua Bloch. The book covers many interesting topics and especially the items 7 and 8 shed light on good <code>equals()</code> and <code>hashCode()</code> implementation practices. Another book by the same author, <a href="http://www.javapuzzlers.com">&#8220;Java Puzzlers&#8221;</a>, also contains two puzzlers on this problematic area: specifically, puzzlers 57 and 58 show how these two methods depend on each other.</p>
<p>There are many more discussions about the problems that can occur with <code>hashCode()</code>. For example, in his Google TechTalks presentation <a href="http://www.youtube.com/watch?v=aAb7hSCtvGw">&#8220;How To Design A Good API and Why it Matters&#8221;</a>, Joshua Bloch says that <code>hashCode()</code> is an implementation detail that should not have leaked into the Java API at all (at about 27:30 min).</p>
<p>And don&#8217;t forget the lesson learned here: <em>using mutable fields in hashCode() is a recipe for disaster. And disaster strikes when instances of this class are put in a hash-based collection like <code>HashSet</code> or <code>HashMap</code> (as map keys)</em>.</p>
<p>Please note that since code usually uses only the respective collection interfaces, e.g. Set and Map, you might not even know about it (as in our case). Or you use a module or library that stores your objects in a collection internally as an implementation detail that&#8217;s hidden from you.</p>
<h2>Don&#8217;t rely on automatic hashCode() Generation</h2>
<p>Even with coding rules in mind, a <code>hashCode()</code> implementation that uses mutable fields creeps into our code base faster than you can spell &#8220;bug&#8221;. This is because developers are reluctant to write the long-winded calculations in the <code>hashCode()</code> methods manually and often generate them with the help of the IDE, as shown in the screenshot below. But it&#8217;s just too easy for a developer to press &#8220;Generate&#8221; without first checking the specific fields that can be included and leaving the mutable fields out. Of all IDEs I tested only <a href="http://netbeans.org/">NetBeans</a> at least has all fields unchecked, which forces the developer to select them on purpose.</p>
<div id="attachment_1124" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-hashset-java-puzzler" href="http://blog.mgm-tp.com/wp-content/uploads/2012/03/Generating-hashCode-method.png"><img class="size-large wp-image-1124" title="Generating hashCode method" src="http://blog.mgm-tp.com/wp-content/uploads/2012/03/Generating-hashCode-method-480x468.png" alt="" width="480" height="468" /></a><p class="wp-caption-text">Modern IDEs provide automatic generation of hashCode(). Eclipse and Intellij IDEA by default include all mutable fields.</p></div>
<h2>Code Inspection Tools to the Rescue?</h2>
<p>You might wonder if classes of your code base contain an <code>hashCode()</code> implementation that uses mutable fields. One option (besides a manual code review) is using a code inspection tool. Unfortunately, the prominent open source tools like <a href="http://findbugs.sourceforge.net/bugDescriptions.html/">FindBugs</a>, <a href="http://pmd.sourceforge.net/snapshot/rules/basic.html">PMD</a>, <a href="http://checkstyle.sourceforge.net/availablechecks.html">CheckStyle</a> do not offer such a built-in inspection.</p>
<div id="attachment_1125" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-hashset-java-puzzler" href="http://blog.mgm-tp.com/wp-content/uploads/2012/03/IntelliJ-IDEA-Code-Inspection.png"><img class="size-large wp-image-1125" title="IntelliJ IDEA Code Inspection" src="http://blog.mgm-tp.com/wp-content/uploads/2012/03/IntelliJ-IDEA-Code-Inspection-480x113.png" alt="" width="480" height="113" /></a><p class="wp-caption-text">Only IntelliJ IDEA has a built-in code inspection that detects the use of mutable (actually non-final) fields in hashCode().</p></div>
<p>The only tool support I found was <a href="http://www.jetbrains.com/idea/">Intellij IDEA</a>. This IDE provides a <a href="http://www.jetbrains.com/idea/documentation/inspections.jsp">code inspection</a> named <em>&#8220;Non-final field referenced in &#8216;hashCode()&#8217;&#8221;</em>. Any violation is highlighted as shown in the screenshot above.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2012/03/hashset-java-puzzler/">Consequences when using Mutable Fields in hashCode() - Hacking Java Puzzlers for Fun and Profit, Part 1</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=24" title="View articles by Ulrich Schrempp">Ulrich Schrempp</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2012/03/hashset-java-puzzler/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<series:name><![CDATA[Hacking Java Puzzlers for Fun and Profit]]></series:name>
	</item>
		<item>
		<title>Using Domain Specific Languages to Implement Interactive Frontends - Software Quality driven by Formal DSLs, Part 1</title>
		<link>http://blog.mgm-tp.com/2012/02/formal-dsl-part1/</link>
		<comments>http://blog.mgm-tp.com/2012/02/formal-dsl-part1/#comments</comments>
		<pubDate>Thu, 16 Feb 2012 12:06:17 +0000</pubDate>
		<dc:creator>Dr. Jürgen Knopp</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[DSL]]></category>
		<category><![CDATA[QA]]></category>
		<category><![CDATA[Testing]]></category>
		<category><![CDATA[Validation]]></category>
		<category><![CDATA[Web Forms]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1082</guid>
		<description><![CDATA[For many years we have dealt with the challenges that frontends with interactive forms pose w.r.t. validation, test data and quality. Describing the requirements in formal Domain Specific Languages (DSL) became the way of choice to create a specification that gives a twofold benefit: first, the customer understands it better, and secondly, the software engineers [...]]]></description>
			<content:encoded><![CDATA[<p>For many years we have dealt with the challenges that frontends with interactive forms pose w.r.t. validation, test data and quality. Describing the requirements in formal <a href="http://en.wikipedia.org/wiki/Domain-specific_language">Domain Specific Languages</a> (DSL) became the way of choice to create a specification that gives a twofold benefit: first, the customer understands it better, and secondly, the software engineers use the specification not only to implement more resilient software, but also to improve quality assurance. This new series will explain how we do it and why we think it&#8217;s the best approach.</p>
<p><span id="more-1082"></span></p>
<p>We develop many applications for e-government, finance and insurance. These usually rely heavily on the presentation and evaluation of data processed within interactive frontends. Legal and business reasons demand that the resulting applications meet very high standards of reliance and safety. For us, this meant that we had to find a way to work close with the customer to understand the domain while simultaneously improving coverage and efficiency of development and QA. </p>
<p>We met this challenge by using Domain Specific Languages to involve the customer in the creation a specification basis for the domain which could be used by our engineers not only as requirement description but also for the implementation of tools for automatic validation or even code generation. The work spent on the definition, development and maintenance of specification language tools proved to be worth it as quality improved and effort decreased. E.g. while we were analyzing and reporting test coverage in terms of project specific formalized test data, we were glad that we could rely on a resilient specification basis. </p>
<div id="attachment_1111" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-forms-specification-part1" href="http://blog.mgm-tp.com/wp-content/uploads/2012/02/DSL-Specification-as-a-Bridge.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2012/02/DSL-Specification-as-a-Bridge.png" alt="" title="DSL Specification as a Bridge" width="480" height="201" class="size-full wp-image-1111" /></a><p class="wp-caption-text">Specifications based on a Domain Specific Formal Language bridge the gap between Customers and Software Developers.</p></div>
<p>The proven benefits of formal specifications at the comprehension level of customers encouraged us to bridge the gap in more mgm projects: we motivate more customers to introduce a formal requirement process and more engineers to a domain specific, though formalistic view of requirements so they can benefit from the improvements.</p>
<p>This first blog article introduces our domain specific formal language approach. The upcoming second part will focus on quality benefits derived from formal specifications.</p>
<h2>Setting the Scene</h2>
<p>The quality of complex interactive applications always includes a high level of conformity to the respective customer&#8217;s requirements. It is not enough to have a well-functioning system; the system also has to do what the customer expects! This holds for both, explicit and implicit (sometimes never formulated) requirements. We believe that the only way to accomplish this is to learn from the customer and to struggle for exact and comprehensive requirements consistent with the customer&#8217;s domain.</p>
<h4>Requirements modeled by Domain-Specific Formal Languages</h4>
<p>We believe that the need for exact requirements can be met by using formal languages to describe requirements. To model and implement what the customer needs (and not what we believe that he or she needs), the specification must be formulated using the terms of the respective domain and be comprehensible for the customer. We achieve this through the adaptation and extension of our existing language family to support the customer&#8217;s particular domain.</p>
<h4>Requirements modeled for Frontends &ndash; at least</h4>
<p>It is common sense to separate front-end aspects (such as input and output handling for forms) from the full business logic modeled somewhere in the back-end. The main purpose of front-ends is to deliver to the back-end information which is guaranteed to be sound and consistent. The back-end implementers should not be bothered by interactive aspects and user level data consistency.</p>
<h3>Our Approach</h3>
<p>mgm technology partners&#8217; approach to form-centric applications was to implement a framework supporting specification languages along the full software development chain: editors, code generators, documentation and test data generators for the supported language family.</p>
<p>Whereever possible:</p>
<ul>
<li>We use and promote formal specification languages for front-ends and part of the business logics.</li>
<li>We involve customers in the requirement process, especially through including formal specifications for their domains.</li>
<li>We tailor the specification framework to each of the customer&#8217;s specific application domain.</li>
</ul>
<p>This yields a high return (for customers and mgm) in terms of delivery time and quality and, in addition, reduces the implementation and quality assurance effort.</p>
<h3>Technical Benefits</h3>
<p>The benefits of using formal front-end specification techniques are manifold:</p>
<ol>
<li><strong>Consistency of the customer requirements</strong>: It becomes very likely, that the system accepts and denies exactly what the customer expected (including implicit requirements which would have never been formulated in absence of formal specifications).</li>
<li><strong>Reduction of programming effort and risk</strong>: Since formal specifications allow for automatic code generation, a great deal of programming effort and risk simply vanishes.</li>
<li><strong>Increasing functional quality standards for the front-end</strong>: Obviously, generated code is consistent to specifications (once the code generator is mature and tested). Moreover, due to the formal description of legal inputs, test suites describing valid inputs are automatically generated.</li>
<li><strong>Driving the tests of the back-end</strong>: Since the front-end specifications define the set of correct inputs, they serve as basis for tests for the back-end. Automatically generated test suites can be run which prove the quality of the back-end in a well defined setting. Test coverage goals for the back-end can be formulated in terms of the existing formal front-end specifications for the front-end. Test coverage becomes measurable in terms of front-end data and front-end use cases.</li>
</ol>
<p>In this blog post we focus on the topics 1) and 2) described above. We do so by presenting a flavor of the used specification languages. Aspects 3) and 4) will be discussed in a second blog post.</p>
<h2>Specifications supporting customer-related formal Requirements</h2>
<p>In the following we will demonstrate how to derive a formal specification using mgm&#8217;s specification language.</p>
<h3>An User Interface Example</h3>
<p>Imagine a simple bill writing system based on a simple (e.g. web graphical) user interface where we will also need to consider both, calculations and consistency constraints.</p>
<p><em>Let us assume that a number of bill positions with a net unit price can be typed in, each of them with its own multiplicity (quantity) of at least 1. The system calculates the net price for each position as well as the baseline including the default or specific VAT (value added tax).</em></p>
<p>The input form might look as follows (we ignore GUI details in this example):</p>
<div id="attachment_1083" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-forms-specification-part1" href="http://blog.mgm-tp.com/wp-content/uploads/2012/02/Empty-Form.png"><img class="size-full wp-image-1083" src="http://blog.mgm-tp.com/wp-content/uploads/2012/02/Empty-Form.png" alt="" width="480" height="199" /></a><p class="wp-caption-text">Sample form with input fields (yellow) and calculated fields (blue).</p></div>
<p>Input fields are marked yellow and fields which are to be calculated by the system (and thus are &#8220;read-only&#8221; for the user) are marked blue.</p>
<p>A completed form for two positions, with reduced VAT would look like follows:</p>
<div id="attachment_1084" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-forms-specification-part1" href="http://blog.mgm-tp.com/wp-content/uploads/2012/02/Filled-Form.png"><img class="size-full wp-image-1084" src="http://blog.mgm-tp.com/wp-content/uploads/2012/02/Filled-Form.png" alt="" width="480" height="236" /></a><p class="wp-caption-text">Filled form with automatically calculated net and gross amounts.</p></div>
<h2>Formal Specification for the Example</h2>
<p>The specification language is now used to describe the properties of the fields, i.e. their types, the relations between their values and other constraints regarding consistency and completeness in a formal way. This approach is the core of the formalization of requirements.</p>
<p>Let us look at some formal specifications for the example above.</p>
<h3>Types</h3>
<p>Depending on the domain, fields have domain-specific types at user level. These types can be specified as numbers, currencies, date data, strings, enumerations and truth values. Most of them also include a specification of the allowed field length. For instance <code>AlternativeVat</code> is a positive numeric value with 2 digits which leads to the specification:</p>
<pre class="brush: plain; wrap-lines: false;">
AlternativeVat: PositiveNumberDigits(2)
</pre>
<p>Some fields (e.g. all position related fields in the example) might have multiple instances. This is also specified along with the type definition. Multiplicity is specified as follows by using <code>multi</code>:</p>
<pre class="brush: plain; wrap-lines: false;">
multi Position:     String(25)
multi UnitPrice:    EurosAndCentsDigits(8)
multi Quantity:     PositiveInteger(3)
multi PosFullPrice: EurosAndCentsDigits(10)
</pre>
<p>Fields values which are either subject to calculation by the system or which are given or by the specification are marked as <code>calc</code> or <code>constant</code>, respectively. Both kinds are not editable by the user (they are &#8220;readonly&#8221;).</p>
<pre class="brush: plain; wrap-lines: false;">
calc       NetAmount:    EurosAndCentsDigits(10)
constant   NormalVat:    PositiveNumberDigits(2) = 19
calc       AllVat:       EurosAndCentsDigits(10)
calc multi PosFullPrice: EurosAndCentsDigits(10)
</pre>
<h3>Functional Rules</h3>
<p>Functional rules specify values in a functional setting.</p>
<p>For the fields <code>AllVat</code> and <code>GrossAmount</code> one gets straight-forward specification formulas:</p>
<pre class="brush: plain; wrap-lines: false;">
AllVat = If FieldValueSpecified(AlternativeVat)
            then  AlternativeVat/100*NetAmount
            else  NormalVat/100*NetAmount
GrossAmount = AllVat + NetAmount
</pre>
<p>The semantics of these rules is intended to be functional rather than imperative, i.e. a field value is not influenced by any state aside from the occurring field values (Similar to formulas in spreadsheets or functional languages). Note that functional rules can also be used to specify mappings within the user interface, e.g. from drop-downs to textual representations.</p>
<h3>Constraint Rules</h3>
<p>Constraint rules describe constraints which have to be satisfied to ensure consistent inputs. They consist of</p>
<ul>
<li>a conditional clause specifying the constraint, which the field values have to satisfy, and of</li>
<li>a fail-clause including a message which shall be issued if the constraints are not satisfied.</li>
</ul>
<p><em>In this example, only specific VATs can be considered: normal VAT, half normal VAT or no VAT at all.</em></p>
<pre class="brush: plain; wrap-lines: false;">
constraint
AlternativeVat == 0
   or AlternativeVat == NormalVat
   or AlternativeVat == NormalVat/2
   =&gt; failed: &quot;VAT can only be normal, half normal or zero&quot;
</pre>
<p>Note that the semantics is similar to assert statements: the failed-clause expresses feedback in case the conditional clause is not fulfilled.</p>
<p>Note further the differences between functional and constraint rules:</p>
<ul>
<li>Unlike functional rules, constraint rules specify constraints (operationally expressed: checks) rather than computations. For syntactic differentiation, we use &#8220;<code>==</code>&#8221; here rather than &#8220;<code>=</code>&#8220;.</li>
<li>Functional rules do not have fail-clauses since they enforce values for fields rather than checking their relationship.</li>
</ul>
<h3>Rules for Multiple Instances</h3>
<p>The language supports multiple instances of fields (and rules therefore) along two multiplicity dimensions &ndash; &#8220;All&#8221; and &#8220;Each&#8221;.</p>
<h4>All-multiplicity for functional rules</h4>
<p>For computations such as for the field <code>NetAmount</code> multiple instances have to be considered</p>
<pre class="brush: plain; wrap-lines: false;">
NetAmount = Sum(PosFullPrice.all)
</pre>
<p>The &#8220;<code>.all</code>&#8221; next to <code>PosFullPrice</code> stands for an &#8220;all instances&#8221; semantics. We call this <em>&#8220;all&#8221; multiplicity</em>.</p>
<h4>Each-multiplicity functional rules</h4>
<p>Each instance of <code>PosFullPrice</code> is calculated in a homogenous way by multiplying the price with the quantity.</p>
<pre class="brush: plain; wrap-lines: false;">
PosFullPrice.each = UnitPrice.each * Quantity.each
</pre>
<p>This functional rule yields for each row (i.e. instance), respectively and is denoted with the postfix &#8220;<code>.each</code>&#8220;. We call this each <em>multiplicity</em>.</p>
<p>This defines that the rule holds for each multiple instance of involved fields. Intuitively, one can view this as multiple copies of the rule for each instance.</p>
<h4>Each-multiplicity for constraint rules</h4>
<p>Consistency constraints for multiple instances of fields are expressed here as well.</p>
<p><em>In the example, it must be guaranteed, that each position (row) is fully specified.</em> This is expressed with the specific predicate <code>FieldsCommonlyDefined</code> (using &#8220;each&#8221; multiplicity).</p>
<pre class="brush: plain; wrap-lines: false;">
constraint
FieldsCommonlyDefined(Position.each, UnitPrice.each, Quantity.each)
    ==&gt; failed: &quot;All fields (Position , UnitPrice, Quantity) must be specified if one is specified &quot;
</pre>
<h4>All-multiplicity for constraint rules</h4>
<p>Obviously, &#8220;all&#8221; multiplicity is usable in constraint rule as well. In the example, since the system should not print empty bills, one has to specify multiplicity greater than zero at least for one of the fields referring to positions.</p>
<pre class="brush: plain; wrap-lines: false;">
constraint
AtLeastOneInstanceExists(Position.all)
    ==&gt; failed: &quot;Please specifiy at least one position&quot;
</pre>
<p>This completes the example showing some of the specifications means. The full specification is given in the appendix. Since it is complete it can be used for automatic code generations for computations and constraint checking.</p>
<p>The aim of this short example is neither syntactic accuracy nor completeness. It is merely an illustration of important aspects. Please remember that each language of the Specification Language family is tailored specifically anyhow.</p>
<h3>Summary: Characterization of the Specification Language Family</h3>
<p>In a nutshell, the specification language family described above defines valid inputs and front-end computations for web applications or other form based systems. The formalism can be characterized as a subset of typed predicate calculus defined in terms of the customer domain. For the sake of simplicity and comprehensibility, we do not extend to full predicate calculus or to higher order logic. The pragmatic expressiveness of the language family is more important than the theoretical power of predicate calculus.</p>
<p>Here is a summary of the aspects which can be described.</p>
<ul>
<li>Typed field: Valid and invalid field value using field description.</li>
<li>Field-value-constraints: valid and invalid values for related fields.</li>
<li>Existence-constraints: validity and invalidity of existent and non-existent input value depending on values or existence of input value for other fields.</li>
<li>Both kinds of constraints can be interwoven.</li>
<li>Computation of field values based on other field values. This is expressed by rules including:
<ul>
<li>mapping of external presentations to internal ones (such as mapping drop down selections to values),</li>
<li>constant values which show up in the user interface,</li>
<li>conditional values.</li>
</ul>
</li>
<li>Multiplicity: fields may be defined to occur in several instances, controlled by the specification. In terms of constraint control this allows to express iterative aspects on a higher level than for single fields. There are multiplicity aspects expressible by the specification language family (not needed in this example). This, however, is beyond the scope of this paper.</li>
</ul>
<p>And here&#8217;s the full specification example:</p>
<pre class="brush: plain; wrap-lines: false;">
# Fields and Types:
multi      Position: String(25)
multi      UnitPrice: EurosAndCentsDigits(8)
multi      Quantity: PositiveInteger(3)
multi      PosFullPrice: EurosAndCentsDigits(10)
           AlternativeVat: PositiveNumberDigits(2)
constant   NormalVat: PositiveNumberDigits(2) = 19
calc       NetAmount: EurosAndCentsDigits(10)
calc       AllVat:  EurosAndCentsDigits(10)
calc multi PosFullPrice: EurosAndCentsDigits(10)

# Functional Rules:
AllVat = If FieldValueSpecified(AlternativeVat)
            then AlternativeVat/100*NetAmount
            else NormalVat/100*NetAmount
GrossAmount       = AllVat + NetAmount
NetAmount         = Sum(PosFullPrice.all)
PosFullPrice.each = UnitPrice.each * Quantity.each

# Constraints:
AlternativeVat == 0  or  AlternativeVat == NormalVat
      or  AlternativeVat == NormalVat/2
      =&gt; failed: &quot;VAT can only be normal, half normal or zero&quot;
FieldsCommonlyDefined(Position.each UnitPrice.each, Quantity.each)
      =&gt; failed: &quot;All fields (Position , UnitPrice, Quantity) must be specified if one is specified &quot;
AtLeastOneInstanceExists(Position.all)
     =&gt; failed: &quot;Please specifiy at least one position&quot;
</pre>
<h2>Development and Testing issues</h2>
<p>Due to formal specifications, both, software development and quality assurance become less cumbersome, more comprehensible, easier to track, more scalable and safer. We conclude by shortly sketching these aspects.</p>
<h3>Avoiding Implementation Efforts and Risks through Code Generation</h3>
<p>Similar to programming languages, specification languages substantially simplify software development. Code generated from specifications eliminates a great deal of complexity and leads to less error-prone systems. Both functional rules and constraint rules can be used as input for generators to generate fully operational code. This is a highly efficient way to automatically produce components for the validation of data in a complex solution environment.</p>
<p>Once the code generators are well tested there is not much need to test for each front-end application again. mgm technology partners developed code generators that generate code for Java, C++ and Javascript, each of them running in a variety of environments.</p>
<h3>Test Data Generation and Test Coverage</h3>
<p>Since the used specification languages describe well defined inputs they are an ideal prerequisite for test data generation: consistent test cases are generated directly from the specification due to</p>
<ul>
<li>the existence of type definitions for fields,</li>
<li>the existence of constraint rules describing the relationship between fields,</li>
<li> and the existence of calculation formulas for calculated fields values.</li>
</ul>
<p>Test data generation occurs with few exceptions fully automatically. For specific test cases it is also possible to automatically generate inconsistent data by deliberately enforcing wrong data. This is done by negating constraints and re-running the test data generator.</p>
<p>mgm technology partners has developed a test data generator suite based on the used formal languages. The tool is used for the generation of big and complex test cases allowing for extensive and well defined test coverage. Some insights can be found in our blog articles &#8220;<a href="http://blog.mgm-tp.com/2010/10/test-data-generation-part1/">Form Validation with Rule Bases</a>&#8220;, and in &#8220;<a href="http://blog.mgm-tp.com/2010/12/test-data-generation-part2/">Producing High-Quality Test Data</a>&#8220;. Test coverage and other quality assurance topics issues will be highlighted in the upcoming second blog article.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2012/02/formal-dsl-part1/">Using Domain Specific Languages to Implement Interactive Frontends - Software Quality driven by Formal DSLs, Part 1</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=23" title="View articles by Dr. Jürgen Knopp">Dr. Jürgen Knopp</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2012/02/formal-dsl-part1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<series:name><![CDATA[Software Quality driven by Formal DSLs]]></series:name>
	</item>
		<item>
		<title>Ultra-Performant Dynamic Websites with Varnish</title>
		<link>http://blog.mgm-tp.com/2012/01/varnish-web-cache/</link>
		<comments>http://blog.mgm-tp.com/2012/01/varnish-web-cache/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 11:41:25 +0000</pubDate>
		<dc:creator>Dr. Christian Winkler</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[ECommerce]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1059</guid>
		<description><![CDATA[This article describes how we configured and used the Varnish web cache for the popular German online shop www.lidl.de. Varnish gave us a tremendous performance boost. With this new caching setup, we easily achieve request rates of several thousand pages per second, which are quite common during marketing campaigns like special offers.

In a typical non-caching [...]]]></description>
			<content:encoded><![CDATA[<p>This article describes how we configured and used the <a href="https://www.varnish-cache.org/">Varnish</a> web cache for the popular German online shop <a href="http://www.lidl.de/">www.lidl.de</a>. Varnish gave us a tremendous performance boost. With this new caching setup, we easily achieve request rates of several thousand pages per second, which are quite common during marketing campaigns like special offers.</p>
<p><span id="more-1059"></span></p>
<p>In a typical <em>non-caching</em> setup of a web application as illustrated in the figure below, Apache handles static requests for images, scripts, etc. and forwards requests for the HTML pages to an application server like Tomcat or Glassfish. There the dynamic content is generated and then sent back to Apache and finally to the user. In this scenario, the database access is the most critical bottleneck. Even worse, each page request can cause multiple database requests, i.e. SQL statements.</p>
<div id="attachment_1068" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-varnish" href="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Setup-without-Web-Cache.png"><img class="size-full wp-image-1068" title="Setup without Web Cache" src="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Setup-without-Web-Cache.png" alt="" width="480" height="255" /></a><p class="wp-caption-text">Initial setup without our caching solution: the slow components are shown in red. (Load balancing, which could be done by Apache, is not considered here.)</p></div>
<p>Let&#8217;s assume that without caching, an application server can serve up to <span id="statefullperformance">100 dynamic pages per second</span>. Through a bit of vertical scaling, i.e. using two server instances (nodes) and load balancing, this can be increased to about 200 dynamic requests. However, this scalability is not perfect and once it grows to three and more nodes, it already starts to get worse as the sessions have to be distributed among the nodes in the cluster.</p>
<p>The system can of course handle many more simultaneous users than the number 200 suggests, as users do not permanently access links. So the number of users did not really pose a problem during normal operation. However, the situation immediately got critical when newsletters with special offers were sent, as the application server instances were now under &#8220;siege&#8221;. An overload of the instances led to slower and slower responses and decreasing customer satisfaction. Another reason a shop would want to be responsive, is that search engines consider measured response times during crawling for ranking search results.</p>
<p>So the question we had to solve was: <em>How can we keep the system responsive (ideally with a response time of 1-2 seconds) during high load and peak situations?</em> Please be aware that in the case of online shops, the highest turnover occurs in these situations.</p>
<p>When we analyzed the server log of the <a href="http://www.lidl.de/">www.lidl.de</a> online shop, we noticed an interesting fact, which we used to our advantage later on: the behavior of users is different in these situations. Most users are just browsing and reading. Consider e.g. a newsletter sent to a few million users: most of the readers will just click a few links (which can still easily amount to several million page impressions). Taking a deeper look we found out that most users are viewing absolutely and exactly identical content which has nevertheless been produced exclusively for them. Only a small percentage used the interactive services of the website like shopping carts, ordering etc.</p>
<h2>Introducing Varnish</h2>
<p>The peak situation described above implies that most content (even though dynamically generated by the web application) is identical for all users. So the obvious idea for a cache is to store the most frequently requested pages. The <a href="http://www.mediawiki.org/wiki/Manual:Varnish_caching">Varnish manual</a> describes Varnish as a lightweight, efficient <a href="https://www.varnish-cache.org/docs/trunk/tutorial/advanced_backend_servers.html">reverse proxy</a> server, meaning it is working in front of the web servers (Apache). It acts as a so-called <em>HTTP accelerator</em> which stores (caches) copies of the pages served by the web server (thus the synonym &#8220;web cache&#8221;). The next time the same page is requested by a user, Varnish will serve the copy instead of requesting the page from the Apache server. Varnish is blazingly fast, since it stores its cached data in memory.</p>
<p>The new architecture with Varnish as a web cache now looks like this:</p>
<div id="attachment_1069" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-varnish" href="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Setup-with-Varnish-as-Web-Cache.png"><img class="size-full wp-image-1069" title="Setup with Varnish as Web Cache" src="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Setup-with-Varnish-as-Web-Cache.png" alt="" width="480" height="367" /></a><p class="wp-caption-text">Varnish in front of Apache acting as a Web cache. It is configured to cache only stateless page requests. Stateful page requests (session) and static resources are forwarded to Apache.</p></div>
<h2>Performance Improvements</h2>
<p>Caching with Varnish removes the need for the web application to regenerate the same page over and over again, resulting in a tremendous performance boost. Varnish can easily handle 10,000 requests/s on a single node. Especially in high load situations the hit rate is easily above 90% (and almost 100% for the mostly clicked homepage) so that the setup described above can now handle 50 times the original volume. However, this high performance will only hold for <em>stateless</em> users. Any user with a session will fall back to the 100 requests/s class.</p>
<p>As most of the load is now taken by the Varnish cache servers, the load on the application servers has dropped considerably. Even in high load situations where the Varnish servers handle several thousand requests per second, most of the content comes from the cache and the application servers can concentrate on re-creating expired content (which is then kept in the cache for s-maxage seconds) and handling users with a session (who are hopefully going to order).</p>
<p>Our setup leads to a significantly improved end-to-end performance of the system &ndash; even during normal operation. This is interesting as it creates an advantage for users during normal operation and saves money for the website owner at the same time.</p>
<p>Using less hardware means investing less money initially. However, an even more important fact is, that the operating costs will also be much lower. These operating costs are caused by permanent maintenance of the system, like powering servers around the clock, updating, applying patches etc. Since these costs are the main drivers for the total cost of ownership (TCO), the potential savings are also largest in this regime.</p>
<p>Using fewer servers also means consuming less power. By reducing the energy bill this &#8220;green IT&#8221; approach therefore leads to lower operating costs. Compared to extending the existing system without a cache, an enormous amount of money was saved both in hardware and operating costs, while introducing a &#8220;performance buffer&#8221; for situations with even higher loads at the same time.</p>
<p>Another effect is that the shop&#8217;s marketing division can now act freely without having to keep technical constraints in mind: new campaigns can be planned to increase the turnover significantly, like sending more frequent newsletters, using special offers etc.</p>
<h2>Challenges</h2>
<p>Before we dive into the details of our Varnish configuration, let&#8217;s first discuss three problems we had to solve, specifically handling stateful users, keeping users stateless w.r.t. caching as long as possible, and caching pages with changing content.</p>
<h3>Problem: Websites are Stateful</h3>
<p>Most websites nowadays are stateful, e.g. a server-side session is created when a user logs in. In case of an online shop, the session might contain the shopping cart, login information etc.</p>
<p>The problem is that as soon as the session contains personalized information, caching must immediately stop. But, as long as state information does not have an effect on the content of generated pages, it can be ignored. This is what we call a <em>stateless or browsing user</em>, and our first objective should be to cache pages suitable for this user class.</p>
<p>Thus, our <strong>solution is to classify users</strong>, i.e. to carefully distinguish between stateless and stateful users. As the web application did not originally take care of that, it had to be changed in two fundamental ways:</p>
<ol>
<li>The application must only generate and send cookies if it has created some internal state for a user.</li>
<li>This state transition can happen at any time. So a user who has not even touched the application server and is completely unknown to the application must be able to become a stateful user at any time.</li>
</ol>
<div id="attachment_1070" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-varnish" href="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Stateless-vs-Stateful-Users.png"><img class="size-full wp-image-1070" title="Stateless vs Stateful Users" src="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Stateless-vs-Stateful-Users.png" alt="" width="480" height="337" /></a><p class="wp-caption-text">Two classes of users are distinguished by certain attributes. A user should stay stateless as long as possible. Stateful (red) users will need contact to the application server and experience slower performance.</p></div>
<p>Fortunately, the web application was already obeying the <a href="http://en.wikipedia.org/wiki/Representational_state_transfer">REST</a> paradigm. HTTP GET requests were used for all content that was just shown to the users. In contrast to this, all user actions which were actually creating some state on the server side were already modeled in HTTP POST requests. This proved to be extremely helpful when we started to configure the cache software.</p>
<h3>Keeping Users Stateless</h3>
<p>The general goal must be to keep users stateless, at least as long as possible. In a first naive approach, only this facilitates caching.</p>
<p>Keeping users stateless means that the server should never send a session cookie unless really necessary. On the other hand, a lot of web applications require some basic personalization. This dilemma can be solved by using cookies which will be evaluated on the client side only. For example, let&#8217;s assume that users can change the background color of the website as a very simple form of personalization. This can be performed by Javascript and, for the sake of caching (and achieving a high hit rate), this should be the preferred way of doing simple personalization. Of course, a server-side cookie for personalized background color could be used to get the same result. But the cache hit rate would then suffer considerably (to be exact, by a factor  identical to the number of background colors, since exactly the same amount of cached copies has to be saved).</p>
<p>So one <strong>recipe for staying stateless is to keep simple state on the client-side</strong> and never send it to the server. This state does not necessarily have to reside in a cookie &ndash; you can also use the browser local storage for that, as described in Smashing Magazin&#8217;s <a href="http://coding.smashingmagazine.com/2010/10/11/local-storage-and-how-to-use-it/">&#8220;Using Local Storage In HTML5-Capable Browsers&#8221;</a> article.</p>
<h3>Dealing with Content that is Changing</h3>
<p>Even if now all stateless users can see the same cached content, this content is changing over time. In an online shop, for example, some products might run out of stock and become unavailable or need to be replaced by other products. Unfortunately, this does not only affect the product pages themselves but sometimes also pages that reference them; e.g. links and thumbnail images will have to be changed or removed. Similar situations often occur in online publishing and in nearly all websites which change over time.</p>
<p>Thus, another requirement for the cache is its ability to <strong>partially</strong> <strong>expire content</strong>. And of course, the bookkeeping must be performed externally so that the affected pages can be removed individually.</p>
<p>For the cache to work properly and perform <strong>automatic expiration</strong> of content, it needs to know how long the currently cached content should be kept (i.e. its maximal age). The web application therefore has to generate this so-called time-to-live (TTL) information.</p>
<p>The HTTP specification has defined HTTP response header fields such as <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9"><code>Cache-Control</code></a> for exactly this purpose a long time ago. These are set by the web application itself, since it knows best how long the content will be considered &#8220;current&#8221;/&#8221;valid&#8221;. This setting could even be dynamic , e.g. giving a shorter time-to-live to a product page if stock is low. The Cache-Control directive most suitable for this purpose is <code>s-maxage</code> as it specifies the maximum age of the object in seconds that the response is allowed to be kept in the web cache.</p>
<h3>Determining Cacheable Candidates</h3>
<p>Not all content can be or even should be cached. Caching on completely static websites is easier by far, however, these tend to be very unattractive, could be pre-generated and then moved to the web server. As the cache will sit in front of the web server, all requests will go to the cache first. It does not make much sense to store pages in the cache which are kept statically in the web server&#8217;s file system anyway.</p>
<p>On the other hand, only GET URLs can be candidates for caching. As a POST request transmits information from the browser to the server, it cannot be cached and must always be handled by the application server. This might sound like a big constraint at first but is actually a feature that can be nicely utilized: all URLs which are candidates for performing the state transition of a user from stateless to stateful will be POST requests. And consequently, the application itself can decide whether the POST requests actually qualify for making a user stateful or whether s/he can remain stateless, for example when a wrong login/password combination is entered.</p>
<h2>Anatomy of Varnish&#8217;s Request Processing</h2>
<p>Varnish distinguishes three stages when processing a request:</p>
<ul>
<li>The request is received from the browser (<code>vcl_recv</code>).<br />
At this stage, Varnish calls the subroutine <code>vcl_recv</code> in the configuration file (VCL). Here, the request header can be manipulated e.g. by removing cookies. It can be decided whether the content should be looked up in the cache or be propagated to the backend server.</li>
<li>The response is received from the backend (<code>vcl_fetch</code>).<br />
This function is only executed when the content is not delivered from the cache. In this phase, response headers from the backend can be modified (either for delivery or for saving in the cache). The request attributes are also still available and can be used for manipulating several settings.</li>
<li>The response is sent to the browser (<code>vcl_deliver</code>).<br />
This stage is passed by all requests and can be used to add headers (like TTL), change cookies etc. The request parameters are available for reading.</li>
</ul>
<div id="attachment_1063" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-varnish" href="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Varnish-Request-Processing.png"><img class="size-large wp-image-1063" title="Varnish Request Processing" src="http://blog.mgm-tp.com/wp-content/uploads/2012/01/Varnish-Request-Processing-480x310.png" alt="" width="480" height="310" /></a><p class="wp-caption-text">Different stages of Varnish&#39;s request processing. Everything related to the cache is in red, i.e. all cacheable content is looked up in the cache and possibly delivered; if it&#39;s not in the cache, the web server will be asked via vcl_fetch.</p></div>
<p>Varnish defines additional subroutines which also hook into the Varnish workflow, but they are not as important. See also the <a href="https://www.varnish-cache.org/docs/trunk/tutorial/vcl.html">VCL tutorial</a> and the <a href="https://www.varnish-cache.org/docs/trunk/reference/vcl.html">VCL reference</a>.</p>
<h3 id="vcl">A Sample Varnish Configuration (VCL)</h3>
<p>This section contains a simple Varnish configuration that provides caching as required. The challenge is to keep the user stateless as long as possible. In order to achieve this, a simple trick is used: if a request does not contain a <code>JSESSIONID</code> cookie, it is a stateless request and even if the (uneducated) backend wants to set a cookie, it will be removed. Only POST requests will set necessary cookies. Manipulating the TTL compliments the configuration. A lot of logging is used in the example; this is not just for illustrative purposes but also practical for debugging and optimizing the configuration.</p>
<pre class="brush: plain; wrap-lines: false;">
import std;

backend default {
    .host = &quot;localhost&quot;;  # Varnish is running on same server as Apache
    .port = &quot;80&quot;;
}

sub vcl_recv {
  # remove unnecessary cookies
  if (req.http.cookie ~ &quot;JSESSIONID&quot;) {
    std.log(&quot;found jsessionid in request, passing to backend server&quot;);
    return (pass);
  } else {
    unset req.http.cookie;
  }
}

sub vcl_fetch {
  if (req.http.cookie ~ &quot;JSESSIONID&quot; || req.request == &quot;POST&quot;) {
    std.log(&quot;not removing cookie/passing POST, url &quot; + req.url);
    return (pass);
  } else {
    # remove all other cookies and prevent backend from setting any
    std.log(&quot;removing cookie in url &quot; + req.url);
    unset beresp.http.set-cookie;
    set beresp.ttl = 600s;
  }
}

sub vcl_deliver {
  # send some handy statistics back, useful for checking cache
  if (obj.hits &gt; 0) {
    set resp.http.X-Cache-Action = &quot;HIT&quot;;
    set resp.http.X-Cache-Hits = obj.hits;
  } else {
    set resp.http.X-Cache-Action = &quot;MISS&quot;;
  }
}
</pre>
<p>Notice the C-like syntax in the Varnish configuration. This is no accident; in fact, the whole configuration code is compiled to a binary shared object at startup and when reloading the script to optimize for performance. As the subroutines in this configuration are called for each request, this helps immensely in creating a fast cache server. Moreover, it is possible to add C code directly to the configuration.</p>
<p>It might seem strange at first to define the configuration in a procedural language, but it proved to be extremely valuable as it enables us to be flexible and to formulate how exactly to handle the requests. Overall, this leads to a much more readable configuration than a declarative approach.</p>
<p>Notice the different &#8220;top level&#8221; objects in the configuration file:</p>
<ul>
<li><code>req</code> is the request (i.e. the URL including all headers) coming from the browser,</li>
<li><code>resp</code> is the response before it is sent to the client, i.e. when it can still be manipulated.</li>
<li><code>beresp</code>: The response which Varnish gets from the backend (if the object is not cacheable or not cached) is also available as <code>beresp</code> and can be evaluated.</li>
</ul>
<p>On a side note, Varnish can use ACLs to restrict the access to certain resources. The same ACLs can also be used to (declaratively) tell Varnish what to cache and what not. This technique is the sometimes used <a href="https://www.varnish-cache.org/docs/trunk/tutorial/purging.html#bans">&#8220;banning&#8221;</a>. Varnish can also (atomically) delete certain elements from the cache. This is accomplished via a <a href="https://www.varnish-cache.org/docs/trunk/tutorial/purging.html#http-purges">&#8220;purge&#8221; command</a> through the HTTP interface and should be restricted to IP addresses (which is the standard configuration together with a secret).</p>
<h2>Configuration Details and Tips</h2>
<p>Now that we have seen the basic VCL file and understood how a request is usually processed, let&#8217;s dive in even further and discuss the details and lessons learned.</p>
<h3>Improving the Hit rate with Header Normalization</h3>
<p>Varnish has to be told which HTTP request header fields it should use as a cache index. The index is organized as a hash, thus these selected header fields are often referred to as the <em>hash key</em>.</p>
<p>On a side note, you can select the header fields to be used as a hash by implementing the subroutine <code>vcl_hash</code>. If you don&#8217;t implement it, Varnish uses the full URL plus the <code>Host</code> request header field by default. In addition to the hash key computed in <code>vcl_hash</code>, the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html"><code>Vary</code></a> header field is always automatically added to the hash key . For further information on the hash key, see <a href="https://www.varnish-cache.org/docs/trunk/phk/varnish_does_not_hash.html">&#8220;What Varnish Does&#8221;</a> and <a href="http://stackoverflow.com/questions/6098914/varnish-and-http-header">&#8220;Varnish and http header&#8221; on Stackoverflow</a>.</p>
<p>To improve the cache hit-rate, it is crucial that you clean up the request header fields used for the hash key. Cleaning up means to change them to a common denominator (so-called <em>header normalization</em>). Another very good candidate is of course the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23"><code>Host</code> header field</a>, where a normalized version (like &#8220;www.sitename.com&#8221;) should be used even if &#8220;sitename.com&#8221; is sent in the request header. In addition to that, removing unnecessary headers is always a good idea.</p>
<p>Be careful that the application server does not send a <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44"><code>Vary</code> header field</a> for the user agent as this effectively means that there has to be a distinct copy for each user agent. There are so many different browsers (http://panopticlick.eff.org/) that this will basically make caching useless. See also <a href="http://mark.koli.ch/2010/09/understanding-the-http-vary-header-and-caching-proxies-squid-etc.html">&#8220;Understanding the HTTP Vary Header and Caching Proxies (Squid, etc.)&#8221;</a> and the<br />
<a href="https://www.varnish-cache.org/docs/trunk/tutorial/vary.html#tutorial-vary">Varnish Documentation on Vary</a>.</p>
<h3>Compression</h3>
<p>The <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3"><code>Accept-Encoding</code></a> request header field plays an important role: it can have different values like &#8220;plain&#8221;, &#8220;gzip&#8221; or &#8220;deflate&#8221;. Unfortunately, Internet Explorer prefers the deflate encoding while all other browsers favor gzip. Without intervention, this leads to different copies of the same content in the cache, one in deflate format, the other in gzip format.</p>
<p>Since the request header can be modified on the fly in the <code>vcl_recv<br />
</code> subroutine, we can effectively control that only one variant of the content is cached. In your VCL you can modify the request header field and use gzip exclusively if it is available (which is true for both Internet Explorer and others). This technique is presented in detail in the article <a href="https://www.varnish-cache.org/trac/wiki/VCLExampleNormalizeAcceptEncoding">&#8220;Normalize Accept-Encoding header&#8221;</a>. Since both browser families have a market share of roughly 50%, this simple change effectively doubles the hit rate.</p>
<p>Please note that beginning with Varnish 3.0, Varnish <a href="https://www.varnish-cache.org/docs/trunk/tutorial/compression.html">supports gzip natively</a> and can modify the <code>Accept-Encoding</code> field by itself, so the measures discussed in the previous paragraph can be skipped.</p>
<h3>Handling Cookies</h3>
<p>Cookies basically fall into different categories:</p>
<ul>
<li>Cookies <em>relevant</em> for caching: These should be kept and their values can be used as part of the hash key for cache index.</li>
<li>Cookies <em>irrelevant</em> for caching: These should be discarded and not considered by the cache.</li>
<li>Cookies <em>partially relevant</em> for caching: These should be modified and the irrelevant parts should be removed. The remaining cookie should then be used as part of the hash key for the cache index.</li>
<li>Session cookies: These cookies must be treated differently as they basically make caching impossible. If such cookies are detected, Varnish should not cache anything but work as a proxy only sending data from the backend server directly to the client.</li>
</ul>
<h3>Consistent Values for TTL and the Expires Field</h3>
<p>Varnish has to decide whether and how long to keep elements in the cache. As we have already learned, the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9"><code>Cache-Control</code></a> header field is utilized here. More specifically, the <code>s-maxage</code> directive part (or <code>maxage</code> as a fallback if <code>s-maxage</code> is not present) is examined to determine the specified maximum lifetime of a cacheable object. Of course, this only works as long as the cache is not full; in the latter case the <a href="http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used">LRU algorithm</a> is used.</p>
<p>If the web application was not designed with a web cache in mind, it might have conflicting values in <code>s-maxage</code> and the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21"><code>Expires</code></a> response header field. (See the HTTP specification for a <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3">discussion of the Expires versus the maxage field</a>.) This might lead to the bizarre situation that the cached content is sent by Varnish with an <code>Expires</code> header field value that lies in the past if <code>s-maxage</code> has a larger value than <code>Expires</code>.</p>
<p>This weird behavior can be fixed in several ways, e.g. by statically setting the Expires header in Varnish for each request to <code>s-maxage</code> seconds into the future during &#8220;vcl_fetch&#8221;. This will increase the cache efficiency <em>on the browser side</em> and lead to a more responsive website.</p>
<h3>File Descriptors</h3>
<p>In our first tests the solution performed well, but not excellently. But even more critical were the many dropped connections, i.e. requests from browsers that did not even reach Varnish.</p>
<p>The reason and the fix were easy &ndash; the number of file descriptors had to be increased. This is even more important in real-life situation where connections tend to be slow, as each TCP connection consumes one file descriptor. It does not hurt to allow 32768 descriptors for Varnish.</p>
<h3>Monitoring</h3>
<p>If you have setup a web cache solution with Varnish, it is important to measure its performance and especially monitor the hit rate of the cache. This turns out to be a bit complicated since the Varnish log files are not written to disk for performance reasons; instead of this, Varnish logs to a circular buffer residing in a shared memory segment. The circular buffer can be read at any time but past values will vanish forever. Since we wanted a monitoring solution that would also allow us to perform a post-mortem analysis in case of a problem, we configured the logging to write the <a href="https://www.varnish-cache.org/docs/trunk/reference/varnishd.html#storage-types">circular buffer to a persistent file</a>.</p>
<p>The most relevant tools for monitoring Varnish are:</p>
<ul>
<li><a href="https://www.varnish-cache.org/docs/trunk/reference/varnishlog.html">varnishlog</a>: This shows current requests from the logging ring buffer. Usually request phases will be shown in chronological order which mixes up the requests themselves. This can be fixed by using appropriate options though.</li>
<li><a href="https://www.varnish-cache.org/docs/trunk/reference/varnishtop.html">varnishtop</a>: This shows the CPU distribution inside the varnish process and can be used to optimize the configuration if too much time is spent in only a few functions.</li>
<li><a href="https://www.varnish-cache.org/docs/trunk/reference/varnishhist.html">varnishhist</a>: This is easily the most intuitive and graphical tool for analyzing Varnish. It shows a (text) histogram of the response time distribution and thus gives a good overview how the whole system is performing.</li>
<li><a href="https://www.varnish-cache.org/docs/trunk/reference/varnishstat.html">varnishstat</a>: This shows important statistical information about hit rates, total cache hits, accepted connections from clients etc.</li>
</ul>
<h2>Why Varnish is the best Caching Solution (for us)</h2>
<p>When we began to investigate ways to speed up the <a href="http://www.lidl.de">www.lidl.de</a> site, our first choice was to add the <a href="http://httpd.apache.org/docs/2.1/mod/mod_cache.html">Apache mod_cache</a> caching module to the Apache web server already in use. The first hurdle was the declarative configuration; it is well-suited for a web server but not perfect for modeling a caching behavior. After some fiddling around, it was working smoothly. But more serious problems arose from the fact that certain cookies had to be considered and others had to be neglected. It was impossible to find a viable solution, so the cookie was filtered out by the load balancer. Cache invalidation is performed lazily in Apache, i.e. an outdated resource is removed from the cache only after it is requested. Consequently, outdated resources which are not requested will stay in the cache forever and can only be expired externally. As all cached components are distributed in single files, this expiry is slow and the whole process complicated. For our situation, Apache was not a good solution (although it was in use for quite some time) and hit rates were also rather disappointing.</p>
<p>So our search continued. Via dedicated proxy servers, which are more suitable for large client-side installations like <a href="http://www.squid-cache.org/">Squid</a>, we finally encountered Varnish, an HTTP accelerator specially built for caching purposes on the server-side. Varnish is already used by many big websites like <a href="http://www.facebook.com/">Facebook</a>, <a href="http://search.twitter.com">Twitter (Search)</a>, <a href="http://www.hulu.com/">Hulu</a>.</p>
<p>Varnish is very flexible as it offers procedural configuration of all request stages in a C-like language (which is actually translated to C and compiled at start time to be as efficient as possible). This enables creative cookie handling and all kinds of other tricks which are usually needed in such a scenario. Varnish was specially designed to run on servers with a VM subsystem, so all cached objects live in a single memory-mapped file and can be accessed extremely fast. Varnish handles expiry automatically and correctly and is even much faster than Apache. So the decision was made to go with Varnish.</p>
<p>Other <em>HTTP accelerators</em> were also considered, but proved to be not feasible, like <a href="http://www.oracle.com/technetwork/middleware/ias/index-089317.html">Oracle Web Cache</a>, a commercial software package from Oracle Inc.; the problem here is that the cache cannot grow easily, and that the manipulation of requests and responses is limited. A hardware-based solution is e.g. <a href="http://www.f5.com/solutions/acceleration/web-acceleration/">F5&#8217;s BIG-IP WebAccelerator</a>.</p>
<h2>Further Optimizations</h2>
<p>Below is a discussion of measures that build on a Varnish setup and would speed-up the page delivery even further.</p>
<h3>Using a CDN to increase Scale, Reach &amp; Performance</h3>
<p>CDNs take care of delivering the static content while the dynamic content is served via the usual stack. They work in an inherently distributed way and have clever algorithms to select the topologically nearest server for each user. Static and dynamic content can be separated by using virtual webservers with different hostnames. The PDF article <a href="http://www.akamai.com/dl/technical_publications/GloballyDistributedContentDelivery.pdf">&#8220;Globally distributed content delivery&#8221;</a> from Akamai provides an excellent introduction.</p>
<p>Most CDNs offer an API for invalidating all or partial content and respect the expires header field sent from the originating servers. So the Varnish server can work as a central content repository and will be the upstream server for refreshing the CDN.</p>
<p>Almost all traffic would then be served by the CDN. This saves a lot of bandwidth on the Varnish server and the Gigabit interface will not so easily be overloaded. Moreover, as traffic costs in the CDN are negligible, money can be saved as the hosting company does not have to increase its own upstream link. For more information on how to build a CDN see <a href="http://blog.unixy.net/2010/07/how-to-build-your-own-cdn-using-bind-geoip-nginx-and-varnish/">&#8220;How to build your own CDN using BIND, GeoIP, Nginx, and Varnish&#8221;</a>.</p>
<h3>ESI: Caching Page Fragments with diverse TTL</h3>
<p>From a technical point of view, only pages which are requested by the GET method can be cached at all. This is due to the fact that &ndash; by definition &ndash; POST requests change state on the server which then necessarily needs to reach the application server.</p>
<p>However, the solution described above performs less &#8220;aggressive&#8221; caching since it just stops caching as soon as a session cookie is present. The effect is that stateful users never get cached pages and therefore might have to wait longer for the page to render completely. On the other hand, it does not make sense to cache pages for individual users since it is quite unlikely that the same user will come back to the exactly same page. Even if the user would come back, it would not be safe to assume that the page is still up-to-date (e.g. since the shopping cart might have changed in the meantime).</p>
<p>To speed things up again, a compromise needs to be found between caching invariant fragments of a page and producing personalized content on the fly for stateful users. Fortunately, Varnish offers the correct arsenal to perform exactly this decomposition by leveraging <a href="http://www.w3.org/TR/esi-lang">Edge Side Include (ESI)</a>.</p>
<p>When <a href="https://www.varnish-cache.org/docs/trunk/tutorial/esi.html">Varnish processes ESI tags</a>, the page assembly (out of fragments) is done by Varnish. As these fragments are separate web resources (requested through GET or POST) they can be assigned their own cache settings and handling information. For example, a cache time-to-live (TTL) of several days could be appropriate for the template, but a fragment containing a frequently-changing story or ad may require a much lower TTL. Some fragments may have to be marked uncacheable.</p>
<p>It must be carefully analyzed how the decomposition of the page might look like as getting it right is essential to achieve a high hit-rate and a low overhead. In case of an online shop, the page could e.g. consist of different (graphical) fragments:</p>
<div id="attachment_1071" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-varnish" href="http://blog.mgm-tp.com/wp-content/uploads/2012/01/ESI-Page-Template-and-Fragements.png"><img class="size-full wp-image-1071" title="ESI Page Template and Fragements" src="http://blog.mgm-tp.com/wp-content/uploads/2012/01/ESI-Page-Template-and-Fragements.png" alt="" width="480" height="365" /></a><p class="wp-caption-text">Decomposition of a typical page into user-specific, dynamic (red) and static (blue) fragments.</p></div>
<p>The shopping cart and the login details would then be transferred directly from the application server via an appropriate ESI fragment, whereas the rest of the page is identical for all users and can be stored in the cache. To minimize the number of requests from Varnish to the application server, both fragments can be transferred in one part and integrated in different locations on the page on the client side or via CSS.</p>
<p>Compared to the performance numbers <a href="#statefullperformance">above</a>, the stateful performance is much higher when using ESI. Rates of about 500 stateful requests per second are now easily possible.</p>
<h3>Memcached: Caching Session-specific Page-Fragments</h3>
<p>If you examine the page diagram above, you might notice that even though the shopping cart and login details are user-specific elements on the page, they are not very dynamic, i.e. they change infrequently.</p>
<p>This leads to an opportunity for further optimization: the user-specific fragments can also be stored, but must of course be associated with the session of the corresponding user. As the information is not persistent (as it becomes invalid with an invalidated session) it can be stored in memory. <a href="http://memcached.org/">Memcached</a> is just made for this scenario and therefore a perfect fit, see e.g. the article <a href="http://blog.preinheimer.com/index.php?/archives/334-Storing-Sessions-in-Memcache-how-everything-behaves.html">&#8220;Storing Sessions in Memcache&#8221;</a>.</p>
<p>Any change in the shopping cart or login details will trigger a regeneration of the HTML fragments which will then be stored in memcached. (This can be done in the same POST request by the application server.) Varnish will include the fragment from Memcached (either via direct integration, via Apache or via Nginx). A SessionListener within Tomcat can take care of removing stale sessions from Memcached.</p>
<p>Memcached is extremely fast. Even for stateful users this leads to a performance of well above 5,000 GET requests/s. POST requests are a different story as they still have to be handled by the application server. As they perform only internal tasks and write both to the database and Memcached, a rate of 500 requests/s is nonetheless realistic.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2012/01/varnish-web-cache/">Ultra-Performant Dynamic Websites with Varnish</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=20" title="View articles by Dr. Christian Winkler">Dr. Christian Winkler</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2012/01/varnish-web-cache/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Practical Customer Participation in JIRA Workflows - JIRA beyond Bug Tracking, Part 2</title>
		<link>http://blog.mgm-tp.com/2011/12/jira-beyond-bug-tracking-part2/</link>
		<comments>http://blog.mgm-tp.com/2011/12/jira-beyond-bug-tracking-part2/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 15:43:19 +0000</pubDate>
		<dc:creator>Alexander Weiss</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Agile]]></category>
		<category><![CDATA[JIRA]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[QA]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1046</guid>
		<description><![CDATA[This second part of our blog series continues with the topic of direct involvement of customers and gives some practical examples of when, where and how to introduce and include your customer into JIRA. We will then discuss two of mgm&#8217;s proven real-world workflows and use them as case studies about appropriate modes for successful [...]]]></description>
			<content:encoded><![CDATA[<p>This second part of our blog series continues with the topic of direct involvement of customers and gives some practical examples of when, where and how to introduce and include your customer into <a href="http://www.atlassian.com/software/jira/overview">JIRA</a>. We will then discuss two of mgm&#8217;s proven real-world workflows and use them as case studies about appropriate modes for successful customer participation. You will also learn about our recommended ways of keeping the complexity of huge JIRA projects from the customer.</p>
<p><span id="more-1046"></span></p>
<p>Let&#8217;s begin with how JIRA can be utilized during the initial project phases. The main steps here are to prepare a more detailed business modeling and to complete the technical and business concepts. These steps are tightly connected with the compilation of the requirements and the requirement management phase. The requirement analysts of the project and the responsible project manager will interview all the necessary stakeholders to get a complete picture of the required solution that the business modelers and architects need for their work.</p>
<h2>Capturing Requirements as JIRA Tickets</h2>
<p>During this requirement management phase all the collected functional and non-functional requirements will already be stored as JIRA tickets to control their content and impact and to prioritize them with respect to the solution and its implementation order (planning process). And exactly this phase can be used to create a first contact point for customers with JIRA: Involvement in the compilation of new requirements and detailing of already filed items.</p>
<p>But as the customer is not yet very familiar with JIRA in this very early stage, we typically choose to create all the new requirements tickets ourselves instead of the customer. This is not just to unburden the customer: we also want to avoid the additional work of correcting imprecisely formulated requirements. </p>
<p>The descriptions of requirement tickets should always be <em>unambiguous and complete</em>. Thus, the responsibility to verbalize requirements usually remains with us. But the customer can be involved at any time to contribute details and he can (and should) be an active part during the elaboration phase and deliver his input and expertise through comments to the respective tickets. </p>
<p>Another very important point in requirement management is the used <em>terminology</em>. It is very important to always talk (and to write) the customers&#8217; domain specific language. Use only terms that can be understood by the customer! We find it very helpful to maintain a glossary of all domain specific words and terms together with the customer. </p>
<h2>Customer Involvement in the JIRA Requirement Process</h2>
<p>In addition to detailing the content of requirements and ensuring their correctness, the customer can take over two other important tasks in the requirement process:</p>
<ul>
<li><strong>Assignment:</strong> Once a requirement is elaborated, the effort estimated and it is ready for realization, it has to be assigned to the supplier (us) for release planning and implementation.</li>
<li><strong>Approval:</strong> When the requirement is implemented and approved by the development team and our internal quality assurance, the realized requirement is ready for approval by the customer.</li>
</ul>
<p>Both of these steps (assignment and approval) can be realized as workflow steps for the issue type &#8220;Requirement&#8221;. Dependent on the character of the project and customer, mgm runs projects with different levels of integration. </p>
<h2>Proven Workflow Implementations</h2>
<p>Let&#8217;s take a look at two requirement workflows that we designed for our projects, each with a different integration level of the mentioned steps.</p>
<div id="attachment_1047" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-jira-beyond-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-workflow-1.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-workflow-1-480x313.png" alt="" title="blog2-workflow-1" width="480" height="313" class="size-large wp-image-1047" /></a><p class="wp-caption-text">Workflow 'Requirement' with dedicated customer steps for assignment and approval (Variant 1).</p></div>
<p>This first workflow (shown above) contains a dedicated step to alert the customer that the requirement is &#8220;READY TO ASSIGN&#8221;. On his dashboard the customer has a portlet listing all these &#8216;marked&#8217; requirements as a working queue for assignments!</p>
<p>It is not strictly necessary that the customer himself executes the transition &#8220;assign&#8221; in JIRA. We have projects where the requirement assignment is officially sent via mail or e-mail by the customer. In these cases, our requirement manager performs the transition on behalf of the customer. But we also have projects where the customer himself pushes the &#8220;assign&#8221; button in JIRA.</p>
<p>Following the implementation part with the steps &#8220;IN PROGRESS&#8221;, &#8220;RESOLVED&#8221; and &#8220;CODE-REVIEWED&#8221;, the requirement workflow contains the steps &#8220;READY FOR TESTING&#8221; and &#8220;VERIFIED&#8221;. During the software approval stage the customer can use these dedicated steps to manage his testing and approval tasks. Once again, the needed filters are integrated into the customer dashboard. The development team will explicitly hand over all implementations that passed internal quality assurance to the customer. The approval transition &#8220;VERIFY ISSUE&#8221; will then be executed by the customer himself. Usually we convince the customer to do this directly in JIRA. </p>
<p>In this first example, the step &#8220;REVISION&#8221; of the requirement (the elaboration phase) is separated from the step &#8220;ESTIMATE&#8221;, thus the customer will keep the control of ordering the effort estimations. </p>
<p>Now let us consider a second workflow example as depicted below: </p>
<div id="attachment_1048" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-jira-beyond-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-workflow-2.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-workflow-2-480x311.png" alt="" title="blog2-workflow-2" width="480" height="311" class="size-large wp-image-1048" /></a><p class="wp-caption-text">Workflow 'Requirement' with dedicated customer steps for assignment and approval (Variant 2).</p></div>
<p>The interesting parts here are the initial step &#8220;DRAFT&#8221;, the step &#8220;ANALYSIS&#8221; (before &#8220;REVISION&#8221;) and the approval step &#8220;SIGNED OFF&#8221;. The &#8220;DRAFT&#8221; step is especially useful if customers create requirement tickets by themselves. The distinct step &#8220;ANALYSIS&#8221; is an independent elaboration and phrasing phase for the customer (typically when they have a dedicated operations department) whereas the step &#8220;REVISION&#8221; is an elaboration phase for the project team (development). At the end of the whole implementation process the customer can use the step &#8220;SIGNED OFF&#8221; for the approval process.</p>
<h2>Overview: Where and How to Involve the Customer</h2>
<p>Requirement management is an obvious area for direct participation of customers, but more traditional areas like &#8220;bug tracking&#8221; and &#8220;change management&#8221; are also potential candidates for customer involvement.</p>
<p>Below is a collection of areas where we constantly try to convince our customers to participate directly within our established JIRA processes:</p>
<ul>
<li><strong>Requirement management</strong>
<ul>
<li>Input of new requirement tickets </li>
<li>Direct participation in the elaboration phase (optionally with additional workflow steps)</li>
<li>Assignment for realization (workflow steps)</li>
<li>Testing and approval of implemented requirements (workflow steps)</li>
</ul>
</li>
<li><strong>Change management</strong>
<ul>
<li>Input of new change request tickets </li>
<li>Direct participation in the elaboration phase (optionally with own workflow steps)</li>
<li>Assignment for realization (workflow steps)</li>
<li>Testing and approval of implemented change requests (workflow steps)</li>
</ul>
</li>
<li><strong>Bug tracking</strong>
<ul>
<li>Input of new bug tickets </li>
<li>Testing and approval of fixed bugs (workflow steps)</li>
</ul>
</li>
<li><strong>Software approval process</strong>
<ul>
<li>Execution of dedicated testing tickets </li>
<li>Approval of all individual development tickets (requirements, change requests and bugs) (workflow steps)</li>
<li>Issue the final software (or release) acceptance (workflow steps)</li>
</ul>
</li>
</ul>
<p>The &#8220;Change Management&#8221; process has to be aligned with the customers&#8217; organization structure and change process. We experienced that especially change management is in most cases an already well defined process at the customer side. However, for &#8220;Change Management&#8221; we can typically apply the same workflow as for &#8220;Requirements&#8221;.</p>
<p>&#8220;Bug tracking&#8221; nowadays follows standard workflows. But bug tickets are also development tasks, i.e. changes to the product/source code. Thus, we extended the bug workflows with steps representing the approval and quality assurance parts (&#8220;READY FOR TESTING&#8221;, &#8220;VERIFIED&#8221; and &#8220;SIGNED OFF&#8221;) as well. This applies to all issue types leading to development activities where requirements and change requests just represent the controlling/management part and not the realization part, e.g. Requirements, Change Requests, Bugs and dedicated implementation tasks.</p>
<h2>Dedicated JIRA Projects for the Customer and Development</h2>
<p>Sometimes the periodic amount of JIRA tickets (e.g. needed for a software release) exceeds the &#8220;pain&#8221; threshold (typically &gt; 300 per release) and the customer is beginning to loose the project overview and feels lost in the overwhelming amount of requirements, change request, bugs, QA tasks and tasks in general. Our recommendation for these cases is to split it up and create 2 dedicated JIRA projects:</p>
<ul>
<li><strong>Customer facing project:</strong> Used for all operational bugs and incidents (source: customer and end-user), comprises the complete requirement and change management and the software approval process. </li>
<li><strong>Development facing project:</strong> Used for all bugs during the development phase, all implementation tasks derived from requirements and change requests, development internal quality assurance tasks, all tasks that are related to the project in general, etc.</li>
</ul>
<div id="attachment_1049" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-jira-beyond-part2" href="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-project-splitting.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/12/blog2-project-splitting-480x226.png" alt="" title="blog2-project-splitting" width="480" height="226" class="size-large wp-image-1049" /></a><p class="wp-caption-text">Splitting up a JIRA project into a dedicated JIRA project for the customer and another one for the development.</p></div>
<p>The ticket handling is very easy: Assigned requirements within the customer project are just cloned and moved to the development project. The original and the cloned requirements are then automatically linked by JIRA. In the development project you can then create the appropriate division into implementation tasks needed for your team and component diversity. The status update of the customer&#8217;s source tickets (linked tickets) has to be done manually.</p>
<p>The development project is typically only visible for the development team and not for the customer. When we did this in the past, the customer&#8217;s initial doubts that we just want to hide information from him could always be resolved by simply opening the development project to him and showing him the hundreds of (open) tickets. Normally he would loose interest in this project very quickly because he is not getting any additional benefits out of it. On the contrary, he will be getting rather confused by the amount of information.</p>
<p>We&#8217;ve made really good experiences with the concept of 2 dedicated JIRA projects for the customer and the development, respectively. But there is a <em>second way</em> to remove redundant information from an overstrained customer. You can use JIRA&#8217;s security level concept. This way you can keep all tickets in one JIRA project. But you will then have to cope with the maintenance of security settings at ticket level due to the fact that security levels have to be set manually for each required ticket. To set a default security level is counterproductive because then every customer created ticket will be automatically hidden from the customer directly after creation.</p>
<h2>Conclusion</h2>
<p>In our experience the advantages obtained through direct participation of customers in JIRA exceed the disadvantages of for example the increased efforts necessary for JIRA configuration. A well informed customer who is directly involved in his project feels much more comfortable even if something goes wrong or the progress of the project gets stuck. <em>Transparency is the magical keyword.</em></p>
<p>But you have to accept that every project has its own characteristics. It will be mainly influenced by the customer&#8217;s character, organization and stakeholders. You have to find the most appropriate and fitting level for a customer&#8217;s direct participation. Try to get the most accurate picture of the stakeholders you have to work with and then decide how they could fit into the process. And keep in mind that the process can always be adapted afterwards in order to achieve the greatest efficiency in project progress and customer satisfaction.</p>
<p>In keeping with agile practice apply <em>continuous improvement</em> to your project management processes: Change something &ndash; find out how it went &ndash; learn from it &ndash; change something again!</p>
<h2>Summary of the Key Success Factors</h2>
<ul>
<li>Let customers create tickets directly in JIRA: requirements, change requests, support inquiries, bugs.</li>
<li>Incorporate customers&#8217; duties and responsibilities (e.g. assignments, approvals) directly into the issues workflow as dedicated steps.</li>
<li>Prepare specific filters and dashboards for the customer
<ul>
<li>for his duties (detailing, assignments, approval)</li>
<li>for overviews</li>
<li>for status</li>
</ul>
</li>
<li>Split projects with an overwhelming amount of implementation tasks into two separate instances &#8211; one for the customer and one for development.</li>
<li>Give customers a short JIRA training covering all standard actions as well as how to use and adapt dashboards and how to interpret the data and analysis reports.</li>
<li>Tailor every project set-up individually and don&#8217;t try to compress it into the same template.</li>
</ul>
<p>If you keep all this in mind, you have a good chance that JIRA will become customer&#8217;s &#8217;sweetheart&#8217;!</p>
<p>There are tons of other interesting topics around JIRA. I will continue to provide you with further ideas, suggestions and mgm experiences. And of course if you have additional questions, ideas, and suggestions around JIRA I would really appreciate any comments and input from you.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/12/jira-beyond-bug-tracking-part2/">Practical Customer Participation in JIRA Workflows - JIRA beyond Bug Tracking, Part 2</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=21" title="View articles by Alexander Weiss">Alexander Weiss</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/12/jira-beyond-bug-tracking-part2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<series:name><![CDATA[JIRA beyond Bug Tracking]]></series:name>
	</item>
		<item>
		<title>Successfully Embracing JIRA in the whole Project Lifecycle - JIRA beyond Bug Tracking, Part 1</title>
		<link>http://blog.mgm-tp.com/2011/11/jira-beyond-bug-tracking-part1/</link>
		<comments>http://blog.mgm-tp.com/2011/11/jira-beyond-bug-tracking-part1/#comments</comments>
		<pubDate>Fri, 18 Nov 2011 10:49:22 +0000</pubDate>
		<dc:creator>Alexander Weiss</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Agile]]></category>
		<category><![CDATA[JIRA]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[QA]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1023</guid>
		<description><![CDATA[This two-part series shows how JIRA can be used for other things than just bug tracking and presents mgm&#8217;s experiences with embracing JIRA in nearly all parts of traditional and agile project lifecycles, resulting in a number of customized, optimized JIRA workflows and processes. In this first part, I will give you an overview of [...]]]></description>
			<content:encoded><![CDATA[<p>This two-part series shows how <a href="http://www.atlassian.com/software/jira/overview">JIRA</a> can be used for other things than just bug tracking and presents mgm&#8217;s experiences with embracing JIRA in nearly all parts of traditional and agile project lifecycles, resulting in a number of customized, optimized JIRA workflows and processes. In this first part, I will give you an overview of challenges we faced over the years and how we adapted JIRA to meet them. You will get a taste of the vast variety of uses we have found for JIRA and understand why we decided to use only one tool instead of many.</p>
<p><span id="more-1023"></span></p>
<p>mgm has been using <a href="http://www.atlassian.com/software/jira/overview">JIRA</a> for project management since 2005 for all its projects, ranging from small projects (&lt; 5 members), medium (&lt; 30 members) ones up to really large projects with up to 500 members and more. Part of my job is setting up JIRA and designing customer-tailored, optimized JIRA workflows. We&#8217;re setting up JIRA externally for customers, usually as part of software development projects, and also internally, e.g. for marketing projects, front-office and administrative tasks.</p>
<p>During the past 6 years, our initially defined processes – especially the ones concerning project management and requirement management – passed through several iterations of refinement. We learned a lot and embraced our lessons learned into improved process definitions. And we introduced more and more agile methods into projects fitting to this methodology. The latest step in this direction was to extend JIRA with Atlassians agile plug-in <a href="http://www.atlassian.com/software/greenhopper">GreenHopper</a> which enables a gradual introduction of agile tools.</p>
<div id="attachment_1024" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-jira-beyond-part1" href="http://blog.mgm-tp.com/wp-content/uploads/2011/11/greenhopper.png"><img class="size-large wp-image-1024" src="http://blog.mgm-tp.com/wp-content/uploads/2011/11/greenhopper-480x324.png" alt="" width="480" height="324" /></a><p class="wp-caption-text">Agile task tracking using the GreenHopper extension.</p></div>
<p>We make sure that every project gets its own tailored project set-up (set of issue types, workflows, custom fields, GreenHopper configuration) optimized for their individual project needs. And although we have several standard configurations sets for new software projects inherited from our best practises, we use them only for the initial project set-up. Once the basic functionality is implemented we try to reflect the distinct project requirements (character of project, product and customer) by creating an individual project configuration instead of trying to condense them into a one-size-fits-all template.</p>
<h2>About our largest JIRA Installation with 700+ People</h2>
<p>One of mgm&#8217;s biggest pool for JIRA project experiences is the set-up and ongoing maintenance of a JIRA instance that supports a huge multi-project environment with more than 700 active members. The set-up for 28 managed software products is split into 60 JIRA projects with approximately 80 workflows definitions and almost 300 custom fields.</p>
<p>The projects form a network of vertical layers (product software development) and horizontal layers (cross-section functions like quality assurance, expertise, strategic product management, support, operation etc). There is also a strong interdependency between the software projects with respect to results and releases that we had to take into account. One of the largest of these software projects generates an average of 2.500 JIRA tickets for every main release (i.e. every 4 months). So we have a lot of different aspects to consider.</p>
<h2>Software Project Process Maturity at mgm</h2>
<p>As agile project management patterns become more and more prevalent within software projects, we adapted our projects&#8217; lifecycle with the following steps: The typical project starts with more or less agile iterations in which requirements are compiled and analyzed, followed by the design and modelling phase. After several development cycles with accompanying quality assurance activities and a final software approval process, the software moves into operation. Product maintenance and enhancement projects typically follow release plans and put all necessary cycle steps into more or less short time boxes.</p>
<p>Today, mgm has reached a process maturity that allows us to control all aspects of the project lifecycle through JIRA processes:</p>
<ul>
<li>Requirement management</li>
<li>Change management</li>
<li>Bug tracking</li>
<li>Task management</li>
<li>Release management</li>
<li>Development cycles and sprint planning</li>
<li>Software approval process</li>
<li>Quality assurance and test management</li>
</ul>
<div id="attachment_1029" class="wp-caption alignnone" style="width: 489px"><a rel="lightbox-jira-beyond-part1" href="http://blog.mgm-tp.com/wp-content/uploads/2011/11/project_lifecycle_with_jira.png"><img class="size-large wp-image-1029" src="http://blog.mgm-tp.com/wp-content/uploads/2011/11/project_lifecycle_with_jira-479x824.png" alt="" width="479" height="824" /></a><p class="wp-caption-text">Software Project lifecycle and its reflection within JIRA.</p></div>
<p>Of these aspects, only the requirement engineering part including business modeling (for example using UML tools etc.) and the design of business and technical concepts is typically done outside of JIRA, but even they can be controlled by JIRA tickets with adequate dedicated workflows.</p>
<p>But why do we spend so much effort to squeeze as many lifecycle steps into JIRA processes as possible? The answer is that we have learned our lessons from experiences: we realized there are many advantages in using only one tool instead of a bunch of different ones, thus reducing system discontinuity.</p>
<p>In summary, one tool for the whole project lifecycle has these advantages:</p>
<ul>
<li>Creates more expertise and familiarity.</li>
<li>Avoids disruptions of process and media during transitions to the next lifecycle step.</li>
<li>Enables fast feedback between
<ul>
<li>Development &amp; Project management,</li>
<li>Customer &amp; Project management,</li>
<li>Development &amp; Quality Assurance.</li>
</ul>
</li>
<li>Facilitates a release oriented approach for development and planning.</li>
<li>Allows cross-processing reports and progress control.</li>
<li>Helps when introducing agile tools and methods at the appropriate level of need.</li>
</ul>
<h2>Direct Customer Involvement through JIRA</h2>
<p>In addition to the benefits to the project as mentioned above, there is one other advantage of using JIRA as the main tool for all phases of the project:  it creates a point of access for direct participation of the customer. But why involve the customer at all?</p>
<p>Quite often project managers and suppliers keep away insight into the detailed project work and progress from the customer and just provide him with status reports on a regular basis instead. But our experiences showed that in software maintenance and enhancement projects – especially in large projects – there is less overhead and more benefits for everybody involved if customers aren&#8217;t locked out of the JIRA processes.</p>
<p>For example, if the project lifecycle is well defined and well known and all participants know their parts and responsibilities within JIRA, the process runs very smooth. And as all communications can be conducted within this one tool (JIRA) there will be no more bypassed communications, activities and results that have to be painstakingl  translated into the necessary JIRA processes and information snippets (and no information should be kept out of JIRA anyway!). Thus, one more cause of problems is be eliminated!</p>
<h2>Avoid Creeping Requirements through Process Transparency enabled by JIRA</h2>
<p>Furthermore, the customer gets a completely different attachment to &#8216;his&#8217; project if he is able to see all ongoing activities and if he can generate an actual project state by himself. We have realized that the customer feels much more satisfied and more responsible for his assigned lifecycle parts once he experiences the whole power of process transparency. He also becomes more sensitive for interference factors and usually starts to avoid &#8216;creeping requirements&#8217; once he realizes the consequences of adding just another &#8220;simple&#8221; request (but sometimes the exception proves the rules).</p>
<p>Having only one tool to manage and view all processes has the advantage that the customer can concentrate on acquiring only one set of necessary skills. In addition we support the customers in getting all the necessary information by preparing <em>views</em> for his tasks, responsibilities and status reports by creating all needed JIRA filters and dashboards for him. These will then be shared with everyone who needs them on the customer side (or within the project team) – so one set of preparations can be used by any amount of users to get the same results. Another great help are <em>filter subscriptions</em> especially to track and manage time critical process steps. These tend to make the customer&#8217;s start in JIRA very smooth and comfortable.</p>
<p>Nevertheless, the customer has to be willing to take an active part in JIRA. If the customer isn&#8217;t very technically skilled and has problems with or an aversion to use software tools at all, we can only recommend to stop your efforts to make him an active part in the JIRA process. Compulsion to use JIRA will have the exact opposite effect and creates feelings of unease and dissatisfaction. In such a case stop your efforts to convince him to use JIRA and just claim higher fees for your larger project management efforts.</p>
<h2>To be continued&#8230;</h2>
<p>In the next part of this blog series we will continue with the topic of direct involvement of customers as introduced above and give you some practical examples of when, where and how to introduce and include your customer into JIRA. We will show you some proven real-world workflows that include customer interaction and cover appropriate modes of customer participation.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/11/jira-beyond-bug-tracking-part1/">Successfully Embracing JIRA in the whole Project Lifecycle - JIRA beyond Bug Tracking, Part 1</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=21" title="View articles by Alexander Weiss">Alexander Weiss</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/11/jira-beyond-bug-tracking-part1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<series:name><![CDATA[JIRA beyond Bug Tracking]]></series:name>
	</item>
		<item>
		<title>Must-Know URL Hash Techniques for AJAX Applications</title>
		<link>http://blog.mgm-tp.com/2011/10/must-know-url-hashtechniques-for-ajax-applications/</link>
		<comments>http://blog.mgm-tp.com/2011/10/must-know-url-hashtechniques-for-ajax-applications/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 07:53:09 +0000</pubDate>
		<dc:creator>Dr. Christian Winkler</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[AJAX]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[jQuery]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[URL]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=1006</guid>
		<description><![CDATA[By coding the page state into the URL, even single-page web applications can  support deep bookmarks and the browser&#8217;s back button. The most accepted approach is to utilize the location hash, i.e. the local part of the URL. This article explains this technique and what pitfalls you should be aware of, based on my [...]]]></description>
			<content:encoded><![CDATA[<p>By coding the page state into the URL, even single-page web applications can  support deep bookmarks and the browser&#8217;s back button. The most accepted approach is to utilize the location hash, i.e. the local part of the URL. This article explains this technique and what pitfalls you should be aware of, based on my team&#8217;s experience from building an AJAX interface for Solr. You will also learn about the HTML5 History API, which is a second, more modern technique.</p>
<p><span id="more-1006"></span></p>
<p>Modern web applications often load data via AJAX without leaving the original page; some sites even have only one single page left. The idea behind that was <a href="http://www.zdnet.com/blog/btl/will-the-real-ajax-pioneer-please-stand-up/1201">pioneered</a> by Outlook Webclient and, more prominently for Web users, <a href="http://mail.google.com/">Google Mail</a> in order to create a desktop-like experience. Other benefits include less bandwidth used by the client, faster response time and eventually a more interactive application feeling. </p>
<p>After the first enthusiasm about all these new applications, users found the first <a href="http://isolani.co.uk/blog/javascript/FixingTheBackButtonThatAjaxBroke">serious drawbacks hitting them hard</a>. The beloved back button was not working anymore, bookmarks always led to the first and not the current page and links could not be forwarded.</p>
<p>All these problems have been known for a long time. New techniques now offer very elegant solutions for this problem and I will discuss them in this article.</p>
<h2>From Stateless Web Sites to Single-Page Web Applications</h2>
<p><em>In the beginning, the web was stateless.</em> This reflects the stateless nature of the <a href="http://tools.ietf.org/html/rfc1945">HTTP protocol</a>. Most web sites were purely informational and content-driven like e.g. newspapers and similar. New content was requested via navigating URLs and a whole new page with a new URL was shown. </p>
<p>When the first applications like online shops were created, a state or session would have proved immensely useful; otherwise all URLs had to be dynamic. The web was still young and dynamic, so <a href="http://www.w3.org/Protocols/rfc2109/rfc2109">cookies</a> were invented for this and have become immensely popular since then. The request-response cycle was unaffected; the HTTP requests were still stateless. </p>
<p>The situation changed again when more and more Javascript was used. Javascript now had implications in rendering the local page which were only known to the browser. By leveraging the <a href="http://www.w3.org/TR/XMLHttpRequest/">XMLHttpRequest object</a>, it soon became possible to get information from the server and incorporate that into the document object model (DOM) in the browser (<a href="http://www.adaptivepath.com/ideas/ajax-new-approach-web-applications">AJAX</a>). A client-side action can trigger a request to the server which brought the displayed URL totally out-of-sync with the state of the web application. This is still the case in most web applications today!</p>
<h2>Coding the Application State into the URL</h2>
<p>Some clever developers have found out that changes in the URL (also via Javascript) will be interpreted as new pages by the browser. This immediately enables the back button. And if done correctly, the URL once again uniquely designates the current page even if that has been initially created by a single request but modified by a multitude of subsequent AJAX requests.</p>
<p>An URL that is in-sync with the web page state does not come for free though. In order to work correctly, some things have to be taken into account:</p>
<ol>
<li>The URL must be changeable without reloading the page.</li>
<li>Each and every action that modifies the current page must trigger a URL change.</li>
<li>When the AJAX-changed page is opened using the URL (from a bookmark), the URL must be interpreted to &#8220;replay&#8221; the changes which are recorded in it.</li>
<li>When the back button is hit, Javascript must detect this.</li>
</ol>
<p>For an example, consider the scenario in the figure below, where a bookmarked page &#8220;/url#2&#8243; is called from a browser. Only the part before the &#8220;#&#8221; (i.e. &#8220;/url&#8221;) is relevant for the server, so this URL is requested. The Javascript code in the browser needs to interpret the local part (i.e. &#8220;2&#8243;) by itself and determines which actions to take. In our case, this triggers a subsequent AJAX request. Note that the URL in the browser then does not have to be changed again.</p>
<div id="attachment_1008" class="wp-caption alignnone" style="width: 377px"><a rel="lightbox-url-hash" href="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-1.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-1.png" alt="" title="Hash in URL Scenario 1" width="367" height="294" class="size-full wp-image-1008" /></a><p class="wp-caption-text">Reconstructing the page state from a bookmark.</p></div>
<p>Another scenario is shown in the following figure. Here, a user hits the back button. The <code>hashchange</code> event fires and the Javascript code in the browser must decide how to get the appropriate content, in this case also via an AJAX request like in the previous figure.</p>
<div id="attachment_1009" class="wp-caption alignnone" style="width: 379px"><a rel="lightbox-url-hash" href="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-2.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-2.png" alt="" title="Hash in URL Scenario 2" width="369" height="294" class="size-full wp-image-1009" /></a><p class="wp-caption-text">Reconstructing the previous page state when the user presses the back button.</p></div>
<p><a href="http://ajaxpatterns.org/Unique_URLs">Unique URLs (AJAX Patterns)</a> has some great details about this technique.</p>
<h2>Changing the Local Part of the URL</h2>
<p>Although it might sound easy, already the first point in the list above turns out to be complicated. Of course, Javascript can modify the URL in the browser using <code>location.href</code>, but the browser will then load the page from the modified URL. This is apparently not the desired result.</p>
<p>Clever programmers soon found a solution by only modifying the <em>local part</em> of the URL which then does not trigger a page reload. The local part is defined to be everything after the first hash &#8220;#&#8221; in the URL. You can directly modify this local part by using the <code>location.hash</code> property. The article <a href="http://www.elated.com/articles/using-javascripts-location-object-to-work-with-urls/">&#8220;Using Javascript&#8217;s Location Object to Work with URLs&#8221;</a> is a great introduction.</p>
<p>This technique has become very popular. A lot of sites are using this, the most famous one is probably Twitter. If you open the Tweets page of somebody, you will see that the URL looks like &#8220;http://twitter.com/#!/myachinghead&#8221;. Effectively, this means that there is only a single web page (apart from about etc.) and everything else is loaded via AJAX.</p>
<p>Links on the page to other &#8220;pages&#8221; are all intercepted via Javascript and trigger changes in the local part of the URL. For example with Twitter, see what happens if you click on other users but also on the tabs on the tweets page. The URL will change in a well-defined way which is suitable for bookmarks and interpreting all actions which have been performed.</p>
<p>By using this technique, bookmarking is easy. The URL structure of Twitter is fortunately quite simple: the local part is interpreted as the username and the corresponding tweets are loaded asynchronously via AJAX. The same is true for subpages like &#8220;Favorites&#8221;, &#8220;Following&#8221; etc.</p>
<h2>Detecting Changes in the Local Part of a URL</h2>
<p>As the technique has become quite popular, an API for changes on the local part of the URL has been designed and is called <a href="https://developer.mozilla.org/en/DOM/window.onhashchange">onHashChange</a>. You can easily catch this event by using something like:</p>
<pre>window.onhashchange = function () { ...  }</pre>
<p>Unfortunately, older browsers have no easy interface for listening to hash changes. An alternative is to use <code>setInterval</code> in order to check periodically for a changed hash.</p>
<p>Frameworks like <a href="http://jquery.com/">jQuery</a> have plugins for hash, see e.g. <a href="http://benalman.com/projects/jquery-bbq-plugin/">jQuery BBQ</a>. These plugins degrade gracefully to older browsers and thus hide the messing around with different APIs from your code etc.</p>
<p>The following figures shows the workflow when changing hashes: clicking an active element must only change the hash (i.e. by serializing the current application state there). Immediately after that, a <code>hashchange</code> event fires and will be detected by the browser. The local part of the URL must then be interpreted and in this case generates an AJAX request to the server. Note that if the element is clicked again and the serialized data is unchanged, no event will be triggered (as the hash has not changed) and the application has been accidentally optimized.</p>
<div id="attachment_1010" class="wp-caption alignnone" style="width: 379px"><a rel="lightbox-url-hash" href="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-3.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Hash-in-URL-Scenario-3.png" alt="" title="Hash in URL Scenario 3" width="369" height="296" class="size-full wp-image-1010" /></a><p class="wp-caption-text">Workflow when changing URL hashes.</p></div>
<p>Twitter is also an excellent example of what problems can arise using this solution:</p>
<ul>
<li>Search for the Twitter page of somebody using Google (or Bing, it doesn&#8217;t matter). Click the result and see how the URL changes (the &#8220;#!/&#8221; is inserted). The reason is that search engines will never index pages which only differ in the local part of the URL. For search engines to work properly, some tricks like <a href="http://en.wikipedia.org/wiki/Sitemaps">sitemap.xml</a>, <a href="http://en.wikipedia.org/wiki/Rewrite_engine">rewrite rules</a> etc. have to be used. See our <a href="http://blog.mgm-tp.com/series/the-art-of-search-engine-optimization/">blog series about SEO</a> to get more information. Also, Google provides an interesting <a href="http://code.google.com/intl/de/web/ajaxcrawling/docs/getting-started.html">document &#8220;Making AJAX Applications Crawlable&#8221;</a>.</li>
<li>Open the Tweets page of a user, click &#8220;about&#8221; (the company link) and then use the back button. The result is not what you expect! The back button works fine when jumping between tweets of users, though.</li>
</ul>
<p>Using a real URL and not only one differing in the local part, offers many exciting possibilities.</p>
<h2>Example 1: AJAX-ifying a conventional page-based website </h2>
<p>For example, if you convert a normal page-based (i.e. request-response based) website into an AJAX-based one, you can keep your current URL structure but generate the URLs on the client side by using <code>pushState</code>. The associated AJAX call will create a request to the server. You can (and should!) use the same URL as for the whole page and detect on the server side that this is an AJAX request and should render as e.g. JSON (which can be pushed as data in <code>pushState</code>) whereas normal requests will render as complete HTML. </p>
<p>Following this approach, your server-side logic needs no modification, only the rendering must be changed. Bookmarking works still in the usual way as the bookmarked page will be requested as a full HTML page even if the application has become a single page application under the covers. This technique has been used successfully by e.g. <a href="https://www.github.com">Github</a> for quite some time. It is so seamless that you might not even have noticed it.</p>
<h2>Example 2: Creating an HTML search interface</h2>
<p>In a real-world project, we have created a large-scale search application using <a href="http://lucene.apache.org/solr/">Apache Solr</a> as a backend. The API is exposed via Tomcat, which intelligently distributes search requests among the cluster, and we created a <a href="http://en.wikipedia.org/wiki/Representational_state_transfer">REST</a> interface with <a href="http://en.wikipedia.org/wiki/JSON">JSON</a> as data transport.</p>
<div id="attachment_1011" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-url-hash" href="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Screenshot.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Screenshot-480x275.png" alt="" title="Screenshot" width="480" height="275" class="size-large wp-image-1011" /></a><p class="wp-caption-text">Query Parameters coded into the local part of the URL (example taken from a custom Solr interface).</p></div>
<p>Putting an AJAX-enabled Web application on top of that proved to be quite easy. JSON data can be used directly in Javascript. Each search creates an AJAX request to the server and updates the results table. Trouble hit us as soon as the first users were testing the application. They were used to working with the back button which didn&#8217;t work as expected. A solution had to be found.</p>
<p>As legacy browsers still had to be supported we chose to implement a URL hash. As <a href="http://jquery.com">jQuery</a> was already in use in the project, the choice was to go with <a href="http://benalman.com/projects/jquery-bbq-plugin/">jQuery BBQ</a>. The application was changed to &#8220;serialize&#8221; the state into the URL. URL changes then raise the <code>hashchange</code> event which in turn trigger the search. We got some nice add-ons for free:</p>
<ul>
<li>The first search starts automatically.</li>
<li>Modifying the URL directly also creates a search.</li>
<li>Hitting search again without changing the parameters does not change the hash and no event is raised, i.e. no search is performed.</li>
<li>Reload works out of the box.</li>
<li>Users can (again) use bookmarks e.g. for wrong results and check back later.</li>
<li>Users can forward URLs to their colleagues for cross-checks.</li>
</ul>
<p>After this change has been successfully rolled out, the customers were much happier. The application got a more desktop-like feeling without sacrificing the convenience of the Web&#8217;s ubiquitous back button and bookmarks.</p>
<h2>Solutions for more complex Web Applications</h2>
<p>The discussed solution works fine for small websites or dialog-driven applications. As soon as the application gets more complex, new approaches will prove more efficient. </p>
<p>There are quite a few client-side frameworks which all have their individual strengths. The more popular ones are <a href="http://documentcloud.github.com/backbone/">Backbone.js</a> and <a href="http://knockoutjs.com/">Knockout.js</a>. For an interesting discussion about their differences see the discussion <a href="http://stackoverflow.com/questions/5112899/knockout-js-vs-backbone-js-vs">&#8220;Knockout.js vs Backbone.js (vs ?)&#8221;</a>. Depending on the requirements, other paradigms like <a href="http://www.sproutcore.com/">SproutCore</a>, <a href="http://code.google.com/webtoolkit/">GWT</a> or <a href="http://vaadin.com/home">Vaadin</a> should also be considered.</p>
<h2>Using the HTML5 History API</h2>
<p>Instead of messing with URLs and <code>onhashchange</code> events and since the functionality is already so important and will be gaining even more relevance in the future, HTML5 provides a different, more elegant solution. This solution is the <a href="http://www.w3.org/TR/html5/history.html">History API</a>, a dedicated interface that has been designed solely for the forward/backward navigation in the browser. For this post, mostly two methods are interesting:</p>
<ol>
<li>If a URL has to change, the method <code>history.pushState(url, title, data)</code> can be called. For the parameter: Besides the new URL (url) which replaces the current URL <em>without reloading</em>, a title parameter and a data object can be given. The title is more or less irrelevant, whereas the data parameter can be used to encapsulate the state of the page. If the back button is pressed, this state can be retrieved and used to correctly render the page. Using this state opens therefore possibilities of handling navigation more fine-grained than on a pure URL basis (it can also be viewed as a local session storage which is context-sensitive to each navigational step in the page).
<p>Interesting reads on this topic are: <a href="http://badassjs.com/post/840846392/location-hash-is-dead-long-live-html5-pushstate">&#8220;location.hash is dead. Long live HTML5 pushState!&#8221;</a>, <a href="http://blog.new-bamboo.co.uk/2011/2/2/degradable-javascript-applications-using-html5-pushstate">Degradable JavaScript Applications Using HTML5 pushState</a>, and the jQuery Plugins <a href="http://benalman.com/projects/jquery-bbq-plugin/">BBQ</a> and <a href="http://plugins.jquery.com/project/history-js">History.js</a> with their fallback capabilities to the old onhashchange functionality.</p>
</li>
<li>The History API also defines a callback for the <code>window</code> element which is called <code>onpopstate</code>. You should define a function and assign it to <code>window.onpopstate</code>, which is then responsible for reconstructing the original page. Of course you have access to the URL of the desired page and to the state object. Both can be used for rendering the page, e.g. via performing AJAX requests or using the pushed variables.</li>
</ol>
<div id="attachment_1012" class="wp-caption alignnone" style="width: 380px"><a rel="lightbox-url-hash" href="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Scenario-with-HTML5-History-API.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/10/Scenario-with-HTML5-History-API.png" alt="" title="Scenario with HTML5 History API" width="370" height="442" class="size-full wp-image-1012" /></a><p class="wp-caption-text">Workflow when using the HTML5 History API.</p></div>
<p>The two figures above show the workflow when using the HTML5 History API. The upper figure describes forward navigation and includes pushing the new state and the AJAX request to the server. The lower figure shows that the URL changes and a <code>popstate</code> event fires when the back button is pressed. The application can either use local state storage or (in our case) use an AJAX call to update the page.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/10/must-know-url-hashtechniques-for-ajax-applications/">Must-Know URL Hash Techniques for AJAX Applications</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=20" title="View articles by Dr. Christian Winkler">Dr. Christian Winkler</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/10/must-know-url-hashtechniques-for-ajax-applications/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Effective and Efficient Techniques for a Rule-Based Test-Data Generator - Test Data Generation for Form-Centric Applications, Part 3</title>
		<link>http://blog.mgm-tp.com/2011/09/test-data-generation3/</link>
		<comments>http://blog.mgm-tp.com/2011/09/test-data-generation3/#comments</comments>
		<pubDate>Mon, 12 Sep 2011 14:49:30 +0000</pubDate>
		<dc:creator>Dr. Hans-Martin Adorf</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[QA]]></category>
		<category><![CDATA[Testing]]></category>
		<category><![CDATA[Validation]]></category>
		<category><![CDATA[Web Forms]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=982</guid>
		<description><![CDATA[This part explains some of the sophisticated software technology that is working behind the scenes in our rule-based test-data generator for form-centric applications. You will see that a simple enumeration of all possible ways to fill in a form is likely doomed to run longer than the age of the universe. Therefore more efficient techniques [...]]]></description>
			<content:encoded><![CDATA[<p>This part explains some of the sophisticated software technology that is working behind the scenes in our rule-based test-data generator for form-centric applications. You will see that a simple enumeration of all possible ways to fill in a form is likely doomed to run longer than the age of the universe. Therefore more efficient techniques are needed to make the seemingly impossible possible.</p>
<p><span id="more-982"></span></p>
<p>In the <a href="http://blog.mgm-tp.com/2010/12/test-data-generation-part2/">previous part</a> of this blog series, we have seen that it is necessary to generate test data containing <em>extreme or special values</em> (ESVs) which are capable of exerting some pressure onto the software under test. We have also seen that it is desirable to squeeze into a single test-data record as many ESVs as possible, in order to reduce the number of records to be fed into the test system. Running functional tests may take considerable time. Therefore we want to achieve a high compression, i.e. a high density of ESVs per data record.</p>
<p>As we mentioned in <a href="http://blog.mgm-tp.com/2010/10/test-data-generation-part1/">part 1</a> of this blog series, the main obstacle in the way of an automated generation of test data for form-centric applications is the presence of cross-field constraints. Before we get to these, let us look at constraints that can be formulated for individual fields alone.</p>
<h2>Constraints on single fields &mdash; the easy problem</h2>
<p>The kind of constraints that one may formulate for a field depends on the type of the latter. Consider a Boolean field. The field may assume only one of two values, <em>true</em> or <em>false</em>. A simple constraint on such a field is that it may not be empty. Additionally there may be attributes governing the external representation of the field values, such as &#8220;wahr&#8221; and &#8220;falsch&#8221;, or &#8220;1&#8243; and &#8220;0&#8243;.</p>
<p>Next, let&#8217;s look at a field representing an <em>amount of Euro with Cents</em>. Again a simple constraint on such a field is that it may not be empty. Other constraints may be that the amount may not be negative, or that the amount may not be zero. Invariably, there will be a constraint on the length of the field, which entails a lower and an upper limit for the amount the field can hold. Further attributes may involve the external representation such as the presence/absence of a positive sign in front of the amount, or the appearance of the decimal separator (a comma in Germany, a dot in the UK or the USA). Working with these constraints can be separated into a first phase, where a valid value is generated, followed by a second phase where an appropriate external representation is constructed. Our generator typically produces extreme and special values (ESVs), as explained in the <a href="http://blog.mgm-tp.com/2010/12/test-data-generation-part2/">previous part 2</a> of this blog series.</p>
<p>For a <em>string field</em>, we always have a maximum length constraint, plus a global constraint describing the character set that may be used. All we need is a simple string generator that looks at the string fields one after the other, and for each field generates data, while obeying the constraints that may exist (see e.g. <a href="#ref2">[2]</a>).</p>
<p>In our form-centric applications, we usually encounter a type of constraint on string fields that is a bit harder to cope with than the length and character set constraints mentioned above: it is the regular expression constraint. For instance in order to specify a valid German zip code number we may use a simple regular expression. Likewise, a regular expression may be devised constraining valid phone numbers, e.g. &#8220;<code>(\d+[ -])?\d+</code>&#8220;. A regular expression for valid e-mail addresses is considerably harder to come by. </p>
<p>Matching a given string using a regular expression is a common task in software engineering. The process of generating a valid (or non-valid) string from a regular expression is the inverse of matching. While there are numerous packages that perform matching, it is a lot more difficult to locate a good string-generator that produces a matching (or non-matching) string from a regular expression, while taking into account the side constraints (see e.g. [8]). Fortunately, the theory of finite automata (FA) comes to the rescue, which allows the conversion of any regular expression into a FA <a href="#ref3">[3]</a>. From there it is only a small step to an operational string generator that allows the production of matching or non-matching strings with relatively little effort. Not surprisingly, we have such a string-generator built into our test-data generator.</p>
<p>Figure 1 below shows the transition graph of a finite-state automaton (FA) for the regular expression &#8220;ab|cd&#8221;. By traversing paths from the start node (marked with an &#8216;S&#8217;) to one of the end nodes (marked with an &#8216;E&#8217;) one can generate the two strings &#8220;ab&#8221; or &#8220;cd&#8221; that match the given regular expression.</p>
<div id="attachment_983" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-test-data-generation-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-1-NFA-for-Regular-Expression.jpg"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-1-NFA-for-Regular-Expression.jpg" alt="" title="Figure 1 - NFA for Regular Expression" width="480" height="352" class="size-large wp-image-983" /></a><p class="wp-caption-text">Transition graph of the corresponding finite-state automaton (FA) for the regular expression ab|cd.</p></div>
<p>The nice thing is that any valid data item generated for a field not participating in any cross-field constraints can be combined with any valid data item generated for another such field.</p>
<p>Let us now turn to the harder problem, namely to the fields whose values are restricted by cross-field constraints.</p>
<h2>Cross-field constraints &mdash; the hard problem</h2>
<p>Let us begin by looking at a relatively simple problem, namely, how to fill in two fields in an address block of a form. Assume that the fields are the given name and the surname of a person. The single-field constraints that will exist for each field individually may be dealt with in the way described above. Let us assume that there is a single cross-field constraint linking the given name to the surname, and assume that the constraint reads: &#8220;if the given name is present, the surname has also to be present&#8221;. Formally we might encode this in the assertion</p>
<pre>absent(givenName) or present(surname)
</pre>
<p>Both fields may be empty, but the given name should not occur without the surname.</p>
<p>The simplest approach to solving this miniature <a href="http://en.wikipedia.org/wiki/Constraint_satisfaction_problem">constraint satisfaction problem</a> (CSP) is &#8220;trial and error&#8221;, or &#8220;generate and test&#8221;. The task of the generator consists in completely filling in the first two columns of the following table:</p>
<table>
<tr>
<th>givenName</th>
<th>surname</th>
<th>valid</th>
</tr>
<tr>
<td>absent</td>
<td>absent</td>
<td>yes</td>
</tr>
<tr>
<td>present</td>
<td>absent</td>
<td>no</td>
</tr>
<tr>
<td>absent</td>
<td>present</td>
<td>yes</td>
</tr>
<tr>
<td>present</td>
<td>present</td>
<td>yes</td>
</tr>
</table>
<p>Then we use a validator to check whether the constraint is fulfilled. </p>
<p>While this simple strategy is effective and efficient enough for a small number of fields, we quickly run into problems as soon as that number grows. For 10 fields filling in a table like the one above might take as little as 1 microsec, but for 100 fields it might easily take 1000 times the age of the universe. We are witnessing the curse of the combinatorial explosion.</p>
<p>Note that the problem of validating the rows in the truth table only takes polynomial time.</p>
<h2>What can be done?</h2>
<p>A very successful strategy to cope with the combinatorial explosion is to interleave the data generation with the checking/validation steps. This mixing of steps allows an &#8220;early pruning&#8221; of the search tree.</p>
<p>Early pruning is a hallmark of practically all CSP-solvers, and, perhaps not surprisingly, CSP-solvers have been used for two decades in the area of automated test data generation (see <a href="#ref4">[4]</a>, <a href="#ref5">[5]</a>, <a href="#ref7">[7]</a>, <a href="#ref8">[8]</a>). Also, the increasing importance of CSPs in main-stream computing is witnessed by the on-going attempt to define a Java-API for CSP-solvers <a hef="#ref6">[6]</a>. </p>
<p>While early pruning is essential, experience shows that it not sufficient to solve real-world CSPs that occur in the area of automated test data generation. As I mentioned in <a href="http://blog.mgm-tp.com/2010/10/test-data-generation-part1/">part 1</a> of this blog series, it is necessary to partition the CSP into independent components before submitting it to a CSP-solver.</p>
<p>The following figure illustrates this concept. It shows the 32 largest of more than 200 independent connected components. Each little square depicts a field, and a connection between two fields indicates that the two fields occur together in a constraint. This primal constraint graph has been derived from the validation rule base of a large form-centric application developed at mgm tp.</p>
<div id="attachment_984" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-test-data-generation-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-2-Independent-Connected-Components.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-2-Independent-Connected-Components-480x460.png" alt="" title="Figure 2 - Independent Connected Components" width="480" height="460" class="size-large wp-image-984" /></a><p class="wp-caption-text">This example for partition the CSP into independent components shows the 32 largest independent connected components of more than 200.</p></div>
<p>One might think that such a partitioning algorithm would be an integral part of a CSP-solver, but in our experience this is not always the case.</p>
<p>Here is, in brief, how a decomposition of a CSP into independent components can be accomplished:</p>
<ol>
<li>Initially create a component for each field, and put the field into a set belonging to that component.</li>
<li>Go through the constraints one by one. Identify all fields of the constraint. Identify all components to which these fields belong and merge their sets.</li>
<li>Label the resulting sets.</li>
<li>Go through the constraints one by one again. Identify a field in the constraint. Find its component, and label the constraint with the label of that component.</li>
</ol>
<p>Clearly, each field that does not participate in any cross-field constraint ends up in its own independent component, where test data generation can take place undisturbed by the rest of the problem.</p>
<h2>Implementation and data representation</h2>
<p>The bulk of our test data generator has been implemented in Java. We greatly benefit from representing the CSP&#8217;s variables and constraints in <a href="http://en.wikipedia.org/wiki/Lisp">Lisp</a> data structures. The representation of a constraint in a Lisp-data structure with prefix notation looks like this:</p>
<pre class="brush: plain;">
(constraint (
		2 &quot;Regel_Zahlen_Vergleich&quot;  &quot;4&quot; &quot;Test_Vordruck&quot;
            &quot;Zahlen_Kontext&quot; &quot;Nicht_Negative_Zahl&quot;
		1 1 1 1 1 1)
 (not (and (= 2
              (+ num-def_positive_zahl_$v1_$u1_$z1
                 num-def_nicht_negative_zahl_$v1_$u1_$z1))
           (&gt;= $t_positive_zahl_$v1_$u1_$z1
               $t_nicht_negative_zahl_$v1_$u1_$z1))))
</pre>
<p>As a matter of fact, we have never regretted the decision of choosing a Lisp representation, which was taken early on in the lifetime of the project. There are certain transformations of the original CSP that are easy to carry out, when the variables and constraints are encoded as Lisp data. Also intermediate results can easily be written back to a file for visual inspection.</p>
<p>A lot of configuration information is necessary in order to control our Rule-Based Test Data Generator, such as the number of forms that should be filled in, the maximum number of rows in the fields, etc. but we spare you the detail.</p>
<p>Our test-data generator produces data records for each component individually. Due to the independence of components, the records of the components can be freely combined.</p>
<p>Finally, we obtain a table with the generated test-data records. The figure below shows a portion of this table with several solutions of a test-data generation process. Each row represents a solution; its value consists of a mixture of extreme or special values (such as -99999999999.99 or 0.01), and of values that are the result of satisfying the validation constraints.</p>
<div id="attachment_985" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-test-data-generation-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-3-Table-with-Solutions.jpg"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/09/Figure-3-Table-with-Solutions-480x262.jpg" alt="" title="Figure 3 - Table with Solutions" width="480" height="262" class="size-large wp-image-985" /></a><p class="wp-caption-text">A portion of the table with the generated test-data records.</p></div>
<p>Most of the desired extreme and special values (ESVs) are present in at least one of the records (high test coverage). Since most of the time many ESVs are present in a single record, we attain a high compression rate (i.e. many ESVs in few records).</p>
<h2>Conclusion</h2>
<p>We are at the end of this blog series on the Rule-Based Test Data Generator (R-TDG). You must be left with the impression that the R-TDG is a complex machinery, and, undoubtedly, it is. Be assured, however, it is in operational use at mgm tp since about two years.</p>
<p>The constraints which the R-TDG needs are served to us on a silver tablet by the rule-based system whose main purpose consists in generating the validators for form-centric applications. This approach is radically different from earlier attempts in the area of automated test-data generation, where the constraints are extracted from source code. To the best of our knowledge the R-TDG is the only working automated test-data generator world-wide that can properly take into account cross-field constraints.</p>
<h2>References</h2>
<ul class="references">
<li id="ref1">[1]	Bergmann, V. (2008). <a href="http://databene.org/databene-benerator">Databene Benerator</a>.</li>
<li id="ref2">[2] Brüggemann-Klein, A. (1992). &#8220;Regular expressions into finite automata.&#8221; Lecture Notes in Computer Science 583: 87-98.</li>
<li id="ref3">[3] DeMillo, R. A. and A. J. Offutt (1991). &#8220;Constraint-Based Automatic Test Data Generation.&#8221; IEEE Transactions on Software Engineering 17(9): 900-910.</li>
<li id="ref4">[4] Feldman, J. (2011). <a href="http://jcp.org/en/jsr/detail?id=331">JSR 331: Constraint Programming API</a>, Java Community Process. 2011</li>
<li id="ref5">[5] Gotlieb, A., B. Botella, et al. (1998). &#8220;Automatic test data generation using constraint solving techniques.&#8221; ACM SIGSOFT Software Engineering Notes 23 (2): 53&#8211;62.</li>
<li id="ref6">[6] Ince, D. C. (1987). &#8220;The Automatic Generation of Test Data.&#8221; Computer Journal 30 (1): 63-69.</li>
<li id="ref7">[7] msdn (2010). <a href="http://msdn.microsoft.com/en-us/divbrary/aa833197.aspx">&#8220;The Regular Expression Generator&#8221;</a>, Visual Studio Team System 2008 Database Developer Center, Microsoft Corporation.</li>
<li id="ref8">[8] Zhan, Y. (2002). Constraint Solving in Test-Data Generation. Lecture Notes in Computer Science 2470. P. V. Hentenryck. Berlin-Heidelberg: 770-771.</li>
</ul>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/09/test-data-generation3/">Effective and Efficient Techniques for a Rule-Based Test-Data Generator - Test Data Generation for Form-Centric Applications, Part 3</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=14" title="View articles by Dr. Hans-Martin Adorf">Dr. Hans-Martin Adorf</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/09/test-data-generation3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<series:name><![CDATA[Test Data Generation for Form-Centric Applications]]></series:name>
	</item>
		<item>
		<title>How Google’s Geocoding solves Address Validation</title>
		<link>http://blog.mgm-tp.com/2011/08/address-validation-with-geocoding/</link>
		<comments>http://blog.mgm-tp.com/2011/08/address-validation-with-geocoding/#comments</comments>
		<pubDate>Fri, 12 Aug 2011 12:39:21 +0000</pubDate>
		<dc:creator>Petr Skokan</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[ECommerce]]></category>
		<category><![CDATA[Validation]]></category>
		<category><![CDATA[Web Forms]]></category>
		<category><![CDATA[Web Service]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=962</guid>
		<description><![CDATA[For the e-commerce business, it is very important to have a valid and correct address base. Sending packages to incorrect addresses causes significant losses, since packages are not delivered and sent back. In the case of the Kickz online shop, we faced exactly this problem. Thus, we thought about how to improve and force users [...]]]></description>
			<content:encoded><![CDATA[<p>For the e-commerce business, it is very important to have a valid and correct address base. Sending packages to incorrect addresses causes significant losses, since packages are not delivered and sent back. In the case of the <a href="http://www.kickz.com/">Kickz</a> online shop, we faced exactly this problem. Thus, we thought about how to improve and force users to enter correct addresses. Learn how we used Google&#8217;s Geocoding service to solve this problem.</p>
<p><span id="more-962"></span></p>
<p>There are many commercial providers for addresses, however not all of them provide a world-wide address base and the Kickz shop is delivering world-wide. It turned out that the best value for money would be to use Google&#8217;s <a href="http://code.google.com/apis/maps/documentation/geocoding/">Geocoding web service</a>. Geocoding is the process of converting addresses into geographic coordinates, i.e. latitude and longitude.</p>
<p>However, as the response from Google includes not only the longitude/latitude data but also further address suggestions, this service can also be easily used to present refined/suggested addresses to the user. And that was exactly what we needed. So our basic idea of the service usage was to request exact coordinates and refined addresses (suggestions) for a given address. The coordinates were then used to show the location on the map.</p>
<p>The Geocoding web service is part of the Google Maps API and is offered both as free and as commercial. The basic service is free for 2500 requests per day and does not support HTTPS requests. As Kickz requires HTTPS and more requests per day, we decided to go for the commercial service, which allows of up to 100,000 requests per day and provides HTTPS access with signed web service requests.</p>
<h2>Using the Geolocation Web Service</h2>
<p>Google&#8217;s <a href="http://code.google.com/apis/maps/documentation/geocoding/">Geocoding web service</a> is quite simple to use. All you need to do is send a simple HTTP request to the specified address and you will receive a response in either XML or JSON format. The response contains zero, one or even several addresses, depending on the precision of the input. It can be empty if Google cannot locate your address &mdash; but based on our experience, this rarely happens.</p>
<p>A <a href="http://de.wikipedia.org/wiki/JavaScript_Object_Notation">JSON</a>  response is the obvious choice for JavaScript-based validation on the client-side using (AJAX). XML is more suitable for classic server-side validation. For the Kickz project, we chose the XML format. We validate using the Google web service all over the application, where the user enters a address, i.e. during customer registration and checkout. In these scenarios, the request from the browser is first sent to our server-side application, which calls the Google web service with a certificate-signed request. The request for the input &#8220;Frankfurstein ring 105a,München, de, 80000&#8243; looks this:</p>
<pre class="brush: plain; light: true; wrap-lines: true;">
https://maps-api-ssl.google.com/maps/api/geocode/xml?address=Frankfurstein+ring+105a,M%C3%BCnchen,de, 80000,&amp;amp;sensor=false&amp;amp;client=gme-kickzag&amp;amp;signature=VF930KLrbu98sKKLqIjn4adIoTs=
</pre>
<p>Google returns the following XML document as a response:</p>
<pre class="brush: xml;">
&lt;GeocodeResponse&gt;
    &lt;status&gt;OK&lt;/status&gt;
    &lt;result&gt;
        &lt;type&gt;street_address&lt;/type&gt;
        &lt;formatted_address&gt;Frankfurter Ring 105, 80807 Munich, Germany&lt;/formatted_address&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;105&lt;/long_name&gt;
            &lt;short_name&gt;105&lt;/short_name&gt;
            &lt;type&gt;street_number&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;Frankfurter Ring&lt;/long_name&gt;
            &lt;short_name&gt;Frankfurter Ring&lt;/short_name&gt;
            &lt;type&gt;route&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;M&amp;#252nchen&lt;/long_name&gt;
            &lt;short_name&gt;M&amp;#252nchen&lt;/short_name&gt;
            &lt;type&gt;sublocality&lt;/type&gt;
            &lt;type&gt;political&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;Munich&lt;/long_name&gt;
            &lt;short_name&gt;Munich&lt;/short_name&gt;
            &lt;type&gt;locality&lt;/type&gt;
            &lt;type&gt;political&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;Munich&lt;/long_name&gt;
            &lt;short_name&gt;M&lt;/short_name&gt;
            &lt;type&gt;administrative_area_level_2&lt;/type&gt;
            &lt;type&gt;political&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;Bayern&lt;/long_name&gt;
            &lt;short_name&gt;BY&lt;/short_name&gt;
            &lt;type&gt;administrative_area_level_1&lt;/type&gt;
            &lt;type&gt;political&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;Germany&lt;/long_name&gt;
            &lt;short_name&gt;DE&lt;/short_name&gt;
            &lt;type&gt;country&lt;/type&gt;
            &lt;type&gt;political&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;address_component&gt;
            &lt;long_name&gt;80807&lt;/long_name&gt;
            &lt;short_name&gt;80807&lt;/short_name&gt;
            &lt;type&gt;postal_code&lt;/type&gt;
        &lt;/address_component&gt;
        &lt;geometry&gt;
            &lt;location&gt;
                &lt;lat&gt;48.1883675&lt;/lat&gt;
                &lt;lng&gt;11.5857053&lt;/lng&gt;
            &lt;/location&gt;
            &lt;location_type&gt;ROOFTOP&lt;/location_type&gt;
            &lt;viewport&gt;
                &lt;southwest&gt;
                    &lt;lat&gt;48.1852199&lt;/lat&gt;
                    &lt;lng&gt;11.5825577&lt;/lng&gt;
                &lt;/southwest&gt;
                &lt;northeast&gt;
                    &lt;lat&gt;48.1915151&lt;/lat&gt;
                    &lt;lng&gt;11.5888529&lt;/lng&gt;
                &lt;/northeast&gt;
            &lt;/viewport&gt;
        &lt;/geometry&gt;
        &lt;partial_match&gt;true&lt;/partial_match&gt;
    &lt;/result&gt;
&lt;/GeocodeResponse&gt;
</pre>
<p>The XML response is deserialized to Java objects using <a href="http://jaxb.java.net/">JAXB 2</a>. After the response is processed, the user is given some options to refine the original address entered in the browser.</p>
<p>It&#8217;s convenient that all responses from the Google web service are localized. In the screenshots below, you see the same validation done in the German shop (left) and in the international shop (right). The only difference is the language parameter sent with the web service request and set in the HTTP request header &#8220;Accept-Language&#8221;. To visually improve validation, we also use static Google maps URL to display the address to the user graphically.</p>
<div id="attachment_965" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-geocoding" href="http://blog.mgm-tp.com/wp-content/uploads/2011/08/LocalizedResult.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/08/LocalizedResult-480x294.png" alt="" title="LocalizedResult" width="480" height="294" class="size-large wp-image-965" /></a><p class="wp-caption-text">The Geolocation web service is localized: Address data returned for the german shop (left) and for the international shop (right). Click to enlarge. </p></div>
<h2>Conclusion</h2>
<p>Google&#8217;s Geocoding web service turned out to be an effective tool for the validation of the user addresses in our Kickz online shop. Furthermore, this service can be used free of charge for small projects or sites with lower traffic. In Kickz though, we are using the commercial version of the web service, since the traffic is higher and we need the HTTPS service URL.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/08/address-validation-with-geocoding/">How Google&#8217;s Geocoding solves Address Validation</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=19" title="View articles by Petr Skokan">Petr Skokan</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/08/address-validation-with-geocoding/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>GR8Conf: A gr8t Conference on Groovy based Technologies</title>
		<link>http://blog.mgm-tp.com/2011/06/gr8conf/</link>
		<comments>http://blog.mgm-tp.com/2011/06/gr8conf/#comments</comments>
		<pubDate>Wed, 22 Jun 2011 23:37:39 +0000</pubDate>
		<dc:creator>Marcus Olk</dc:creator>
				<category><![CDATA[Talks]]></category>
		<category><![CDATA[DSL]]></category>
		<category><![CDATA[Groovy]]></category>
		<category><![CDATA[Testing]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=941</guid>
		<description><![CDATA[On May 17th-19th, I took the opportunity to escape the daily &#8220;Java business as usual&#8221; on the GR8Conf conference in Copenhagen, Denmark. The three days were packed with the latest information on Groovy related technologies such as Grails, Griffon, Gradle, GPars, Spock etc.

The whole conference seemed to have a mantra: A-S-T for Abstract Syntax Tree [...]]]></description>
			<content:encoded><![CDATA[<p>On May 17th-19th, I took the opportunity to escape the daily &#8220;Java business as usual&#8221; on the GR8Conf conference in Copenhagen, Denmark. The three days were packed with the latest information on <a href="http://groovy.codehaus.org/">Groovy</a> related technologies such as <a href="http://www.grails.org/">Grails</a>, <a href="http://griffon.codehaus.org/">Griffon</a>, <a href="http://www.gradle.org/">Gradle</a>, <a href="http://gpars.codehaus.org/">GPars</a>, <a href="http://spockframework.org/">Spock</a> etc.</p>
<p><span id="more-941"></span></p>
<p>The whole conference seemed to have a mantra: A-S-T for <a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax Tree</a> transformations. <em>A language should have access to its own abstract syntax</em>, as McCarthy claimed back in the days of LISP &#8211; Groovy has its access via AST transformations.</p>
<h2>AST Transformations</h2>
<p><a href="http://twitter.com/HamletDRC">Hamlet D&#8217;Arcy</a>, an experienced engineer with <a href="http://www.canoo.com">Canoo</a> held an in depth workshop on transformations of the abstract syntax tree of Groovy code.</p>
<p>In a nutshell AST transformations are Java annotations allowing to <a href="http://java.dzone.com/articles/groovy-ast-transformations">manipulate Groovy code</a>. Practical use cases are best practices patterns such as Joshua Blochs &#8216;Effective Java&#8217; implemented once in terms of an AST annotation applied to every day Groovy code.</p>
<p>Examples are <a href="http://groovy.codehaus.org/gapi/groovy/transform/Canonical.html">@Canonical</a> applied to a Groovy class generating correct implementations of Object.hashCode, Object.equals, and Object.toString or <a href="http://groovy.codehaus.org/gapi/groovy/transform/Synchronized.html">@Synchronized</a>, <a href="http://groovy.codehaus.org/gapi/groovy/transform/WithReadLock.html">@WithReadLock</a>or <a href="http://groovy.codehaus.org/gapi/groovy/transform/WithWriteLock.html">@WithWriteLock</a> generating the correct error prone synchronization boiler plate code for synchronization of mutual state access.</p>
<p>Another handy AST annotation is <a href="http://groovy.codehaus.org/gapi/groovy/util/logging/Log.html">@Log</a> which creates the standard logger and guarding log level check code for log statements.</p>
<p><a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/log_ast1.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/log_ast1-480x383.png" alt="" title="log_ast1" width="480" height="383" class="alignnone size-large wp-image-942" /></a></p>
<p><a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/log_ast2.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/log_ast2.png" alt="" title="log_ast2" width="279" height="69" class="alignnone size-full wp-image-943" /></a></p>
<p>AST transformation are very powerful but one must not forget that &#8211; like AOP &#8211; your code will look different in a debugger.</p>
<h2>Design by Contract with GContracts</h2>
<p>The <a href="https://github.com/andresteingress/gcontracts/">GContracts project</a> uses AST transformations to implement a feature the <a href="http://www.eiffel.com">Eiffel</a> language has for years: <a href="http://www.eiffel.com/developers/design_by_contract.html">Design by Contract</a> by defining invariants and pre- and post conditions. The infamous example of an integer overflow that caused the spectacular <a href="http://archive.eiffel.com/doc/manuals/technology/contract/ariane/">Ariane 5 crash</a> served as the obvious example.</p>
<p><a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5-480x432.png" alt="" title="ariane5" width="480" height="432" class="alignnone size-large wp-image-944" /></a><br />
<a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5_assert11.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5_assert11-480x146.png" alt="" title="ariane5_assert1" width="480" height="146" class="alignnone size-large wp-image-946" /></a><br />
<a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5_assert2.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/ariane5_assert2-480x162.png" alt="" title="ariane5_assert2" width="480" height="162" class="alignnone size-large wp-image-947" /></a></p>
<p>Applying these contract annotations obviously seems like a good idea.</p>
<h2>Static Code analysis for Groovy</h2>
<p>Java has some mature static code analysis tools such as <a href="http://findbugs.sourceforge.net/">Findbugs</a>, <a href="http://pmd.sourceforge.net/">PMD</a> or the inspections built into the <a href="http://www.jetbrains.com/idea/documentation/inspections.jsp">IntelliJ IDE</a> bugging a developer with violations of best coding practices rules. <a href="http://codenarc.sourceforge.net">CodeNarc</a> is a tool that offer a similar functionality. It doesn&#8217;t seem to be as mature as its senior Java counter parts, but the number of rules is increasing and writing CodeNarc rules is quite easy. Integrating CodeNarc into a Groovy build process and enabling the corresponding <a href="http://plugins.intellij.net/plugin/?idea&#038;id=5925">IntelliJ plugin</a> should become as mandatory as using Findbugs and friends.</p>
<h2>Happy spec&#8217;ing with Spock</h2>
<p><a href="http://spockframework.org/">Spock</a> is a <a href="http://junit.org">JUnit</a> compliant testing framework allowing to write very expressive test cases. Peter Niederwieser, the creator of Spock uses Groovy&#8217;s internal DSL capabilities to provide some very appealing features to write test cases called <em>specifications</em> in the Spock terminology.</p>
<p>A specification follows the <em>given-when-then</em> pattern also known as AAA, <em>arrange-act-assert</em>. JUnit provides the @Before and @Test annotations in combination with a whole bunch of Assert-Methods. Asserts are implict in the &#8220;then&#8221; section of a spec and the &#8220;given&#8221; and &#8220;then&#8221; sections are the <em>arrange</em> and <em>act</em> realizations.</p>
<p>Apart from this basic functionality Spock comes with many more powerful features such as the <a href="http://hamletdarcy.blogspot.com/2009/05/new-power-assertions-in-groovy.html">Power Assert</a> which has been extracted into the standard Groovy GDK, by the way. One of these features is the &#8220;where&#8221; section allowing to provide test data in a tabular form or to be read from a database in a very convenient way.</p>
<p><a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/spock_demo.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/spock_demo.png" alt="" title="spock_demo" width="410" height="373" class="alignnone size-full wp-image-948" /></a></p>
<p>A deeper look into Spock is highly recommended. Spock can serve as a replacement for <a href="http://junit.org">JUnit</a>or <a href="http://testng.org/">TestNG</a> because due to Groovy&#8217;s intrinsic Java integration it allows to test any JVM language.</p>
<h2>Languages are either concurrent or obsolete</h2>
<p>A very good argument to persuade Java purist to consider Groovy as a language is <a href="http://gpars.codehaus.org/">GPars</a>. <a href="https://twitter.com/#!/vaclav_pech">Vaclav Pech</a>, the project lead of the GPars project gave a very good introduction to GPars by demonstrating its power by coding concrete examples instead of flipping slides.</p>
<p>GPars can be considered as an abstraction layer on top of the Java concurrency packages. It is a library that enhances Groovy in a way that a programmer can safely use all prominent concurrency patterns such as <a href="http://gpars.codehaus.org/ForkJoin">thread pools</a>, <a href="http://gpars.codehaus.org/Actor">actors</a> or <a href="http://gpars.codehaus.org/Dataflow">data flow variables</a>. Vaclav demonstrated with a simple piece of code sequentially processing a number of images that the eight cores of his machine are not used as they are supposed to. By simply wrapping the same code with a &#8220;withPool&#8221; clause the process monitor showed all cores at work. The interesting fact was that no threads had to be created manually &#8211; the GPars construct figured out how to distribute the work load.</p>
<p><a rel="lightbox-gr8conf" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/gpars.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/gpars-480x348.png" alt="" title="gpars" width="480" height="348" class="alignnone size-large wp-image-949" /></a></p>
<p>Data flow variables take care of the synchronization of concurrently executed tasks. There are no explicit threads and locks required any longer. Anyone who has to implement concurrent computations should consider using GPars instead of having to code the corresponding concurrency patterns using the lower level <a href="http://download.oracle.com/javase/6/docs/api/java/util/concurrent/package-summary.html">Java concurrency packages</a>.</p>
<h2>Summary</h2>
<p>Gr8Conf was worth a visit. Compared to the heavy weight JCP the evolution of the Java language is tied to, the growth of the Groovy ecosystem is breath taking. Some of the projects presented on this conference should be evaluated for existing and upcoming mgm projects.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/06/gr8conf/">GR8Conf: A gr8t Conference on Groovy based Technologies</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=6" title="View articles by Marcus Olk">Marcus Olk</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/06/gr8conf/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Rapid Prototyping with Spring Roo</title>
		<link>http://blog.mgm-tp.com/2011/06/rapid-prototyping-with-spring-roo/</link>
		<comments>http://blog.mgm-tp.com/2011/06/rapid-prototyping-with-spring-roo/#comments</comments>
		<pubDate>Thu, 16 Jun 2011 19:01:07 +0000</pubDate>
		<dc:creator>Christian Belka</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Reviews]]></category>
		<category><![CDATA[CRUD]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Hibernate]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[JPA]]></category>
		<category><![CDATA[Spring]]></category>
		<category><![CDATA[Web Frameworks]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=908</guid>
		<description><![CDATA[Spring Roo, a new contender to Grails and Ruby on Rails, provides flexible scaffolding for Spring based applications. We used Spring Roo in a web project to quickly provide working software in order to discuss features and gather feedback. This article shows how we generated an early prototype and transitioned to early development and then [...]]]></description>
			<content:encoded><![CDATA[<p>Spring Roo, a new contender to Grails and Ruby on Rails, provides flexible scaffolding for Spring based applications. We used Spring Roo in a web project to quickly provide working software in order to discuss features and gather feedback. This article shows how we generated an early prototype and transitioned to early development and then to production code.</p>
<p><span id="more-908"></span></p>
<p><a href="http://www.springsource.org/roo">Spring Roo</a>&#8217;s mission is to &#8220;<em>fundamentally and sustainably improve Java developer productivity without compromising engineering integrity or flexibility</em>&#8220;. It follows the <a href="http://en.wikipedia.org/wiki/Convention_over_configuration">Convention over Configuration</a> principle, and provides rapid application development for Java EE applications.</p>
<p>Spring Roo itself isn&#8217;t a framework; it&#8217;s more like a software tool for generating Java EE web applications with <a href="http://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a> functionality. This means the user can create, read/find/list, update and delete entities. Spring Roo offers several technology options, like <a href="http://www.springsource.org/roo/guide?w=base-persistence">various  providers</a> for the JPA-based persistence (e.g. Hibernate and EclipseLink),  Spring MVC controllers and a frontend based on either JSPs, JSF 2.0 (coming in Spring Roo 1.2), Google Web Toolkit (<a href="http://www.springsource.org/roo/guide?w=base-gwt">GWT</a>) or <a href="http://blog.springsource.com/2010/07/14/introducing-the-flex-addon-for-spring-roo">Flex</a>. If you want to change your persistence provider, you could easily do this; if you want to add a JSON output (or whatever else is needed) to the controllers, you could add it with only small effort. And everything integrated into the project has a reasonable &#8220;default&#8221; configuration, so that very little configuration overhead is needed.</p>
<p>The toolset of Spring Roo can generate code and integrate a lot of standard modules into a Java EE project, with very little configuration overhead, leaving you always the choice to modify everything you need. So with some easy commands you can add technologies like <a href="http://www.springsource.org/roo/guide?w=command-index#command-index-jms-commands">JMS</a> for messaging, <a href="http://www.springsource.org/roo/guide?w=base-json">JSON</a> for REST, <a href="http://www.springsource.org/roo/guide?w=base-solr">Solr</a> for full-text search etc. </p>
<h2>How does Spring Roo work?</h2>
<p>The heart of Spring Roo is the <strong>Roo Shell</strong> with a really great help system, which is self-explaining and easy to use. From here you could setup your JPA provider, create entity objects etc.</p>
<p>There are lots of <a href="http://www.springsource.org/roo/guide?w=intro#intro-exploring-sample">examples</a>  where you can quickly try out Spring Roo. In those, Spring Roo creates a complete project for you, including a Jetty server and a in memory database if needed.</p>
<p>We will use our own short example to explain a few features along with the article.</p>
<h3>A Short Example</h3>
<p>We are going to build a simple address database, where a person can have multiple addresses. Setting this up is a really easy process:</p>
<ol>
<li>First install Roo from the <a href="http://www.springsource.org/roo/">Spring Roo homepage</a> and add the bin folder to you path.</li>
<li>Then create an empty folder and start Roo with roo.sh (or roo.bat) from within this folder.</li>
<li>Next, follow the following commands to setup the project. You can type &#8220;hint&#8221; anytime, showing a smart help system which suggests you the next steps.</li>
</ol>
<pre class="brush: plain;">
project --topLevelPackage de.mgm.roo.sample
</pre>
<p>After this command an initial project will be created.</p>
<p>Here we will choose Hibernate for persistence and the Hypersonic in-Memory Database as specific database:</p>
<pre class="brush: plain; wrap-lines: true;">
persistence setup --provider HIBERNATE --database HYPERSONIC_IN_MEMORY
</pre>
<div id="attachment_909" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/console-setup.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/console-setup-480x365.png" alt="" title="Console Setup" width="480" height="365" class="size-large wp-image-909" /></a><p class="wp-caption-text">A Spring Roo project is usually created and modified in the Roo Shell.</p></div>
<p>There are lots of ORM technology options and databases to choose from, nearly all you would find in a normal environment. </p>
<div id="attachment_910" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/projectstruture.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/projectstruture-480x308.png" alt="" title="Project Struture" width="480" height="308" class="size-large wp-image-910" /></a><p class="wp-caption-text">Project structure generated by Roo.</p></div>
<p>This is our first entity, the Address:</p>
<pre class="brush: plain;">
entity --class ~.domain.Address --testAutomatically
field string --fieldName street --notNull --sizeMin 2
field string --fieldName city --notNull --sizeMin 2
field number --type int --fieldName postcode --notNull
</pre>
<p>This is our second entity, the Person, which can have several addresses:</p>
<pre class="brush: plain;">
entity --class ~.domain.Person --testAutomatically
field string --fieldName name --notNull --sizeMin 2
field string --fieldName lastname --notNull --sizeMin 2
field set --fieldName addresses --type ~.domain.Address
</pre>
<p>Pressing Tab during the commands shows you all available options.</p>
<div id="attachment_911" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/entity-classes-in-eclipse.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/entity-classes-in-eclipse-480x367.png" alt="" title="Entity Classes in Eclipse" width="480" height="367" class="size-large wp-image-911" /></a><p class="wp-caption-text">The new entity classes are reflected in the Eclipse IDE.</p></div>
<p>After the following commands, we have a fully working web application, that can be imported into eclipse (other IDEs are available as well):</p>
<pre class="brush: plain;">
perform eclipse
controller all --package ~.web
</pre>
<div id="attachment_912" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/web-classes-in-eclipse.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/web-classes-in-eclipse-480x337.png" alt="" title="Web Classes in Eclipse" width="480" height="337" class="size-large wp-image-912" /></a><p class="wp-caption-text">The Web tier and resources are created through the Roo Shell.</p></div>
<p>This application can be started easily with the command &#8220;mvn jetty:run&#8221; and you have the following web application: </p>
<div id="attachment_913" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/roo-web-application.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/roo-web-application-480x328.png" alt="" title="Roo Web Application" width="480" height="328" class="size-large wp-image-913" /></a><p class="wp-caption-text">The out-of-box generated Web application.</p></div>
<p>The core modules are included and managed via the OSGI implementation Felix, so the Roo-Shell also supports all OSGi Commands, e.g. to load and unload modules. You should take a look at the <a href="http://en.wikipedia.org/wiki/Spring_Roo#Base_Add-Ons">module list</a>.</p>
<p>But how does Spring Roo manage the generation and even modifications?<br />
Try to add a field to your entity object and Spring Roo will update everything for you. </p>
<p>How does Spring Roo work technically? Spring Roo uses <a href="http://static.springsource.org/spring-roo/reference/html/architecture.html">AspectJ mixins</a> called inter-type declarations in order to achieve separation of concerns, because the Spring Roo code is in a different compilation unit, as the code the user writes. So Spring Roo can modify it files as it needs them, without getting in conflict with the user.</p>
<h2>Spring Roo as a Rapid Prototype tool</h2>
<p>As the headline suggests, we used Spring Roo with a customer for Rapid Prototyping. Because the specific project had a very short timeline, there was not much time for a long requirement phase, so we tried rapid prototyping instead. So the customer got a very early working base, from where we could add and remove requirements on interacting cycles. Another problem were the changing requirements.</p>
<p>The general concept of the application was very clear, but the details like fields and entities changed. This is where rapid prototyping did really help. Rapid prototyping allowed us to show a lot of changes to the customers early in our development cycle so that we began getting the feedback that will be crucial to the success of the project. Not only the showing was successful, also the &#8220;usage&#8221; by the customer did help.</p>
<p>The requirement phase was followed by an early development stage:</p>
<ul>
<li>At first we developed the <strong>data model/entity model</strong> with the customer and transferred it to Spring Roo, then added</li>
<li>the <strong>web frontend</strong> generated and maintained by Spring Roo, and</li>
<li>the <strong>layout and CSS styles</strong> for the customer to see.</li>
<li>We then included the <strong>existing databases</strong> with <a href="http://www.springsource.org/roo/guide?w=base-dbre">DBRE</a> (Spring Roo Database Reverse Engineering), </li>
<li>showed the system to the customer and <strong>added/removed entities</strong> and attributes as needed, and</li>
<li>finally added <strong>security</strong>.</li>
</ul>
<h3>Adding and Removing Attributes and Entities</h3>
<p>There are different ways to add and remove entities in the existing project. The first would be using the Roo shell to add and remove fields of an object. This is the most obvious way. But another, much better way is changing the classes directly, i.e. in the Java code.</p>
<p>Lets take a look at a domain object, generated by Spring Roo. The Address entity looks pretty clean and straight forward. The getters and setters are managed by Roo, but you can easily overwrite and modify them as you wish.</p>
<p>So lets add an field to our address. We noticed that the country is missing, so lets add this! I just added the following code: </p>
<pre class="brush: java;">
    @NotNull
    @Size(min = 2)
    private String country;
</pre>
<p>As soon as I save the domain object, Roo updates the project and the new field appears in the web application. Here&#8217;s an excerpt from the Roo shell output:</p>
<pre class="brush: plain; wrap-lines: false;">
Updated SRC_MAIN_WEBAPP/WEB-INF/views/addresses/list.jspx
Updated SRC_MAIN_WEBAPP/WEB-INF/views/addresses/show.jspx
Updated SRC_MAIN_WEBAPP/WEB-INF/views/addresses/create.jspx
Updated SRC_MAIN_WEBAPP/WEB-INF/views/addresses/update.jspx
Updated SRC_MAIN_WEBAPP/WEB-INF/i18n/application.properties
Updated SRC_MAIN_JAVA/de/mgm/roo/sample/domain/Address_Roo_ToString.aj
Updated SRC_TEST_JAVA/de/mgm/roo/sample/domain/AddressDataOnDemand_Roo_DataOnDemand.aj
Updated SRC_MAIN_JAVA/de/mgm/roo/sample/domain/Address_Roo_JavaBean.aj
Updated SRC_MAIN_JAVA/de/mgm/roo/sample/web/ApplicationConversionServiceFactoryBean_Roo_ConversionService.aj
</pre>
<p>You can keep Roo running when you modifying a class outside of Roo &ndash; it should detect changes, or it will do it on next restart.</p>
<a rel="lightbox-spring-roo" href="http://blog.mgm-tp.com/wp-content/uploads/2011/06/fields-in-web-application.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/06/fields-in-web-application-480x328.png" alt="" title="Fields in Web Application" width="480" height="328" class="size-large wp-image-921" /></a>
<p>You can use the same way to create and delete entities &ndash; just create or delete the classes as you need. <strong>Tip: </strong>A good practice is changing the entities and attributes, but all other stuff should be changed through the Roo shell, like adding and removing security etc.</p>
<h3>Removing Spring Roo from Project</h3>
<p>After this early development phase, we decided to remove Spring Roo, resulting in a regular Spring based web application, and continued with &#8220;normal&#8221; Java Development. This way is intended by the Spring Roo authors; there is a whole chapter of <a href="http://www.springsource.org/roo/guide?w=removing">removing Spring Roo</a> from a project.</p>
<p>From this point on, we refactored the entity layer (just to get more structure in the packaging etc) and got a quite nice Java EE App as a working base.</p>
<h2>Spring vs. Grails vs. Ruby on Rails</h2>
<p>Here I want to explain shortly why we chose Roo over Grails and Ruby on Rails. The quickest answer would be, because there was a requirement to develop in Java. But I want the show a little bit about the advantages and disadvantages of this three comparable systems.</p>
<p><a href="http://rubyonrails.org/">Ruby on Rails</a> is a very powerful platform with a large community, but you would have to learn Ruby and you would have to understand Rails as a platform. You couldn&#8217;t use existing modules (in Java) easily.</p>
<p><a href="http://www.grails.org/">Grails</a> has also a very good community, uses standard Java technologies like Hibernate and runs on the JVM. Here, the only effort would be learning the Groovy language. But on the downside, Groovy is always a little bit slower than native Java. Here the main aspect against Groovy was again the language.</p>
<p>Spring Roo has a very small community, but help is available. The advantages here were that Spring Roo creates &#8220;regular&#8221; Spring Projects as soon as you remove Spring Roo itself.</p>
<p>Another interesting option is <a href="http://appfuse.org">AppFuse</a>, which focusses more on the <em>initial</em> scaffold of a new project (a bit like Maven&#8217;s archetypes). We had no chance to investigate it, but consider the <a ref="http://stackoverflow.com/questions/1536268/appfuse-vs-roo-what-would-you-use">Discussion &#8220;Appfuse vs Roo &#8211; what would you use&#8221;</a>.</p>
<h2>Conclusion</h2>
<p>Spring Roo is definitively useable as a tool for Rapid Prototyping, i.e. to quickly and easily create a first usable application. But you should face the fact that the current version 1.1.2 still has a lot of missing features (like composite primary keys) and bugs, where you suddenly need a solid understanding of AspectJ and databases. This was the main reason why we decided, after the requirements phase, not to continue working with Spring Roo.</p>
<p>The real great upside was, that we got a complete application up in running in a few days with a clean codebase where we could easy start from developing all the details needed in the application. The CRUD web pages created by Spring Roo and the RESTful architecture are a great help extending the application clean and fast.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/06/rapid-prototyping-with-spring-roo/">Rapid Prototyping with Spring Roo</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=17" title="View articles by Christian Belka">Christian Belka</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/06/rapid-prototyping-with-spring-roo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting the most out of LiquiBase - Tool-supported Data Modeling and Change Management, Part 3</title>
		<link>http://blog.mgm-tp.com/2011/04/data-modeling-part3/</link>
		<comments>http://blog.mgm-tp.com/2011/04/data-modeling-part3/#comments</comments>
		<pubDate>Thu, 07 Apr 2011 14:19:39 +0000</pubDate>
		<dc:creator>Ursula Bauer</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Evolution]]></category>
		<category><![CDATA[LiquiBase]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=887</guid>
		<description><![CDATA[This last part of the series, we focus on successful database schema evolution with LiquiBase. You will learn about LiquiBase features that we find most useful in our projects. This article also gives a roundup of our best practices and tips, like how to include custom SQL and stored procedures and how to use LiquiBase [...]]]></description>
			<content:encoded><![CDATA[<p>This last part of the series, we focus on successful database schema evolution with LiquiBase. You will learn about LiquiBase features that we find most useful in our projects. This article also gives a roundup of our best practices and tips, like how to include custom SQL and stored procedures and how to use LiquiBase with different DBMS products.</p>
<p><span id="more-887"></span></p>
<p>Remember that one of the benefits of using LiquiBase is that it allows you to keep all the Changes to the database(s) in one place, the Changelog. But what happens if we are dealing with a very large and complex physical data model with frequent changes? Won&#8217;t the Changelog grow to such a size that it is completely unmanageable? This is a valid question to which the simple answer is that although LiquiBase will always have <strong>only one Master-Changelog</strong> per database, this does not mean that <em>all</em> Changesets have to be in that one Changelog file.</p>
<h2>Partitioning the Changelog</h2>
<p>In fact, it is very advisable to create smaller <strong>Sub-Changelogs</strong> and use the <code><strong>include</strong></code> script statement to link these into the Master-Changelog:</p>
<pre class="brush: xml;">
&lt;databaseChangeLog ...&gt;
   &lt;include file=&quot;r1.0/r1.0_changelog.xml&quot;/&gt;

   &lt;include file=&quot;r2.0/r1.0_changelog.xml&quot;/&gt;
   ...
&lt;/databaseChangeLog&gt;
</pre>
<p>It is then quite easy and comfortable to create a hierarchically ordered directory of different Sub-Changelogs which allow to bundle all changes that concern a certain release or concept like triggers etc. into separate Sub-Changelogs. If the Sub-Changelogs are already referenced in the Master-Changelog, you only have to update the smaller Sub-Changelogs with any new Changesets while you will always just run the MasterChangelog.</p>
<p>This mechanism allows us to organize the changes for different releases into separate folders to create an easy to understand structure which makes it quite simple to add any new Changesets at the correct space.</p>
<h2>Stored Procedures</h2>
<p>We have found it particularly helpful to keep stored procedures in separate Sub-Changelog files, since they tend to get rather big. We use a special approach to achieve this (which differs from the way other Changes are handled): for these stored procedures, we only keep a single version of the procedure in the Sub-Changelog file – instead of appending a new Changeset each time the stored procedure changes. To ensure that the changed Changesets are run again, each procedures is wrapped into a single &lt;changeSet&gt; to which we add the attribute <code><strong>runOnChange="true"</strong></code> that forces LiquiBase to check if the Changeset was changed instead of just checking if it was run once before so it can be skipped.  This exception to the rule makes it a lot easier to manage and spot changes to stored procedures and as all our Changelogs are stored and versioned in Subversion there is no danger to lose data.</p>
<div id="attachment_906" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-datamodel-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/04/Hierarchy-of-sub-changelogs.jpg"><img class="size-large wp-image-906" title="Hierarchy of sub-changelogs" src="http://blog.mgm-tp.com/wp-content/uploads/2011/04/Hierarchy-of-sub-changelogs-480x558.jpg" alt="" width="480" height="558" /></a><p class="wp-caption-text">An example how we organize our Changes into a hierarchy of Sub-Changelogs.</p></div>
<h2>Using Relative Paths</h2>
<p>We consider it a good practice to use relative paths to ensure consistency and to avoid problems caused by mistakenly reapplied Changesets. This is especially important if a structure of Master and Sub-Changelogs and/or a lot of <code>include</code> statements are used.</p>
<p>A short recap: The &#8220;DATABASECHANGELOG&#8221; table records each Changeset that has successfully applied by author and id tag and by filename and uses this information to check if a given Changeset has already been applied.</p>
<p>If absolute paths are used they will be recorded as part of the filename and should (which is quite likely) the next application of the Master Changelog be triggered from a different path (e.g. a different developer&#8217;s machine, as part of a build process, etc.) the absolute path will no longer be the same and LiquiBase will not recognize the Changeset.</p>
<p>To avoid running into this kind of problems, some of my colleagues force LiquiBase to record the relative path names in the &#8220;DATABASECHANGELOG&#8221; table by adding the Master Changelog directory to the CLASSPATH so that Liquibase is launched to execute the Master Changelog only by filename (without path). All the <code>include</code> statements are configured to also use only paths relative to the Master Changelog to ensure that no absolute paths are written to the &#8220;DATABASECHANGELOG&#8221; table. This has so far allowed us to avoid these problems.</p>
<p>For another discussion covering quite broadly why and how to use relative paths with LiquiBase, see the posting <a href="http://forum.liquibase.org/topic/why-does-the-change-log-contain-the-file-name">&#8220;Why does the change log contain the file name?&#8221;</a> in the LiquiBase forum.</p>
<h2>Custom SQL — Blessing and curse</h2>
<p>In the former Installments we mentioned that PowerArchitect does not offer to reverse or forward engineer concepts like check constraints, triggers or stored procedures but you may have noticed that we just explained how we use separate Changelogs for stored procedures. So it is clear that there is another way to create Changesets for these. The concept is called Custom SQL and it enables us to add any SQL statement as a valid Changeset to LiquiBase. We just enclose the respective statement with the tags:</p>
<pre class="brush: xml;">
&lt;changeSet author='xxxx' id='x'&gt;
  &lt;sql &gt;
    ADD YOUR SQL STATEMENT HERE
  &lt;/sql&gt;
&lt;/changeSet&gt;
</pre>
<p>If it is a stored Procedure the tags will be:</p>
<pre class="brush: xml;">
&lt;changeSet author='xxxx' id='x' runOnChange=&quot;true&quot;&gt;
  &lt;sql &gt;
    ADD YOUR SQL STATEMENT HERE
  &lt;/sql&gt;
&lt;/changeSet&gt;
</pre>
<p>Sometimes it is helpful to add rollback information after the <code>&lt;/sql&gt;</code> tag using the <code>&lt;rollback&gt;</code> tag, to avoid problems should the custom SQL somehow fail.</p>
<p>Some advanced concepts are dependent on the DBMS (as they use SQL dialects that slightly vary). If it is obvious that the Changelog will have to be applied to different flavors of SQL we use the Include mechanism and create separate Folders to store all the custom sql Changesets for a given flavor of SQL (depends on DBMS). We might for example have one subdirectory for Oracle Custom SQL and one for PostgreSQL.</p>
<p>Using Preconditions in the Master-Changelog we can then control which set of Sub-Changelogs will be run for a given DBMS. Once this is implemented correctly in the Master-Changelog you can safely update your database specific Changelogs in the respective folders.</p>
<h2>Applying Changes</h2>
<p>One of the major selling points to use LiquiBase for us is that it makes applying Data model Changes very quick, easy and safe. LiquiBase can be set up to be run using Ant, Maven, Spring etc. or by using the Commandline option. As we currently use Maven we can thus integrate LiquiBase into our build management to deal with database changes. So applying LiquiBase is just a click or two.</p>
<p>LiquiBase will keep track of which changes have already been applied (though you can force it to reapply all Changes by using the attributes &#8220;runAlways&#8221; or &#8220;runOnChange&#8221;). It does this by means of an internal table &#8220;DatabaseChangeHistory&#8221;, which registers every applied Changeset by filename, id and author tags. When applying a Changelog to a database, the LiquiBase process scans the &#8220;DatabaseChangeHistory&#8221; table and skips all Changesets that have already been applied. Once a Changeset has been applied the filename author and id will be logged in the table. This is a major time saver in the long run.</p>
<p>But what if a customer insists on getting a SQL file that his DBA can apply? Though we do not recommend it because it will be potentially less safe, more time consuming and give less control we can use LiquiBase for the creation of traditional SQL scripts as well. And we can easily switch to LiquiBase should the customer at a later stage decide that he wants it.</p>
<h2>Special DBMS specific tips and warnings</h2>
<p>Let me finish with a few experiences we have had using LiquiBase with various database management systems.</p>
<p>I will start with <strong>MySQL</strong>, the popular free DBMS, as it has some quirks that set it apart from most other DBMS.</p>
<p>In MySQL it is necessary to select a storage engine like <strong>InnoDB</strong> when creating new tables. This poses a problem in LiquiBase that we currently solve by a workaround suggested by LiquiBase. We put the create table statement into a separate Changeset which we then adapt using the tag <code><strong>&lt;modifySql&gt;</strong></code>.</p>
<p>For example:</p>
<pre class="brush: xml;">
&lt;changeSet id=&quot;55&quot; author=&quot;xxx&quot;&gt;
  &lt;createTable tableName=&quot;person&quot;&gt;
    &lt;column name=&quot;id&quot; type=&quot;bigint&quot;/&gt;
    &lt;column name=&quot;firstname&quot; type=&quot;varchar(255)&quot;/&gt;
    &lt;column name=&quot;lastname&quot; type=&quot;varchar(255)&quot;/&gt;
  &lt;/createTable&gt;
  &lt;modifySql&gt;
     &lt;replace replace=&quot;bigint&quot; with=&quot;long&quot;/&gt;
  &lt;/modifySql&gt;
  &lt;modifySql dbms=&quot;mysql&quot;&gt;
     &lt;append value=&quot; engine innodb&quot;/&gt;
  &lt;/modifySql&gt;
&lt;/changeSet&gt;
</pre>
<p>This is not perfect but it works.</p>
<p>Another MySQL specific quirk is that opposed to most other DBMS <strong>MySQL</strong> expects the definition of <strong>Constraints to be part of the column definition</strong> like:</p>
<pre class="brush: xml;">
&lt;column name=&quot;ID&quot; type=&quot;BIGINT&quot;
        remarks=&quot;Primary key&quot; autoIncrement=&quot;true&quot;&gt;
  &lt;constraints nullable=&quot;false&quot; primaryKey=&quot;true&quot;
               primaryKeyName=&quot;KATEGORIE_pk&quot; /&gt;
&lt;/column&gt;
</pre>
<p>Instead of adding the key afterwards in a statement as it is generated by the PowerArchitect export:</p>
<pre class="brush: xml;">
&lt;addPrimaryKey tableName=&quot;KATEGORIE&quot;
               constraintName=&quot;KATEGORIE_pk&quot;
               columnNames=&quot;ID&quot;/&gt;
</pre>
<p>To correct this manually is quite tedious but we have found it is possible to use an <strong>XSLT</strong> transformation to deal with this problem which takes much of the pain out of this.</p>
<p>Another inconvenience we have encountered with MySQL is that it enforces a limit of a maximum of 64 characters on remarks on tables or columns. Currently there is no workaround for this. While this is not a problem introduced by either Power Architect or LiquiBase, my colleague Thomas Kellerer has delivered a solution that can be run with the &#8220;Criticize Model&#8221; context menu item in Power Architect to check if the comment is compatible with MySQL (See image below). Maybe not a big deal for some but still a little handy solution.</p>
<div id="attachment_895" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-datamodel-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/04/Criticize-Model-Option-in-Power-Architect.png"><img class="size-large wp-image-895" title="Criticize Model Option in Power Architect" src="http://blog.mgm-tp.com/wp-content/uploads/2011/04/Criticize-Model-Option-in-Power-Architect-480x325.png" alt="" width="480" height="325" /></a><p class="wp-caption-text">Use the &#39;Criticize Mode&#39; context menu item in Power Architect to check if the comment is compatible with MySQL.</p></div>
<p>On <strong>Oracle</strong> we have discovered that we can run into a situation were we have to use preconditions to ensure the correct username and this database is used when dealing with multiple databases as in Oracle DB user and schema are the same thing. We use Changeset modifications like the following to ensure that we do not run into problems </p>
<pre class="brush: xml;">
&lt;changeSet author=&quot;xxx&quot; id=&quot;5&quot;&gt;
  &lt;preConditions onFail=&quot;MARK_RAN&quot;&gt;
    &lt;or&gt;
      &lt;runningAs username=&quot;SHOP&quot; /&gt;              &lt;!-- PROD   --&gt;
      &lt;runningAs username=&quot;SHOPBUGFIXONLINE&quot; /&gt;  &lt;!-- BUGFIX --&gt;
      &lt;runningAs username=&quot;SHOPTESTONLINE&quot; /&gt;    &lt;!-- TEST   --&gt;
    &lt;/or&gt;
  &lt;/preConditions&gt;
</pre>
<p>We also use Liquibase with <strong>PostgreSQL</strong> but have not yet ran into any major problems there.</p>
<h2>Managing Data with LiquiBase?</h2>
<p>A question that keeps coming up is: <em>&#8220;Can I manage my (test) data with LiquiBase as well?&#8221;</em></p>
<p>The answer is yes and no. While it is often no problem to use LiquiBase to deal with small amounts of relatively static test data (e.g. to fill a smaller test database for the first time) using LiquiBase to maintain large amounts of quick changing data is often not so convenient. We are currently still investigating the best ways to deal with this problem and we will keep you informed about the outcome in another blog post.</p>
<h2>Conclusion</h2>
<p>Now that we have shared our experiences with you let us conclude to sum it up to answer the question: Why do we do it? Why do we use tools instead of just writing SQL scripts from logical data models?</p>
<p>One answer is that we believe and our experiences support it that well documented data models save us time and money. Many applications evolve over time and our tool-chain supports the adaptation of the physical data model and it simplifies the application of the changes to databases. It also helps us in the test and maintenance phases to create databases in different stages and selective rollbacks of many changes with great precision and little effort. And last but not least it is much easier to test a given data model on different DBMS should a customer want to either change his DBMS or insist on an unsuitable one – nothing is more convincing than a demonstration.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/04/data-modeling-part3/">Getting the most out of LiquiBase - Tool-supported Data Modeling and Change Management, Part 3</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=13" title="View articles by Ursula Bauer">Ursula Bauer</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/04/data-modeling-part3/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<series:name><![CDATA[Tool-supported Data Modeling and Change Management]]></series:name>
	</item>
		<item>
		<title>Non-Standard Ways of using Lucene</title>
		<link>http://blog.mgm-tp.com/2011/03/non-standard-ways-of-using-lucene/</link>
		<comments>http://blog.mgm-tp.com/2011/03/non-standard-ways-of-using-lucene/#comments</comments>
		<pubDate>Tue, 01 Mar 2011 13:14:41 +0000</pubDate>
		<dc:creator>Michal Perlik</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Hibernate]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=865</guid>
		<description><![CDATA[For our recent online shop project, we required a full-text, multi-criteria product search. Lucene, the popular Java search engine, is an ideal candidate for this functionality. But in order to meet the high performance requirement, we had to extend its usage beyond standard full-text search. This posting describes our solution including index switching and using [...]]]></description>
			<content:encoded><![CDATA[<p>For our recent online shop project, we required a full-text, multi-criteria product search. Lucene, the popular Java search engine, is an ideal candidate for this functionality. But in order to meet the high performance requirement, we had to extend its usage beyond standard full-text search. This posting describes our solution including index switching and using Lucene as a simple NoSQL database.</p>
<p><span id="more-865"></span></p>
<p>Searching is a frequent activity on the web and one of the most important features of every online shop. With a powerful searching facility, customers can easily specify what they want and then find a product they look for. This includes all kinds of standard catalog search (i.e. category) and full-text search.</p>
<p>Traditional web shop solutions implement a catalog search with relational database queries. Full-text search tasks can then be implemented either by native features of the databases, or by using external indexing engines. <a href="http://www.lucene.net">Lucene</a> is an open-source Java indexing engine library. It is used by a huge number of web sites and applications. Internally, Lucene stores data in a flat storage structure, where each record consists of several fields as key/value pairs. In Lucene terms, such a record is called a <em>Document</em>.</p>
<p>We have had very good experience with Lucene in previous projects and decided to extend its usage beyond standard full-text search. Here are a couple of non-standard usages of Lucene that we&#8217;ve used in our projects and that you might find interesting too.</p>
<h2>Criteria search with Lucene </h2>
<p>Our recent shop project <a href="http://www.kickz.com">www.kickz.com</a> had high performance requirements for a multi-criteria search. Specifically, we required &lt;100 ms for &gt;200 concurrent queries on average. The application itself is written in Java and can be clustered easily.</p>
<p>We are using <a href="http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/store/RAMDirectory.html">RAMDirectory</a> to load an index entirely into RAM (see also <a href="#RAMDirectory">later on in the blog</a>). This is especially convenient since we have a pre-built index living on disk and can easily slurp the whole thing into RAM for faster searching. In such a setup, a search index is only limited by the amount of heap memory available to the JVM process. A mid-sized shop contains up to hundred thousands of products what, which should not reach usual memory limits. In our case, the index size is about 15 MByte per language, summing up to 60 MByte in total. However, if the number of indexed items is very high (resulting in big indexes) or a distributed search is needed then <a href="http://lucene.apache.org/solr">Solr</a> could be considered as an alternative. See also our <a href="/series/scalable-log-data-management-with-hadoop/">blog series about using Solr/Lucene with Hadoop</a>.</p>
<h2>Mapping Relational Data to the Lucene Index</h2>
<p>Our first challenge was to represent the relations in the Lucene index, i.e. searchable data that comes from embedded or associated entities (*-to-many associations). For example, when searching products, one might restrict the search on specific categories, such as shoes. In relational databases, a SQL query on an entity and its associations can easily join several tables by primary/foreign keys. Such a join cannot be done with Lucene&#8217;s document model in a straight-forward way.</p>
<p>However, there is a workaround at the cost of storing <strong>redundant data</strong>: a document might contain more fields with the same key, i.e. key/value = &#8220;category/shoes&#8221; and &#8220;category/basketball-shoes&#8221;. This can be used for *-to-many associations where joins are represented by its business key (i.e. unique category key from the ERP system) or primary key from the database. Then it is necessary to collect all required data during index creation and build index documents from them. In our case, this means the transformation of the domain model (left side) into Lucene&#8217;s flat document structure (right side):</p>
<div id="attachment_867" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-lucene-in-shop" href="http://blog.mgm-tp.com/wp-content/uploads/2011/02/Mapping.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/02/Mapping-480x181.png" alt="" title="Mapping" width="480" height="181" class="size-large wp-image-867" /></a><p class="wp-caption-text">Extracting information from an domain entity and its associated entities into a single Lucene document.</p></div>
<p>When searching for a product with a given relation, i.e. for products of a specific category, then the search will be performed for all documents that contain the relation&#8217;s key with a given relation value, i.e. documents containing the key &#8220;category&#8221; with the value &#8220;basketball-shoes&#8221;.</p>
<p>Please note, that this approach is not suitable in cases when the associated entity in a &#8220;*-to-many&#8221; relation is changed frequently. Because in this case, either the complete index or at least all the documents containing the changed data would have to be updated. This might be time consuming operation. However, for our shop engine, this concern is not relevant, because data is changed only within the underlying ERP system, and each publication triggers a complete rebuild of the index (see <a href="#index-switching">below</a>). Thus, the approach can be used without any limitations.</p>
<p><em>A sidenote</em>: If you&#8217;re using JPA with Hibernate, you might want to have a look at <a href="http://www.hibernate.org/subprojects/search.html">Hibernate Search</a>. It uses special annotations and automates the indexing and extracting of data from the entities, and even supports the indexing of <a href="http://docs.jboss.org/hibernate/stable/search/reference/en-US/html/search-mapping.html#search-mapping-associated">embedded and associated JPA entities</a>. For one-to-many and many-to-many relations, it uses the same &#8220;trick&#8221; as mentioned above.</p>
<h2>Lucene as a NoSQL Database</h2>
<p>Our first versions of the search returned only IDs of database records that met the given condition. The records were then loaded from the database. This approach, combined with a properly configured Hibernate second-level query cache, perfectly fits to most usages.</p>
<p>However, the performance can still be improved. For example, the search result page does not display all product information &mdash; only brand, product name, price is typically displayed, but the detailed product description is not, see <a href="http://www.kickz.com/en/shoes/basketball-function">example</a>. Lucene allows to store &#8220;result values&#8221; in the index, which are not processed, i.e. kept untokenized, which can be retrieved for displaying the result. For example like this:</p>
<pre class="brush: java;">
Field field = new Field(&quot;brand&quot;, brand.getName(),
                        Field.Store.YES,
                        Field.Index.UN_TOKENIZED);
document.add(field);
</pre>
<p>And after performing the search, the value can be easily retrieved from the query result:</p>
<pre class="brush: java;">
String brand = document.get(&quot;brand&quot;);
</pre>
<p>Like brand, we also store product name and price in the index, so that all necessary data to build the search result page can be retrieved from Lucene alone. Specifically, there is no need afterwards to make an extra database query to load them. A similar effect can be achieved by a second level cache containing all products.</p>
<h2 id="index-switching">Concurrent Rebuilding and Searching with Index Switching</h2>
<p>The index creation is running in an external process that is triggered after  data from the <a href="http://en.wikipedia.org/wiki/Enterprise_resource_planning">ERP system</a> has been imported into the SQL database. This is typically done once a day, or at maximum several times a day. The imported data is read-only in the SQL databse. Once the import is finshed, all available products are indexed during this process and thus, the <strong>index is always completely rebuilt</strong>.</p>
<p>Unfortunately, Lucene has a restriction in that a index cannot be updated while it is open for reading by another process. But this issue can be easily solved by <strong>working with two indexes: one for searching, the other for updating</strong>. While one index is used by the shop for searching, the other might be recreated in the background. If the new index is ready, the clustered shop application is notified via JMS and opens the index that has just been recreated. The original search index stays untouched and is thus available for the next import and index re-creation.</p>
<div id="attachment_868" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-lucene-in-shop" href="http://blog.mgm-tp.com/wp-content/uploads/2011/02/index-switch.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/02/index-switch-480x233.png" alt="" title="index-switch" width="480" height="233" class="size-large wp-image-868" /></a><p class="wp-caption-text">The principle of Index Switching allow concurrent rebuilding and searching of different JVM processes.</p></div>
<h2 id="RAMDirectory">Searching in RAM</h2>
<p>The index is normally being stored in the filesystem and performs all operations there. If there is enough JVM heap space, then the index file can be loaded into memory using <a href="http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/store/RAMDirectory.html">RAMDirectory</a> and all search operations then do not require any disk access. This will <a href="http://stackoverflow.com/questions/1582377/need-to-know-pros-and-cons-of-using-ramdirectory">increase the performance</a>, if the file system does not keep recently accessed files in shared memory:</p>
<pre class="brush: java;">
if (ramSearch) {
	searcher = new IndexSearcher(new RAMDirectory(directory));
} else {
    searcher = new IndexSearcher(directory);
</pre>
<p>Other <a href="http://my.safaribooksonline.com/book/web-development/seo/9781933988177/further-lucene-extensions/ch09lev1sec4">high-performance options are discussed</a> in chapter 9.4 of the latest edition of the <em>Lucene in Action</em> book.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/03/non-standard-ways-of-using-lucene/">Non-Standard Ways of using Lucene</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=18" title="View articles by Michal Perlik">Michal Perlik</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/03/non-standard-ways-of-using-lucene/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Why Functional Tests don’t belong in a Build Environment - A Practitioner's Guide to Successful Software Testing, Part 2</title>
		<link>http://blog.mgm-tp.com/2011/02/successful-software-testing-part2/</link>
		<comments>http://blog.mgm-tp.com/2011/02/successful-software-testing-part2/#comments</comments>
		<pubDate>Tue, 08 Feb 2011 10:15:55 +0000</pubDate>
		<dc:creator>Dr. Martin Varendorff</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Build]]></category>
		<category><![CDATA[Hudson]]></category>
		<category><![CDATA[Jenkins]]></category>
		<category><![CDATA[Maven]]></category>
		<category><![CDATA[QA]]></category>
		<category><![CDATA[Testing]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=857</guid>
		<description><![CDATA[The previous part discussed why a unit test for a class should be written by the developer of that class, and why a functional test should be created by an independent tester. This posting argues that functional tests should not be part of the build process of the product, but instead should be developed and [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://blog.mgm-tp.com/2010/11/successful-software-testing-part1/">previous part</a> discussed why a unit test for a class should be written by the developer of that class, and why a functional test should be created by an independent tester. This posting argues that functional tests should not be part of the build process of the product, but instead should be developed and executed separately. For this, I give guidelines for setting up an independent validation system.</p>
<p><span id="more-857"></span></p>
<p>Unit tests are written by the developer simultaneously with the production code. If the API or the <em>functionality of a class</em> in the production code changes, then the corresponding unit test class has to be adapted accordingly, otherwise the build executing the unit test will fail. This is no principal problem, since both should be changed by the same developer simultaneously and should be committed only after they are consistent. </p>
<p>How is the situation with functional tests? A functional test with its test data is coupled to the business requirement and the part of the product fulfilling this business requirement. So if the <em>product</em> changes, then the functional tests must change as well.</p>
<p>If the functional tests run in the same build as the product, every change to the product will cause the build process to fail, unless the developer and tester change their code simultaneously. This would require a strong synchronization between developer and tester, which is usually not feasible.</p>
<p>The result is that functional tests almost always fail if they are integrated into the build environment. This calls at least for different builds for the product (code plus unit tests) and the functional tests. But I will go further: <em>I recommend to not run the functional tests in the build environment at all</em>! Because the build environment not only does not facilitate anything, but on the contrary adds a lot of extra complexity to the test environment.</p>
<p>The following diagram compares the two approaches of running functional tests, i.e. in a build environment and in a separate test environment against an independent validation system (or staging system). </p>
<div id="attachment_858" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-software-testing-2" href="http://blog.mgm-tp.com/wp-content/uploads/2011/02/Comparison-of-Approaches.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/02/Comparison-of-Approaches-480x344.png" alt="" title="Comparison of Approaches" width="480" height="344" class="size-large wp-image-858" /></a><p class="wp-caption-text">System setup for unit tests within a build system and functional tests against a validation system. <em>On the left side</em>: Unit tests within a build system are triggered by the build system (Jenkins/Hudson, Luntbuild, Ant, Maven etc.), and executed locally. The application runs on the build server ideally setup by the build system and unit test framework (JUnit, TestNG) before the execution of the unit tests. <em>On the right side</em>: Functional Tests are executed by test drivers for functional tests on dedicated test clients against a validation system, which is set up and configured like the live system.</p></div>
<h2>How to run GUI-/Browser-based Tests</h2>
<p>If you run functional tests in the build environment, then they must be executable in a batch-run, driven by the build framework. Specifically, on the build system, the build framework, e.g. <a href="http://jenkins-ci.org/">Jenkins</a> (formerly known as Hudson), <a href="http://luntbuild.javaforge.com/">Luntbuild</a> etc., starts a build script (Maven, Ant), which starts a test framework (JUnit, TestNG), which executes the test case. Of course, this is technically possible. There are functional test tools and build environments, which have such capabilities, but they bring in extra complexity and entail extra development and significant configuration effort (often in the range of 1-2 weeks). But the restrictions are high.</p>
<p>It is very hard to develop GUI-related or browser-based functional tests in an environment without visible screen-output. How do you debug test failures, if they occur only within the build environment? How do you create keyboard and mouse events (i.e. entering text in an applet within a browser)?</p>
<p>If the functional tests are executed on <em>dedicated test clients</em> (often the tester&#8217;s PC), then any test and scripting software available for the desktop OS of the tester&#8217;s PC can be used. What&#8217;s more, the test execution can be easily observed (GUI-Tests) and debugged by the tester, so the test development has less obstacles in its path.</p>
<h2>Merits of an Independent Validation System</h2>
<p>Another important point is the test of the configuration of the system. If the system under test runs within the build framework, then usually the configuration of the system deviates strongly from the configuration of the live system. An <em>independent validation system</em> can be set up and configured much closer to the live system including hardware, installed software and components and configuration of all parts. Thus, a functional test against the validation system checks a large part of the system configuration as well and finds system specific errors and failures in the deployment of the system.</p>
<p>A test on the validation system almost satisfies the well-known software development paradigm: <em>The work is only done, if the requirement is fulfilled on the live system</em>. Manual reproduction and repetition of failed tests is also easier, if the test is executed from the same client against the same system under test (validation system) with the same product version, the same configuration and the same test data.</p>
<p>If the functionality of the product is accessed by an API instead of a GUI and if the product or parts of the product are already very stable, then it can be useful to put all stable functional tests as regression tests into the build environment.</p>
<h2>Stable Builds, or when to Deploy on the Validation System</h2>
<p>If a lot of effort is invested in builds with high quality standards, like high coverage of unit tests or code reviews before a commit, then the builds are usually stable enough to be deployed on the validation system as a basis for the execution of functional tests. If however the builds are not very stable, then there should be an additional pre-validation system, which is used to <em>stabilize a build</em>. As soon as a build is deployed on the pre-validation or validation system, functional tests are executed against it.</p>
<p>In order to adapt existing functional tests and to prepare new ones, the testers must be aware of all functional changes and additions that are to be expected in the new build. Depending on the team size, this information exchange can be done in small teams in ad-hoc meetings with the developers or &mdash; in larger teams &mdash; with the support of detailed feature lists maintained in task management tools like <a href="http://www.atlassian.com/software/jira/">JIRA</a>.</p>
<p>For all failing functional tests, the requirements must be checked (often in collaboration with the developer), to find out whether the test or the product is working incorrectly. After the errors in the functional tests are fixed, the functional tests are repeated. The remaining failing tests should then all be caused by flaws of the product.</p>
<p>A build is considered <em>stable</em> if and only if the functional tests checking the main paths of the business functionality have passed, so that the execution of further tests is feasible and makes sense. In projects with longer development cycles (several months), stable builds should be achieved every one or two weeks and towards the end in even shorter intervals. In projects with short development cycles (several weeks), stable builds should be achieved every one or two days. 10 to 15 stable builds should be achieved per development cycle. Each stable build shows, that the development and test group is in sync and the product is stable. A close collaboration between the functional testers and the developers is essential in such a setup. </p>
<p>Summarizing the argumentation, I emphasize the following points:</p>
<ul>
<li>Functional tests are executed on dedicated test clients.</li>
<li>Functional tests are executed against a dedicated validation system.</li>
<li>Only stable builds are deployed on the validation system.</li>
<li>Stable builds should be achieved regularly and in adequate frequency throughout the development cycle.</li>
</ul>
<p>In a later blog, I will discuss test coverage, when to write and rely on unit tests and when on functional tests and how to derive test coverage. A rough guide for electing the right tools will be given as well.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/02/successful-software-testing-part2/">Why Functional Tests don&#8217;t belong in a Build Environment - A Practitioner's Guide to Successful Software Testing, Part 2</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=15" title="View articles by Dr. Martin Varendorff">Dr. Martin Varendorff</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/02/successful-software-testing-part2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<series:name><![CDATA[A Practitioner's Guide to Successful Software Testing]]></series:name>
	</item>
		<item>
		<title>Portlets with XForms in Liferay - Rethinking Web Forms with XForms, Part 3</title>
		<link>http://blog.mgm-tp.com/2011/01/rethinking-web-forms-xforms-part3/</link>
		<comments>http://blog.mgm-tp.com/2011/01/rethinking-web-forms-xforms-part3/#comments</comments>
		<pubDate>Mon, 31 Jan 2011 14:01:58 +0000</pubDate>
		<dc:creator>Christian Belka</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Liferay]]></category>
		<category><![CDATA[Orbeon]]></category>
		<category><![CDATA[Portlet]]></category>
		<category><![CDATA[Spring]]></category>
		<category><![CDATA[Web Forms]]></category>
		<category><![CDATA[XForms]]></category>

		<guid isPermaLink="false">http://blog.mgm-tp.com/?p=824</guid>
		<description><![CDATA[This posting discusses how we integrated an web application with XForms as one of the main view technologies into a Liferay portal. We were using the Orbeon 3.8 XForms engine, which was introduced in the previous part of this series, and Spring Portlet MVC. Since we were facing quite a few problems and issues regarding [...]]]></description>
			<content:encoded><![CDATA[<p>This posting discusses how we integrated an web application with XForms as one of the main view technologies into a Liferay portal. We were using the Orbeon 3.8 XForms engine, which was introduced in the <a href="http://blog.mgm-tp.com/2010/09/rethinking-web-forms-xforms-part2/">previous part</a> of this series, and Spring Portlet MVC. Since we were facing quite a few problems and issues regarding Orbeon and Liferay, I go to great length discribing our working solution.</p>
<p><span id="more-824"></span></p>
<p>First of all, it is important to understand that our project uses the &#8220;separate deployment&#8221; approach of Orbeon integration. Here, Orbeon is only used as XForms renderer &mdash; neither the Orbeon Portlet nor the Formbuilder / Formrunner system was used.</p>
<p>Orbeon XForms supports <em>two</em> modes for a <a href="http://jcp.org/en/jsr/detail?id=286">JSR-286</a> portal integration:</p>
<ul>
<li>The <strong>Orbeon Portlet</strong> uses the <a href="http://www.orbeon.com/forms/orbeon-form-builder">Orbeon Form Builder</a> and <a href="http://www.orbeon.com/forms/orbeon-form-runner">Orbeon Form Runner</a>, which are Orbeon&#8217;s own XForms editing and management toolkits.  They allow you, to easily develop your own XForms and XForms screenflows with Orbeon. The catch is that you have to use the Orbeon framework and these toolkits for your whole XForms Application.</li>
<li>With the <strong>&#8220;separate deployment&#8221; mode</strong>, Orbeon is only used as a XForms rendering engine, you have to create and mange the application by yourself. This gives you the ability to create 100% custom applications. As mentioned, this is the mode we chose.</li>
</ul>
<h2>Overview of our Integration Approach</h2>
<p>Besides XForms, our existing web application was developed with <strong>Spring MVC</strong>, therefore the <a href="http://static.springsource.org/spring/docs/3.0.x/reference/portlet.html">Spring Portlet MVC</a> technology was used for the portlet implementation. The Spring application loads the XForms from a repository corresponding to the required business case. The specific architecture as well as Spring and Spring Portlet MVC were project technology decisions. They are <em>not needed</em> to integrate Orbeon with a portal server.</p>
<p>Orbeon is deployed as a normal web application besides Liferay, but is stripped down (Form Runner/Builder and samples removed). The JavaEE server is <a href="http://jboss.org/">JBoss</a> 5.1 with <a href="http://www.liferay.com/">Liferay</a> and Orbeon 3.8 installed on it, and configured with 1,5 GB RAM for the JVM.</p>
<p>Please note that Orbeon, as of October 2010, proclaims that the 3.8 stable version has <a href="http://wiki.orbeon.com/forms/doc/developer-guide/admin/deployment-portlet#TOC-Status-of-Portlet-support-in-Orbeon">various issues with the Portlet mode</a>, so a <strong>Orbeon 3.8 nightly build</strong> from end of October 2010 was used. We also chose the stable release of Liferay 6, since the nightly build of Orbeon 3.8 is currently tested with this version only.</p>
<div id="attachment_830" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-xforms-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/01/Integration-and-Architecture.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/01/Integration-and-Architecture-480x265.png" alt="" title="Integration and Architecture" width="480" height="265" class="size-large wp-image-830" /></a><p class="wp-caption-text">Integration of the different components and the overall architecture.</p></div>
<p>The Spring Portlet loads the XForms through some services invoked from the controller and uses an own XForms View to send the XForms directly to Orbeon through the portlet filter for further rendering. Other Spring MVC views like the JSTL view (a Spring view to display normal JSPs) are not affected, and can be used normally. The edit mode of this portlet is for example written as normal JSP Page.</p>
<h2>The XForms XHTML Portlet Page</h2>
<p>Spring MVC uses views to render models which are filled and created by controllers. (Ok, you might have guessed that;). There are many different included views for displaying Velocity templates, XML and JSPs. The &#8220;view resolvers&#8221; decide which View should be used, to render the requested page. So an own Spring ViewResolver was written, which reacts on our Spring convention that all View requests should start with &#8220;XForms::&#8221;. All other View requests are ignored by this resolver, so that the other resolvers can fetch it. As mentioned before, Spring is not needed in order to use XForms in Portlets; it was a general project requirement in our case. </p>
<p>Against your expectation, and not common when using Portlet page snippets, the XForms portlet page must be a <a href="http://orbeon-forms-ops-users.24843.n4.nabble.com/Missing-portlet-filter-class-Inaccurate-Wiki-page-td2720997.html#a3027420">complete XHTML page</a> including all HTML Tags as shown below:</p>
<pre class="brush: xml; wrap-lines: false;">
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;
  xmlns:ev=&quot;http://www.w3.org/2001/xml-events&quot;
  xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;
  xmlns:xhtml=&quot;http://www.w3.org/1999/xhtml&quot;
  xmlns:xf=&quot;http://www.w3.org/2002/xforms&quot;
  xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
  xmlns:n1=&quot;http://www.w3.org/2004/11/xmlmime&quot;
  xmlns:xxf=&quot;http://orbeon.org/oxf/xml/xforms&quot;
  xmlns:f=&quot;http://orbeon.org/oxf/xml/formatting&quot;
  xmlns:c=&quot;http://java.sun.com/jsp/jstl/core&quot;&gt;
&lt;head&gt;
  &lt;xf:model&gt;
    &lt;xf:instance id=&quot;vu&quot;&gt;
      &lt;repo:vu xmlns:repo=&quot;repos-ns&quot;
               xmlns:partner=&quot;partner-ns&quot; xmlns:allg=&quot;allg-bs&quot;&gt;
        &lt;repo:Versicherungsunternehmen&gt;
          &lt;partner:Name&gt;Versicherungsunternehmen&lt;/partner:Name&gt;
          &lt;partner:Anschrift&gt;
            &lt;allg:Strasse&gt;&lt;/allg:Strasse&gt;
            &lt;allg:Hausnummer&gt;543&lt;/allg:Hausnummer&gt;
            &lt;allg:Postleitzahl&gt;32430&lt;/allg:Postleitzahl&gt;
            &lt;allg:Ort&gt;Stadt&lt;/allg:Ort&gt;
            &lt;allg:Land&gt;D&lt;/allg:Land&gt;
        [...]
      &lt;/repo:vu&gt;
    &lt;/xf:instance&gt;
  &lt;/xf:model&gt;
&lt;/head&gt;
&lt;body&gt;
  &lt;h2&gt;Adressdaten&lt;/h2&gt;
  &lt;xf:group ref=&quot;instance('vu')/repo:Versicherungsunternehmen&quot;&gt;
    &lt;xf:label&gt;Firma&lt;/xf:label&gt;
    &lt;xf:input ref=&quot;partner:Name&quot;/&gt; &lt;br/&gt;

    &lt;xf:group ref=&quot;partner:Anschrift&quot;&gt;
    &lt;xf:input ref=&quot;allg:Strasse&quot;/&gt;
    [...]
  &lt;/xf:group&gt;
  [...]
&lt;/body&gt;
&lt;/html&gt;
</pre>
<div id="attachment_826" class="wp-caption alignnone" style="width: 490px"><a rel="lightbox-xforms-part3" href="http://blog.mgm-tp.com/wp-content/uploads/2011/01/Screenshot-of-XForms-in-Portal.png"><img src="http://blog.mgm-tp.com/wp-content/uploads/2011/01/Screenshot-of-XForms-in-Portal-480x323.png" alt="" title="Screenshot of XForms in Portal" width="480" height="323" class="size-large wp-image-826" /></a><p class="wp-caption-text">Screenshot of the XForms web application running in a Liferay portal. The form on the right side is rendered by the XForms portlet. It contains a XForms page with different validation rules (seen by the red exclamation marks).</p></div>
<h2>A Liferay Bug and a Workaround for rendering XForms</h2>
<p>Usually, Spring views act like a normal &#8220;view&#8221; Servlet or JSP, i.e. you can write to the output stream or forward to a JSP to render your model. However, there seems to be a <a href="http://www.liferay.com/community/forums/-/message_boards/message/5682233">problem with the portlet filtering mode in Liferay 6</a>, so JSPs cannot be used as view technology. The filter only works for direct writing to the response stream, not when a redirection to a JSP happens.</p>
<p>The first work-around was to create the complete XForms pages in the Spring view and to send it out to the response via some Output Writer. But while debugging the bug described above, we discovered the request attribute &#8220;oxf.xforms.renderer.document&#8221;. Orbeon first checks this attribute in its filter initialization, and, it&#8217;s set, uses the content of the attribute to render the page. </p>
<p>In our case, the complete page was filled via a String into this attribute, which seems to be quite a bit faster then the normal response stream filter:</p>
<pre class="brush: java;">
request.setAttribute(&quot;oxf.xforms.renderer.document&quot;, result);
</pre>
<h2>Receiving POST Data from the Browser</h2>
<p>For receiving POST Data, no further configuration needed to be done. The binding URLs get rewritten automatically by the portlet filter. The following binding is sufficient to address the portlet correctly:</p>
<pre class="brush: xml;">
&lt;xforms:submission id=&quot;gui-submission&quot; method=&quot;post&quot; ref=&quot;instance('getOffer')&quot; resource=&quot;/&quot; replace=&quot;all&quot; /&gt;
</pre>
<p>With this binding, the submission resource URL (&#8220;/&#8221;) gets rewritten to the correct portlet action URL and the XForms model gets converted through the Orbeon filter. This filter is specified in the portlet configuration file &#8220;portlet.xml&#8221; as shown below:</p>
<pre class="brush: xml;">
&lt;filter&gt;
  &lt;filter-name&gt;orbeon-forms-portlet-xforms-filter&lt;/filter-name&gt;
  &lt;filter-class&gt;org.orbeon.oxf.portlet.OrbeonPortletXFormsFilter&lt;/filter-class&gt;
  &lt;lifecycle&gt;RENDER_PHASE&lt;/lifecycle&gt;
  &lt;lifecycle&gt;ACTION_PHASE&lt;/lifecycle&gt;
  &lt;lifecycle&gt;RESOURCE_PHASE&lt;/lifecycle&gt;
  &lt;init-param&gt;
    &lt;name&gt;oxf.xforms.renderer.context&lt;/name&gt;
    &lt;value&gt;/orbeon&lt;/value&gt;
  &lt;/init-param&gt;
&lt;/filter&gt;

&lt;filter-mapping&gt;
  &lt;filter-name&gt;orbeon-forms-portlet-xforms-filter&lt;/filter-name&gt;
  &lt;portlet-name&gt;xFormPortlet&lt;/portlet-name&gt;
&lt;/filter-mapping&gt;
</pre>
<p>For this to work, the portlet filter JAR &#8220;Orbeon-xforms-filter.jar&#8221; must be included within the Portlet WAR archive. For greater detail, see the Orbeon Wiki on <a href="http://wiki.orbeon.com/forms/doc/developer-guide/xforms-separate-deployment-with-portlets">XForms &#8211; Separate Deployment with Portlets</a>.</p>
<h2>Conclusion</h2>
<p>We have discussed our quick prototype to test whether portal integration could be achieved easily or not. The general impression was that this solution based on the &#8220;separate deployment&#8221; mode works quite nice. And it seemed faster than the operating mode &#8220;Orbeon Portlet&#8221;. This might be because, in the portal environment, the portal takes up most of the loading time. A quick <a href="http://jakarta.apache.org/jmeter/" title="JMeter">JMeter</a> test revealed an increase of 50% load time. </p>
<p>The rewriting of URLs to the portlet addresses works quite well and transparent. The downsides of using Orbeon with portlets using our approach are the following:</p>
<ul>
<li>limitations in your application design, through the portlet filter bug (no redirection and JSPs),</li>
<li>bugs in the 3.8 release of Orbeon XForms,</li>
<li>the current release of Orbeon Portlet Integration has been tested only with Liferay 6.x.</li>
</ul>
<p>Another interesting approach using <em>remote portlets</em> (had the same idea) can be <a href="http://wiki.orbeon.com/forms/doc/developer-guide/xforms-remote-portlet-deployment">found in the Orbean Developer Wiki</a>.</p>
<p>In conclusion, if you are planning to use Orbeon XForms with your portal server, you should use Liferay and also think about getting professional support from the Orbeon Team (which requires a commercial license), because further insight into the Orbeon system is of great advantage here. Other great sources are the <a href="http://orbeon-forms-ops-users.24843.n4.nabble.com/" title="Orbeon Mailing List Archive">Orbeon mailing list</a>, the <a href="http://wiki.orbeon.com/forms/projects/xforms-improved-portlet-support">Development Overview</a>, and the <a href="http://www.liferay.com/community/forums/-/message_boards/category/4470265">Liferay Forum</a>.</p>
	<p><em><small>(c) 2012 <a href="http://www.mgm-tp.com">mgm technology partners</a>. This posting "<a href="http://blog.mgm-tp.com/2011/01/rethinking-web-forms-xforms-part3/">Portlets with XForms in Liferay - Rethinking Web Forms with XForms, Part 3</a>" is part of the <a href="http://blog.mgm-tp.com">mgm technology blog</a>. The author of the posting is
	<a href="http://blog.mgm-tp.com/?author=17" title="View articles by Christian Belka">Christian Belka</a>.
	</small></em></p>

	<p><em><small>
	We are hiring! mgm technology partners is looking for good software engineers for all our offices. Check out <a rel="external" href="http://www.mgm-tp.com/karriere">www.mgm-tp.com/karriere</a>.
	</small></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mgm-tp.com/2011/01/rethinking-web-forms-xforms-part3/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<series:name><![CDATA[Rethinking Web Forms with XForms]]></series:name>
	</item>
	</channel>
</rss><!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk (enhanced)
Database Caching 7/36 queries in 0.011 seconds using disk

Served from: blog.mgm-tp.com @ 2012-05-08 16:00:25 -->

