<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;A0IMR3w-cSp7ImA9WhBbGE4.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944</id><updated>2013-05-17T18:13:06.259-07:00</updated><category term="JAVA Spring Transactions" /><category term="Spring security web" /><category term="cloud computing" /><category term="java" /><category term="web" /><category term="REST" /><category term="webservices" /><category term="JAVA concurrency scalability" /><category term="patterns" /><category term="security" /><category term="java performance" /><category term="java Spring" /><category term="NoSql BigData HBase architecture" /><category term="concurrency" /><category term="java security shiro" /><category term="low latency" /><category term="JAVA Hadoop MapReduce BigData" /><category term="scalability architecture database web" /><category term="Spring JAVA distributed programming" /><category term="Scala" /><category term="MapReduce Hadoop &quot;distributed programming&quot;" /><category term="Spring" /><category term="security architecture web" /><category term="architecture" /><category term="BigData NoSql HBase" /><category term="scalability cloud database" /><category term="architecture web HA" /><category term="database" /><category term="Hadoop MapReduce BigData" /><title>The Khangaonkar Report</title><subtitle type="html" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://khangaonkar.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>41</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/TheKhangaonkarReport" /><feedburner:info uri="thekhangaonkarreport" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;DEcARno4eCp7ImA9WhBVGU4.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-7049154525426869243</id><published>2013-04-25T17:27:00.000-07:00</published><updated>2013-04-25T17:27:27.430-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-04-25T17:27:27.430-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Scala" /><title>10 reasons for considering the Scala programming language</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
Scala programming language has been around for a few years now and its popularity is increasing. Having programmed in Java for many years, I was initially skeptical whether we needed another programming language on the JVM. But after trying out Scala and reading about the language, I have had a change in heart. Whether your background is Java, C/C++, Ruby, python, C# or any other language, Scala has some very useful features that will force you to consider it, if you were looking for a programming language. This blog just lists the useful features. Programming examples will follow in subsequent blogs. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;1. Objected oriented programming language (OOP)&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Scala is a object oriented programming language. The benefits of OOP are well documented. A majority of programs today are written in some OO language. If you come from JAVA, C++, C# background, then you already know the benefits. If you are currently using a language that is not OO, then this might be one of the reasons for you to consider Scala. In Scala everything is an Object, unlike JAVA where primitives are not objects and the use of static methods lets you bypass the OO paradigm. OO programming enables you to write programs that have a structure that models that problem domain that the program is written for. This helps produce programs that are easier to read and maintain.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;2. Functional programming&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
In contrast to OO programming, functional programming encourages the use of functions to do some work without changes in state or changes to the data it works on. Data is immutable. Functions take data as input and may produce new data as output. Additionally, a function is a type just like an Integer, String or any class. The advantage of functional programming is that there are no side effects - a function takes input and produces output , that is all. This make it easy to write error free programs that can scale or can be executed in parallel. Scala has very good support functional for programming.&amp;nbsp; &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;3. Static Types&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
In statically typed languages like C++, Java and Scala, every variable has a type and the type&lt;br /&gt;
determines what the program can do with the variable. If you try to multiply 2 Strings, the compilation process will flag that as error. Statically typed language protect the programmer&lt;br /&gt;
by detecting errors and from shooting himself in the foot. If you think strong typing is annoying and leads to verbose code, then you will be pleased to know that unlike Java, Scala supports type inference ( ability to detect type ) which reduces verbosity.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;4. Brevity&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Scala has features that enable the programmer to write compact code as opposed to verbose code. Less code mean fewer bugs and less time spent on maintenance. &lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;//Java&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;public class Person {&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp; private String fname ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp; private String lname ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&lt;br /&gt;&lt;/span&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp; public Person(String first, String last) {&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fname = first ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lname = last ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&lt;br /&gt;&lt;/span&gt;
&lt;span style="color: blue;"&gt;}&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
In Scala the same class is written as&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;class Person(fname: String,lname: String)&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;&lt;span style="color: black;"&gt;Scala supports type inference that helps avoid verbose code.&amp;nbsp;&lt;/span&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;// Java String is in the statement twice &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;public String[] stringarray = new String[5] ;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;// Scala type is infered as Array of Strings&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;val stringarray = new Array[String](5) &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;5. JVM language&lt;/b&gt; &lt;br /&gt;
&lt;br /&gt;
Scala is compiled to bytecode that runs on the Java virtual machine. Since the JVM is available on every platform, your scala code will run on windows , linux , mac os and any other platform for which a JVM is available. &lt;br /&gt;
&lt;br /&gt;
Another advantage is the integration with Java. Java has a very rich class library. There are several open source projects that provide additional libraries for very useful functions. Java code can be called from Scala programs very easily, which means all those function rich libraries are available for your use in Scala.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;val calendar = new java.util.GregorianCalendar() &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;print(java.lang.String.format("%1$ty%1$tm%1$td",cal))&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
will print todays date in format YYMMDD.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;6.&amp;nbsp; Better support for concurrency&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
To write concurrent programs in JAVA, you had to deal with threads, the java memory model, locking ,&amp;nbsp; synchronization, deadlocks etc. Writing error free concurrent programs was difficult. Scala has a actor based programming model that shields the programmer from the issues you face in Java , C/C++. To write concurrent programs , you implement actors that send, receive and handle messages. The Actor model lets the programmer avoid sharing data between threads and the issues related to locking shared data.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;7. Scalable programs&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
By avoiding locking in concurrent programs, Scala is able to exploit the parallelism in way that Java cannot. In Java, a recommended best practice for writing scalable code was to use immutable objects. With the Actor model is Scala, you use immutable objects as messages and have unsynchronized methods. Immutable object are also at the heart of functional programming (2) which Scala promotes.&lt;br /&gt;
&lt;br /&gt;
How many times have we heard of a Ruby or Python application that has be rewritten in Java or C++ because it cannot scale to the increased demands of users ? With Scala, this will not be an issue. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;8. Fast&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Studies have shown that Scala is at least as fast as Java. &lt;br /&gt;
see http://research.google.com/pubs/pub37122.html&lt;br /&gt;
&lt;b&gt;&lt;br /&gt;&lt;/b&gt;
&lt;b&gt;9. General purpose/multi-purpose&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The brevity and compactness of Scale ensures that it can be used for scripting or rapid application development a la Ruby or Python. But the fact that it runs on JVM and its scalability features ensure that it can be used for complex applications.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;10. It is getting more popular&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
This is a more non technical reason. Scala is getting more popular. More startups are moving to Scala. Many are skipping Java and going directly to Scala. If you are a Java programmer, learning Scala makes you more marketable. Even if you are not a Java programmer, learning Scala will open up a number of opportunities in the programming world.&amp;nbsp; &lt;/div&gt;
&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/FhSRIFkX8-Q" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/7049154525426869243/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2013/04/10-reasons-for-considering-scala.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/7049154525426869243?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/7049154525426869243?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/FhSRIFkX8-Q/10-reasons-for-considering-scala.html" title="10 reasons for considering the Scala programming language" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2013/04/10-reasons-for-considering-scala.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0EAQHs-cSp7ImA9WhBWEUw.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-6352984540613332969</id><published>2013-04-04T16:54:00.000-07:00</published><updated>2013-04-04T16:54:01.559-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-04-04T16:54:01.559-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="NoSql BigData HBase architecture" /><title>Using HBase Part 2: Architecture</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
&lt;br /&gt;
In this blog, let us take a quick look at some architectural details of HBase.&lt;br /&gt;
&lt;br /&gt;
For an introduction to NoSql and HBase, read the following blogs.&lt;br /&gt;
&lt;a href="http://khangaonkar.blogspot.com/2011/11/what-is-nosql.html"&gt;What is NoSql ? &lt;/a&gt;&lt;br /&gt;
&lt;a href="http://khangaonkar.blogspot.com/2013/03/using-hbase.html"&gt;Using HBase&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Internally HBase is a&amp;nbsp; a sparse, distributed, 
persistent, multidimensional sorted Map. While that sentence seems 
complicated, reading each word individually gives clarity. &lt;br /&gt;
sparse - some cells can be empty&lt;br /&gt;
distributed - data is partitioned across many hosts&lt;br /&gt;
persistent - stored to disk&lt;br /&gt;
multidimensional - more than 1 dimension (key,value,version)&lt;br /&gt;
Map - key and value&lt;br /&gt;
sorted - maps are generally not sorted but this one is &lt;br /&gt;
&lt;br /&gt;
HBase uses HDFS to store the data.&lt;br /&gt;
&lt;br /&gt;
An HBase table has rows and columns. Columns are grouped into column families. There is a version for each value. So table,row key, column family, column name, version are used to get to a value. Both row keys and values are byte[]s.&lt;br /&gt;
&lt;br /&gt;
Table is sorted by row key, Within a column family, the columns are sorted. Storage is per column family. So logically related columns should be in a column family.&lt;br /&gt;
&lt;br /&gt;
A Table is made of regions. A region has a subset of the rows in a table. A region can be described using tablename, start key, end key. A region is made up of one or more HDFS files.&lt;br /&gt;
&lt;br /&gt;
The regions are managed by servers known as the region servers. There is a master server that assigns regions to region servers.&lt;br /&gt;
&lt;br /&gt;
HBase has 2 catalog tables -ROOT- and .META. .META has information on all regions in the system. -ROOT- has information on .META. When a client wants to access data, these 2 tables are consulted to determine which region server has the region that should be used for this request. The client issues read/write requests to the region server directly.&lt;br /&gt;
&lt;br /&gt;
HBase uses zookeeper to maintain cluster state. A simple diagram below shows the components of an HBase cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-aO1Py3KDAp4/UUzPvUzd34I/AAAAAAAAC0Q/9I7AOvC5dx4/s1600/HBase+cluster.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="300" src="http://4.bp.blogspot.com/-aO1Py3KDAp4/UUzPvUzd34I/AAAAAAAAC0Q/9I7AOvC5dx4/s400/HBase+cluster.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;&lt;b&gt;Logical view of a table:&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;
The table is figure 2 has 2 column families: cf1 with columns colA and ColB, cf2 with columns ColC &lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
and ColD. The value in each cell is uniquely identified by row key, column family, column name and a timestamp or version. &lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-lpfI86VoSSQ/UVohjH2MslI/AAAAAAAAC1A/mLmoHt7nvVc/s1600/HBase+table+_+Logical+view.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="300" src="http://3.bp.blogspot.com/-lpfI86VoSSQ/UVohjH2MslI/AAAAAAAAC1A/mLmoHt7nvVc/s400/HBase+table+_+Logical+view.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;u&gt;&lt;b&gt;Logical view of RegionServer:&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-PHqvSB1CAFk/UVTopCFu5wI/AAAAAAAAC0o/OfMzV3PVOcY/s1600/Logical+view+RegionServers.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="300" src="http://2.bp.blogspot.com/-PHqvSB1CAFk/UVTopCFu5wI/AAAAAAAAC0o/OfMzV3PVOcY/s400/Logical+view+RegionServers.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The rows of a table are in a Region. Region is the unit of allocation and is identified by a start key and end key. The regions are distributed across the region servers in the cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/u&gt;
&lt;u&gt;&lt;b&gt;Physical view of Region Server:&lt;/b&gt;&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-yx3yAcQo4Ds/UVTo-rzi3II/AAAAAAAAC0w/qaHqa9SmCg8/s1600/Physical+View.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="300" src="http://4.bp.blogspot.com/-yx3yAcQo4Ds/UVTo-rzi3II/AAAAAAAAC0w/qaHqa9SmCg8/s400/Physical+View.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
Each Region has one of more stores. Each Store is per column family. The memStore is where changes are stored in memory before writing to disk. The file store is the persistent store and is a file written to HDFS. The Hfile is described in the blog &lt;a href="http://th30z.blogspot.com/2011/02/hbase-io-hfile.html?spref=tw"&gt;HFile&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Each RegionServer has a write ahead log (WAL) . Writes are first written to the WAL. If the region server crashed before memory is flushed to disk, the WAL is used to recover. This implies data is stored in memory and flushed to disk periodically. Changes are sorted while in memory.&lt;br /&gt;
&lt;br /&gt;
Reads look for data in memStore first and then go to disk if necessary. Data is flushed to disk in 64 Mb chunks. This size is configurable. HFiles are merged to larger files. Sorting in memory and merging files makes it like a mergeSort.&lt;br /&gt;
&lt;br /&gt;
For delete, the row is marked as deleted ( as opposed to physically removing it).&lt;br /&gt;
&lt;br /&gt;
HBase provides ACID semantics at a row level. HBase does multi version concurrent updates, which means updates happen by creating a new version as opposed to overwriting existing row. Writers need to acquire a lock to write. Readers do not acquire a lock.To ensure consistent reads without locking, HBase assigns a write number to each write. The read returns data from the highest write number that is durable. Locks stored in memory in the region server. This is sufficient because all values for a row are in one region server. Transactions are committed in a serial order. &lt;br /&gt;
&lt;br /&gt;
Sharding is automatic. Regions split when files reach a certain size.&lt;br /&gt;
&lt;br /&gt;
Compaction step which run in background combines files, removes deleted data. &lt;br /&gt;
&lt;br /&gt;
This concludes the introduction to HBase architecture.&lt;/div&gt;
&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/tHncNY7xwes" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/6352984540613332969/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2013/04/using-hbase-part-2-architecture.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/6352984540613332969?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/6352984540613332969?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/tHncNY7xwes/using-hbase-part-2-architecture.html" title="Using HBase Part 2: Architecture" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-aO1Py3KDAp4/UUzPvUzd34I/AAAAAAAAC0Q/9I7AOvC5dx4/s72-c/HBase+cluster.png" height="72" width="72" /><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2013/04/using-hbase-part-2-architecture.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DU4FRXkyeSp7ImA9WhBQE0Q.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-972667510537007384</id><published>2013-03-15T18:38:00.000-07:00</published><updated>2013-03-15T18:38:34.791-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-15T18:38:34.791-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="BigData NoSql HBase" /><title>Using HBase</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
HBase is a NoSQL database from the hadoop family. The NoSql concept is discussed  in my blog at
&lt;a href="http://khangaonkar.blogspot.com/2011/11/what-is-nosql.html"&gt; What is NoSql ?&lt;/a&gt; HBase
is a column oriented key value store based on Google's Bigtable.&lt;br /&gt;
&lt;br /&gt;
To recap,&amp;nbsp; you would be considering a NoSql database because your RDBMS
is probably not able to meet your requirements because of one or more of the following reasons: &lt;br /&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;You application deals with billions and billion of rows of data &lt;/li&gt;
&lt;/ul&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;Application does a lot of writes&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;Reads require low latency &lt;/li&gt;
&lt;/ul&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;linear scalability with commodity hardware is required &lt;/li&gt;
&lt;/ul&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;You frequently need to add more columns or remove columns &lt;/li&gt;
&lt;/ul&gt;
There are several NoSql databases that can address one or more of these issues. In this article I
provide an introduction to HBase. The goal is to help you get started evaluating whether HBase would be appropriate
for your problem. This is introductory material. More details in subsequent blogs.&lt;br /&gt;
&lt;br /&gt;
Main features of HBase are :&lt;br /&gt;
&lt;br /&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;Built on hadoop and HDFS. If you are already using hadoop , then HBase can be viewed as an extension
to your hadoop infrastructure that provides random reads and writes.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;&amp;nbsp;A Simple data model based on keys , values and columns. More on this later.&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;Scales linearly by adding commodity hardware&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;Automatic partitioning of tables as they grow larger&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;Classes available for integration with MapReduce&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;Automatic failover support&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;Support rowkey range scans&amp;nbsp; &lt;/li&gt;
&lt;/ul&gt;
&lt;u&gt;&lt;b&gt;Data Model&lt;/b&gt;&lt;/u&gt; &lt;br /&gt;
&amp;nbsp; &lt;br /&gt;
The main constructs of the model are&amp;nbsp; Table, rows, column family and columns.&lt;br /&gt;
&lt;br /&gt;
Data is written and read from a Table. 

A Table has rows and column families.  

Each row has a key.&lt;br /&gt;
&lt;br /&gt;
Each Column family has one or more columns. Columns in a column family are logically related. Each column has a name and value.

When a Table is created, the column families have to be declared. But the columns in each family do
not need to be defined and can be added on demand.

Each column is referred to using the syntax columnFamily:column. For example, an age column in a userprofile
column family is referred to as userprofile:age. For each row, storage space is taken up only for the columns written in that row.&lt;br /&gt;
&lt;br /&gt;
Let us design a Hbase table to store User web browsing information. Each user has a unique id called userid.
For each user we need to store&lt;br /&gt;
&lt;br /&gt;
(1) some profile information like sex, age, geolocation, membership.&lt;br /&gt;
(2) For each partner website he visits, store the page types viewed, products viewed.&lt;br /&gt;
(3) For each partner website he visits, store products purchased , product put in shopping cart but not purchased.&lt;br /&gt;
&lt;br /&gt;
Our structure might look like&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;&lt;span style="color: blue;"&gt;{
userid1:{ // rowkey
    profile:{ // column family
          sex: male, // column , value
          age : 25,&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span style="color: blue;"&gt;          member: Y &lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span style="color: blue;"&gt;&amp;nbsp;   },
    browsehistory: { // column family
          partner1.hp:23,    // visited partner1 homepage 23 times
          partner2.product.pr1 : 4 // viewed product pr1 4 times
    }&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span style="color: blue;"&gt;    shoppinghistory: { // column family&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span style="color: blue;"&gt;&amp;nbsp;        partner3.pr3: 25.5 , // purchased pr3 from partner3 for $25.5&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span style="color: blue;"&gt;    } &lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span style="color: blue;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span style="color: blue;"&gt;&amp;nbsp;}&lt;/span&gt;&lt;/pre&gt;
&lt;br /&gt;
&amp;nbsp;Let us design an Hbase table for the above structure.&lt;br /&gt;
&lt;br /&gt;
Tablename : UserShoppingData.

Since we will lookup data based on user, the key can be userid.&lt;br /&gt;
&lt;br /&gt;
(1) ColumnFamily profile for profile information. Columns would be sex, age, member etc&lt;br /&gt;
(2) ColumnFamily browsehistory for browsing data. Columns are dynamic such as websitename.page or website.productid&lt;partnerid&gt;&lt;pagetype&gt;&lt;/pagetype&gt;&lt;/partnerid&gt;&lt;br /&gt;
(3) ColumnFamily shopping history for shopping data. Columns are dynamic.&lt;partnerid&gt;&lt;productid&gt;&lt;br /&gt;&lt;/productid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;The beauty is you can dynamically
add columns. If visualizing this as columns is difficult, just think that you are dynamically
adding key value pairs.&amp;nbsp;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;

This kind of data is required in a typical internet shopper analytics application.&amp;nbsp;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;HBase is an
appropriate choice because you have several hundred million internet shoppers. That is several million rows. If you wanted to store data by date, you might make the key userid+date, in which case you might have even more rows - in the order of billions. Data is written
as the user visits various internet shopping websites. Later the data might need to read with low latency to be able to show
the user a promotion or advertisement based on his past history. A company I worked for in the past used a very popular RDBMS for such high volume writes and when ever the RDBMS was flooded with such write requests, the RDBMS would grind to a halt.&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;Let us use HBase shell to create the above table, insert some data into it and query it.&amp;nbsp;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;b&gt;Step 1:&lt;/b&gt; Download and install HBase from&amp;nbsp;http://hbase.apache.org&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;b&gt;Step 2:&lt;/b&gt; Start hbase&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;$ ./start-hbase.sh&lt;br /&gt;starting master, logging to /Users/jk/hbase-0.94.5/bin/../logs/hbase-jk-master-jk.local.out&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&amp;nbsp;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;b&gt;Step 3: &lt;/b&gt;Start hbase shell&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;$ ./hbase shell&lt;br /&gt;&lt;span style="color: #38761d;"&gt;HBase Shell; enter 'help&lt;/span&gt;&lt;return&gt;&lt;span style="color: #38761d;"&gt;' for list of supported commands.&lt;br /&gt;Type "exit&lt;/span&gt;&lt;return&gt;&lt;span style="color: #38761d;"&gt;" to leave the HBase Shell&lt;br /&gt;Version 0.94.5, r1443843, Fri Feb&amp;nbsp; 8 05:51:25 UTC 2013&lt;br /&gt;hbase(main):001:0&amp;gt; &lt;/span&gt;&lt;br /&gt;&amp;nbsp;&lt;/return&gt;&lt;/return&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;b&gt;Step4:&lt;/b&gt; Create the table&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;hbase(main):004:0&amp;gt; create 'usershoppingdata','profile','browsehistory','shophistory'&lt;br /&gt;0 row(s) in 3.9940 seconds&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/span&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;b&gt;Step5:&lt;/b&gt; Insert some data&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;span style="color: #38761d;"&gt;hbase(main):003:0&amp;gt; put 'usershoppingdata', 'userid1','profile:sex','male'&lt;br /&gt;0 row(s) in 0.1990 seconds&lt;br /&gt;&lt;br /&gt;hbase(main):004:0&amp;gt; put 'usershoppingdata', 'userid1','profile:age','25'&lt;br /&gt;0 row(s) in 0.0090 seconds&lt;br /&gt;&lt;br /&gt;hbase(main):005:0&amp;gt; put 'usershoppingdata', 'userid1','browsehistory:amazon.hp','11'&lt;br /&gt;0 row(s) in 0.0100 seconds&lt;br /&gt;&lt;br /&gt;hbase(main):006:0&amp;gt; put 'usershoppingdata', 'userid1','browsehistory:amazon.isbn123456','3'&lt;br /&gt;0 row(s) in 0.0070 seconds&lt;br /&gt;&lt;br /&gt;hbase(main):007:0&amp;gt; put 'usershoppingdata', 'userid1','shophistory:amazon.isbn123456','19.99'&lt;br /&gt;0 row(s) in 0.0140 seconds&lt;/span&gt;&lt;br /&gt;&amp;nbsp;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;b&gt;Step 6:&lt;/b&gt; Read the data&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;hbase(main):008:0&amp;gt; scan 'usershoppingdata'&lt;br /&gt;ROW&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; COLUMN+CELL&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=browsehistory:amazon.hp, timestamp=1362784343421, value=11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=browsehistory:amazon.isbn123456, timestamp=1362786676092, value=3&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:age, timestamp=1362784243334, value=25&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362784225141, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=shophistory:amazon.isbn123456, timestamp=1362786706557, value=19.99&amp;nbsp; &lt;br /&gt;1 row(s) in 0.1450 seconds&lt;br /&gt;&amp;nbsp;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;hbase(main):010:0&amp;gt; get 'usershoppingdata', 'userid1'&lt;br /&gt;COLUMN&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CELL&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;browsehistory:amazon.hp&amp;nbsp;&amp;nbsp; timestamp=1362784343421, value=11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;browsehistory:amazon.isbn timestamp=1362786676092, value=3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;123456&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;profile:age&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; timestamp=1362784243334, value=25&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;profile:sex&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; timestamp=1362784225141, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;shophistory:amazon.isbn12 timestamp=1362786706557, value=19.99&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;3456&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;5 row(s) in 0.0520 seconds&lt;br /&gt;&amp;nbsp;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;hbase(main):011:0&amp;gt; get 'usershoppingdata', 'userid1', 'browsehistory:amazon.hp'&lt;br /&gt;COLUMN&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CELL&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;browsehistory:amazon.hp&amp;nbsp;&amp;nbsp; timestamp=1362784343421, value=11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;1 row(s) in 0.0360 seconds&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/span&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;br /&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;b&gt;Step 7:&lt;/b&gt; Add few more rows&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;br /&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;
&lt;span style="color: #38761d;"&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;hbase(main):015:0&amp;gt; put 'usershoppingdata', 'userid2','profile:sex','male'&lt;br /&gt;0 row(s) in 0.0070 seconds&lt;br /&gt;&lt;br /&gt;hbase(main):016:0&amp;gt; put 'usershoppingdata', 'userid3','profile:sex','male'&lt;br /&gt;0 row(s) in 0.0060 seconds&lt;br /&gt;&lt;br /&gt;hbase(main):017:0&amp;gt; put 'usershoppingdata', 'userid4','profile:sex','male'&lt;br /&gt;0 row(s) in 0.0330 seconds&lt;br /&gt;&lt;br /&gt;hbase(main):018:0&amp;gt; put 'usershoppingdata', 'userid5','profile:sex','male'&lt;br /&gt;0 row(s) in 0.0050 seconds&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/span&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;br /&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;b&gt;Step 8:&lt;/b&gt; Let us do some range scans on the row key&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;hbase(main):024:0&amp;gt; scan 'usershoppingdata', {STARTROW =&amp;gt; 'u'}&lt;br /&gt;ROW&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; COLUMN+CELL&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=browsehistory:amazon.hp, timestamp=1362784343421, value=11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=browsehistory:amazon.isbn123456, timestamp=1362786676092, value=3&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:age, timestamp=1362784243334, value=25&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362784225141, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=shophistory:amazon.isbn123456, timestamp=1362786706557, value=19.99&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362788377896, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362788385501, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362788392575, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362788398087, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;5 row(s) in 0.0780 seconds&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;br /&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/span&gt;
&lt;span style="color: #38761d;"&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;hbase(main):019:0&amp;gt; scan 'usershoppingdata', {STARTROW =&amp;gt; 'userid3'}&lt;br /&gt;ROW&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; COLUMN+CELL&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362788385501, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362788392575, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362788398087, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;3 row(s) in 0.0250 seconds&lt;br /&gt;&lt;br /&gt;hbase(main):023:0&amp;gt; scan 'usershoppingdata', {STARTROW =&amp;gt; 'userid3', STOPROW =&amp;gt; 'userid5'}&lt;br /&gt;ROW&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; COLUMN+CELL&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362788385501, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;userid4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; column=profile:sex, timestamp=1362788392575, value=male&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;2 row(s) in 0.0160 seconds&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/span&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;br /&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;The shell is very useful to playaround with the data model and get familiar with HBase. In a real world application , you might write code in a language like Java. There is more to HBase than this simple introduction. I will get into internals and architecture in future blogs.&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;
&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;partnerid&gt;&lt;pid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/pid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;/partnerid&gt;&lt;br /&gt;&lt;/div&gt;
&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/U-VOlVs3CrE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/972667510537007384/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2013/03/using-hbase.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/972667510537007384?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/972667510537007384?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/U-VOlVs3CrE/using-hbase.html" title="Using HBase" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2013/03/using-hbase.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkEDRno4eyp7ImA9WhBTGUs.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-8595988639382761190</id><published>2013-02-15T12:37:00.001-08:00</published><updated>2013-02-15T12:37:57.433-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-02-15T12:37:57.433-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Hadoop MapReduce BigData" /><title>Hadoop Secondary Sort: Sorting values</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
Sorting is a core strength of the Hadoop MapReduce framework. But by default it sorts only the keys. The values for each key are not sorted. There are many use cases where the values for a key are required in a sorted order. For example, let us say a web application logs users interaction - user id, time and other log data. The log data is distributed across log files from different servers. The requirement is to get users log records in&amp;nbsp; a chronological order.&lt;br /&gt;
&lt;br /&gt;
The input to the map is the set of log records. Let us say (userid,time) not in any order.&lt;user t=""&gt;&lt;/user&gt;&lt;br /&gt;
&lt;br /&gt;
The output from the reducer needs to be sorted by user by time.&lt;br /&gt;
&lt;br /&gt;
user1, t1&lt;br /&gt;
user1, t2&lt;br /&gt;
user1, tn&lt;br /&gt;
.&lt;br /&gt;
.&lt;br /&gt;
usern, t1&lt;br /&gt;
usern,tn&lt;br /&gt;
&lt;br /&gt;
where for each user, t1 &amp;lt; t2 ..... &amp;lt; tn. &lt;br /&gt;
&lt;br /&gt;
We know Hadoop sorts the keys. We could make the map output a key that is a combination of user and time&lt;userid logrecord="logrecord" time="time"&gt;. In other words, a composite key that is a combination of userid and time. We need to write a comparator that hadoop can use to sort using the composite key. However a side effect of this is that hadoop will send records for the same user to different reducers. This means you will be not be able to reduce all the users records as a group.&lt;/userid&gt;&lt;br /&gt;
&lt;br /&gt;
Remember, the hadoop framework sorts the output of a Map by key. It then partitions the output. Each partition is intended for a Reducer.&amp;nbsp; All values for a key from a Map are in the same partition. We want all the records for a user to go to the same Reducer. This implies they need to be in the same partition. Fortunately there is a way to influence partitioning. We tell hadoop to put all records for a user in the same partition by implementing a partioner that partitions just based on user id and not the composite key.&lt;br /&gt;
&lt;br /&gt;
A Reducer receives partitions from several Mappers. Remember that reducer is called with a key and list of values for that key. Before the framework can call the reducer, it has to group the values for that key from all the partitions.&amp;nbsp; We want the grouping to happen based on userid ( not userid + time).&lt;br /&gt;
So we need to implement a grouping comparator that hadoop uses for grouping and this will compare userids. Since the records in each partition are sorted by userid and time, grouping which is a merge process preserves the sort order - like a merge sort. &lt;br /&gt;
&lt;br /&gt;
In summary you need to&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;&lt;span style="background-color: white;"&gt;&lt;b&gt;Step 1: Make the map output key a composite of the natural key and value&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&amp;nbsp;Make the Map ouput key a composite of the natural key (userid) and the value (time). A composite key implements a WritableComparable. You need to override the compareTo method to use both userid and time. This method is used to order the keys using the composite key. You also need to override the write and readFields method which are called for serialization and deserialization.&lt;br /&gt;
You tell Hadoop to use this composite key by calling the method&lt;br /&gt;
job.setOutputKeyClass(UserTime.class) ;&lt;br /&gt;
&lt;br /&gt;
Map output would be like&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;(user1,t1) , t1&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;(user1,t2),t2&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;.&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;(user2,t1),t1&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;(user2,t2),t2 &lt;/span&gt;&lt;br /&gt;
&amp;nbsp;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;
&lt;span style="color: blue;"&gt;&lt;b&gt;Step 2: Partition Map output using only the natural key&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
To ensure all values for the key go the same reducer, you need to implement a partitioner. This is a class that extends org.apache.hadoop.mapreduce.Partitioner. Override the getPartition method to return a partition based on the natural key user id. You tell hadoop to partition using this partitioner with the call&lt;br /&gt;
job.setPartitionerClass(NaturalKeyPartitioner.class) ; &lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;Partition from Map 1:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;(user1,t1),t1&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;(user1,t2),t2&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;(user2,t1),t1&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;(user2,t2),t2&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;Partition from Map 2:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;(user1,t3),t3&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;(user1,t4),t4&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;(user2,t3),t3&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;(user2,t4),t4&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;&lt;b&gt;Step 3: Group values using only the natural key &lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
To ensure the reducer gets called with all the values for the key, you need to implement a WritableComparator based on the natural key user id. Override the compare method to compare based on the userid. You tell hadoop to use this comparator for grouping with the call&lt;br /&gt;
job.setGroupingComparatorClass(NaturalKeyGroupComparator.class) ;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;Input to Reducer 1:&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;key = (user1,t1), values = t1,t2,t3,t4&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;Input to Reducer 2:&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;key = (user2,t1), values = t1,t2,t3,t4&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;Output from Reducer1:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;user1, t1&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;user1, t2&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;user1, t3&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;user1, t4 &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
The complete sample source code is at &lt;a href="https://sites.google.com/site/khangaonkar/home/hadoop-mapreduce-samples"&gt;SecondarySort.jar&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/LQjh0rNb8SI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/8595988639382761190/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2013/02/hadoop-secondary-sort-sorting-values.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/8595988639382761190?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/8595988639382761190?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/LQjh0rNb8SI/hadoop-secondary-sort-sorting-values.html" title="Hadoop Secondary Sort: Sorting values" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2013/02/hadoop-secondary-sort-sorting-values.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0MMSXg5cSp7ImA9WhNbE0U.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-5720188671664089448</id><published>2013-01-16T17:18:00.000-08:00</published><updated>2013-01-16T17:18:08.629-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-01-16T17:18:08.629-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><title>Generics : Array Creation</title><content type="html">
How do you write a method to convert a generic collection to an Array ?

A naive implementation would be:
&lt;pre&gt; &lt;font color="red"&gt;
public static &amp;lt;T&amp;gt; T[] convertToArray(Collection&amp;lt;T&amp;gt; c) {

   T[] a = new T[c.size()] ; // compilation error
   int i = 0 ;
   for (T x : c) {
      a[i++] = x ;

   }
     
} &lt;/font&gt;
&lt;/pre&gt;
The code does not compile because the type of Array is required to be able to create an array. Array is what is known as a reifiable type. A type is reifiable if its type information is available at runtime. Any java class or primitive type is reifiable. 
&lt;br&gt;

Generics on the other hand are implemented by erasure - that is the type information is erased and runtime uses casts to get appropriate behavior. 

So while 
&lt;pre&gt; &lt;font color="blue"&gt;
List&amp;lt;T&amp;gt; a = new ArrayList&amp;lt;T&amp;gt; () ; &lt;/font&gt;&lt;/pre&gt;

works because T is erased. Under the hood, just an ArrayList() is created and casts added when getting T. However, 
&lt;pre&gt; &lt;font color="red"&gt;
T[] a = new T[size] ; // compile error
&lt;/font&gt; &lt;/pre&gt; will not work because for arrays type information is required. &lt;br&gt;


The solution is to use reflection, which is what you would if you wanted to dynamically create an instance of any reifiable type like a plain java class.

The method signature in our example changes a little to take the array type as an additional parameter. Since it is easier to pass in the required array
&lt;pre&gt; &lt;font color="blue"&gt;
public static &amp;lt;T&amp;gt; T[] convertToArray(Collection&amp;lt;T&amp;gt; c, T arry) {
   if (arry.length &lt; c.size()) {

      arry = (T[]) java.lang.reflect.Array.newInstance(
                             arry.getClass().getComponentType(),c.size()) ;

      int i = 0 ;
      for (T x : c) {
          a[i++] = x ;
      }
   }
     
} &lt;/font&gt;
&lt;/pre&gt;
The newInstance method on Array creates an array of the required type. getComponentType returns the type of elements of the array. This is analagous
to using reflection to a create an instance of class K. You would do
&lt;pre&gt; &lt;font color="blue"&gt;
K.getClass.newInstance() ;
&lt;/font&gt; &lt;/pre&gt;
In summary, in generic methods, you can use new operator to create non-reifiable types (eg List&amp;lt;T&amp;gt;) because the type information is erased during compilation (List is created). But for reifiable type, you need to use reflection because the type information is required and cannot be erased


&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/owbn-gAY6UQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/5720188671664089448/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2013/01/generics-array-creation.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/5720188671664089448?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/5720188671664089448?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/owbn-gAY6UQ/generics-array-creation.html" title="Generics : Array Creation" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2013/01/generics-array-creation.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0MER3g4eSp7ImA9WhNWGUg.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-1363030776396363748</id><published>2012-12-19T13:10:00.000-08:00</published><updated>2012-12-19T13:10:06.631-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-12-19T13:10:06.631-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java performance" /><title>JAVA Garbage Collection</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
Garbage (GC) collection is the process by which the java virtual machine frees up memory by releasing the memory taken up by objects that are no longer referenced by any other objects. Garbage collection is automatic. For simple applications, the developer does even need to be aware of garbage collection. But for applications with large memory footprint or are long running or have low latency requirements, some understanding is necessary to ensure that garbage collection does not interfere with the application. A common interference of garbage collection is that the application seems to stop responding or the time to respond goes up randomly. The articles lists a few important points every Java developer needs to know about garbage collection.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;1.0&amp;nbsp; Generational GC&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Since JDK 5 , the garbage collectors are what are called generational collectors. The heap is divided into regions based on the age of the objects. The young generation has objects that are short lived. The tenured generation has objects that are long lived. All objects are first created in the young region and after a while if they are alive, they are moved to the tenured generation. Garbage collection of the young region happens frequently and is generally fast. GC for the tenured region happens less frequently. Since most objects are short lived, this makes the GC more efficient.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;2.0&amp;nbsp; Types of collectors&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Serial Collector&lt;/i&gt; : Garbage from both young and tenured regions is done serially and while this happens your application is paused. This is the default collector on single cpu machines and for small heaps sizes ( less that 2G) . This is fine if your application does not care about pauses.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Parallel Collector&lt;/i&gt;: This is the default collector on server class machines ( multiple CPUs and greater than 2G heap size). Multiple threads/cpus are used to do garbage collection in parallel for the young region. This makes collection faster. But the application is still paused when GC happens. For the tenured region, the GC is serial as in a serial collector.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Parallel Compacting Collector&lt;/i&gt;: GC for the young region is the same as parallel collector and uses multiple threads. However GC for tenured region happens in parallel using multiple CPUs. Application is paused when GC happens.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Concurrent Mark Sweep Collector (CMS)&lt;/i&gt;: For young region, it is same as in parallel collectors. But for tenured region,&amp;nbsp; most of the time, GC runs concurrently with the application. The application pauses during GC are expected to be much shorter than the other collectors. This is an ideal choice for applications that cannot tolerate long pauses.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;3.0 Understanding GC in your application&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Before you try to tune your applications GC, it is important to understand when GC is happening, how much time it takes and how much memory it is reclaiming. The JVM provides the following options to log GC activity.&lt;br /&gt;
&lt;br /&gt;
The &lt;span style="color: red;"&gt;-XX:+PrintGCDetails&lt;/span&gt; prints GC details described below. The &lt;span style="color: red;"&gt;-XX:+PrintGCTimeStamps&lt;/span&gt; prints the time from the start of the JVM to when each GC happened. The -&lt;span style="color: red;"&gt;Xloggc:gcfilename.log&lt;/span&gt; writes the log to gcfilename.log.&lt;br /&gt;
&lt;br /&gt;
In the gc log, you will see a number of lines like&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;11.561: [GC [PSYoungGen: 868524K-&amp;gt;294158K(1198848K)] 1303221K-&amp;gt;728855K(4694144K), 0.3640750 secs] [Times: user=1.44 sys=0.02, real=0.37 secs]&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
This indicates that a GC of the young region occurred at time 11.561 secs from start. The young region was reduced from 868524k to 294158k (66%).&amp;nbsp; The number (1198848K) is the memory allocated to the young region. The total heap was reduced from 1303221K to 728855K or 44%. The number (4694144K) is the total heap. This GC took .37 secs.&lt;br /&gt;
&lt;br /&gt;
You will see a few lines like&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;3602.170: [Full GC (System) [PSYoungGen: 16250K-&amp;gt;0K(1662080K)] [PSOldGen: 1594630K-&amp;gt;1578665K(3495296K)] 1610881K-&amp;gt;1578665K(5157376K) [PSPermGen: 22314K-&amp;gt;22314K(35904K)], 3.4836190 secs] [Times: user=3.45 sys=0.03, real=3.48 secs]&lt;/span&gt; &lt;br /&gt;
&lt;br /&gt;
This indicates that a full GC occurred at 3602.17 secs from the start. The young region was reduced from 16250K to 0K. The old or tenured region was reduced from&amp;nbsp; 1594630K to 1578665K. The total heap was reduced from 1610881K to 1578665K. The GC took 3.48 sec.&lt;br /&gt;
&lt;br /&gt;
The &lt;a href="http://www.tagtraum.com/gcviewer.html"&gt;GCViewer&lt;/a&gt; is free tool to view GC logs graphically. &lt;br /&gt;
&lt;table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-6kdWBfilfMA/UM-58BkxnzI/AAAAAAAACz0/qH6z1z71CjQ/s1600/Screen+shot+2012-12-17+at+4.30.14+PM.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="488" src="http://4.bp.blogspot.com/-6kdWBfilfMA/UM-58BkxnzI/AAAAAAAACz0/qH6z1z71CjQ/s640/Screen+shot+2012-12-17+at+4.30.14+PM.png" width="640" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;GC log viewed in GCViewer&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;br /&gt;
The very small black lines at the bottom indicate the small GCs. The tall black lines at the hourly mark are the Full GCs. The blue peaks are lines indicating how the used heap goes up and goes down after a GC. The ruby red line just below the blue spikes shows the growth of the tenured region. You can see that the tenured region drops after a full GC. Full GCs take a lot of time and you want to reduce the frequency with which they occur.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;4.0 Tuning options&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;The JVM offers a few knobs that one can turn to tune the GC in a way most suitable to your machine and your application.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: red;"&gt;-Xms -Xmx&lt;/span&gt; options are used to set the initial and maximum size of the heap.&amp;nbsp; Maximum heap size should be less that physical memory on the machine to avoid paging and one should also leave aside memory for the operating system and other applications running on the same machine. While bigger heap and more memory are good because the GC has to collect less often, when it does have to collect, it has to do more work and the GC pauses could be longer.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: red;"&gt;–XX:+UseSerialGC&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: red;"&gt;–XX:+UseParallelGC&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: red;"&gt;–XX:+UseParallelOldGC&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: red;"&gt;–XX:+UseConcMarkSweepGC&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
These options are used to select the GC. SerialGC and ParallelGC are selected by default depending on machine type as described earlier.&amp;nbsp; Applications that have low latency requirements and cannot tolerate long GC pauses should consider switching to the Concurrent Mark Sweep GC.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: red;"&gt;-XX:NewSize=n&lt;/span&gt; is used to set the default initial size of the young generation. Most applications have many short lived objects and few long lived objects. The newsize should be large enough that short lived objects fit into the young generation and are garbage collected in the small GCs. If the young generation is too small, short lived object get moved to the tenured region which leads to longer Full GCs.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: red;"&gt;-XX:MaxPauseTimeMillis&lt;/span&gt; is a hint to the GC as to the desired maximum pause time. This is just a hint and may or may not be honoured.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;5.0 References&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
There are many other tuning options and the following documents from Oracle are good references on tuning options as well as garbage collection in general:&lt;br /&gt;
&lt;br /&gt;
1. &lt;a href="http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html"&gt;http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html&lt;/a&gt;&lt;br /&gt;
2. &lt;a href="http://www.oracle.com/technetwork/java/javase/tech/memorymanagement-whitepaper-1-150020.pdf"&gt;http://www.oracle.com/technetwork/java/javase/tech/memorymanagement-whitepaper-1-150020.pdf &lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/i8lt3xQgZnY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/1363030776396363748/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/12/java-garbage-collection.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/1363030776396363748?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/1363030776396363748?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/i8lt3xQgZnY/java-garbage-collection.html" title="JAVA Garbage Collection" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-6kdWBfilfMA/UM-58BkxnzI/AAAAAAAACz0/qH6z1z71CjQ/s72-c/Screen+shot+2012-12-17+at+4.30.14+PM.png" height="72" width="72" /><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/12/java-garbage-collection.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkcMRns-eSp7ImA9WhNQEkQ.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-4610672916933147097</id><published>2012-11-18T20:08:00.000-08:00</published><updated>2012-11-18T20:08:07.551-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-11-18T20:08:07.551-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java Spring" /><title>Spring JAVA config tutorial</title><content type="html">The classic way of configuring beans in Spring is using XML. But many programmers find switching between XML and java code annoying. Having to go into XML to debug dependencies and track down implementation classes has turned many programmers away from Spring.

Since version 3.0, Spring has supported the ability to do configuration using classes and annotations without the need to use XML.
In XML , to define a bean, you added to the application.xml &lt;br&gt;

&lt;span style="color: blue;"&gt;&amp;lt;bean id="Hello" class="com.mj.Hello"/&amp;gt;&lt;/span&gt; &lt;br&gt;

To use the bean you wrote code like

&lt;br&gt;
&lt;pre&gt;&lt;span style="color: blue;"&gt;ApplicationContext ac = new ClassPathXmlApplicationContext("application.xml") ; 
BeanFactory bf = (BeanFactory) ac ; 
Hello h = bf.getBean("Hello")
h.someMethod() ;&lt;/span&gt;&lt;/pre&gt;

Let us write a new spring application using no XML. &lt;br&gt;&lt;br&gt;

&lt;span style="color: purple;"&gt;&lt;b&gt;Step1: Define the bean interface and implementation&lt;/b&gt;&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;
&lt;span style="color: blue;"&gt;public interface Greeting {
    public String getMessage() ;
}

public class NewYearGreeting implements Greeting {
    public String getMessage() {
        return "Happy New Year" ;
    }
}
public class BirthDayGreeting  implements Greeting {
    public String getMessage() {
        return "Happy Birthday" ;
    }
}
&lt;/span&gt;&lt;/pre&gt;
&lt;span style="color: purple;"&gt;&lt;b&gt;Step 2: Define the bean configuration in JAVA&lt;/b&gt;&lt;/span&gt;&lt;br&gt;
The bean definitions are created by writing a class and annotating it with&amp;nbsp;@Configuration. The individual beans are defined by annotating the method that creates the bean with @Bean. &lt;/span&gt;&lt;/span&gt;

&lt;pre&gt;&lt;span style="color: blue;"&gt;@Configuration
public class GreetingSpringConfig {
    @Bean(name="newyear")
    public Greeting newyearGreeting() {
        return new NewYearGreeting() ;
    }
    @Bean(name="birthday")
    public Greeting birthdayGreeting() {
        return new BirthDayGreeting() ;
    }
 }&lt;/span&gt; &lt;/pre&gt;

&lt;span style="color: purple;"&gt;&lt;b&gt;Step 3: Use the beans from a client&lt;/b&gt;&lt;/span&gt;&lt;/pre&gt;

&lt;pre&gt; &lt;span style="color: blue;"&gt;public class GreetingSample {
    public static void main(String args[]) {
        ApplicationContext ac = new    
        AnnotationConfigApplicationContext(GreetingSpringConfig.class) ;
        Greeting g = (Greeting) ac.getBean("newyear") ;
        System.out.println(g.getMessage()) ; 
        g = (Greeting) ac.getBean("birthday") ;
        System.out.println(g.getMessage()) ; 
}&lt;/span&gt; &lt;/pre&gt;


Note that instead of using ClassPathXmlApplicationContext ,we used AnnotationConfigApplicationContext. AnnotionConfigApplicationContext can process not just @Configuration annotated classes, but also JSR 330 annotated classes. If you don'nt like switching between JAVA &amp;amp; XML , then Java config is simple way of&amp;nbsp;wiring your spring beans.
&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/2RrDKfbAVG4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/4610672916933147097/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/11/spring-java-config-tutorial.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/4610672916933147097?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/4610672916933147097?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/2RrDKfbAVG4/spring-java-config-tutorial.html" title="Spring JAVA config tutorial" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/11/spring-java-config-tutorial.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DE8BR388eCp7ImA9WhNTFkQ.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-3774391093584281127</id><published>2012-10-19T18:34:00.000-07:00</published><updated>2012-10-19T18:34:16.170-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-10-19T18:34:16.170-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="JAVA concurrency scalability" /><title>JAVA Synchronized HashMap vs ConcurrentHashMap</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
A synchronized HashMap is a Map returned by calling&amp;nbsp; synchronizedMap methods of java.util.Collections class.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="background-color: blue;"&gt;&lt;span style="color: blue;"&gt;&lt;span style="background-color: white;"&gt;Map syncMap = Collections.synchronizedMap(new HashMap()) ;&lt;/span&gt;&lt;/span&gt;&lt;span style="background-color: white;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
The characteristics of synchronized collections are:&lt;br /&gt;
&lt;br /&gt;
1. Each method is synchronized using an object level lock. So the get and put methods on syncMap acquire a lock on syncMap.&lt;br /&gt;
&lt;br /&gt;
2. Compound operations such as check -then - update or iterating over the collection require the client to explicitly acquire a lock on the collection object.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;synchronized(syncMap) {&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Integer val = syncMap.get(key) ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ( val == null) {&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; syncMap.put(key,&amp;nbsp; newvalue) ;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;}&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Without synchronization, multiple threads calling the code can lead to inconsistent values.&lt;br /&gt;
&lt;br /&gt;
3. Locking the entire collection is a performance overhead. While one thread holds on to the lock, no other thread can use the collection.&lt;br /&gt;
&lt;br /&gt;
4. HashMap and other collections from java.util.collections throw ConcurrentModificationException if a thread tries to modify a collection while another thread is iterating over it. The recommended approach is to acquire a lock before iterating over the map.&lt;br /&gt;
&lt;br /&gt;
ConcurrentHashMap was introduced in JDK 5. &lt;br /&gt;
&lt;br /&gt;
The characteristics of ConcurrentHashMap are:&lt;br /&gt;
&lt;br /&gt;
1. There is no locking at the object level. The locking is at a much finer granularity. For a concurrentHashMap , the locks may be at a hashmap bucket level.&lt;br /&gt;
&lt;br /&gt;
2. The effect of lower level locking is that you can have concurrent readers and writers which is not possible for synchronized collections. This leads to much more scalability.&lt;br /&gt;
&lt;br /&gt;
3. Since there no locking at the object level,&amp;nbsp; additional atomic methods are provided for some compound operations. The ConcurrentHashMap has methods putIfAbsent, remove, replace all of which require checking a key or value and then performing a put or remove.&lt;br /&gt;
&lt;br /&gt;
The code above can be replaced by&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;ConcurrentHashMap concMap ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;.&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;.&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;. &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;concMap.putIfAbsent(key,newvalue) ; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
4. ConcurrentHashMap does not throw a ConcurrentModificationException if one thread tries to modify it while another is iterating over it. The iterator returned by ConcurrentHashMap is an iterator on a snapshot of the data when the iterator was created. It may or may not have changes made by other threads after the iterator was created.&lt;br /&gt;
&lt;br /&gt;
In general, using ConcurrentHashMap instead of synchronized Map gives you much better scalability and you do not have to explicitly synchronize on the map object.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/cZoyQSexGoQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/3774391093584281127/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/10/java-synchronized-hashmap-vs.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/3774391093584281127?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/3774391093584281127?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/cZoyQSexGoQ/java-synchronized-hashmap-vs.html" title="JAVA Synchronized HashMap vs ConcurrentHashMap" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/10/java-synchronized-hashmap-vs.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUYCQ3c9fyp7ImA9WhJUGEk.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-1380384492156079798</id><published>2012-09-16T18:59:00.000-07:00</published><updated>2012-09-16T18:59:22.967-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-09-16T18:59:22.967-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="JAVA Hadoop MapReduce BigData" /><title>Hadoop 2.x Tutorial</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
Hadoop 2.x release involves many changes to Hadoop and MapReduce. The centralized JobTracker service is replaced with a ResourceManager that manages the resources in the cluster and an ApplicationManager that manages the application lifecycle. These architectural changes enable hadoop to scale to much larger clusters. A new release also has minor changes to scripts,directories and environment variables necessary to get started. This is a getting started tutorial for 2.x. The intended audience is someone who is completely new to hadoop and needs a jumpstart or someone who has played a little bit with a previous version and wants to start using 2.x.&amp;nbsp; The emphasis on getting hadoop running and not necessarily explaining concepts which is covered in many other blogs.&lt;br /&gt;
&lt;br /&gt;
In this tutorial we will&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;(1) Setup a hadoop in a single node environment&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;(2) Create and move files to HDFS&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;(3) Write and execute a simple MapReduce Application&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Step 1: Download Hadoop and install&lt;/b&gt; &lt;br /&gt;
Download the current 2.x.x release from &lt;a href="http://hadoop.apache.org/releases.html"&gt;http://hadoop.apache.org/releases.html&lt;/a&gt;.&lt;br /&gt;
I downloaded hadoop-2.0.1-alpha.tar.gz.&lt;br /&gt;
Untar the file to a directory say ~/hadoop-2.0.1-alpha.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Step 2: Set the following environment variables&lt;/b&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ export HADOOP_HOME=~/hadoop-2.0.1-alpha&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ export HADOOP_MAPRED_HOME=~/hadoop-2.0.1-alpha&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ export HADOOP_COMMON_HOME=~/hadoop-2.0.1-alpha&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ export HADOOP_HDFS_HOME=~/hadoop-2.0.1-alpha&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ export YARN_HOME=~/hadoop-2.0.1-alpha&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ export HADOOP_CONF_DIR=~/hadoop-2.0.1-alpha/etc/hadoop&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="color: black;"&gt;If these environment variables are not setup correctly, Step 4 might fail. &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Step 3: Update the configuration files&lt;/b&gt;&lt;br /&gt;
&amp;nbsp;&lt;u&gt;&lt;i&gt;hdfs-site.xml &lt;/i&gt;&lt;/u&gt;&lt;br /&gt;
Add the following configuration to etc/hadoop/hdfs-site.xml. If you do not set dfs.namenode.name.dir and dfs.datanote.data.dir explicitly, hadoop will default to a temp directory that the OS may clean up on restart and you will lose data. This is a common omission for newbees. &lt;br /&gt;
&lt;span style="color: white;"&gt;&amp;nbsp;&lt;span style="color: #38761d;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;br /&gt;&amp;lt;configuration&amp;gt;&lt;br /&gt;&amp;lt;property&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;name&amp;gt;dfs.replication&amp;lt;/name&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;value&amp;gt;1&amp;lt;/value&amp;gt;&lt;br /&gt;&amp;nbsp; &amp;lt;/property&amp;gt;&lt;br /&gt;&amp;nbsp; &amp;lt;property&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;name&amp;gt;dfs.namenode.name.dir&amp;lt;/name&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;value&amp;gt;file:/Users/joe/hadoop-hdfs201/data/hdfs/namenode&amp;lt;/value&amp;gt;&lt;br /&gt;&amp;nbsp; &amp;lt;/property&amp;gt;&lt;br /&gt;&amp;nbsp; &amp;lt;property&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;name&amp;gt;dfs.datanode.data.dir&amp;lt;/name&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;value&amp;gt;file:/Users/joe/hadoop-hdfs201/data/hdfs/datanode&amp;lt;/value&amp;gt;&lt;br /&gt;&amp;nbsp; &amp;lt;/property&amp;gt;&lt;br /&gt;&amp;lt;/configuration&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;u&gt;&lt;i&gt;core-site.xml&lt;/i&gt;&lt;/u&gt;&lt;br /&gt;
Add the following to etc/hadoop/core-site.xml.&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;?xml version="1.0" encoding="UTF-8"?&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;configuration&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;property&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;span style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;name&amp;gt;fs.default.name&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;/name&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;span style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;value&amp;gt;hdfs://localhost:9000&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;/value&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;span style="font-size: small;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;/property&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;/configuration&lt;/span&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;u&gt;&lt;i&gt;yarn-site.xml&lt;/i&gt;&lt;/u&gt;&lt;br /&gt;
Add the following to etc/hadoop/yarn-site.xml. &lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;?xml version="1.0"?&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;configuration&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&amp;nbsp; &lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;property&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;name&amp;gt;yarn.nodemanager.aux-services&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/name&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;value&amp;gt;mapreduce.shuffle&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/value&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&amp;nbsp; &lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/property&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&amp;nbsp;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;property&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;name&amp;gt;yarn.nodemanager.aux-services.mapreduce.shuffle.class&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/name&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;value&amp;gt;org.apache.hadoop.mapred.ShuffleHandler&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/value&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&amp;nbsp; &lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/property&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;configuration&gt;&lt;property&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/configuration&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/property&gt;&lt;/configuration&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;u&gt;&lt;i&gt;mapred-site.xml&lt;/i&gt;&lt;/u&gt;&lt;br /&gt;
Add the following to etc/hadoop/mapred-site.xml.&lt;br /&gt;
&lt;span style="color: #38761d;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;?xml version="1.0"?&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;configuration&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&amp;nbsp; &lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;property&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;name&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;mapreduce.framework.name&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/name&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;value&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;yarn&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/value&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&amp;nbsp; &lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/property&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;lt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;/configuration&lt;span style="font-size: small;"&gt;&lt;span style="font-size: small;"&gt;&lt;span style="background-color: white;"&gt;&lt;span style="font-size: small;"&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Step 4: Start the processes.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Change to the directory where hadoop is installed.&lt;br /&gt;
cd ~/hadoop-2.0.1-alpha &lt;br /&gt;
&lt;br /&gt;
If you are running hadoop for the first time, the following command will format HDFS. Do not run this everytime as it formats and thus deletes any existing data&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ bin/hadoop namenode -format&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Start the namenode.&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ sbin/hadoop-daemon.sh start namenode&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Start the datanode.&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="background-color: white;"&gt;hadoop-2.0.1-alpha$ sbin/hadoop-daemon.sh start datanode&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
In hadoop 2.x , there is no jobtracker. Instead there is a resourcemanager and a nodemanager.&lt;br /&gt;
Start the resourcemanager.&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ sbin/yarn-daemon.sh start resourcemanager&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Start the nodemanager.&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ sbin/yarn-daemon.sh start nodemanager&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Start the history server.&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ sbin/mr-jobhistory-daemon.sh start historyserver&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Type jps. It lists the java processes running. Check that all the processes are started&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ jps&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;br /&gt;&lt;/span&gt;
&lt;span style="color: purple;"&gt;1380 DataNode&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;1558 Jps&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;1433 ResourceManager&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;1536 JobHistoryServer&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;1335 NameNode&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;1849 NodeManager&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Once I ran into a problem where the mapreduce job was being accepted but never executed. In looking at the logs, I found that the NodeManager had not started.&amp;nbsp; The jps command is a good check to ensure all necessary processes are started.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Step 5: Get familiar with HDFS&lt;/b&gt;&lt;br /&gt;
The HDFS commands are documented in the older releases of hadoop&lt;br /&gt;
&lt;a href="http://hadoop.apache.org/docs/r1.0.3/file_system_shell.html"&gt;http://hadoop.apache.org/docs/r1.0.3/file_system_shell.html&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ bin/hadoop fs -ls&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
will list the home directory. If you are user joe. HDFS creates a /user/joe directory for you. Any files or directories you create will be created here.&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;br /&gt;&lt;/span&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ bin/hadoop fs -mkdir /user/joe/input&lt;/span&gt;&lt;br /&gt;
creates a directory input&lt;br /&gt;
&lt;br /&gt;
In the local filesystem create a file app.log with the data&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: orange;"&gt;user01|1|2|3|4|5&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user02|1|2|3|4|5&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user03|1|2|3|4|5&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user01|1|2|3|4|5&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user02|1|2|3|4|5&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user01|1|2|3|4|5&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user03|1|2|3|4|5&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user01|1|2|3|4|5&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user04|1|2|3|4|5&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user01|1|2|3|4|5&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
let us pretend this is a log file from a web application where for each request we have logged userid and some additional data. We will later use this .as input for a MapReduce program.&lt;br /&gt;
You can move it to hdfs using the command&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ bin/hadoop fs -moveFromLocal ~/projects/app.log /user/manoj/input/&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
To print the file just moved to hdfs&lt;br /&gt;
&lt;span style="color: purple;"&gt;&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ &lt;/span&gt;bin/hadoop fs -cat /user/manoj/input/app.log&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Step 6: Create a MapReduce program&lt;/b&gt;&lt;br /&gt;
The MapReduce programming model is explained in the blog &lt;a href="http://khangaonkar.blogspot.com/2011/05/what-is-mapreduce.html"&gt;What is MapReduce ?&lt;/a&gt;. Let us write a simple mapreduce program that uses that app.log we created in step5 as input and outputs the number of times a user visited the site. UserCountMap reads a line and outputs (username,1)&lt;user&gt;.&amp;nbsp; UserCountReducer takes as input (username, list of 1s) &lt;username 1s="1s" ist="ist" of="of"&gt; and outputs (username, sum)&lt;username 1s="1s" of="of" sum="sum"&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;public class UserCount {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static class UserCountMap extends Mapper&lt;longwritable ext="ext" ntwritable="ntwritable"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; public void map(LongWritable key, Text Value, Context context) &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; throws IOException, InterruptedException {&lt;/longwritable&gt;&lt;/span&gt;&lt;/username&gt;&lt;/username&gt;&lt;/user&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; String line = Value.toString() ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; String tokens[] = line.split("\\|") ;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; if (tokens.length &amp;gt; 0) {&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; context.write(new Text(tokens[0]),new IntWritable(1)) ;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static class UserCountReducer extends Reducer&lt;text ext="ext" ntwritable="ntwritable"&gt;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; public void reduce(Text key, Iterable&lt;intwritable&gt; values, Context context)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; throws IOException, InterruptedException {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; int count = 0 ; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; for (IntWritable value : values) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; count = count + value.get() ;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; context.write(key, new IntWritable(count)) ;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(String[] args) throws Exception {&lt;/intwritable&gt;&lt;/text&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; Configuration conf = new Configuration();&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; Job job = new Job(conf);&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; job.setJarByClass(UserCount.class) ;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; FileInputFormat.addInputPath(job, new Path(args[0])) ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; FileOutputFormat.setOutputPath(job, new Path(args[1])) ;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; job.setMapperClass(UserCountMap.class) ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; job.setReducerClass(UserCountReducer.class) ;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; job.setOutputKeyClass(Text.class) ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; job.setOutputValueClass(IntWritable.class) ;&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; System.exit(job.waitForCompletion(true) ? 0 : 1) ;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; } &amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;
&lt;span style="color: blue;"&gt;}&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Compile the program and package into a jar called say usercount.jar.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Step 7: Run the madreduce program&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ bin/hadoop jar ~/projects/usercount.jar com.mj.UserCount /user/joe/input /user/joe/output&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
you should see output some of which is shown below.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;12/09/12 17:41:28 INFO mapreduce.Job: The url to track the job: http://joe.local:8088/proxy/application_1347494786422_0003/&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;12/09/12 17:41:28 INFO mapreduce.Job: Running job: job_1347494786422_0003&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;12/09/12 17:41:37 INFO mapreduce.Job: Job job_1347494786422_0003 running in uber mode : false&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;12/09/12 17:41:37 INFO mapreduce.Job:&amp;nbsp; map 0% reduce 0%&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;12/09/12 17:41:43 INFO mapreduce.Job:&amp;nbsp; map 100% reduce 0%&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;12/09/12 17:41:45 INFO mapreduce.Job:&amp;nbsp; map 100% reduce 100%&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;12/09/12 17:41:45 INFO mapreduce.Job: Job job_1347494786422_0003 completed successfully&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
You can see the status of the job at http:/localhost:8088&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: purple;"&gt;hadoop-2.0.1-alpha$ bin/hadoop fs -ls /user/joe/output&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;212/09/12 17:45:29 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;Found 2 items&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;-rw-r--r--&amp;nbsp;&amp;nbsp; 1 manoj supergroup&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0 2012-09-12 17:41 /user/joe/output/_SUCCESS&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: purple;"&gt;-rw-r--r--&amp;nbsp;&amp;nbsp; 1 manoj supergroup&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 39 2012-09-12 17:41 /user/joe/output/part-r-00000&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
The file part-r-00000 will have the output which is&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: orange;"&gt;user01&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user02&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user03&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/span&gt;&lt;br /&gt;
&lt;span style="color: orange;"&gt;user04&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Hoping these steps help jumpstart you with hadoop and get you going on your way to write more complex Map Reduce jobs to analyze your big data.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/C_l54ZN4a7A" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/1380384492156079798/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/09/hadoop-2x-tutorial.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/1380384492156079798?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/1380384492156079798?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/C_l54ZN4a7A/hadoop-2x-tutorial.html" title="Hadoop 2.x Tutorial" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/09/hadoop-2x-tutorial.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C04FRXg9cSp7ImA9WhJWEkg.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-8333510851894405764</id><published>2012-08-17T18:05:00.000-07:00</published><updated>2012-08-17T18:05:14.669-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-08-17T18:05:14.669-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><title>JAVA enum tutorial</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
Java language has supported enum type for several releases. Yet, many programmers do not use it or do not fully understand all features of enum.&lt;br /&gt;
&lt;br /&gt;
We still see a lot of code like this:&lt;br /&gt;
&lt;br /&gt;
&lt;div style="color: blue;"&gt;
public static final int LIGHT = 1 ;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
public static final int MEDIUM = 2 ;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
public static final int HEAVY = 3 ;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
public static final int SUPERHEAVY = 4 ; &lt;/div&gt;
&lt;div style="color: blue;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
int weight_range&amp;nbsp; = getRange():&lt;/div&gt;
&lt;div style="color: blue;"&gt;
if (weight_range == LIGHT&amp;nbsp; ) {&lt;/div&gt;
&lt;div style="color: blue;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
} else if (weight_range == MEDIUM) {&lt;/div&gt;
&lt;div style="color: blue;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
} else if (weight_range == HEAVY) {&lt;/div&gt;
&lt;div style="color: blue;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
}&lt;/div&gt;
&lt;br /&gt;
Such code is error prone. It lacks type safety. If the weight_range is serialized/deserialized somewhere you are going to have to remember what 1,2,3 represent.&lt;br /&gt;
&lt;br /&gt;
Java enum is a cleaner type safe way of working with constants. It is a type that has a fixed set of constant fields that are instances of the type. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;1. Defining enum&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Defining enum is like defining a class.&lt;br /&gt;
&lt;br /&gt;
&lt;div style="color: blue;"&gt;
public enum WeightRange {&lt;/div&gt;
&lt;div style="color: blue;"&gt;
&amp;nbsp;LIGHT, MEDIUM, HEAVY,SUPERHEAVY&amp;nbsp; &lt;/div&gt;
&lt;div style="color: blue;"&gt;
} ;&lt;/div&gt;
&lt;br /&gt;
defines a WeightRange enum type with 4 constant fields. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;2. Creating a variable of type enum&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;WeightRange wclass = WeightRange.Medium ;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;&lt;span style="color: black;"&gt;is like declaring any other type. &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;3. Using the enum&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div style="color: blue;"&gt;
WeightRange boxer_class&amp;nbsp; = getWtRangeFromSomeWhere();&lt;/div&gt;
&lt;div style="color: blue;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
if (boxer_class == WeightRange.LIGHT) {&lt;/div&gt;
&lt;div style="color: blue;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
} else if (boxer_class == WeightRange.HEAVY) {&lt;/div&gt;
&lt;div style="color: blue;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
}&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: black;"&gt;is more type safe than the code without enums. &lt;/span&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;b&gt;4. Enum is a class.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
As mentioned above, enum is a class. Every enum type extends java.lang.Enum.&amp;nbsp; All enum types thus can have additional fields and constructors. &lt;br /&gt;
&lt;br /&gt;
The above WeightRange enum can be enhanced to add fields for low and high range. The values are provided in the constructor.&lt;br /&gt;
&lt;br /&gt;
&lt;div style="color: blue;"&gt;
public enum WeightRange {&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; LIGHT(0,70) ,&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; MEDIUM(71,150),&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; HEAVY(151,225),&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; SUPERHEAVY(226,350) ;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; private final int low ;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; private final int high ;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; WeightRange(int low, int high) {&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; this.low = low ;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; this.high = high ;&amp;nbsp; &lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/div&gt;
&lt;span style="color: blue;"&gt;}&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;5. Enum can also have methods.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
In the above enum we can add a method to check if a given weight is within a weight range.&lt;br /&gt;
&lt;br /&gt;
&lt;div style="color: blue;"&gt;
public boolean isInRange(int wt) {&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;if (wt &amp;gt;= low &amp;amp;&amp;amp; wt &amp;lt;= high)&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;return true ;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;else&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;return false ;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
}&lt;/div&gt;
&lt;br /&gt;
&lt;b&gt;5. It can have static factory method that takes a weight as parameter and returns the correct enum. &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div style="color: blue;"&gt;
public static WeightRange getWeightRange(int weight) {&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; if (weight &amp;lt;= 70)&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; return LIGHT ;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; else if (weight &amp;lt;= 150)&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; return MEDIUM ;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; else if (weight &amp;lt;= 225)&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; return HEAVY ;&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; else&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; return SUPERHEAVY ;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/div&gt;
&lt;span style="color: blue;"&gt;}&lt;/span&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;6.&amp;nbsp; Calling toString on an enum value returns the name used to define the constant field.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div style="color: blue;"&gt;
System.out.println(WeightRange.LIGHT) ;&amp;nbsp;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
prints LIGHT&lt;/div&gt;
&lt;br /&gt;
&lt;b&gt;7. In converse, enum can be constructed using a String using the valueOf method.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div style="color: blue;"&gt;
WeightRange w3 = WeightRange.valueOf("MEDIUM") ;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
System.out.println(w3) ;&lt;/div&gt;
&lt;div style="color: blue;"&gt;
will print MEDIUM&lt;/div&gt;
&lt;br /&gt;
&lt;b&gt;8. You can iterate over the constants defined in the enum.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div style="color: blue;"&gt;
for (WeightRange r : WeightRange.values()) {&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println(r) ;&lt;/div&gt;
&lt;span style="color: blue;"&gt;}&lt;/span&gt;&lt;br /&gt;
&lt;b&gt;&lt;br /&gt;&lt;/b&gt;
&lt;b&gt;9. enum constants are final.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div style="color: red;"&gt;
WeightRange.LIGHT = WeighRange.Heavy ; // compilation error&lt;/div&gt;
&lt;br /&gt;
&lt;b&gt;10. The only instances of an enum that can be created are the constants defined in the enum defintion.&lt;/b&gt;&lt;br /&gt;
&lt;div style="color: blue;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div style="color: red;"&gt;
WeightRange r = new WeightRange(12,100) ; // compilation error&lt;/div&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;&lt;span style="color: black;"&gt;Next time you need a fixed set of constants, consider using enum. It is type safe, leads to better code and your constants are within a namespace.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;
&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/VXy7pIQnE0E" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/8333510851894405764/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/08/java-enum-tutorial.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/8333510851894405764?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/8333510851894405764?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/VXy7pIQnE0E/java-enum-tutorial.html" title="JAVA enum tutorial" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>3</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/08/java-enum-tutorial.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Dk8BQ387eyp7ImA9WhJRFUo.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-3268879145227165770</id><published>2012-07-17T18:27:00.000-07:00</published><updated>2012-07-17T18:27:32.103-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-07-17T18:27:32.103-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scalability architecture database web" /><title>Scaling The Relational Database</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
Scalability of a web application is the ability to handle increased load whether it is requests or number of users or data without having to redesign or re-architect the application. Scalability should not be confused with performance or raw speed.&lt;br /&gt;
&lt;br /&gt;
One can scale by using bigger components : bigger machine, more memory, more cpu. This is vertical scaling. One can also scale by adding more copies of the same component to share the workload. This is horizontal scaling.&lt;br /&gt;
&lt;br /&gt;
In a typical multi tiered web application, the middle tier, where the application logic executes, scales easily by going stateless or using a session cookie with state stored to a centralized storage. The middle tier thus scales horizontally by just adding more application servers. In reality, it has just punted the problem down the stack to the centralized storage which generally is a relational database. The database thus becomes that hardest component to scale.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-7xRMwLBi7JM/T_n8c1nF-mI/AAAAAAAACy0/UoSiBRJwJS4/s1600/Database+Figure+1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="480" src="http://4.bp.blogspot.com/-7xRMwLBi7JM/T_n8c1nF-mI/AAAAAAAACy0/UoSiBRJwJS4/s640/Database+Figure+1.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
The typical multi tiered web application starts with the architecture shown in figure 1. As the application become popular, the number of users increase, the number of concurrent reads and writes increase. The application slows down to a crawl and eventually grinds to halt like a braking train. In the rest of this article we discuss some strategies to avoid such a situation. &lt;br /&gt;
&lt;br /&gt;
To understand issues involved in scaling the database, it is useful to think in terms of the two primary client operations on a database; READ and WRITE. Clients either read from a database or write to the database. READs can be scaled easily by adding additional servers, replicating the data and distributing the read requests across servers. Scaling WRITEs is much more complicated. Simply distributing write requests across servers will not work because the it is difficult to maintain consistency of data across servers. &lt;br /&gt;
&lt;h2 style="text-align: left;"&gt;



&lt;br /&gt;Scaling reads: Master - Slave configuration&lt;/h2&gt;
&lt;br /&gt;
As mentioned above, a simple master slave configuration as shown in figure 2 will scale READs. In most web applications 80% of the traffic is read requests and 20% write request. Hence most of time, this configuration provides significant relief.&lt;br /&gt;
&lt;a href="http://3.bp.blogspot.com/-fm1PRHP9yuU/T_n-KcipNYI/AAAAAAAACy8/pLEGoQMlFzA/s1600/Figure+2+_+Master+Slave.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="480" src="http://3.bp.blogspot.com/-fm1PRHP9yuU/T_n-KcipNYI/AAAAAAAACy8/pLEGoQMlFzA/s640/Figure+2+_+Master+Slave.png" width="640" /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
All WRITE requests are sent only to the master. READ requests are sent to the slaves. The master is replicated to the slaves. Note that a READ from a slave is not any faster than a READ from a master. This is because every WRITE on the master leads to a WRITE on the slave because of replication. However because there can be multiple slaves and READ request distributed across slaves, the system as whole has higher through put. As the number of READ requests go up, you can continue to scale by simply adding more slaves.&lt;br /&gt;
&lt;h2 style="text-align: left;"&gt;


&lt;br /&gt;Master - Master configuration&lt;/h2&gt;
&lt;br /&gt;
In the master - master configuration shown in figure 3, the two servers are setup to replicate to each other. READ and WRITE requests are sent to both servers. While this gives the appearance of scaling WRITEs as well, this approach has some serious disadvantages.&lt;br /&gt;
&lt;a href="http://1.bp.blogspot.com/-yPWQtwkEr1A/T_oGdnbqWDI/AAAAAAAACzI/DizGPfkbFUY/s1600/Figure+3_+Master+Master.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="480" src="http://1.bp.blogspot.com/-yPWQtwkEr1A/T_oGdnbqWDI/AAAAAAAACzI/DizGPfkbFUY/s640/Figure+3_+Master+Master.png" width="640" /&gt;&lt;/a&gt;&lt;br /&gt;
Since there can be a replication lag, the data in the servers might not be identical in certain time windows, leading to read inconsistency. If any columns are ids that needs to be incremented, the logic will need to be implemented at an application level, since that has to be coordinated across the servers. You will not be able to use database features like auto incrementing ids. This does not scale beyond a couple of&amp;nbsp; servers as each WRITE on every server has to be replicated to every other server.&lt;br /&gt;
&lt;br /&gt;
&lt;h2 style="text-align: left;"&gt;


Scaling writes : partitioning the database &lt;/h2&gt;
&lt;br /&gt;
The only way to scale WRITEs is to partition the database. The WRITE requests are sent to different instances of the database which may have the same or different schema. There is no replication or sharing between the instances.&lt;br /&gt;
&lt;br /&gt;
Figure 4 shows an architecture where the database is partitioned by moving some of the tables to different database instances. Tables that needs joins need to be on the same instance. You cannot do SQL joins across servers. This approach works when you have many tables in the schema and some of the tables are not really related to others. This increases application complexity. The application needs connections to several instances and be aware of which instance has which table.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-fAybWu6c03s/T_oGnzVq8MI/AAAAAAAACzQ/lIf5zPn6b-Y/s1600/Figure+4_+Database+partitions+by+tables.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="480" src="http://3.bp.blogspot.com/-fAybWu6c03s/T_oGnzVq8MI/AAAAAAAACzQ/lIf5zPn6b-Y/s640/Figure+4_+Database+partitions+by+tables.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
If you have a schema with few tables , but a large number of rows in the table, then another strategy is to keep the schema in instances the same but partition the data across servers based on some key range. For example , a USER table which has a billion rows with users from the all over the world can be partitioned across instances based on the geographical location of the user, say the continent. Figure 5 shows such an architecture. Again this requires the application logic to be smart enough to know which database instance to connect to , based on say a key value. To keep application logic simple, it helps to write a layer that handles the partitioning for the application.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;img border="0" height="480" src="http://1.bp.blogspot.com/-o36SB49pyMI/T_oG5ahDQ9I/AAAAAAAACzY/0Hvhpkyf5nA/s640/Figure+5_+Database+shards.png" width="640" /&gt;&lt;/div&gt;
&lt;h2 style="text-align: left;"&gt;


Scaling even further : NoSql&lt;/h2&gt;
If your data is even larger. of the order of&amp;nbsp; petabytes or several hundred terrabytes and ACID consistancy is not a hard requirement, you might consider NoSql datastores as discussed in &lt;a href="http://khangaonkar.blogspot.com/2011/11/what-is-nosql.html"&gt;What is NoSql ?&lt;/a&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/z2JobBkLhb4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/3268879145227165770/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/07/scaling-relational-database.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/3268879145227165770?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/3268879145227165770?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/z2JobBkLhb4/scaling-relational-database.html" title="Scaling The Relational Database" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-7xRMwLBi7JM/T_n8c1nF-mI/AAAAAAAACy0/UoSiBRJwJS4/s72-c/Database+Figure+1.png" height="72" width="72" /><thr:total>1</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/07/scaling-relational-database.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0ACRHY6fSp7ImA9WhVaFUk.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-8164104270867875663</id><published>2012-06-12T18:09:00.000-07:00</published><updated>2012-06-12T18:09:25.815-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-12T18:09:25.815-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="architecture" /><category scheme="http://www.blogger.com/atom/ns#" term="low latency" /><category scheme="http://www.blogger.com/atom/ns#" term="web" /><title>5 Tips for building low latency web applications</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
Low latency applications are those that need to service requests in a few milliseconds or microseconds. Examples of a low latency application are&lt;br /&gt;
&lt;ul style="text-align: left;"&gt;
&lt;li&gt;Servers that serve ads to be shown on web pages. If you don'nt serve the ad as per SLA dictated by publishers, you will not be given the opportunity to serve the ad. Typically the server has a few milliseconds to respond.&lt;/li&gt;
&lt;li&gt;Servers that participate in real time bidding. Again, taking internet advertising as an example, if you are bidding for the opportunity to show an ad, you have just a few milli-seconds to make a bid.&lt;/li&gt;
&lt;li&gt;Applications that provide real time quotes such as a travel portal. The portal makes requests to its partner agents. For the portal to be usable, the quotes need to displayed before the user loses interest and moves on to another travel site. Typically the portal app gives each partner a few milliseconds to respond. Otherwise the response is ignored.&lt;/li&gt;
&lt;/ul&gt;
Here are 5 simple guidelines to remember when building low latency applications.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;1. Read from Memory:&lt;/b&gt; Keep the data required to serve the request in memory. Keep as much as possible in local memory and&amp;nbsp;then&amp;nbsp;use external&amp;nbsp;caches like memcached, Ehcache and the more recent NoSql key value stores. Having to read data from a database or filesystem is slow and should be avoided.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;2. Write asynchronously:&lt;/b&gt; The thread that services the request should never be involved in writing to any external storage such database or disk. The main thread should hand off the write task to worker threads that do the writing asynchronously.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;3. Avoid locks and contention:&lt;/b&gt; Design the application in way that multiple threads are not contending for the same data and requiring locks. This is generally not an issue if the application mostly reads data. But multiple threads trying to write requires accquiring locks that slows down the application. You should consider a design where write operations are delegated to a single thread. You might need to relax the requirement of ACID properties on data. In many cases, your users will be able to tolerate data that becomes eventually consistent.&lt;br /&gt;
&lt;br /&gt;
4. &lt;b&gt;Architect with loosely coupled components:&lt;/b&gt; When tuning for low latency, you might need to co locate certain components. Co location reduces hops and hence latency. In other cases, you might need to geographically distribute components, so that they are closer to the end user. An application built with loosely coupled components can accommodate such physical changes without requiring a rewrite of the application.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;5. Horizontal scalability:&lt;/b&gt; Architect the application so that you can scale horizontally as the load on the application goes up. Even if you read all data from a cache, as the number of requests go up and size of data increases, because of things like garbage collection(in the case of JAVA) the latency will go up. To handle more requests without increasing latency, you will need to add more&amp;nbsp;servers. If you are using a key-value store, you might need to shard the data across multiple servers to keep sub millisecond response time.&lt;br /&gt;
&lt;br /&gt;
Most importantly, building and maintaining low latency is an iterative process that involves designing correctly, building, measuring performance and then tuning by looking at the design and code for improvements.&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/_he6mluXINY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/8164104270867875663/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/06/5-tips-for-building-low-latency-web.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/8164104270867875663?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/8164104270867875663?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/_he6mluXINY/5-tips-for-building-low-latency-web.html" title="5 Tips for building low latency web applications" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/06/5-tips-for-building-low-latency-web.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEQBSHs_fyp7ImA9WhVaGEw.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-4754451380172302243</id><published>2012-05-16T18:00:00.000-07:00</published><updated>2012-06-15T19:05:59.547-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-15T19:05:59.547-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="concurrency" /><title>When to use explicit Locks in JAVA ?</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
Prior to JDK 5, the only way to protect data from concurrent access was to use the synchronized keyword. The limitations of using synchronized are &lt;br /&gt;
&lt;br /&gt;
(1) A thread that tries to acquire a lock has to wait till it gets the lock. There is no way to timeout.&lt;br /&gt;
(2) A thread that is waiting for a lock cannot be interrupted.&lt;br /&gt;
(3) Since synchronized applies to a block of code, the lock has&amp;nbsp;to be&amp;nbsp;acquired and released in the same block. While this is good most of the time, there are cases where you need the flexibility of acquiring and releasing the lock in different blocks. &lt;br /&gt;
&lt;br /&gt;
The Lock interfaces and classes are well documented at &lt;a href="http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/locks/package-summary.html"&gt;java.util.concurrent.locks&lt;/a&gt;&lt;br /&gt;
The basic usage of the new Lock interface is &lt;br /&gt;
&lt;br /&gt;
Lock l = new ReentrantLock() ;&lt;br /&gt;
l.lock() ;&lt;br /&gt;
try {&lt;br /&gt;
// update &lt;br /&gt;
} finally {&lt;br /&gt;
l.unlock() ;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
You might be tempted to say that this can be done using synchronized. However the new Lock interface has several additional features.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;1. Non blocking method&lt;/b&gt;&lt;br /&gt;
The trylock() method (without params) acquires the lock if it is available. If it is not available it returns immediately. This is very useful in avoiding deadlocks when you are trying to acquire multiple locks .&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;2. Timed&lt;/b&gt; &lt;br /&gt;
trylock(time......)&amp;nbsp; acquires the lock if it is free within the time. Otherwise it returns false. The thread can be interrupted during the wait time.&lt;br /&gt;
&lt;br /&gt;
This is useful when you have service time requirements such as in real time bidding. Say the method needs to response in 10 milli secs, otherwise the response is of no use because the bid is lost.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;3. Interruptible&lt;/b&gt;&lt;br /&gt;
The lockInterruptibly method will try to acquire the lock till it is interrupted. &lt;br /&gt;
This is useful in implementing abort or cancel features.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;4. Non block structured locking&lt;/b&gt;&lt;br /&gt;
You can acquire the lock in one method and release it in another. Or you can wrap the lock and unlock in your domain specific accquireLock() and releaseLock() methods.&lt;br /&gt;
&lt;br /&gt;
This is useful in avoiding race conditions on read,update,save operations on data stored in caches. The synchronization provided by ConcurrentHashMap or Synchronized Map protects only for the duration of get and set operation. Not while the data is modified.&lt;br /&gt;
&lt;br /&gt;
cache.acquireLock(key) ;&lt;br /&gt;
Data d = cache.get(key) ;&lt;br /&gt;
d.update1() ;&lt;br /&gt;
d.update2() ;&lt;br /&gt;
d.update3() ;&lt;br /&gt;
cache.put(key,d) ;&lt;br /&gt;
cache.releaseLock(key) ;&lt;br /&gt;
&lt;br /&gt;
Acquiring and releasing the lock are abstracted away in acquirelock and releaseLock methods.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;5. Read /Write Locks&lt;/b&gt;&lt;br /&gt;
This is my favorite feature. The ReadWriteLock interface exposes 2 locks objects. A read lock and a write lock.&lt;br /&gt;
&lt;br /&gt;
You acquire the read lock when all you are doing is reading. Multiple threads can acquire the read lock.By allowing multiple readers, you achieve greater concurrency. A read lock cannot be acquired while a write lock is held by another thread.&lt;br /&gt;
&lt;br /&gt;
You acquire the write lock when you need to write data. Only one thread can acquire a write lock at a time. A&amp;nbsp;write lock cannot be acquired while other threads have acquired read locks.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Is the use of synchronized obsolete ?&lt;/b&gt;&lt;br /&gt;
Not really. Synchronized blocks are simple to use and are widely used. Most programmers are very familiar with its usage. They are less error prone as the lock is automatically release. It is reasonable to continue using synchronized for the the simpler use cases of locking. But if you need any of the features described above, using explicit locks is well worth the extra coding. Performance wise there is not much difference though studies have shown that explicit locks are slightly faster.&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/Q6h2D-W1SA0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/4754451380172302243/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/05/explicit-locks-in-java.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/4754451380172302243?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/4754451380172302243?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/Q6h2D-W1SA0/explicit-locks-in-java.html" title="When to use explicit Locks in JAVA ?" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/05/explicit-locks-in-java.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU4DRnwyeyp7ImA9WhVXGUg.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-233149452442460878</id><published>2012-04-20T13:31:00.000-07:00</published><updated>2012-04-20T13:32:57.293-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-04-20T13:32:57.293-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Spring JAVA distributed programming" /><title>Build distributed applications using Spring HTTP Invoker</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;
Buliding distributed applications involves calling methods on objects that are remote - on different machines and/or different JVMs. Code running on machine A invokes a method on an object running on machine B and it works just as if the caller and the target were in the same JVM. In the past, CORBA, RMI and EJBs were technologies used for remote invocation. But they are complicated to use. The protocols are binary and difficult to troubleshoot. Also they are not suitable for use across intranets because they use ports that networks admins hate to open.&lt;br /&gt;
&lt;br /&gt;
Since 2000, SOAP based web services enabled remote invocation using HTTP as the transport and XML for payload. While HTTP solved the problems of troubleshooting and firewalls, the performance of using XML was not very good. Some developers prefer web services using JSON over HTTP, but that requires modeling the data in JSON. &lt;br /&gt;
&lt;br /&gt;
Spring HTTP Invoker is a remoting mechanism, where the programming model is plain java, but HTTP is used as the transport and the payload is created using java serialization. Spring HTTP gives developers the benefit of HTTP without the performance overhead of XML based web services. In the rest of the article, we explain with a simple example remoting using Sprint HTTP Invokers.&lt;br /&gt;
&lt;br /&gt;
For this tutorial you will need&lt;br /&gt;
&lt;br /&gt;
(1) &lt;a href="http://www.springsource.org/spring-framework"&gt;Spring&lt;/a&gt;&lt;br /&gt;
(2) &lt;a href="http://www.oracle.com/technetwork/java/javase/downloads/index.html"&gt;JDK&lt;/a&gt;&lt;br /&gt;
(3) &lt;a href="http://www.eclipse.org/downloads/"&gt;Eclipse&lt;/a&gt;&lt;br /&gt;
(4) &lt;a href="http://tomcat.apache.org/"&gt;Tomcat&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
In this example, we create a service AccountService, with a method getAccount. The service is deployed to tomcat. We invoke the getAccount method from J2SE client in a different JVM. You may download the full source code for this sample at &lt;a href="https://sites.google.com/site/khangaonkar/home/spring"&gt;RemoteService&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt; &lt;b&gt; Step 1: create the service and implementation &lt;/b&gt; &lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
Let us define an interface AccountService and its implementation AccountServiceImpl in plain JAVA.&lt;br /&gt;
&lt;pre&gt;public interface AccountService {
  public Account getAccount(int id) ;
}
public class AccountServiceImpl implements AccountService {
 @Override
 public Account getAccount(int id) {
  // TODO Auto-generated method stub
  return new Account(id,"testacct",100,2999.99F) ;
 }
}
&lt;/pre&gt;
&lt;span style="color: blue;"&gt;&lt;b&gt;Step 2: Spring Application context for server side&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
The Spring application context is defined in the file remoting-servlet.xml.
&lt;br /&gt;
&lt;pre&gt;&amp;lt;beans&amp;gt;
    &amp;lt;bean id="accountService" class="com.mj.account.AccountServiceImpl"/&amp;gt;
    &amp;lt;bean&amp;gt; name="/AccountService" class="org.springframework.remoting.httpinvoker.HttpInvokerServiceExporter"&amp;gt;
       &amp;lt;property name="service" ref="accountService"/&amp;gt;
       &amp;lt;property name="serviceInterface" value="com.mj.account.AccountService"/&amp;gt;
   &amp;lt;/bean&amp;gt;
&amp;lt;/beans&amp;gt;
&lt;/pre&gt;
The first bean accountService needs no explanation - it is a simple spring bean. The 2nd exports a bean /AccountService. This is exported by HttpInvokerServiceExporter, a Spring provided class. The service exported is accountService defined by the 1st bean. Since we will be invoking using HTTP, the url is /AccountService. (by convention).&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;&lt;b&gt;Step 3: package as war and deploy to tomcat&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
The classes and context xml needs to be packaged as a war and deployed to tomcat. The standard spring MVC dispatcherServlet needs to be wired into the web.xml.&lt;br /&gt;
&lt;pre&gt;&amp;lt;servlet&amp;gt;
        &amp;lt;servlet-name&amp;gt;remoting&amp;lt;/servlet-name&amp;gt;
        &amp;lt;servlet-class&amp;gt;
            org.springframework.web.servlet.DispatcherServlet
        &amp;lt;/servlet-class&amp;gt;
        &amp;lt;load-on-startup&amp;gt;1&amp;lt;/load-on-startup&amp;gt;
&amp;lt;/servlet&amp;gt;
&amp;lt;servlet-mapping&amp;gt;
        &amp;lt;servlet-name&amp;gt;remoting&amp;lt;/servlet-name&amp;gt;
        &amp;lt;url-pattern&amp;gt;/*&amp;lt;/url-pattern&amp;gt;
&amp;lt;/servlet-mapping&amp;gt;
&lt;/pre&gt;
&lt;span style="color: blue;"&gt;&lt;b&gt;Step 4: Create a application context for the client with the entry&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
&lt;pre&gt;&amp;lt;bean id="AccountProxy" class="org.springframework.remoting.httpinvoker.HttpInvokerProxyFactoryBean"&amp;gt;
     &amp;lt;property name="serviceUrl" value="http://localhost:8080/remoteservice/AccountService"/&amp;gt;
     &amp;lt;property name="serviceInterface" value="com.mj.account.AccountService"/&amp;gt;
&amp;lt;/bean&amp;gt;
&lt;/pre&gt;
This defines a bean AccountProxy whose implementation is the HttpInvokerProxyFactorybean, which will create the Http invoker. The url to invoke is http://localhost:8080/remoteservice/AccountService. http://localhost:8080 is where the target web server is listening. /remoteservice is the tomcat context ( I deployed the service as remoteservice.war). We defined /AccountService as the url for our bean in remoting-servlet.xml.&lt;br /&gt;
&lt;br /&gt;
&lt;span style="color: blue;"&gt;&lt;b&gt;Step 5: Remote invoke the service&lt;/b&gt;&lt;/span&gt; &lt;br /&gt;
&lt;pre&gt;public class AccountServiceClient {
 public static void main(String[] args) {
   ApplicationContext applicationContext = new ClassPathXmlApplicationContext("remoteclient.xml");  
   AccountService testService = (AccountService) applicationContext.getBean("AccountProxy");  
   Account a = testService.getAccount(25) ;   
   System.out.println(a) ;
 }
}
&lt;/pre&gt;
You should see the output 25 testacct 100 2999.99&lt;br /&gt;
&lt;br /&gt;
In summary, it is very easy to do remote invocation and distribute services using Spring HTTP invokers. You get the ease of plain JAVA programming and the ease of maintainence and troubleshooting because of HTTP. There is simply no reason to use RMI like protocols any more.&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/x9NMCyEDuko" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/233149452442460878/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/04/build-distributed-applications-using.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/233149452442460878?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/233149452442460878?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/x9NMCyEDuko/build-distributed-applications-using.html" title="Build distributed applications using Spring HTTP Invoker" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/04/build-distributed-applications-using.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEYCRH8zfSp7ImA9WhVREEQ.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-636640408138462879</id><published>2012-03-18T10:36:00.000-07:00</published><updated>2012-03-18T10:36:05.185-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-03-18T10:36:05.185-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="architecture web HA" /><title>High Availability for Web applications</title><content type="html">As more mission critical applications move to the cloud, making the application highly available becomes super critical. An application not available for whatever reason, web server down, database down etc mean lost users, lost revenue that can be devastating to your business. In this blog we examine some basic high availability concepts.&lt;br /&gt;
&lt;br /&gt;
Availability means your web application is available to your users to use. We would all like our applications to available 100% of the time. But for various reasons it does not happen. The goal of high availability is to make the application available as much as possible. Generally, availability is expressed as a percent of time that application is available per year. One may say availability is 99% or 99.9% and so on. &lt;br /&gt;
&lt;br /&gt;
Redundancy and failover are techniques used to achieve high availability. Redundancy is achieved by having multiple copies of your server. Instead of 1 apache web server, you have two. One is the active server. The active server is monitered and if for some reason it fails, you failover to the 2nd server which becomes active. Another approach is to use a cluster of active servers as is done in a tomcat clusters. All servers are active. A load balancer distributes load among the members of the cluster. If one or two member of the cluster go down, no users are affects because other servers continue processing. Of course, the load balancer can become a point of failure and needs redundancy and failover.&lt;br /&gt;
&lt;br /&gt;
If you were launching a new web application to the cloud, you might start of with a basic architecture as shown below without any HA consideration.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt; &lt;b&gt; Phase 1: 1 Tomcat web server &lt;/b&gt; &lt;/font&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-jMSvcYj7-84/T1J3fM5rwZI/AAAAAAAACxw/bIkTgSAdyZo/s1600/step1.png" imageanchor="1" style="clear:left; margin-right:1em; margin-bottom:1em"&gt;&lt;img border="0" height="298" width="400" src="http://4.bp.blogspot.com/-jMSvcYj7-84/T1J3fM5rwZI/AAAAAAAACxw/bIkTgSAdyZo/s400/step1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Very soon you hit 2 issues. First whenever the server goes down, by accident or by intent, your users cannot use the application. As the number of users goes up, your server is not able to handle the load.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt; &lt;b&gt; Phase 2: Tomcat cluster &lt;/b&gt; &lt;/font&gt;&lt;br /&gt;
&lt;br /&gt;
You add redundancy and scalability by using a tomcat cluster as shown in the figure below. The cluster is fronted by Apache Web server + mod_proxy which distributes requests to the individual server. Mod_proxy is the load balancer.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-v45kc6SA9NI/T1vbdO7Q1VI/AAAAAAAACyI/JBClZgzMIbE/s1600/Step2.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="300" width="400" src="http://2.bp.blogspot.com/-v45kc6SA9NI/T1vbdO7Q1VI/AAAAAAAACyI/JBClZgzMIbE/s400/Step2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
Now the application scales horizontally. Tomcat or application failure is not an issue because there are other servers in the cluster. But we have introduced a new point a failure, the load balancer. If Apache+mod_proxy goes down, the application is unavailable.&lt;br /&gt;
&lt;br /&gt;
To read more about setting up a tomcat cluster see &lt;a href="http://tomcat.apache.org/tomcat-7.0-doc/cluster-howto.html"&gt; Tomcat clustering&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
To learn how to use a load balancer with tomcat see &lt;a href="http://tomcat.apache.org/tomcat-7.0-doc/balancer-howto.html"&gt; Loadbalancing with Tomcat&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt; &lt;b&gt; Phase 3: Highly available Tomcat cluster &lt;/b&gt; &lt;/font&gt;&lt;br /&gt;
The figure below shows how to eliminate the point of failure and make the load balancer highly available.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-kxSe72R1r7c/T2YWyVBAXeI/AAAAAAAACyg/4AHybkBYqVA/s1600/step3%2B%25281%2529.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="300" width="400" src="http://2.bp.blogspot.com/-kxSe72R1r7c/T2YWyVBAXeI/AAAAAAAACyg/4AHybkBYqVA/s400/step3%2B%25281%2529.png" /&gt;&lt;/a&gt;&lt;/div&gt;You add redundancy by adding a second apache+mod_proxy. However only one of the apache is active. The second apache is not handling any requests. It merely monitors the active server using a tool like heartbeat. If for some reason, the active server goes down, the 2nd server knows and the passive server takes over the ip address and starts handling requests. How does this happen ?&lt;br /&gt;
&lt;br /&gt;
This is possible because the ip address for this application that is advertised to the world is shared by the two apache's. This is know as a virtual ip address. While the 2 servers share the virtual IP, TCP/IP routes packets to only the active server. When the active server goes down, the passive server tells TCP/IP to start routing packets intended for this ip address to it. There are TCP/IP commands that let the server start and stop listening on the virtual ip address.&lt;br /&gt;
&lt;br /&gt;
Tools like heartbeat and Ultramonkey enable you to maintain a heartbeat with another and failover when necessary. With heartbeat, there is a heartbeat process on each server. Config files have information on the virtual ip address, active server, passive server. There are several articles on the internet on how to setup heartbeat.&lt;br /&gt;
&lt;br /&gt;
In summary, you can build highly available applications using open source tools. The key cocepts of HA, redundancy, monitoring &amp; failover, virtual ip address apply to any service and not just web servers. You can use the same concepts to make your database server highly available.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/m9Rvt_a10Sg" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/636640408138462879/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/03/high-availability-for-web-applications.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/636640408138462879?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/636640408138462879?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/m9Rvt_a10Sg/high-availability-for-web-applications.html" title="High Availability for Web applications" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-jMSvcYj7-84/T1J3fM5rwZI/AAAAAAAACxw/bIkTgSAdyZo/s72-c/step1.png" height="72" width="72" /><thr:total>2</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/03/high-availability-for-web-applications.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkMFQn46fCp7ImA9WhRaF0w.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-5101157289746735835</id><published>2012-02-19T22:13:00.000-08:00</published><updated>2012-02-19T22:13:33.014-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-19T22:13:33.014-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><title>Java Generics #2 : what is "? super T"  ?</title><content type="html">Consider the merge method below that copies all elements from List source to List target.&lt;br /&gt;
&lt;pre&gt;&lt;font color=blue&gt;public static &amp;lt;T&amp;gt; void merge(List&amp;lt;? super T&amp;gt; target, List&amp;lt;? extends T&amp;gt; source)&lt;/font&gt;&lt;/pre&gt;The &amp;lt;T&amp;gt; following static declares T as a new type variable for this method. We discussed "? extends T" in the blog &lt;a href="http://khangaonkar.blogspot.com/2012/01/java-generics-subtyping-using-wildcard.html"&gt;Java Generics #1&lt;/a&gt;. Here, let us examine "? super T". One can guess that it is a wildcard that means any types, that are a superclass of T. If T is Integer, then List&amp;lt;? super T&amp;gt; could be List&amp;lt;Integer&amp;gt;, List&amp;lt;Number&amp;gt; or List&amp;lt;Object&amp;gt;. The code below shows the use of the merge method. In line 6, T is Integer and "? super T" is Number. In line 10, T is Number and "? super T" is Object.&lt;br /&gt;
&lt;pre&gt;&lt;font color=blue&gt;1 List&amp;lt;Integer&amp;gt; aInts = new ArrayList&amp;lt;Integer&gt;() ;
2 aInts.add(5) ;
3 aInts.add(7) ;
4 List&amp;lt;Number&amp;gt; aNums = new ArrayList&amp;lt;Number&amp;gt;() ;
5 aNums.add(12.5) ;
6 MCollections.&amp;lt;Integer&amp;gt;merge(aNums,aInts) ; // works
7 System.out.println(aNums.toString()) ; // aNums has 5,7,12.5
8 List&amp;lt;Object&amp;gt; aObjs = new ArrayList&amp;lt;Object&amp;gt;() ;
9 aObjs.add("hello") ;
10 MCollections.&amp;lt;Number&amp;gt;merge(aObjs,aNums) ; // works as well
11 System.out.println(aObjs.toString()) ; // aObjs has hello,5,7,12.5
&lt;/font&gt;&lt;/pre&gt;We discussed in the last blog that if you have a Collection&amp;lt;? extends T&gt; you can get values out of it, but you cannot put stuff into it. So what can you do with Collection&amp;lt;? super T&amp;gt; ?&lt;br /&gt;
&lt;br /&gt;
In our merge example above, List&amp;lt;? super T&amp;gt; is the target, which means the implementation is putting/setting elements into it. "? super T" means any supertype of T. Logically it makes sense that you can put any supertype into the List. &lt;br /&gt;
&lt;br /&gt;
The implementation of merge could be&lt;br /&gt;
&lt;pre&gt;&lt;font color=blue&gt;1 public class MCollections {
2 public static &amp;lt;T&amp;gt; void merge(List&amp;lt;? super T&amp;gt; target, 
3                                List&amp;lt;? extends T&amp;gt; source) {
4   for(int i = 0 ; i &lt; source.size(); i++) {
5     T e = source.get(i) ;
6     target.add(e) ;
7   }
8 }
9 } &lt;/font&gt;
&lt;/pre&gt;But if you were to do a get, what would be the returned type ?. There would be no way to know. Hence it is not allowed as shown in line 4 of the code below. &lt;br /&gt;
&lt;pre&gt;&lt;font color=blue&gt;1 List&amp;lt;? super Integer&amp;gt; aNums = new ArrayList&amp;lt;Number&amp;gt;() ;
2 aNums.add(11) ;
3 aNums.add(12) ; &lt;/font&gt;  
4 &lt;font color=red&gt;Number n = aNums.get(0) ; // Compilation Error - not allowed &lt;/font&gt;
5 &lt;font color=blue&gt;Object o = aNums.get(0) ; // allowed -- No compile error &lt;/font&gt;
&lt;/pre&gt;The exception to the rule is getting an Object, which is allowed because  since Object is a supertype of every other java type.&lt;br /&gt;
&lt;br /&gt;
In summary, you can enable subtyping using "? super T" when you need to put objects into the collection. ( But you can get them out only as Object). You can enable subtyping using ? extends T when you need to get objects out of the collection. It follows that if you need to do both get and put, then you cannot use either of these wildcard mechanisms and you need to use a explicit type.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/Vx_Qh7JyfSk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/5101157289746735835/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/02/java-generics-2-what-is-super-t.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/5101157289746735835?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/5101157289746735835?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/Vx_Qh7JyfSk/java-generics-2-what-is-super-t.html" title="Java Generics #2 : what is &quot;? super T&quot;  ?" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/02/java-generics-2-what-is-super-t.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkEAQnw6cCp7ImA9WhRaF00.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-5360899428435192165</id><published>2012-01-15T15:09:00.000-08:00</published><updated>2012-02-19T18:24:03.218-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-02-19T18:24:03.218-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><title>Java Generics #1 : Subtyping using wildcard with extends</title><content type="html">Generics is one of those more complicated language features in Java that is not well understood by many programmers. Many avoid it altogether. This is not without reason. While writing your program, if you have to stop and think a lot about syntax, there is more than a good chance, you would try to avoid that language construct. In this blog I discuss one type of subtyping with generics which can be tricky.&lt;br /&gt;
&lt;br /&gt;
In Java we know that Integer extends Number. In other words, Integer is a subtype of Number. Anywhere a Number required, you can pass in an Integer. But does this mean that List&amp;lt;Integer&amp;gt; is a subtype of List&amp;lt;Number&amp;gt; ?&lt;br /&gt;
&lt;br /&gt;
Consider the code. Will it work ?&lt;br /&gt;
&lt;pre&gt;&lt;font color=blue&gt;1 List&amp;lt;Integer&amp;gt; aList = new ArrayList&amp;lt;Integer&amp;gt;() ;
2 aList.add(11) ;
3 aList.add(13) ;
4 List&amp;lt;Number&amp;gt; nList = aList ;
5 nList.add(11.5) ;
&lt;/font&gt;&lt;/pre&gt;aList is a list of Integers. nList is a list of Numbers. In line4 nList is made to reference aList. In line 5 we add a double to aList. But aList is a list of integers. This is obviously not correct. And Java will not allow it. Line 4 will cause a compilation error. But sometimes we want to be able use subtypes. Generics have the concept of wildcards that enable subtyping when logically approriate.&lt;br /&gt;
&lt;br /&gt;
Consider the addAll method of the Collection interface.&lt;br /&gt;
&lt;pre&gt;&lt;font color=blue&gt;interface Collection&amp;lt;T&amp;gt; {
   public boolean addAll(Collection&amp;lt;? extends T&amp;gt; x) ;

}
&lt;/font&gt;&lt;/pre&gt;? extend T says that given a collection of Type T , you can add to it elements from any collection whose type is a subtype of T. The following code is valid.&lt;br /&gt;
&lt;pre&gt;&lt;font color=blue&gt;1 List&amp;lt;Number&amp;gt; aList = new ArrayList&amp;lt;Number&amp;gt;() ;
2 List&amp;lt;Integer&amp;gt; intList = Arrays.asList(11,12) ;
3 List&amp;lt;Double&amp;gt; dList = Arrays.asList(15.15) ;
4 aList.addAll(intList) ;
5 aList.addAll(dList) ;
&lt;/font&gt;&lt;/pre&gt;The implemention of addAll method will get elements from the list passed in as parameter and put it into the target collection. Note that it is only a get operation on Collection&amp;lt;? extends T&amp;gt;. A put on Collection&amp;lt;? extends T&amp;gt; would however not be allowed. To understand, consider the code below&lt;br /&gt;
&lt;pre&gt;&lt;font color=blue&gt;1 List &amp;lt;? extends Number&amp;gt; numList ;
2 List&amp;lt;Integer&amp;gt; intList = Arrays.asList(11,12) ;
3 numList = intList ; // Will this work ?
4 numList.add(5.67) ; // Will this work ?
&lt;/font&gt;&lt;/pre&gt;Should line 3 work ? What about line 4 ?&lt;br /&gt;
The Java compiler allows line 3 because List&amp;lt;Integer&amp;gt; is considered a subtype of List &amp;lt;? extends Number&amp;gt;. But line 4 is a compilation error because you should not be allowed to add a double to List&amp;lt;Integer&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
In summary, when you have a Collection&amp;lt;? extends T&amp;gt;, it is safe to get elements out of the collection but not safe to put elements into it. Hence the compiler does not allow it.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/3Bx66awHWzE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/5360899428435192165/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2012/01/java-generics-subtyping-using-wildcard.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/5360899428435192165?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/5360899428435192165?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/3Bx66awHWzE/java-generics-subtyping-using-wildcard.html" title="Java Generics #1 : Subtyping using wildcard with extends" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2012/01/java-generics-subtyping-using-wildcard.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0QMRHwzeip7ImA9WhRXEk8.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-6738031263630481</id><published>2011-12-18T09:29:00.000-08:00</published><updated>2011-12-18T09:29:45.282-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-18T09:29:45.282-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="security architecture web" /><title>Single Sign On for the cloud: SAML &amp; OpenId</title><content type="html">When accessing different applications owned by different organizations, having to authenticate everytime you go from one application to another is annoying. Not only is it time consuming, but you also have to remember multiple passwords, which are often lost.&lt;br /&gt;
&lt;br /&gt;
Single sign on is the ability to authenticate once and be able to move between applications seamlessly using the authenticated identity.&lt;br /&gt;
&lt;br /&gt;
Within an intranet or between applications that are under the control of one development organization, single sign on for web applications can be easily implemented by generating a sessionid and passing it around using cookies. However, such a solution is proprietary and will not work if you need to leave the intranet and access other applications on the cloud. To interoperate with applications on the cloud, a more standards based solution is required.&lt;br /&gt;
&lt;br /&gt;
A related concept and benefit is federated identity. Organizations can agree to a common name to refer to users. The user and his attributes needs to be created only in once place and others can refer to this information.&lt;br /&gt;
&lt;br /&gt;
In this blog, we briefly examine two popular protocols that can be used for single sign on on the cloud: SAML and OpenId.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; &lt;font color=blue size=5&gt;OpenId &lt;/font&gt; &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The problem OpenId solves is that you as a user do not have to maintain and provide a password to each and every site you visit.&lt;br /&gt;
&lt;br /&gt;
You maintain your password or other identifying credential with one provider known as the OpenId provider.&lt;br /&gt;
&lt;br /&gt;
The website or application that you visit and that requires proof of who you are, relies on the OpenId provider to verify that you are who you claim to be. This is known as the relying party. &lt;br /&gt;
&lt;br /&gt;
The basics of the OpenId protocol are:&lt;br /&gt;
&lt;br /&gt;
1. You visit a web application (relying party)  and enter an OpenId&lt;br /&gt;
&lt;br /&gt;
2. Based on your OpenId, the relying party determines who your OpenId provider is.&lt;br /&gt;
&lt;br /&gt;
3. The relying party redirects your request to the OpenId provider.&lt;br /&gt;
&lt;br /&gt;
4. If you are already authenticated, this step is skipped. &lt;br /&gt;
&lt;br /&gt;
The OpenId provider authenticates you by asking for a password or other information. The provider warns you that the relying party is requesting information about you.&lt;br /&gt;
&lt;br /&gt;
5. The request is redirected back to the relying party where you are shown the URL you were trying to access.&lt;br /&gt;
&lt;br /&gt;
The protocol does not require providers or relying parties to be registered anywhere. It uses plain HTTP requests and responses. The protocol messages are plain text key value pairs. The protocol works well with modern "Web20" AJAX style applications.&lt;br /&gt;
&lt;br /&gt;
The OpenId protocol originated from consumer oriented websites such as Google, Twitter, Facebook etc and that is where this protocol is popular.&lt;br /&gt;
&lt;br /&gt;
The OpenId specification is described at &lt;a href=http://openid.net/specs/openid-authentication-2_0.html&gt; OpenId specification &lt;/a&gt;&lt;br /&gt;
There is a java implementation of OpenId at &lt;a href=http://code.google.com/p/openid4java&gt; openid4java &lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; &lt;font color=blue size=5&gt; SAML (Security Assertion Markup Language) &lt;/font&gt; &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
SAML is a XML based protocol that enables web based authentication, authorization and single sign on.&lt;br /&gt;
&lt;br /&gt;
SAML involves a relying party requesting an assertion and a SAML provider responding with the assertion.&lt;br /&gt;
&lt;br /&gt;
Examples of assertions are :&lt;br /&gt;
Authentication Assertion : This user was authenticated using such and such method at time t.&lt;br /&gt;
Attribute Assertion : This user has a title supermanager.&lt;br /&gt;
Authorization Assertion : This user has permission to delete file xyz.doc.&lt;br /&gt;
&lt;br /&gt;
A typical SAML interaction would be as follows:&lt;br /&gt;
&lt;br /&gt;
1. A user tries to access a URL or a web application which is the relying party&lt;br /&gt;
2. The relying party creates a SAML authentication request.&lt;br /&gt;
3. The relying party redirects the users browser to a SAML provider. Embedded in the request is the SAML authentication request.&lt;br /&gt;
4. The SAML provider evaluates the SAML request and authenticates the user.&lt;br /&gt;
5. The SAML provides returns a SAML authentication response to the user browser.&lt;br /&gt;
6. The browser forwards the SAML response back to the relying party.&lt;br /&gt;
7. The relying party verifies and interprets the SAML response.&lt;br /&gt;
8. If the response implies successful authentication, the user is redirected to the URL, he was originally trying to reach. &lt;br /&gt;
&lt;br /&gt;
SAML has the concept of profiles. The interaction is different based on the profile. The interaction above is the Web SSO profile. &lt;br /&gt;
&lt;br /&gt;
SAML has its origins more in enterprise software, in web services, in B2B communication and is from the early 2000s when XML was very popular. In fact SAML1.x had only a SOAP binding.&lt;br /&gt;
&lt;br /&gt;
The SAML specification is at &lt;a href=http://saml.xml.org/saml-specifications&gt; SAML Specification &lt;/a&gt;&lt;br /&gt;
There is a SAML implementation at &lt;a href=https://wiki.shibboleth.net/confluence/display/OpenSAML/Home&gt; OpenSAML &lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; &lt;font color=blue&gt; Which protocol should I use ? &lt;/font&gt; &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
OpenId is a simpler protocol. But has SAML has more features.&lt;br /&gt;
&lt;br /&gt;
OpenId supports discovery of the OpenId provider. Using SAML has generally required expensive SAML projects.&lt;br /&gt;
&lt;br /&gt;
OpenId supports only service provider initiated SSO. You go to a service provider web site and they need authentication. They start the conversation with the OpenId provider. SAML can also support identity provider initiated SSO. You are authenticated into your companys portal. Your company has a partner travel website for business travel. With SAML, you can go from your companys portal ( SAML provider) to the partner website ( relying party) without needing reauthentication.&lt;br /&gt;
&lt;br /&gt;
SAML has been around longer than OpenId. SAML is more popular in the enterprise where as OpenId is more popular in consumer oriented applications.&lt;br /&gt;
&lt;br /&gt;
Both OpenId and SAML rely on external transport layer security protocols such as SSL for the security of protocol messages.&lt;br /&gt;
&lt;br /&gt;
If you are starting a new website and want to accept users from other popular sites such as google or twitter, you might consider OpenId. However if you are an enterprise and you want your authenticated users to access your partner sites without re-authentication, you might need SAML.&lt;br /&gt;
&lt;br /&gt;
In summary, SAML is a feature rich protocol more popular in the enterprise. OpenId is simpler protocol with some limitations.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/HV5u465m_WM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/6738031263630481/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2011/12/single-sign-on-for-cloud-saml-openid.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/6738031263630481?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/6738031263630481?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/HV5u465m_WM/single-sign-on-for-cloud-saml-openid.html" title="Single Sign On for the cloud: SAML &amp; OpenId" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2011/12/single-sign-on-for-cloud-saml-openid.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Dk8NSHk5eSp7ImA9WhRSF0s.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-590033416923967457</id><published>2011-11-19T20:54:00.000-08:00</published><updated>2011-11-19T20:54:59.721-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-19T20:54:59.721-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="scalability cloud database" /><title>What is NoSQL ?</title><content type="html">NoSQL is a term used to refer to a class of database systems that differ from the traditional relational database management systems (RDBMS) in many ways. RDBMSs are accessed using SQL. Hence the term NoSQL implies not accessed by SQL. More specifically not RDBMS or more accurately not relational. &lt;br /&gt;
&lt;br /&gt;
Some key characteristics of NqSQL databases are :&lt;br /&gt;
&lt;font color=blue&gt; &lt;ul&gt;&lt;li&gt;They are distributed, can scale horizontally and can handle data volumes of the order of several terrabytes or petabytes, with low latency.&lt;/li&gt;
&lt;li&gt;They have less rigid schemas than a traditional RDBMS.&lt;/li&gt;
&lt;li&gt;They have weaker transactional guarantees.&lt;/li&gt;
&lt;li&gt; As suggested by the name, these databases do not support SQL.&lt;/li&gt;
&lt;li&gt;Many NoSQL databases model data as row with column families, key value pairs or documents &lt;/li&gt;
&lt;/font&gt; &lt;/ul&gt;&lt;br /&gt;
To understand what non relational means, it might be useful to recap what relational means.&lt;br /&gt;
&lt;br /&gt;
Theoretically, relational databases comply with Codds 12 rules of relational model. More simply, in RDBMS, a table is relation and database has a set of such relations. A table has rows and columns. Each table has contraints and the database enforces the constraints to ensure the integrity of data.Each row in a table is identified by a primary key and tables are related using foreign keys. You eliminate duplicate data during the process of normalization, by moving columns into separate tables but keeping the relation using foreign keys. To get data out of multiple tables requires joining the tables using the foreign keys. This relational model has been useful in modeling most real world problems and is in widespread use for the last 20 years.&lt;br /&gt;
&lt;br /&gt;
In addition, RDBMS vendors have gone to great lengths to ensure that RDBMSs do a great job in maintaining ACID (actomic, consistent, integrity, durable) transactional properties for the data stored. Recovery is supported from unexpected failures. This has lead to relational databases becoming the de facto standard for storing enterprise data.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;If RDBMSs are so good, Why does any one need NoSQL databases ?&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Even the largest enterprises have users only in the order of 1000s and data requirements in the order of few terra bytes. But when your application is on the internet, where you are dealing with millions of users and data in the order of petabytes, things start to slow down with a RDBMS. The basic operations with any database are read and write. Reads can be scaled by replicating data to multiple machines and load balancing read requests. However this does not work for writes because data consistency needs to be maintained. Writes can be scaled only by partitioning the data. But this affects read as distributed joins can be slow and hard to implement. Additionally, to maintain ACID properties, databases need to lock data at the cost of performance. &lt;br /&gt;
&lt;br /&gt;
The Googles, facebooks , Twitters have found that relaxing the constraints of RDBMSs and distributing data gives them better performance for usecases that involve&lt;br /&gt;
&lt;font color=blue&gt; &lt;ul&gt;&lt;li&gt;Large datasets of the order of petabytes. Typically this needs to stored using multiple machines.&lt;/li&gt;
&lt;li&gt;The application does a lot of writes.&lt;/li&gt;
&lt;li&gt;Reads require low latency. &lt;/li&gt;
&lt;li&gt;Data is semi structured.&lt;/li&gt;
&lt;li&gt;You need to be able to scale without hitting a bottleneck.&lt;/li&gt;
&lt;li&gt;Application knows what it is looking for. Adhoc queries are not required. &lt;/li&gt;
&lt;/ul&gt;&lt;/font&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;What are the NoSQL solutions out there ? &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
There are a few different types.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; 1. Key Value Stores &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
They allow clients to read and write values using a key. Amazon's Dynamo is an example of a key value store.&lt;br /&gt;
&lt;br /&gt;
get(key) returns an object or list of objects&lt;br /&gt;
put(key,object) store the object as a blob&lt;br /&gt;
&lt;br /&gt;
Dynamo use hashing to partition data across hosts that store the data. To ensure high availability, each write is replicated across several hosts. Hosts are equal and there is no master. The advantage of Dynamo is that the key value model is simple and it is highly available for writes.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; 2. Document stores &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
The key value pairs that make up the data are encapsulated as a document. Apache CouchDB is an example of a document store. In CouchDB , documents have fields. Each field has a key and value. A document could be&lt;br /&gt;
&lt;pre&gt;"firstname " : " John ",
"lastname " : "Doe" ,
"street " : "1 main st",
"city " : "New york"
&lt;/pre&gt;In CouchDB, distribution and replication is peer to peer. Client interface is RESTful HTTP, that integrated well with existing HTTP loadbalancing solutions.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; 3. Column based stores &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Read and write is done using columns rather than rows. The best known examples are Google's BigTable and the likes of HBase and Cassandra that were inspired by BigTable. The BigTable paper says that BigTable is a sparse, distributed, persistent, multidimensional sorted Map. While that sentence seems complicated, reading each word individually gives clarity. &lt;br /&gt;
sparse - some cells can be empty&lt;br /&gt;
distributed - data is partitioned across many hosts&lt;br /&gt;
persistent - stored to disk&lt;br /&gt;
multidimensional - more than 1 dimension&lt;br /&gt;
Map - key and value&lt;br /&gt;
sorted - maps are generally not sorted but this one is&lt;br /&gt;
&lt;br /&gt;
This sample might help you visualize a BigTable map&lt;br /&gt;
&lt;pre&gt;{
row1:{
    user:{
          name: john
          id : 123
    },
    post: {
          title:This is a post    
          text : xyxyxyxx
    }
}
row2:{
    user:{
          name: joe
          id : 124
    },
    post: {
          title:This is a post    
          text : xyxyxyxx
    }
}
row3:{
    user:{
          name: jill
          id : 125
    },
    post: {
          title:This is a post    
          text : xyxyxyxx
    }
}

}
&lt;/pre&gt;The outermost keys row1,row2, row3 are analogues to rows. user and post are what are called column families. The column family user has columns name and id. post has columns title and text. Columnfamily:column is how you refer to a column. For eg user:id or post:text. In Hbase, when you create the table, the column families need to be specified. But columns can be added on the fly. HBase provides high availability and scalability using a master slave architecture.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; Do I needs a NoSQL store ? &lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
You do not need a NoSQL store if &lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt;&lt;ul&gt;&lt;li&gt;All your data fits into 1 machine and does not need to be partitioned.&lt;/li&gt;
&lt;li&gt;You are doing OLTP which required the ACID transaction properties and data consistency that RDBMSs are good at.&lt;/li&gt;
&lt;li&gt; You need ad hoc querying using a language like SQL. &lt;/li&gt;
&lt;li&gt; You have complicated relationships between the entities in your applications. &lt;/li&gt;
&lt;li&gt; Decoupling data from application is important to you.&lt;/li&gt;
&lt;/ul&gt;&lt;/font&gt;&lt;br /&gt;
&lt;br /&gt;
You might want to start considering NoSQL stores if &lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt;&lt;ul&gt;&lt;li&gt;Your data has grown so large that it can no longer be handled without partitioning.&lt;/li&gt;
&lt;li&gt;Your RDBMS can no longer handle the load. &lt;/li&gt;
&lt;li&gt;You need very high write performance and low latency reads.&lt;/li&gt;
&lt;li&gt;Your data is not very structured. &lt;/li&gt;
&lt;li&gt;You can have no single point of failure. &lt;/li&gt;
&lt;li&gt;You can tolerate some data inconsistency. &lt;/li&gt;
&lt;/font&gt; &lt;/ul&gt;&lt;br /&gt;
Bottomline is that NoSql stores are a new and complex technology. There are many choices and no standards. There are specific use cases for which NoSql is a good fit. But RDBMS does just fine for most vanilla use cases.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/CTLoBYBBxMg" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/590033416923967457/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2011/11/what-is-nosql.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/590033416923967457?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/590033416923967457?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/CTLoBYBBxMg/what-is-nosql.html" title="What is NoSQL ?" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2011/11/what-is-nosql.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEICR3w_eyp7ImA9WhdbGUU.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-1332761830472358933</id><published>2011-10-18T17:02:00.000-07:00</published><updated>2011-10-18T17:02:46.243-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-18T17:02:46.243-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java security shiro" /><title>Apache Shiro : Application Security Made Easy</title><content type="html">Considering that JAVA is over 10+ years old, the number of choices for application developers that need to build authentication and authorization into their applications is shockingly low.&lt;br /&gt;
&lt;br /&gt;
In JAVA &amp; J2EE, the JAAS specification was an attempt to address security. While JAAS works for authentication, the authorization part is just too cumbersome to use. The EJB and Servlet specifications offer coarse grained authorization at a method and resource level. But these are too coarse to be of any use in real world applications. For Spring users, Spring Security is an alternative. But it is a little complicated to use, especially the authorization model. A majority of applications end up building their home grown solutions for authentication and authorization.&lt;br /&gt;
&lt;a href="http://shiro.apache.org/"&gt;&lt;br /&gt;
Apache Shiro&lt;/a&gt; is a open source JAVA security framework that addresses this problem. It is an elegant framework that lets you add authentication, authorization and session management to your application with ease.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt; &lt;b&gt; The highlights of Shiro are: &lt;/b&gt; &lt;/font&gt;&lt;br /&gt;
&lt;br /&gt;
It is a pure java framework. It works with all kinds of JAVA applications: J2SE, J2EE, Web, standalone or distributed.&lt;br /&gt;
&lt;br /&gt;
It can integrate easily with various repositories that may host user and permissions metadata such as RDBMs, LDAPs.&lt;br /&gt;
&lt;br /&gt;
It has a simple and intuitive permissions model that can apply to wide variety of problem domains. It is a model that lets you focus on your problem domain without getting you bogged down in the framework.&lt;br /&gt;
&lt;br /&gt;
It has built in support for session management.&lt;br /&gt;
&lt;br /&gt;
It has built in support for caching metadata.&lt;br /&gt;
&lt;br /&gt;
It integrates very easily with Spring. Same applies to any J2EE application server.&lt;br /&gt;
&lt;br /&gt;
Most importantly, it is very easy to use. Most of the time, all you will need to do to integrate Shiro, will be to implement a REALM that ties Shiro to your User and Permissions metadata.&lt;br /&gt;
&lt;br /&gt;
&lt;font color="blue"&gt; &lt;b&gt; Shiro Concepts &lt;/b&gt; &lt;/font&gt;&lt;br /&gt;
&lt;br /&gt;
The SecurityManager encapsulates the security configuration of an application that uses Shiro.&lt;br /&gt;
&lt;br /&gt;
Subject is the runtimes view of a user that is using the system. When the subject is created, it is not authenticated. For authentication, the login method must be called, passing in the proper credentials. &lt;br /&gt;
&lt;br /&gt;
Session represents the session associated with an authenticated Subject. The session has a session id. Applications can store arbitrary data in the session. The session is valid until the user logs out or the session times out.&lt;br /&gt;
&lt;br /&gt;
A permission represents what actions a subject may perform on a resource in the application. Out of the box Shiro supports permissions represented by colon separated tokens. Each token has some logical meaning. For example, my application may define a permission as ResourceType:actions:ResourceInstance. More concretely File:read:contacts.doc represents a permission to read a file contacts.doc. The permission must be associated with a user, to grant that permission to the user.&lt;br /&gt;
&lt;br /&gt;
A Role is a collection of permissions that might represent ability to perform some organizational function. Roles make the association between users and permissions more manageable.&lt;br /&gt;
&lt;br /&gt;
A Realm abstracts your user, permission and role metadata for Shiro. You make this data available to Shiro by implementing a realm and plugging it into Shiro. Typical realms use either a relational database or LDAP to store user data.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt; &lt;b&gt; Tutorial &lt;/b&gt; &lt;/font&gt;&lt;br /&gt;
Let us build a simple java application that does some authentication and authorization. For this tutorial you will need:&lt;br /&gt;
(1)&lt;a href="http://shiro.apache.org/index.html"&gt; Apache Shiro&lt;/a&gt;&lt;br /&gt;
(2) A java development environment. I use &lt;a href="http://www.eclipse.org"&gt;Eclipse&lt;/a&gt;. But you can use other IDEs or command line tools as well.&lt;br /&gt;
(3) You may download the source code for this example at &lt;a href="https://sites.google.com/site/khangaonkar/home/shirosamples"&gt;simpleshiro.zip&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;font color=purple&gt; &lt;b&gt; Step 1: Create a Shiro.ini configuration file&lt;/b&gt;&lt;/font&gt;&lt;br /&gt;
&lt;br /&gt;
We will use the default file base realm that comes with Shiro. This reads the user/permission metadata from the shiro.ini file. In a subsequent tutorial, I will show how to build a realm that gets data from a relational database.&lt;br /&gt;
&lt;br /&gt;
In the Ini file, let us define some users and associate some roles to them.&lt;br /&gt;
&lt;font color=green&gt;&lt;br /&gt;
# Simple shiro.ini file&lt;br /&gt;
[users]&lt;br /&gt;
# user admin with password 123456 and role Administrator&lt;br /&gt;
admin = 123456, Administrator&lt;br /&gt;
# user mike with password abcdef and role Reader&lt;br /&gt;
mike = abcdef, Reader&lt;br /&gt;
# user joe with password !23abC2 and role Writer&lt;br /&gt;
joe = !23abC2, Writer&lt;br /&gt;
# -----------------------------------------------------------------------------&lt;br /&gt;
# Roles with assigned permissions&lt;br /&gt;
[roles]&lt;br /&gt;
# A permission is modeled as Resourcetype:actions:resourceinstances&lt;br /&gt;
# Administrator has permission to do all actions on all resources&lt;br /&gt;
Administrator = *:*:*&lt;br /&gt;
# Reader has permission to read all files&lt;br /&gt;
Reader = File:read:*&lt;br /&gt;
# Writer role has permission to read and write all files&lt;br /&gt;
Writer = File:read,write:*&lt;br /&gt;
&lt;/font&gt;&lt;br /&gt;
In the above shiro.ini we have defined 3 users and 3 roles. The permission is modeled&lt;br /&gt;
as colon separated tokens. Each token can have multiple comma separated parts. Each domain and part grants permission to some application specific domain.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=purple&gt;&lt;b&gt;Step 2: BootStrap shiro into you application&lt;/b&gt;&lt;/font&gt;&lt;br /&gt;
&lt;pre&gt;Factory&lt;securitymanager&gt; factory = new IniSecurityManagerFactory("classpath:shiro.ini");
SecurityManager securityManager = factory.getInstance();
SecurityUtils.setSecurityManager(securityManager);
&lt;/pre&gt;IniSecurityManagerFactory loads the configuration from shiro.ini and creates a singleton SecurityManager for the application. For simplicity, Our shiro.ini goes with the default SecurityManager configuration which uses a Text based realm and gets user,permission,role metadata from the shiro.ini file. &lt;br /&gt;
&lt;br /&gt;
&lt;font color=purple&gt;&lt;b&gt;Step 3: Login&lt;/b&gt;&lt;/font&gt;&lt;br /&gt;
&lt;pre&gt;&lt;font color=green&gt;Subject usr = SecurityUtils.getSubject();
UsernamePasswordToken token = new UsernamePasswordToken("mike", "abcdef");
try {
    usr.login(token);
} 
catch (AuthenticationException ae) {
    log.error(ae.toString()) ;
    return ;
}
log.info("User [" + usr.getPrincipal() + "] logged in successfully.");&lt;/font&gt;
&lt;/pre&gt;SecurityUtils is a factory class for getting an existing subject or creating a new one. Credentials are passed in using an AuthenticationToken. In this case, we want to pass in a username and password and hence use the UsernamePasswordToken. Then we call the login method on the Subject passing in the authentication token.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=purple&gt;&lt;b&gt;Step 4: Check if the user has permission&lt;/b&gt;&lt;/font&gt;&lt;br /&gt;
&lt;font color=green&gt;&lt;pre&gt;if (usr.isPermitted("File:write:xyz.doc")) {
    log.info(usr.getPrincipal() + " has permission to write xyz.doc ");
} else {
    log.info(usr.getPrincipal() + " does not have permission to write xyz.doc ");
}
if (usr.isPermitted("File:read:xyz.doc")) {
    log.info(usr.getPrincipal() + " has permission to read xyz.doc ");
} else {
    log.info(usr.getPrincipal() + " does not have permission to read xyz.doc ");
}&lt;/font&gt;&lt;/pre&gt;Subject has a isPermitted method that takes a permission string as parameter and returns true/false. &lt;br /&gt;
&lt;br /&gt;
&lt;font color=purple&gt;&lt;b&gt;Step 5: Logout&lt;/b&gt;&lt;/font&gt;&lt;br /&gt;
&lt;font color=green&gt; &lt;pre&gt;usr.logout() ;&lt;/pre&gt;&lt;/font&gt;&lt;br /&gt;
The logout method logs the user out.&lt;br /&gt;
To get familiar with Shiro, try changing the UsernamePasswordToken and login as a different user. Check some other permissions. Modify the Shiro.ini file to create new users and roles with different permissions. Run the program a few times with different metadata and different input.&lt;br /&gt;
&lt;br /&gt;
In a production environment, you will not want users and roles in an ini file. You want them in a secure repository like a relational database or LDAP. In the next part, I will show you how to build a Shiro Realm that can use user,role, permission metadata from a relational database.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/iPtvDjfGHXs" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/1332761830472358933/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2011/10/apache-shiro-application-security-made.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/1332761830472358933?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/1332761830472358933?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/iPtvDjfGHXs/apache-shiro-application-security-made.html" title="Apache Shiro : Application Security Made Easy" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2011/10/apache-shiro-application-security-made.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEQDQXk9fip7ImA9WhdWFkQ.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-154374829406296529</id><published>2011-09-10T15:06:00.000-07:00</published><updated>2011-09-10T15:06:10.766-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-10T15:06:10.766-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="JAVA Spring Transactions" /><title>Spring and Declarative Transactions</title><content type="html">A transaction is a unit of work that has ACID (atomic, consistent, isolated and durable) properties. Atomic means that the changes all happen or nothing happens. If money is debited from an account and credited to another account, a transaction ensures that either both the debit and credit complete or neither completes. Consistent implies that the changes leave the data in a consistent state. Isolated implies that changes do not interfere with other changes. Durable implies that once the changes are committed, they stay committed.   &lt;br /&gt;
&lt;br /&gt;
Resource managers such as relation databases provide a transaction manager and an API to control transactions. Those familiar with JDBC will know that by default a transaction is started because of the setting autocommit= true. Every statement that changes the database is automatically committed. This behavior can be changed by setting autocommit to false. Now the programmer must explicitly begin a transaction and then commit or rollback the transaction.&lt;br /&gt;
&lt;br /&gt;
Transactions that deal with just one resource such as one database are known as local transactions. Transactions that span multiple resources such as more than one database or a database and a messaging engine are called global transactions. Global transaction are implemented using the XA protocol which involves a two phase commit. The JTA specification describes a java API for programmers to work with global transactions. The transaction methods in JDBC such as begin, commit, rollback work only with JDBC and relational databases, where as JTA can work with any transactional resource.&lt;br /&gt;
&lt;br /&gt;
The code involved in working with transactions, however is boiler plate code that can be handled by a framework. At the start of the method, you need to begin a transaction and when the method completes, you need to either commit or rollback the transaction. If you have worked with EJBs, you might be familiar that you can specify in the deployment descriptor, the transactional environment in which the method should execute. For example you might say RequiresNew, which means start a new transaction before invoking the method. The container starts a new transaction before the method is invoked and commits it when the method returns. The programmer does not need to write any java code to handle transaction.&lt;br /&gt;
&lt;br /&gt;
In rest of the article, we discuss with an example, declarative transaction management with Spring. &lt;br /&gt;
&lt;br /&gt;
For this tutorial you will need:&lt;br /&gt;
&lt;br /&gt;
(1) &lt;a href="http://www.springsource.org/download"&gt;Spring 3.0&lt;/a&gt;&lt;br /&gt;
(2) &lt;a href="http://www.eclipse.org"&gt;Eclipse&lt;/a&gt; is optional. I use eclipse as my IDE. Eclipse lets you export the war that can be deployed to Tomcat. But you can use other IDEs or command line tools as well.&lt;br /&gt;
(3) You may download the source code for this example at &lt;a href="https://sites.google.com/site/khangaonkar/home/spring"&gt;springjdbcwithTransaction.zip&lt;/a&gt; .&lt;br /&gt;
&lt;br /&gt;
We resuse the example from the &lt;a href="http://khangaonkar.blogspot.com/2010/09/database-access-made-simple-with-spring.html"&gt;JDBC with Spring&lt;/a&gt; blog we wrote some time ago. Let us add transactions support to MemberSpringJDBCDAO. This class has the insertMember method that inserts a member to the database. Let us modify the method a little bit to throw a RuntimeException after the insert into the database. The runtime exception is added to pretend that an error occured in business logic while updating the database.&lt;br /&gt;
&lt;color font=green&gt;&lt;pre&gt;public int insertMember(Member member) {
    JdbcTemplate jt = getJdbcTemplate() ;
    Object[] params = new Object[{member.getFirstname(),
        member.getLastname(),
        member.getStreet(),member.getCity(),
        member.getZip(),member.getEmail(),member.getPassword()} ;
  
    int ret = jt.update(insert_sql, params) ;
    throw new RuntimeException("simulate Error condition') ;
    return ret ;
}
&lt;/pre&gt;&lt;/color&gt;&lt;br /&gt;
In this method, would you expect the insert to be committed to the database ? The answer is Yes, though that is not the desirable behavior. The default behaviour of JDBC is autocommit = true , which means, each insert or update is committed immediately. You could set autocommit = false and explicitly commit or rollback at the end of the method. But it is much easier to let your container handle this.&lt;br /&gt;
&lt;br /&gt;
To add declarative transaction management to the above method&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Step 1: &lt;/b&gt; Define a transaction manager in springjdbcdao.xml&lt;br /&gt;
&lt;font color=purple&gt;&lt;br /&gt;
&amp;lt;bean id="txManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager"/&amp;gt; &lt;/font&gt;&lt;br /&gt;
&lt;br /&gt;
Spring works with the transaction manager to begin and complete transactions.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; Step 2:&lt;/b&gt; Turn on support for transaction annotations &lt;br /&gt;
&lt;br /&gt;
Add to springjdbcdao.xml&lt;br /&gt;
&lt;font color=purple&gt;&lt;br /&gt;
&amp;lt;tx:annotation-driven transaction-manager="txManager"/&amp;gt;&lt;/font&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; Step 3:&lt;/b&gt; Add the @Transactional annotation to the insertMember method&lt;br /&gt;
&lt;br /&gt;
&lt;pre&gt;&lt;font color=green&gt;@Transactional
public int insertMember(Member member) {
&lt;/font&gt;&lt;/pre&gt;&lt;br /&gt;
@Transactional can take properties but we will go with default values which are:&lt;br /&gt;
&lt;br /&gt;
Propagation : Required&lt;br /&gt;
&lt;br /&gt;
Required means a transaction is required. If there is no transaction, Spring requests the transaction manager to start one. The other possible values is Requires_New, which tells the transaction manager to always suspend the existing transaction and start a new one.&lt;br /&gt;
&lt;br /&gt;
Isolation level : Default&lt;br /&gt;
&lt;br /&gt;
Use the default isolation level of the underlying resource manager.&lt;br /&gt;
&lt;br /&gt;
Rollback : Any runtime exception triggers a rollback&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; Step 4:&lt;/b&gt; Run the updated insertMember method using Junit test MemberSpringJDBCDAOTest.&lt;br /&gt;
&lt;br /&gt;
You will see the following logs from the transaction manager indicating the transaction rolled back.&lt;br /&gt;
&lt;font color=blue&gt;&lt;br /&gt;
org.springframework.jdbc.datasource.DataSourceTransactionManager  - Initiating transaction rollback&lt;br /&gt;
2501 [main] DEBUG org.springframework.jdbc.datasource.DataSourceTransactionManager  - Initiating transaction rollback&lt;br /&gt;
2501 [main] DEBUG org.springframework.jdbc.datasource.DataSourceTransactionManager  - Rolling back JDBC transaction on Connection [org.apache.derby.impl.jdbc.EmbedConnection40@13320911 (XID = 2827), (SESSIONID = 1), (DATABASE = c:\manoj\mjprojects\database\pumausers), (DRDAID = null) ]&lt;br /&gt;
2501 [main] DEBUG org.springframework.jdbc.datasource.DataSourceTransactionManager  - Rolling back JDBC transaction on Connection [org.apache.derby.impl.jdbc.EmbedConnection40@13320911 (XID = 2827), (SESSIONID = 1), (DATABASE = c:\manoj\mjprojects\database\pumausers), (DRDAID = null) ]&lt;br /&gt;
2511&lt;br /&gt;
&lt;/font&gt;&lt;br /&gt;
Use SQL to check the database table. Confirm that no record is added.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt; Step 5:&lt;/b&gt; Remove the runtimeexception from the insertMember method and run the test again.&lt;br /&gt;
&lt;br /&gt;
The Spring debug log with show that the transaction is committed. Use SQL to check the database table. Confirm that a record is added to the table.&lt;br /&gt;
&lt;br /&gt;
In summary, Transactions are necessary to maintain ACID properties for data sources. Declarative transactions using Spring makes that task easier.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/KP8UWppHBoU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/154374829406296529/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2011/09/spring-and-declarative-transactions.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/154374829406296529?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/154374829406296529?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/KP8UWppHBoU/spring-and-declarative-transactions.html" title="Spring and Declarative Transactions" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2011/09/spring-and-declarative-transactions.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkMCQXg9cCp7ImA9WhdSFUo.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-1500829509644776674</id><published>2011-07-24T23:14:00.000-07:00</published><updated>2011-07-24T23:14:20.668-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-07-24T23:14:20.668-07:00</app:edited><title>Handling completion of concurrent tasks</title><content type="html">In the last blog on &lt;a href="http://khangaonkar.blogspot.com/2011/06/java-executors.html"&gt;Java Executors&lt;/a&gt; I introduced Executors which have been the preferred way of writing multi-threaded programs in java since JDK5. In this blog we explain java.util.concurrent.Future, which helps in processing the results from concurrent task.&lt;br /&gt;
&lt;br /&gt;
Prior to java.util.concurrent, if you wanted to wait for a task to complete and get some result from the task after it completed, you had to implement the wait/notify mechanism.&lt;br /&gt;
&lt;br /&gt;
Up to JDK5, the interface used to model concurrent tasks was the Runnable, which has a run method. A major limitation of the run method is that it cannot return a result or throw a checked exception.&lt;br /&gt;
&lt;br /&gt;
JDK5 introduced interface Callable to model concurrent tasks. The call method returns a result and can throw a checked exception.&lt;br /&gt;
&lt;pre&gt;public interface Callable {
    V call() throws Exception ;
}
&lt;/pre&gt;How to get the result returned by call() ?&lt;br /&gt;
The interface Future provides methods to check if the task is completed and to retrieve the result.&lt;br /&gt;
&lt;pre&gt;public interface Future&lt;v&gt; {
    boolean cancel(boolean interupt) ;
    boolean isCancelled() ;
    boolean isDone() ;
    V get() throws .......
    V get(long timeout, TimeUnit unit) throws ....... 
}
&lt;/pre&gt;You call one of the get methods on Future to get the result. But how do you create or get a handle to a Future and how is it related to the concurrent task which might be a Callable ?&lt;br /&gt;
&lt;br /&gt;
In the previous blog we discussed the ExecutorService to execute tasks and we did this by calling the execute method passing it a runnable. The ExecutorService also has a submit method that take a Callable as a parameter and returns a Future.&lt;br /&gt;
&lt;br /&gt;
&lt;t&gt; Future&lt;t&gt; submit(Callable&lt;t&gt; task) ;&lt;br /&gt;
&lt;br /&gt;
The sample below brings it all together:&lt;br /&gt;
&lt;pre&gt;&lt;font color=green&gt;
1 public class FileHandler {
2  private static final ExecutorService tp=Executors.newFixedThreadPool(50);
3
4  public void downloadandProcess(FileMetaData fdata) throws IOException {
5       final String filepath = fdata.getfilePath() ;   
6       Callable&amp;lt;File&amp;gt; task = new Callable() {
7       public File call() {
9          File result = download(filepath) ;   
10          return result ;
11      }
12      Future&lt;file&gt; future = tp.submit(task) ;
13      //wait for the file to be downloaded 
14
15      try {
16        File f = future.get() ;
17        process(f) ;
19      } catch (InterruptedException e) {
20        Thread.currentThread().interrupt() ;
21        future.cancel(true) ;
22      }
23  }
&lt;/font&gt;
&lt;/pre&gt;&lt;br /&gt;
In the above example there are 2 threads. The thread that calls the method downloadandProcess. This thread kicks of another task (thread) in line 12 to download a file. The task is defined as a Callable in lines 6-11.In line 16 the calling thread will block until a result is available from the download thread.&lt;br /&gt;
&lt;br /&gt;
Using Executor, Callable &amp; Future is thus much simpler than using java.lang.Thread and the wait/notify mechanism.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/r0hsHiqgCgc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/1500829509644776674/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2011/07/handling-completion-of-concurrent-tasks.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/1500829509644776674?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/1500829509644776674?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/r0hsHiqgCgc/handling-completion-of-concurrent-tasks.html" title="Handling completion of concurrent tasks" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2011/07/handling-completion-of-concurrent-tasks.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU4ERng-fSp7ImA9WhZUE08.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-3990476351073390625</id><published>2011-06-05T18:31:00.000-07:00</published><updated>2011-06-05T18:31:47.655-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-06-05T18:31:47.655-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><title>Java Executors</title><content type="html">The old way of creating threads in Java was to extend the java.lang.Thread class or to implement the java.lang.Runnable interface and pass it to Thread as argument. In this approach, the task is modeled as a Runnable and you create one or more threads for each task. There were no built in facilities to re-use threads such as thread pools. Additionally, once a task was started, there was no easy way to know when the task completed without implementing the wait/notify mechanism.&lt;br /&gt;
&lt;br /&gt;
Since JDK5, another abstraction for concurrent execution of tasks is the &lt;font color=blue&gt; Executor &lt;/font&gt; interface. &lt;br /&gt;
&lt;pre&gt;&lt;font color=green&gt;public interface Executor {
   void execute(Runnable cmd) ; 
}&lt;/font&gt;&lt;/pre&gt;The task to be executed is coded by implementating the Runnable interface, similar to the older model. However in the old model, execution is typically hard coded by extending the java.lang.Thread class. &lt;br /&gt;
&lt;br /&gt;
With the executor framework, the submission and execution is decoupled by using the Executors class to create different kinds of Executor's that can execute the runnable.&lt;br /&gt;
&lt;br /&gt;
The &lt;font color=blue&gt;ExecutorService&lt;/font&gt; interface extends Executor and provides additional methods that lets callers submit tasks for concurrent execution.&lt;br /&gt;
&lt;br /&gt;
While one can implement Executor or ExecutorService and delegate to Thread to do the actual work, that is generally not the recommended way.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt; java.util.concurrent.Executors &lt;/font&gt; is a class with factory methods to create concrete implementations of ExecutorService. This includes flexible thread pool based implementations. Using thread pools for concurrent execution has the advantage that it lets your application scale.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt; public static ExecutorService newFixedThreadPool(int nThreads)&lt;/font&gt; creates a thread pool that created threads upto the size nThreads. After that it created additional threads only if one of the existing thread dies.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt; public static ExecutorService newCachedThreadPool() &lt;/font&gt; created a threadpool with no upper bound on the number of threads. It will create threads as needed but can reuse existing threads.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt; public static ExecutorService newSingleThreadExecutor() &lt;/font&gt; creates a single thread. If the existing one dies, it will create a new one. But you will never have more that one thread.&lt;br /&gt;
&lt;br /&gt;
&lt;font color=blue&gt; public static ScheduledExecutorService newScheduledThreadPool(int corePoolSize) &lt;/font&gt; lets you create a thread pool that can be scheduled to execute periodically.&lt;br /&gt;
&lt;br /&gt;
Let us write a simple concurrent server using fixed thread pool.&lt;br /&gt;
&lt;pre&gt;&lt;font color=green&gt;
1 public class WebServer {
2  private final ExecutorService tpool = Executors.newFixedThreadPool(50) ;
3
4  public void start() throws IOException {
5    ServerSocket socket = new ServerSocket(8080) ;
6    while(!tpool.isShutdown()) {
7        try {
8           Socket c = socket.accept() ;
9           tpool.execute(new Runnable() {
10               public void run() { process(c) ; }
11           }
12        } catch(Exception e) {
13           log("Error occured .." + e) ;
14        }
15    }
16  }
17  private void process(socket c) {
18    // service the request   
19  }
20  public void stop() {
21    tpool.shutdown() ;
22  }
23}
&lt;/font&gt;&lt;/pre&gt;In line 2 we create an ExecutorService. In lines 9-11 we create a runnable for each task and submit it for execution. The shutdown call in line 21 is a graceful shutdown. Tasks that are already submitted will be completed and no new tasks are accepted.&lt;br /&gt;
&lt;br /&gt;
The advantages of using Executors are:&lt;br /&gt;
&lt;pre&gt;&lt;font color=purple&gt;   Decoupling of task creation with submission/excecution.
   Built in thread pools.
   Built in orderly shutdown.
   Built in scheduling. (mentioned above but not discussed)
   Ability to check or block on completion of tasks.&lt;/font&gt;&lt;/pre&gt;&lt;br /&gt;
Today, if you needed a data structure such as a list or a Map, you use the classes in the collection framework java.util.*. Similarly for concurrent programming, you should use the executor framework in java.concurrent.* as it gives you lot of function that you would otherwise have to code yourself.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/sfgxCNZ-uYI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/3990476351073390625/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2011/06/java-executors.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/3990476351073390625?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/3990476351073390625?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/sfgxCNZ-uYI/java-executors.html" title="Java Executors" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2011/06/java-executors.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DU4GRXs_eyp7ImA9WhZXGEQ.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-4152015243755338230</id><published>2011-05-08T16:38:00.000-07:00</published><updated>2011-05-08T16:38:44.543-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-05-08T16:38:44.543-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="MapReduce Hadoop &quot;distributed programming&quot;" /><title>What is MapReduce ?</title><content type="html">MapReduce is a parallel programming technique made popular by Google. It is used for processing very very large amounts of data. Such processing can be completed in a reasonable amount of time only by distributing the work to multiple machines in parallel. Each machine processes a small subset of the data. MapReduce is a programming model that lets developers focus on the writing code that processes their data without having to worry about the details of parallel execution. &lt;br /&gt;
&lt;br /&gt;
MapReduce requires modeling the data to be processed as key,value pairs. The developer codes a map function and a reduce function. &lt;br /&gt;
&lt;br /&gt;
The MapReduce runtime calls the map function for each key,value pair. The map function takes as input a key,value pair and produces an output which is another key,value pair.  &lt;br /&gt;
&lt;br /&gt;
The MapReduce runtime sorts and groups the output from the map functions by key. It then calls the reduce function passing it a key and a list of values associated with the key. The reduce function is called for each key. The output from the reduce function is a key,value pair. The value is generally an aggregate or something calculated by processing the list of values that were passed in for the input key. The reduce function is called for each intermediate key produced by the map function. The output from the reduce function is the required result.&lt;br /&gt;
&lt;br /&gt;
As an example , let us say you have a large number of log files that contain audit logs for some event such as access to an account. You need to find out how many times each account was accessed in the last 10 years.&lt;br /&gt;
Assume each line in the log file is a audit record. We are processing log files line by line.The map and reduce functions would look like this:&lt;br /&gt;
&lt;pre&gt;map(key , value) {

    // key = byte offset in log file 
    // value = a line in the log file
    if ( value is an account access audit log) {
        account number = parse account from value
        output key = account number, value = 1
    }
}
reduce(key, list of values) {
    // key = account number
    // list of values {1,1,1,1.....}
    for each value
       count = count + value

    output key , count 
}
&lt;/pre&gt;The map function is called for each line in each log file. Lines that are not relevant are ignored. Account number is parsed out of relevant lines and output with a value 1. The MapReduce runtime sorts and groups the output by account number. The reduce function is called for each account. The reduce function aggregates the values for each account, which is the required result.&lt;br /&gt;
&lt;br /&gt;
MapReduce jobs are generally executed on a cluster of machines. Each machine executes a task which is either a map task or reduce task. Each task is processing a subset of the data. In the above example, let us say we start with a set of large input files. The MapReduce runtime breaks the input data into partitions called splits or shards. Each split or shard is processed by a map task on a machine. The output from each map task is sorted and partitioned by key. The outputs from all the maps are merged to create partitions that are input to the reduce tasks.&lt;br /&gt;
&lt;br /&gt;
There can be multiple machines each running a reduce task. Each reduce task gets a partition to process. The partition can have multiple keys. But all the data for each key is in 1 partition. In other words each key can processed by 1 reduce task only.&lt;br /&gt;
&lt;br /&gt;
The number of machines , the number of map tasks , number of reduce tasks and several other things are configurable.&lt;br /&gt;
&lt;br /&gt;
MapReduce is useful for problems that require some processing of large data sets. The algorithm can be broken into map and reduce functions. MapReduce runtime takes care of distributing the processing to multiple machines and aggregating the results. &lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://hadoop.apache.org/"&gt;Apache Hadoop&lt;/a&gt; is an open source Java implementation of mapreduce. Stay tuned for future blog / tutorial on mapreduce using hadoop.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/qc7RtQMkDAQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/4152015243755338230/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2011/05/what-is-mapreduce.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/4152015243755338230?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/4152015243755338230?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/qc7RtQMkDAQ/what-is-mapreduce.html" title="What is MapReduce ?" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2011/05/what-is-mapreduce.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak4HQnY6eip7ImA9WhZQGUg.&quot;"><id>tag:blogger.com,1999:blog-5008017311510568944.post-5392848595135570703</id><published>2011-04-10T17:45:00.000-07:00</published><updated>2011-04-27T19:48:53.812-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-04-27T19:48:53.812-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Spring security web" /><title>Spring Security Tutorial #1 - Authentication and Authorization</title><content type="html">Spring security is a framework that lets you add security to spring based applications. Before Spring security, developers had to rely on J2EE security to secure java applications. J2EE provides very limited function. You could secure web resources or EJB methods. But there is nothing more. Additional drawback of J2EE security is that implementations are application server specific. If you do anything more that what is in the servlet or EJB specification, it is not portable.&lt;br /&gt;
&lt;br /&gt;
In addition to basic authentication and authorization, Spring Security has support for:&lt;br /&gt;
&lt;i&gt;Remember me authentication&lt;br /&gt;
Session management&lt;br /&gt;
ACL based security&lt;br /&gt;
Integration with CAS, LDAP, Open ID&lt;/i&gt;&lt;br /&gt;
and many other things.&lt;br /&gt;
&lt;br /&gt;
It is not possible to cover all those topics in one article. I plan to write about Spring Security as a series of tutorials. This first installment #1 will get you started with Spring Security and help you setup basic authentication and authorization.&lt;br /&gt;
&lt;br /&gt;
Authentication refers to ensuring that the user is who he or she claims to be. Any application that has any security will typically force the user to present a name and a password. If they match what is stored in the system, we say the user is authenticated and allow the user to continue using the application.&lt;br /&gt;
&lt;br /&gt;
Authorization refers to ensuring that an authenticated user has the necessary permissions to perform some operation or access some data. Authorization involves checking a pre-defined policy which may say that these users have permission to perform these actions on this resource.&lt;br /&gt;
&lt;br /&gt;
For this tutorial, let us take the simple web application we developed in &lt;a href="http://khangaonkar.blogspot.com/2010/10/developing-web-applications-with-spring.html"&gt;Spring MVC blog&lt;/a&gt; and add security to it. Springmvc.zip was renamed to Springsecurityv1 and security metadata added. You can download the sample at &lt;a href="https://sites.google.com/site/khangaonkar/home/spring"&gt;Springsecurityv1.zip&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
For this tutorial you will also need&lt;br /&gt;
(1) &lt;a href="http://www.springsource.org/download"&gt;Spring 3.0&lt;/a&gt;&lt;br /&gt;
(2) &lt;a href="http://www.springsource.org/download"&gt;Spring Security 3.x.&lt;/a&gt; &lt;br /&gt;
(3) &lt;a href="http://www.eclipse.org"&gt;Eclipse&lt;/a&gt; is optional. I use eclipse as my IDE. But you can use other IDEs or command line tools as well. &lt;br /&gt;
(4) A webserver like &lt;a href="http://tomcat.apache.org/"&gt;Tomcat&lt;/a&gt;&lt;br /&gt;
(5) Some familiarity with the Spring framework.&lt;br /&gt;
&lt;br /&gt;
Note that Spring Security is packaged separately and is not included in core Spring.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Step 1:&lt;/b&gt;&lt;br /&gt;
As always, spring configuration begins in applicationcontext config file.  We start by adding the namespace which has the spring configuration elements. In springsecurity-servlet.xml, the security namespace http://www.springframework.org/schema/security is added.&lt;br /&gt;
&lt;font color="purple"&gt;&lt;pre&gt;&amp;lt;beans xmlns:security="http://www.springframework.org/schema/security" 
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
       xmlns="http://www.springframework.org/schema/beans" 
       xsi:schemalocation="http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans-2.0.xsd 
       http://www.springframework.org/schema/security
       http://www.springframework.org/schema/security/spring-security-3.0.3.xsd"&amp;gt;
&amp;lt;/beans&amp;gt;&lt;/pre&gt;&lt;/font&gt;&lt;br /&gt;
&lt;b&gt;Step 2:&lt;/b&gt;&lt;br /&gt;
Add the Spring security filters to the web.xml. These filters intercept requests and force authentication and authorization&lt;br /&gt;
&lt;font color="purple"&gt;&lt;pre&gt;&amp;lt;filter&amp;gt;
    &amp;lt;filter-name&amp;gt;springSecurityFilterChain&amp;lt;/filter-name&amp;gt;
    &amp;lt;filter-class&amp;gt;org.springframework.web.filter.DelegatingFilterProxy&amp;lt;/filter-class&amp;gt;
&amp;lt;/filter&amp;gt;
&amp;lt;filter-mapping&amp;gt;
    &amp;lt;filter-name&amp;gt;springSecurityFilterChain&amp;lt;/filter-name&amp;gt;
    &amp;lt;url-pattern&amp;gt;/*&amp;lt;/url-pattern&amp;gt;
&amp;lt;/filter-mapping&amp;gt;&lt;/pre&gt;&lt;/font&gt;&lt;br /&gt;
&lt;b&gt;Step 3:&lt;/b&gt;&lt;br /&gt;
Add a minimal Spring security configuration to springsecurity-servlet.xml.&lt;br /&gt;
&lt;font color="purple"&gt;&lt;pre&gt;&amp;lt;security:http auto-config="true"&amp;gt;
    &amp;lt;security:intercept-url access="ROLE_manager" pattern="/**"&amp;gt;
&amp;lt;/security:intercept-url&amp;gt;
&amp;lt;/security:http&amp;gt;&lt;/pre&gt;&lt;/font&gt;&lt;br /&gt;
The http element is the element under which all web application related security configuration. The intercept-url says that for any request to any url ( pattern = "/*") , the user needs to be authenticated and a member of the role ROLE_manager.&lt;br /&gt;
&lt;br /&gt;
When you try to access any URL in this application, Spring will redirect you to a page where you need to enter a username and password. The login page is generated by Spring.&lt;br /&gt;
&lt;br /&gt;
But who are the valid users ? And which of them are part of role ROLE_manager.&lt;br /&gt;
&lt;b&gt;Step 4:&lt;/b&gt; &lt;br /&gt;
A really simple way is to define the users in springsecurity-servlet.xml using the authentication-provider&lt;br /&gt;
element. &lt;br /&gt;
&lt;font color="purple"&gt;&lt;pre&gt;&amp;lt;security:authentication-manager&amp;gt;
    &amp;lt;security:authentication-provider&amp;gt;
      &amp;lt;security:user-service&amp;gt;
        &amp;lt;security:user authorities="ROLE_manager" name="tony" password="tony12"&amp;gt;
        &amp;lt;security:user authorities="" name="raul" password="raul12"&amp;gt;
        &amp;lt;/security:user&amp;gt;
      &amp;lt;/security:user&amp;gt;
    &amp;lt;/security:user-service&amp;gt;
  &amp;lt;/security:authentication-provider&amp;gt;
&amp;lt;/security:authentication-manager&amp;gt;&lt;/pre&gt;&lt;/font&gt;&lt;br /&gt;
In this authentication provider, 2 users tony and raul are defined in the configuration. Tony has the authority ROLE_manager.&lt;br /&gt;
&lt;br /&gt;
For a production application, You will want define users, password and roles in database (or LDAP). see step 7 for database configuration.&lt;br /&gt;
&lt;b&gt;Step 5:&lt;/b&gt;&lt;br /&gt;
From eclipse, export the war file (or use the provided war file).&lt;br /&gt;
Deploy the war file to tomcat.&lt;br /&gt;
&lt;br /&gt;
Point your browser to http://localhost:8080/springsecurityv1/test.htm. You will be shown the login screen below.&lt;br /&gt;
&lt;a href="http://3.bp.blogspot.com/-PfI9w80-O2Q/TZk0cr_Z5RI/AAAAAAAACuU/NQaP4hXi3BU/s1600/login.png" imageanchor="1" style="clear:left; left;margin-right:1em; margin-bottom:1em"&gt;&lt;img border="0" height="214" width="320" src="http://3.bp.blogspot.com/-PfI9w80-O2Q/TZk0cr_Z5RI/AAAAAAAACuU/NQaP4hXi3BU/s320/login.png" /&gt;&lt;/a&gt;&lt;br /&gt;
Type in user tony and password tony12. Login will be successful and you will be served the web page below&lt;br /&gt;
&lt;a href="http://4.bp.blogspot.com/-9U5xBnQ2Vm4/TZk0yOcYxtI/AAAAAAAACuc/pVhJB4VdioI/s1600/tonylogin.png" imageanchor="1" style="clear:left; left;margin-right:1em; margin-bottom:1em"&gt;&lt;img border="0" height="215" width="320" src="http://4.bp.blogspot.com/-9U5xBnQ2Vm4/TZk0yOcYxtI/AAAAAAAACuc/pVhJB4VdioI/s320/tonylogin.png" /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;b&gt;Step 6:&lt;/b&gt;&lt;br /&gt;
Restart the browser. Clear cookies and sessions. Type the same URL.&lt;br /&gt;
You will be prompted for username/pwd. This time type in abc / abc12. The login will fail as abc is not a valid user.&lt;br /&gt;
&lt;a href="http://1.bp.blogspot.com/-PKIvXAxxmZY/TZk3Qro__bI/AAAAAAAACuk/3vJVjEEvfMA/s1600/loginfailed.png" imageanchor="1" style="clear:left; left;margin-right:1em; margin-bottom:1em"&gt;&lt;img border="0" height="233" width="320" src="http://1.bp.blogspot.com/-PKIvXAxxmZY/TZk3Qro__bI/AAAAAAAACuk/3vJVjEEvfMA/s320/loginfailed.png" /&gt;&lt;/a&gt;&lt;br /&gt;
Repeat with user raul /raul12. With raul the login succeeds. But access will be denied because raul does not have the ROLE_manager authority.&lt;br /&gt;
&lt;b&gt;Step 7:&lt;/b&gt;&lt;br /&gt;
As mentioned earlier, you would normally not keep username/passwords in the spring configuration file. Spring provides the jdbc-user-service that lets you authenticate with users defined in a database.&lt;br /&gt;
&lt;br /&gt;
In this app, springsecurity2-servlet.xml is second configuration that uses jdbc-user-service. The configuration is &lt;br /&gt;
&lt;font color="purple"&gt;&lt;pre&gt;&amp;lt;security:authentication-manager&amp;gt;
    &amp;lt;security:authentication-provider&amp;gt;
      &amp;lt;security:jdbc-user-service data-source-ref="securityDataSource"/&amp;gt;
    &amp;lt;/security:authentication-provider&amp;gt;
  &amp;lt;/security:authentication-manager&amp;gt;
  
   &amp;lt;bean id="securityDataSource"
       class="org.springframework.jdbc.datasource.DriverManagerDataSource"&amp;gt;
       &amp;lt;property name="driverClassName" value="org.apache.derby.jdbc.EmbeddedDriver" /&amp;gt;
       &amp;lt;property name="url" value="jdbc:derby:/home/mks/mkprojects/database/springusers" /&amp;gt;
   &amp;lt;/bean&amp;gt;
&lt;/pre&gt;&lt;/font&gt;&lt;br /&gt;
&lt;br /&gt;
For this to work, you need to create database tables required by Spring. See Spring documentation for database table schema.&lt;br /&gt;
&lt;br /&gt;
In summary, getting started with Spring Security is quite easy. In a few simple steps, by adding some configuration metadata, we have enabled authentication and authorization for our web application.&lt;img src="http://feeds.feedburner.com/~r/TheKhangaonkarReport/~4/mOlX1xP70GE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://khangaonkar.blogspot.com/feeds/5392848595135570703/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://khangaonkar.blogspot.com/2011/04/spring-security-tutorial-1.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/5392848595135570703?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/5008017311510568944/posts/default/5392848595135570703?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheKhangaonkarReport/~3/mOlX1xP70GE/spring-security-tutorial-1.html" title="Spring Security Tutorial #1 - Authentication and Authorization" /><author><name>Manoj Khangaonkar</name><uri>https://plus.google.com/105019772402521748386</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-PfI9w80-O2Q/TZk0cr_Z5RI/AAAAAAAACuU/NQaP4hXi3BU/s72-c/login.png" height="72" width="72" /><thr:total>3</thr:total><gd:extendedProperty name="commentSource" value="1" /><gd:extendedProperty name="commentModerationMode" value="FILTERED_POSTMOD" /><feedburner:origLink>http://khangaonkar.blogspot.com/2011/04/spring-security-tutorial-1.html</feedburner:origLink></entry></feed>
