<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:gd="http://schemas.google.com/g/2005" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;CEcAQnY_fSp7ImA9Wx5TEks.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614</id><updated>2010-07-27T13:20:43.845-07:00</updated><title>Lineland</title><subtitle type="html">Everything's a dot.</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://www.larsgeorge.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>34</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/Lineland" /><feedburner:info uri="lineland" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;A0EGQng6fCp7ImA9WxFWEEs.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-3521736165062825306</id><published>2010-05-28T10:27:00.000-07:00</published><updated>2010-05-28T11:00:23.614-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-05-28T11:00:23.614-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>HBase File Locality in HDFS</title><content type="html">One of the more ambiguous things in &lt;a href="http://hadoop.apache.org/common/"&gt;Hadoop&lt;/a&gt; is block replication: it happens automatically and you should not have to worry about it. &lt;a href="http://hadoop.apache.org/hbase/"&gt;HBase&lt;/a&gt; relies on it 100% to provide the data safety as it stores its files into the &lt;a href="http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html"&gt;distributed file system&lt;/a&gt;. While that works completely transparent, one of the more advanced questions asked though is how does this affect performance? This usually arises when the user starts writing &lt;a href="http://hadoop.apache.org/mapreduce/"&gt;MapReduce&lt;/a&gt; jobs against either HBase or Hadoop directly. Especially with larger data being stored in HBase, how does the system take care of placing the data close to where it is needed? This is referred to data locality and in case of HBase using the Hadoop file system (HDFS) there may be doubts how that is working. &lt;br /&gt;
&lt;br /&gt;
First let's see how Hadoop handles this. The MapReduce documentation advertises the fact that tasks run close to the data they process. This is achieved by breaking up large files in HDFS into smaller chunks, or so called blocks. That is also the reason why the block size in Hadoop is much larger than you may know them from operating systems and their file systems. Default setting is 64MB, but usually 128MB is chosen, if not even larger when you are sure all your files are larger than a single block in size. Each block maps to a task run to process the contained data. That also means larger block sizes equal fewer map tasks to run as the number of mappers is driven by the number of blocks that need processing. Hadoop knows where blocks are located and runs the map tasks directly on the node that hosts it (actually one of them as replication means it has a few hosts to chose from). This is how it guarantees data locality during MapReduce.&lt;br /&gt;
&lt;br /&gt;
Back to HBase. When you have arrived at that point with Hadoop and you now understand that it can process data locally you start to question how this may work with HBase. If you have read my &lt;a href="http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html"&gt;post&lt;/a&gt; on HBase's storage architecture you saw that HBase simply stores files in HDFS. It does so for the actual data files (HFile) as well as its log (WAL). And if you look into the code it simply uses &lt;code&gt;FileSystem.create(Path path)&lt;/code&gt; to create these. When you then consider two access patterns, a) direct random access and b) MapReduce scanning of tables, you wonder if care was taken that the HDFS blocks are close to where they are read by HBase. &lt;br /&gt;
&lt;br /&gt;
One thing upfront, if you do not co-share your cluster with Hadoop and HBase but instead employ a separate Hadoop as well as a stand-alone HBase cluster then there is &lt;u&gt;no&lt;/u&gt; data locality - and it can't be. That equals to running a separate MapReduce cluster where it would not be able to execute tasks directly on the datanode. It is imperative for data locality to have them running on the same cluster, Hadoop (as in the HDFS), MapReduce and HBase. End of story. &lt;br /&gt;
&lt;br /&gt;
OK, you them all co-located on a single (hopefully larger) cluster? Then read on. How does Hadoop figure out where data is located as HBase accesses it. Remember the access pattern above, both go through a single piece of software called a RegionServer. Case a) uses random access patterns while b) scans all contiguous rows of a table but does so through the same API. As explained in my referenced post and mentioned above, HBase simply stores files and those get distributed as replicated blocks across all data nodes of the HDFS. Now imagine you stop HBase after saving a lot of data and restarting it subsequently. The region servers are restarted and assign a seemingly random number of regions. At this very point there is no data locality guaranteed - how could it be?&lt;br /&gt;
&lt;br /&gt;
The most important factor is that HBase is not restarted frequently and that it performs house keeping on a regular basis. These so called compactions rewrite files as new data is added over time. All files in HDFS once written are immutable (for all sorts of reasons). Because of that,  data is written into new files and as their number grows HBase compacts them into another set of new, consolidated files. And here is the kicker: HDFS is smart enough to put the data where it is needed! How does that work you ask? We need to take a deep dive into Hadoop's source code and see how the above &lt;code&gt;FileSystem.create(Path path)&lt;/code&gt; that HBase uses works. We are running on HDFS here, so we are actually using &lt;code&gt;DistributedFileSystem.create(Path path)&lt;/code&gt; which looks like this:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;public FSDataOutputStream create(Path f) throws IOException {
  return create(f, true);
}&lt;/pre&gt;It returns a &lt;code&gt;FSDataOutputStream&lt;/code&gt; and that is create like so:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException {
  return new FSDataOutputStream(dfs.create(getPathName(f), permission, overwrite, replication, blockSize, progress, bufferSize), statistics);
}&lt;/pre&gt;It uses a &lt;code&gt;DFSClient&lt;/code&gt; instance that is the "umbilical" cord connecting the client with the NameNode:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;this.dfs = new DFSClient(namenode, conf, statistics);&lt;/pre&gt;What is returned though is a &lt;code&gt;DFSClient.DFSOutputStream&lt;/code&gt; instance. As you write data into the stream the &lt;code&gt;DFSClient&lt;/code&gt; aggregates it into "packages" which are then written as blocks to the data nodes. This happens in &lt;code&gt;DFSClient.DFSOutputStream.DataStreamer&lt;/code&gt; (please hang in there, we are close!) which runs as a daemon thread in the background. The magic unfolds now in a few hops on the stack, first in the daemon &lt;code&gt;run()&lt;/code&gt; it gets the list of nodes to store the data on:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;nodes = nextBlockOutputStream(src);&lt;/pre&gt;This in turn calls:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;long startTime = System.currentTimeMillis();
lb = locateFollowingBlock(startTime);
block = lb.getBlock();
nodes = lb.getLocations();&lt;/pre&gt;We follow further down and see that &lt;code&gt;locateFollowingBlocks()&lt;/code&gt; calls:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;return namenode.addBlock(src, clientName);&lt;/pre&gt;Here is where it all comes together. The name node is called to add a new block and the &lt;code&gt;src&lt;/code&gt; parameter indicates for what file, while &lt;code&gt;clientName&lt;/code&gt; is the name of the &lt;code&gt;DFSClient&lt;/code&gt; instance. I skip one more small method in between and show you the next bigger step involved:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;public LocatedBlock getAdditionalBlock(String src, String clientName) throws IOException {
  ...
  INodeFileUnderConstruction pendingFile  = checkLease(src, clientName);
  ...
  fileLength = pendingFile.computeContentSummary().getLength();
  blockSize = pendingFile.getPreferredBlockSize();
  clientNode = pendingFile.getClientNode();
  replication = (int)pendingFile.getReplication();

  // choose targets for the new block tobe allocated.
  DatanodeDescriptor targets[] = replicator.chooseTarget(replication, clientNode, null, blockSize);
  ...
}&lt;/pre&gt;We are finally getting to the core of this code in the &lt;code&gt;replicator.chooseTarget()&lt;/code&gt; call:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;private DatanodeDescriptor chooseTarget(int numOfReplicas, DatanodeDescriptor writer, List&amp;lt;Node&amp;gt; excludedNodes, long blocksize, int maxNodesPerRack, List&amp;lt;DatanodeDescriptor&amp;gt; results) {
  
  if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) {
    return writer;
  }
  
  int numOfResults = results.size();
  boolean newBlock = (numOfResults==0);
  if (writer == null &amp;&amp; !newBlock) {
    writer = (DatanodeDescriptor)results.get(0); 
  }
  
  try {
    switch(numOfResults) {
    case 0:
      writer = chooseLocalNode(writer, excludedNodes, blocksize, maxNodesPerRack, results);
      if (--numOfReplicas == 0) {
        break;
      }
    case 1:
      chooseRemoteRack(1, results.get(0), excludedNodes, blocksize, maxNodesPerRack, results);
      if (--numOfReplicas == 0) {
        break;
      }
    case 2:
      if (clusterMap.isOnSameRack(results.get(0), results.get(1))) {
        chooseRemoteRack(1, results.get(0), excludedNodes, blocksize, maxNodesPerRack, results);
      } else if (newBlock) {
        chooseLocalRack(results.get(1), excludedNodes, blocksize, maxNodesPerRack, results);
      } else {
        chooseLocalRack(writer, excludedNodes, blocksize, maxNodesPerRack, results);
      }
      if (--numOfReplicas == 0) {
        break;
      }
    default:
      chooseRandom(numOfReplicas, NodeBase.ROOT, excludedNodes, blocksize, maxNodesPerRack, results);
    }
  } catch (NotEnoughReplicasException e) {
    FSNamesystem.LOG.warn("Not able to place enough replicas, still in need of " + numOfReplicas);
  }
  return writer;
}&lt;/pre&gt;Recall that we have started with the &lt;code&gt;DFSClient&lt;/code&gt; and created a file which was subsequently filled with data. As the blocks need writing out the above code checks first if that can be done on the same host that the client is on, i.e. the "writer". That is "case 0". In "case 1" the code tries to find a remote rack to have a distant replication of the block. Lastly is fills the list of required replicas with local or machines of another rack. &lt;br /&gt;
&lt;br /&gt;
So this means for HBase that as the region server stays up for long enough (which is the default) that after a major compaction on all tables - which can be invoked manually or is triggered by a configuration setting - it has the files local on the same host. The data node that shares the same physical host has a copy of all data the region server requires. If you are running a scan or get or any other use-case you can be sure to get the best performance.&lt;br /&gt;
&lt;br /&gt;
Finally a good overview over the HDFS design and data replication can be found &lt;a href="http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#Data+Replication"&gt;here&lt;/a&gt;. Also note that the HBase team is working on redesigning how the Master is assigning the regions to servers. The plan is to improve it so that regions are deployed on the server where most blocks are. This will particularly be useful after a restart because it would guarantee a better data locality right off the bat. Stay tuned!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-3521736165062825306?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/X2SjcDT8ERgKC54ygWlkDiJD3RA/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/X2SjcDT8ERgKC54ygWlkDiJD3RA/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/X2SjcDT8ERgKC54ygWlkDiJD3RA/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/X2SjcDT8ERgKC54ygWlkDiJD3RA/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/KtdBt5UCuB4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/3521736165062825306/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/3521736165062825306?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/3521736165062825306?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/KtdBt5UCuB4/hbase-file-locality-in-hdfs.html" title="HBase File Locality in HDFS" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>1</thr:total><feedburner:origLink>http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkMNSH06fyp7ImA9WxFQGEg.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-6086740884807129234</id><published>2010-05-14T08:05:00.000-07:00</published><updated>2010-05-14T08:21:39.317-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-05-14T08:21:39.317-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="katta" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><title>Minimal Katta Lucene Client</title><content type="html">A quick post explaining how a minimal &lt;a href="http://katta.sourceforge.net/"&gt;Katta&lt;/a&gt; Lucene Client is set up. I found this was sort of missing from the Katta site and documentation and since I ran into an issue along the way I thought I post my notes here for others who may attempt the same.&lt;br /&gt;
&lt;br /&gt;
First was the question, which of the libs needed to be supplied for a client to use a remote Katta cluster. Please note that I am referring here to a "canonical" setup with a distributed Lucene index (which I created on &lt;a href="http://hadoop.apache.org/common/"&gt;Hadoop&lt;/a&gt; from data in &lt;a href="http://hadoop.apache.org/hbase/"&gt;HBase&lt;/a&gt; using a &lt;a href="http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html"&gt;MapReduce&lt;/a&gt; job). I found these libs needed to be added, the rest is for the server:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;katta-core-0.6.rc1.jar
lucene-core-3.0.0.jar
zookeeper-3.2.2.jar
zkclient-0.1-dev.jar
hadoop-core-0.20.1.jar
log4j-1.2.15.jar
commons-logging-1.0.4.jar&lt;/pre&gt;&lt;br /&gt;
Here is the code for the client, please note that this is a simple test app that expects to get the name of the index, the default Lucene search field and query on the command line. I did not add usage info as this is just a proof of concept.&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush:java"&gt;package com.worldlingo.test;

import net.sf.katta.lib.lucene.Hit;
import net.sf.katta.lib.lucene.Hits;
import net.sf.katta.lib.lucene.LuceneClient;
import net.sf.katta.util.ZkConfiguration;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Writable;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Query;
import org.apache.lucene.util.Version;

import java.util.Arrays;
import java.util.Map;

public class KattaLuceneClient {

  public static void main(String[] args) {
    try {
      Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
      Query query = new QueryParser(Version.LUCENE_CURRENT, args[1], analyzer).parse(args[2]);

      // assumes "/katta.zk.properties" available on classpath!
      ZkConfiguration conf = new ZkConfiguration();
      LuceneClient luceneClient = new LuceneClient(conf);
      Hits hits = luceneClient.search(query, Arrays.asList(args[0]).toArray(new String[1]), 99);

      int num = 0;
      for (Hit hit : hits.getHits()) {
        MapWritable mw = luceneClient.getDetails(hit);
        for (Map.Entry&amp;lt;Writable, Writable&amp;gt; entry : mw.entrySet()) {
          System.out.println("[" + (num++) + "] key -&amp;gt; " + entry.getKey() + ", value -&amp;gt; " + entry.getValue());
        }
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }

}&lt;/pre&gt;&lt;br /&gt;
The first part is standard Lucene code were we parse the query string with an analyzer. The seconds part is Katta related as it creates a configuration object, which assumes we have a &lt;a href="http://hadoop.apache.org/zookeeper/"&gt;ZooKeeper&lt;/a&gt; configuration in the class path. That config only needs to have these lines set:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;zookeeper.embedded=false
zookeeper.servers=server-1:2181,server-2:2181&lt;/pre&gt;&lt;br /&gt;
The first line is really only used on the server, so it can be left out on the client. I simply copied the server &lt;code&gt;katta.zk.properties&lt;/code&gt; to match my setup. The important line is the second one, which tells the client where the ZooKeeper responsible for managing the Katta cluster is running. With this info the client is able to distribute the search calls to the correct Katta slaves.&lt;br /&gt;
&lt;br /&gt;
Further along we create a &lt;code&gt;LuceneClient&lt;/code&gt; instance and start the actual search. Here I simply used no sorting and set the maximum number of hits returned to 99. These two values could be optionally added to the command line parameters but are trivial and not required here - this is a minimal test client after all ;)&lt;br /&gt;
&lt;br /&gt;
The last part of the app is simply printing out the fields and their values of each found document. Please note that Katta is using the low-level &lt;a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Writable.html"&gt;&lt;code&gt;Writable&lt;/code&gt;&lt;/a&gt; class as part of its response. This is not "too" intuitive for the uninitiated. These are actually &lt;a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Text.html"&gt;&lt;code&gt;Text&lt;/code&gt;&lt;/a&gt; instances so they can safely be convert to text using ".toString()".&lt;br /&gt;
&lt;br /&gt;
Finally, I also checked the test project into my &lt;a href="http://github.com/larsgeorge/katta-lucene-client"&gt;GitHub&lt;/a&gt; account for your perusal. Have fun!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-6086740884807129234?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/W05rd6qrBFGFW4Tpl09LiNQCqpE/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/W05rd6qrBFGFW4Tpl09LiNQCqpE/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/W05rd6qrBFGFW4Tpl09LiNQCqpE/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/W05rd6qrBFGFW4Tpl09LiNQCqpE/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/4BsRcahNUzk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/6086740884807129234/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2010/05/minimal-katta-lucene-client.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/6086740884807129234?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/6086740884807129234?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/4BsRcahNUzk/minimal-katta-lucene-client.html" title="Minimal Katta Lucene Client" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2010/05/minimal-katta-lucene-client.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUUCR306fCp7ImA9WxFRF04.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-286632541657250407</id><published>2010-05-01T10:01:00.000-07:00</published><updated>2010-05-01T10:01:06.314-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-05-01T10:01:06.314-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="nosql" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="openhug" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>3rd Munich OpenHUG Meeting</title><content type="html">I am pleased to invite you to our third Munich Open Hadoop User Group Meeting! &lt;br /&gt;
&lt;br /&gt;
Like always we are looking forward to see everyone again and are welcoming new attendees to join our group. We are enthusiast about all things related to scalable, distributed storage system. We are not limiting us to a particular system but appreciate anyone who would like to share about their experiences.&lt;br /&gt;
&lt;br /&gt;
When: Thursday May 6th, 2010 at 6pm (open end)&lt;br /&gt;
Where: eCircle AG, Nymphenburger Straße 86, 80636 München ["Bruckmann" Building, "U1 Mailinger Str", &lt;a href="http://www.ecircle.com/de/kontakt/anfahrt.html"&gt;map&lt;/a&gt; (in German) and look for the signs]&lt;br /&gt;
&lt;br /&gt;
Thanks again to Bob Schulze from eCircle for providing the infrastructure.&lt;br /&gt;
&lt;br /&gt;
We have a talk scheduled by Stefan Seelmann who is a member of the project committee for the Apache Directory project. This is followed by an open discussion.&lt;br /&gt;
&lt;br /&gt;
Please RSVP at&amp;nbsp;&lt;a href="https://www.xing.com/events/3rd-munich-openhug-meeting-506082"&gt;Xing&lt;/a&gt;&amp;nbsp;and&amp;nbsp;Yahoo's&amp;nbsp;&lt;a href="http://upcoming.yahoo.com/event/5771044/BY/Mnchen/3rd-Munich-OpenHUG-Meeting/eCircle-AG"&gt;Upcoming&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Looking forward to seeing you there!&lt;br /&gt;
&lt;br /&gt;
Cheers,&lt;br /&gt;
Lars&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-286632541657250407?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/C75f_0owDG-Ea4ikmRmeVezt9p8/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/C75f_0owDG-Ea4ikmRmeVezt9p8/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/C75f_0owDG-Ea4ikmRmeVezt9p8/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/C75f_0owDG-Ea4ikmRmeVezt9p8/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/HYmUZIY07LY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/286632541657250407/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2010/05/3rd-munich-openhug-meeting.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/286632541657250407?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/286632541657250407?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/HYmUZIY07LY/3rd-munich-openhug-meeting.html" title="3rd Munich OpenHUG Meeting" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2010/05/3rd-munich-openhug-meeting.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0UDR38zfip7ImA9WxBVEEw.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-2766092510491206648</id><published>2010-02-12T16:01:00.000-08:00</published><updated>2010-02-12T16:01:16.186-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-12T16:01:16.186-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="nosql" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="fosdem" /><title>FOSDEM 2010 NoSQL Talk</title><content type="html">Let me take a minute to wrap up my &lt;a href="http://fosdem.org/2010/"&gt;FOSDEM 2010&lt;/a&gt; experience. I was part of the &lt;a href="http://fosdem.org/2010/schedule/devrooms/nosql"&gt;NoSQL DevRoom&lt;/a&gt; organized by &lt;a href="http://twitter.com/stevenn"&gt;@stevenn&lt;/a&gt; from &lt;a href="http://outerthought.org/"&gt;Outerthought&lt;/a&gt;, who I had the pleasure to visit before a few months back as an&amp;nbsp;&lt;a href="http://www.larsgeorge.com/2009/05/european-hbase-ambassador.html"&gt;HBase Ambassador&lt;/a&gt;. &lt;br /&gt;
&lt;br /&gt;
First things first, the NoSQL DevRoom was just an absolute success and I had a blast attending it. I also made sure to not walk around and see other talks outside the NoSQL track while there were many and plenty good ones. I did so deliberately to see the other projects and what they have to offer. I thought it was great, a good vibe was felt throughout the whole day as the audience got a whirlwind tour through the NoSQL landscape. The room was full to the brim for most presentations and some folks had to miss out as we could not have had more enter. This did prove the great interest in this fairly new kind of technology. Exciting!&lt;br /&gt;
&lt;br /&gt;
The focus of my talk was about the history I have with HBase starting with it in late 2007. At this point I would like to take to the opportunity to thank Michael Stack, the lead of HBase, as he has helped me many times back then to sort out the problems I ran into. I would also like to say that if you start with HBase today you will not have these problems as HBase has matured tremendously since then. It is an overall stable product and can solve scalability issues you may face with regular RDBMS's today - and that with great ease.&lt;br /&gt;
&lt;br /&gt;
So the talk I gave did not really sell all the features nor did it explain everything fully. I felt this could be left to the reader to look up on the project's website (or here on my blog) and hence I focused on my use case only. First up, here are the slides.  &lt;br /&gt;
&lt;br /&gt;
&lt;object data="http://viewer.docstoc.com/" height="450" id="_ds_24778119" name="_ds_24778119" type="application/x-shockwave-flash" width="500"&gt; &lt;param name="FlashVars" value="doc_id=24778119&amp;mem_id=602922&amp;doc_type=pdf&amp;fullscreen=0&amp;showrelated=0&amp;showotherdocs=0&amp;showstats=0 "/&gt;&lt;param name="movie" value="http://viewer.docstoc.com/" /&gt;&lt;param name="allowScriptAccess" value="always" /&gt;&lt;param name="allowFullScreen" value="true" /&gt;&lt;/object&gt; &lt;br /&gt;
&lt;span style="font-size: xx-small;"&gt;&lt;a href="http://www.docstoc.com/docs/24778119/My%20Life%20with%20HBase%20-%20FOSDEM%202010%20NoSQL"&gt; My Life with HBase - FOSDEM 2010 NoSQL&lt;/a&gt; - &lt;/span&gt; &lt;br /&gt;
&lt;br /&gt;
After my talk and throughout the rest of the day I also had great conversations with the attendees who had many and great questions.&lt;br /&gt;
&lt;br /&gt;
Having listened to the other talks though I felt I probably could have done a better job selling HBase to the audience. I could have reported about use-cases in well known companies, gave better performance numbers and so on. I have learned a lesson and am making sure I will be doing a better job next time around. I guess this is also another facet of what this is about, i.e. learning to achieve a higher level of professionalism.&lt;br /&gt;
&lt;br /&gt;
But as I said above, my intend was to report about &lt;u&gt;my&lt;/u&gt; life with HBase. I am grateful though that it was accepted as that and please let me cite Todd Hoff (see &lt;a href="http://highscalability.com/blog/2010/2/12/hot-scalability-links-for-february-12-2010.html"&gt;Hot Scalability Links for February 12, 2010&lt;/a&gt;) who put it in such nice words: &lt;br /&gt;
&lt;blockquote&gt;"The hardscabble tale of HBase's growth from infancy to maturity. A very good introduction and overview of HBase."&lt;/blockquote&gt;Thank you!&lt;br /&gt;
&lt;br /&gt;
Finally here is the video of the talk:&lt;br /&gt;
&lt;br /&gt;
&lt;object height="443" width="474"&gt;&lt;param name="movie" value="http://www.parleys.com/share/parleysshare2.swf?pageId=1859"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="pageId" value="1859"&gt;&lt;/param&gt;&lt;embed src="http://www.parleys.com/share/parleysshare2.swf?pageId=1859" type="application/x-shockwave-flash" allowfullscreen="true" width="474" height="443"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;
&lt;br /&gt;
I am looking forward to more NoSQL events in Europe in the near future and will attempt to represent HBase once more (including those adjustments I mentioned above). My hope is that we as Europeans are able to adopt these new technologies and stay abreast with the rest of the world. We sure have smart people to do so.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-2766092510491206648?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/-G3T69EXn3vw2cEU1zbdKOrYCz0/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/-G3T69EXn3vw2cEU1zbdKOrYCz0/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/-G3T69EXn3vw2cEU1zbdKOrYCz0/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/-G3T69EXn3vw2cEU1zbdKOrYCz0/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/QrMgBO56JAQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/2766092510491206648/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2010/02/fosdem-2010-nosql-talk.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/2766092510491206648?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/2766092510491206648?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/QrMgBO56JAQ/fosdem-2010-nosql-talk.html" title="FOSDEM 2010 NoSQL Talk" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2010/02/fosdem-2010-nosql-talk.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkcGQX8_fSp7ImA9WxBWFE0.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-4780795765478762214</id><published>2010-02-05T12:20:00.000-08:00</published><updated>2010-02-05T13:07:00.145-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-05T13:07:00.145-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="eclipse" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><title>IvyDE and HBase 0.21 (trunk)</title><content type="html">If you are staying on top of HBase development and frequently update to the HBase trunk (0.21.0-dev at the time of this post) you may have noticed that we now have support for &lt;a href="http://ant.apache.org/ivy/"&gt;Apache Ivy&lt;/a&gt;&amp;nbsp;(see &lt;a href="http://issues.apache.org/jira/browse/HBASE-1433"&gt;HBASE-1433&lt;/a&gt;). This is good because it allows to better control dependencies of the required jar files. It does have a few drawbacks though. One issue that you must be online to get your initial set of jars. You can also set up a local mirror or reuse the one you need for Hadoop anyways to share some of them.&lt;br /&gt;
&lt;br /&gt;
Another issue is that it pulls in many more libs as part of the dependency resolving process. This reminds me bit of &lt;code&gt;aptitude&lt;/code&gt; and when you try to install Java, for example on Debian. It often wants to pull in a litany of "required" packages but upon closer look many are only recommended and need not to be installed.&lt;br /&gt;
&lt;br /&gt;
Finally you need to get the jar files somehow into your development environment. I am using Eclipse 3.5 on my Windows 7 PC as well as on my MacOS machines. If you have not run &lt;code&gt;ant&lt;/code&gt; from the command line yet you have no jars downloaded and opening the project in Eclipse yields an endless amount of errors. You have two choices, you can run &lt;code&gt;ant&lt;/code&gt; and get all jars and then add them to the project in Eclipse. But that is rather static and does not work well with future changes. It also is not the "ivy" way to resolve the libraries automatically.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://2.bp.blogspot.com/_Cib_A77V54U/S2xMXVWKaxI/AAAAAAAAAGA/11gv-69kUD4/s1600-h/ivy-edit-libs.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="200" src="http://2.bp.blogspot.com/_Cib_A77V54U/S2xMXVWKaxI/AAAAAAAAAGA/11gv-69kUD4/s200/ivy-edit-libs.png" width="145" /&gt;&lt;/a&gt;The other option you have is adding a plugin to Eclipse that can handle Ivy for you, right within the IDE. Luckily for Eclipse there is &lt;a href="http://ant.apache.org/ivy/ivyde/"&gt;IvyDE&lt;/a&gt;. You install it according to its &lt;a href="http://ant.apache.org/ivy/ivyde/download.cgi"&gt;documentation&lt;/a&gt; and then add a "Classpath Container" as described &lt;a href="http://ant.apache.org/ivy/ivyde/history/latest-milestone/cp_container.html"&gt;here&lt;/a&gt;. That part works quite well and after a restart IvyDE is ready to go.&lt;br /&gt;
&lt;br /&gt;
A few more steps have to be done to get HBase working now - as in compiling without errors. The crucial one is editing the Ivy library and setting the HBase specific Ivy files. In particular the "Ivy settings path" and the properties file. The latter especially is specifying all the various version numbers that the ivy.xml is using. Without it the Ivy resolve process will fail with many errors all over the place. Please note that in the screen shot I added you see how it looks like on my Windows PC. The paths will be slightly different for your setup and probably even using another format if you are on a Mac or Linux machine. As long as you specify both you should be fine.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://4.bp.blogspot.com/_Cib_A77V54U/S2xMUo_zXbI/AAAAAAAAAF4/UKgdgVW-2Lg/s1600-h/ivy-build-path.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="136" src="http://4.bp.blogspot.com/_Cib_A77V54U/S2xMUo_zXbI/AAAAAAAAAF4/UKgdgVW-2Lg/s200/ivy-build-path.png" width="200" /&gt;&lt;/a&gt;The other important issue is that you have to repeat that same step adding the Classpath Container two more times: each of the two larger contrib packages "contrib/stargate" and "contrib/transactional" have their own ivy.xml! For both you have to go into the respective directory and right click on the ivy.xml and follow the steps described in the Ivy documentation. Enter the same information as mentioned above to make the resolve work, leave everything else the way it is. You may notice that the contrib packages have a few more targets unticked. That is OK and can be used as-is.&lt;br /&gt;
&lt;br /&gt;
As a temporary step you have to add two more static libraries that are in the &lt;code&gt;$HBASE_HOME/lib&lt;/code&gt; directory: &lt;code&gt;libthrift-0.2.0.jar&lt;/code&gt; and &lt;code&gt;zookeeper-3.2.2.jar&lt;/code&gt;. Those will eventually be published on the Ivy repositories and then this step is obsolete (see &lt;a href="http://issues.apache.org/jira/browse/INFRA-2461"&gt;INFRA-2461&lt;/a&gt;).&lt;br /&gt;
&lt;br /&gt;
Eventually you end up with three containers as shown in the second and third screen shot. The Eclipse toolbar now also has an Ivy "Resolve All Dependencies" button which you can use to trigger the download process. Personally I had to do this a few times as the mirrors with the jars seem to be flaky at times. I ended up with for example "hadoop-mapred.jar" missing. Another resolve run fixed the problem.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://4.bp.blogspot.com/_Cib_A77V54U/S2x4bMfpQMI/AAAAAAAAAGQ/-7XucgUd8kU/s1600-h/ivy-eclipse.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="125" src="http://4.bp.blogspot.com/_Cib_A77V54U/S2x4bMfpQMI/AAAAAAAAAGQ/-7XucgUd8kU/s200/ivy-eclipse.png" width="200" /&gt;&lt;/a&gt;The last screen shot shows the three Ivy related containers once more in the tree view of the Package Explorer in the Java perspective. What you also see is the Ivy console, which also is installed with the plugin. You have to open it as usual using the "Window - Show View - Console" menu (if you do not have the Console View open already) and then use the drop down menu next to the "Open Console" button in that view to open the Ivy console. It gives you access to all the details when resolving the dependencies and can hint when you have done something wrong. Please note though that it also lists a lot of connection errors, one for every mirror or repository that does not respond or yet has the required package available. One of them should respond though or as mentioned above you will have to try later again.&lt;br /&gt;
&lt;br /&gt;
Eclipse automatically compiles the project and if everything worked out it does so now without a hitch. Good luck!&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Update:&lt;/b&gt; Added info about the yet still static thrift and zookeeper jars. See Kay Kay's comment below.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-4780795765478762214?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/jolTg3_KBaRzu8nlvIQ5IAiz5PM/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/jolTg3_KBaRzu8nlvIQ5IAiz5PM/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/jolTg3_KBaRzu8nlvIQ5IAiz5PM/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/jolTg3_KBaRzu8nlvIQ5IAiz5PM/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/0tLBrmU_PE8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/4780795765478762214/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2010/02/ivyde-and-hbase-021-trunk.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/4780795765478762214?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/4780795765478762214?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/0tLBrmU_PE8/ivyde-and-hbase-021-trunk.html" title="IvyDE and HBase 0.21 (trunk)" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_Cib_A77V54U/S2xMXVWKaxI/AAAAAAAAAGA/11gv-69kUD4/s72-c/ivy-edit-libs.png" height="72" width="72" /><thr:total>4</thr:total><feedburner:origLink>http://www.larsgeorge.com/2010/02/ivyde-and-hbase-021-trunk.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEEDR34zfSp7ImA9WxBXGEQ.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-6702483680144268895</id><published>2010-01-30T15:45:00.000-08:00</published><updated>2010-01-30T16:11:16.085-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-30T16:11:16.085-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>HBase Architecture 101 - Write-ahead-Log</title><content type="html">What is the Write-ahead-Log you ask? In my previous &lt;a href="http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html"&gt;post&lt;/a&gt; we had a look at the general storage architecture of HBase. One thing that was mentioned is the Write-ahead-Log, or WAL. This post explains how the log works in detail, but bear in mind that it describes the current version, which is 0.20.3. I will address the various plans to improve the log for 0.21 at the end of this article. For the term itself please read &lt;a href="http://en.wikipedia.org/wiki/Write-ahead_logging"&gt;here&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Big Picture&lt;/u&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_Cib_A77V54U/S2M98DazIVI/AAAAAAAAAFw/cmp0W38kWGY/s1600-h/wal-flow.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="230" src="http://1.bp.blogspot.com/_Cib_A77V54U/S2M98DazIVI/AAAAAAAAAFw/cmp0W38kWGY/s400/wal-flow.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;The WAL is the lifeline that is needed when disaster strikes. Similar to a BIN log in MySQL it records all changes to the data. This is important in case something happens to the primary storage. So if the server crashes it can effectively replay that log to get everything up to where the server should have been just before the crash. It also means that if writing the record to the WAL fails the whole operation must be considered a failure.&lt;br /&gt;
&lt;br /&gt;
Let"s look at the high level view of how this is done in HBase. First the client initiates an action that modifies data. This is currently a call to &lt;code&gt;put(Put)&lt;/code&gt;, &lt;code&gt;delete(Delete)&lt;/code&gt; and &lt;code&gt;incrementColumnValue()&lt;/code&gt; (abbreviated as "incr" here at times). Each of these modifications is wrapped into a &lt;a href="http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/KeyValue.html"&gt;KeyValue&lt;/a&gt;&amp;nbsp;object instance and sent over the wire using RPC calls. The calls are (ideally batched) to the &lt;a href="http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/regionserver/HRegionServer.html"&gt;HRegionServer&lt;/a&gt;&amp;nbsp;that serves the affected regions.&amp;nbsp;Once it arrives the payload, the said KeyValue, is routed to the &lt;a href="http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/regionserver/HRegion.html"&gt;HRegion&lt;/a&gt;&amp;nbsp;that is responsible for the affected row. The data is written to the WAL and then put into the &lt;a href="http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/regionserver/MemStore.html"&gt;MemStore&lt;/a&gt;&amp;nbsp;of the actual &lt;a href="http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/regionserver/Store.html"&gt;Store&lt;/a&gt;&amp;nbsp;that holds the record. And that also pretty much describes the write-path of HBase.&lt;br /&gt;
&lt;br /&gt;
Eventually when the MemStore gets to a certain size or after a specific time the data is asynchronously persisted to the file system. In between that timeframe data is stored volatile in memory. And if the HRegionServer &amp;nbsp;hosting that memory crashes the data is lost... but for the existence of what is the topic of this post, the WAL!&lt;br /&gt;
&lt;br /&gt;
We have a look now at the various classes or "wheels" working the magic of the WAL. First up is one of the main classes of this contraption.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;HLog&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
The class which implements the WAL is called &lt;a href="http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/regionserver/HLog.html"&gt;HLog&lt;/a&gt;. What you may have read in my previous post and is also illustrated above is that there is only one instance of the HLog class, which is one per HRegionServer. When a HRegion is instantiated the single HLog is passed on as a parameter to the constructor of HRegion.&lt;br /&gt;
&lt;br /&gt;
Central part to HLog's functionality is the &lt;code&gt;append()&lt;/code&gt; method, which internally eventually calls &lt;code&gt;doWrite()&lt;/code&gt;. It is what is called when the above mentioned modification methods are invoked... or is it? One thing to note here is that for performance reasons there is an option for &lt;code&gt;put()&lt;/code&gt;, &lt;code&gt;delete()&lt;/code&gt;, and &lt;code&gt;incrementColumnValue()&lt;/code&gt; to be called with an extra parameter set: &lt;code&gt;setWriteToWAL(boolean)&lt;/code&gt;. If you invoke this method while setting up for example a &lt;code&gt;Put()&lt;/code&gt; instance then the writing to WAL is forfeited! That is also why the downward arrow in the big picture above is done with a dashed line to indicate the optional step. By default you certainly want the WAL, no doubt about that. But say you run a large bulk import MapReduce job that you can rerun at any time. You gain extra performance but need to take extra care that no data was lost during the import. The choice is yours.&lt;br /&gt;
&lt;br /&gt;
Another important feature of the HLog is keeping track of the changes. This is done by using a "sequence number". It uses an &lt;a href="http://java.sun.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicLong.html"&gt;AtomicLong&lt;/a&gt; internally to be thread-safe and is either starting out at zero - or at that last known number persisted to the file system. So as the region is opening its storage file, it reads the highest sequence number which is stored as a meta field in each &lt;a href="http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/io/hfile/HFile.html"&gt;HFile&lt;/a&gt;&amp;nbsp;and sets the HLog sequence number to that value if it is higher than what has been recorded before. So at the end of opening all storage files the HLog is initialized to reflect where persisting has ended and where to continue. You will see in a minute where this is used.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://2.bp.blogspot.com/_Cib_A77V54U/S2IVD_TiqsI/AAAAAAAAAFo/dXO_-nZWz1c/s1600-h/wal.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="243" src="http://2.bp.blogspot.com/_Cib_A77V54U/S2IVD_TiqsI/AAAAAAAAAFo/dXO_-nZWz1c/s400/wal.png" width="400" /&gt;&lt;/a&gt;The image to the right shows three different regions. Each of them covering a different row key range. As mentioned above each of these regions shares the the same single instance of HLog. What that means in this context is that the data as it arrives at each region it is written to the WAL in an unpredictable order. We will address this further below.&lt;br /&gt;
&lt;br /&gt;
Finally the HLog has the facilities to recover and split a log left by a crashed HRegionServer. These are invoked by the&amp;nbsp;&lt;a href="http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/master/HMaster.html"&gt;HMaster&lt;/a&gt;&amp;nbsp;before regions are deployed again.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;HLogKey&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Currently the WAL is using a Hadoop &lt;a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html"&gt;SequenceFile&lt;/a&gt;, which stores record as sets of key/values. For the WAL the value is simply the KeyValue sent from the client. The key is represented by an&amp;nbsp;&lt;a href="http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/regionserver/HLogKey.html"&gt;HLogKey&lt;/a&gt;&amp;nbsp;instance. If you may recall from my first &lt;a href="http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html"&gt;post&lt;/a&gt;&amp;nbsp;in this series the KeyValue does only represent the row, column family, qualifier, timestamp, and value as well as the "Key Type". Last time I did not address that field since there was no context. Now we have one because the Key Type is what identifies what the KeyValue represents, a "put" or a "delete" (where there are a few more variations of the latter to express what is to be deleted, value, column family or a specific column).&lt;br /&gt;
&lt;br /&gt;
What we are missing though is where the KeyValue belongs to, i.e. the region and the table name. That is stored in the HLogKey. What is also stored is the above sequence number. With each record that number is incremented to be able to keep a sequential order of edits. Finally it records the "Write Time", a time stamp to record when the edit was written to the log.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;LogFlusher&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
As mentioned above as data arrives at a HRegionServer in form of KeyValue instances it is written (optionally) to the WAL. And as mentioned as well it is then written to a SequenceFile. While this seems trivial, it is not. One of the base classes in Java IO is the Stream. Especially streams writing to a file system are often buffered to improve performance as the OS is much faster writing data in batches, or blocks. If you write records separately IO throughput would be really bad. But in the context of the WAL this is causing a gap where data is supposedly written to disk but in reality it is in limbo. To mitigate the issue the underlaying stream needs to be flushed on a regular basis.&amp;nbsp;This functionality is provided by the &lt;a href="http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/regionserver/LogFlusher.html"&gt;LogFlusher&lt;/a&gt; class and thread. It simply calls &lt;code&gt;HLog.optionalSync()&lt;/code&gt;, which checks if the &amp;nbsp;&lt;code&gt;hbase.regionserver.optionallogflushinterval&lt;/code&gt;, set to 10 seconds by default, has been exceeded and if that is the case invokes &lt;code&gt;HLog.sync()&lt;/code&gt;. The other place invoking the sync method is &lt;code&gt;HLog.doWrite()&lt;/code&gt;. Once it has written the current edit to the stream it checks if the &lt;code&gt;hbase.regionserver.flushlogentries&lt;/code&gt; parameter, set to 100 by default, has been exceeded and calls sync as well.&lt;br /&gt;
&lt;br /&gt;
Sync itself invokes &lt;code&gt;HLog.Writer.sync()&lt;/code&gt; and is implemented in &lt;code&gt;SequenceFileLogWriter&lt;/code&gt;. For now we assume it flushes the stream to disk and all is well. That in reality this is all a bit more complicated is discussed below.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;LogRoller&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Obviously it makes sense to have some size restrictions related to the logs written. Also we want to make sure a log is persisted on a regular basis. This is done by the LogRoller class and thread. It is controlled by the &lt;code&gt;hbase.regionserver.logroll.period&lt;/code&gt; parameter in the &lt;code&gt;$HBASE_HOME/conf/hbase-site.xml&lt;/code&gt; file. By default this is set to 1 hour. So every 60 minutes the log is closed and a new one started. Over time we are gathering that way a bunch of log files that need to be maintained as well. The &lt;code&gt;HLog.rollWriter()&lt;/code&gt; method, which is called by the LogRoller to do the above rolling of the current log file, is taking care of that as well by calling &lt;code&gt;HLog.cleanOldLogs()&lt;/code&gt; subsequently. It checks what the highest sequence number written to a storage file is, because up to that number all edits are persisted. It then checks if there is a log left that has edits all less than that number. If that is the case it deletes said logs and leaves just those that are still needed.&lt;br /&gt;
&lt;br /&gt;
&lt;div style="background-color: #eeeeee; border: 1px solid #000000; padding: 5px;"&gt;This is a good place to talk about the following obscure message you may see in your logs:&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;2009-12-15 01:45:48,427 INFO org.apache.hadoop.hbase.regionserver.HLog: Too&lt;br /&gt;
many hlogs: logs=130, maxlogs=96; forcing flush of region with oldest edits:&lt;br /&gt;
foobar,1b2dc5f3b5d4,1260083783909&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
It is printed because the configured maximum number of log files to keep exceeds the number of log files that are required to be kept because they still contain outstanding edits that have not yet been persisted. The main reason I saw this being the case is when you stress out the file system so much that it cannot keep up persisting the data at the rate new data is added. Otherwise log flushes should take care of this. Note though that when this message is printed the server goes into a special mode trying to force flushing out edits to reduce the number of logs required to be kept.&lt;/div&gt;&lt;br /&gt;
The other parameters controlling the log rolling are &lt;code&gt;hbase.regionserver.hlog.blocksize&lt;/code&gt; and &lt;code&gt;hbase.regionserver.logroll.multiplier&lt;/code&gt;, which are set by default to rotate logs when they are at 95% of the blocksize of the SequenceFile, typically 64M. So either the logs are considered full or when a certain amount of time has passed causes the logs to be switched out, whatever comes first. &lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Replay&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Once a HRegionServer starts and is opening the regions it hosts it checks if there are some left over log files and applies those all the way down in &lt;code&gt;Store.doReconstructionLog()&lt;/code&gt;. Replaying a log is simply done by reading the log and adding the contained edits to the current MemStore. At the end an explicit flush of the MemStore (note, this is not the flush of the log!) helps writing those changes out to disk.&lt;br /&gt;
&lt;br /&gt;
The old logs usually come from a previous region server crash. When the HMaster is started or detects that region server has crashed it splits the log files belonging to that server into separate files and stores those in the region directories on the file system they belong to. After that the above mechanism takes care of replaying the logs. One thing to note is that regions from a crashed server can only be redeployed if the logs have been split and copied. Splitting itself is done in &lt;code&gt;HLog.splitLog()&lt;/code&gt;. The old log is read into memory in the main thread (means single threaded) and then using a pool of threads written to all region directories, one thread for each region. &lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Issues&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
As mentioned above all edits are written to one HLog per HRegionServer. You would ask why that is the case? Why not write all edits for a specific region into its own log file? Let's quote the &lt;a href="http://labs.google.com/papers/bigtable.html"&gt;BigTable&lt;/a&gt;&amp;nbsp;paper once more:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style="border: 1px solid #000000; margin: 1em 20px; padding: 5px;"&gt;If we kept the commit log for each tablet in a separate log file, a very large number of files would be written concurrently in GFS. Depending on the underlying file system implementation on each GFS server, these writes could cause a large number of disk seeks to write to the different physical log files.&lt;/blockquote&gt;&lt;br /&gt;
HBase followed that principle for pretty much the same reasons. As explained above you end up with many files since logs are rolled and kept until they are safe to be deleted. If you do this for every region separately this would not scale well - or at least be an itch that sooner or later is causing pain.&lt;br /&gt;
&lt;br /&gt;
So far that seems to be no issue. But again, it causes problems when things go wrong. As long as you have applied all edits in time and persisted the data safely, all is well. But if you have to split the log because of a server crash then you need to divide into suitable pieces, as described above in the "replay" paragraph. But as you have seen above as well all edits are intermingled in the log and there is no index of what is stored at all. For that reason the HMaster cannot redeploy any region from a crashed server until it has split the logs for that very server. And that can be quite a number if the server was behind applying the edits. &lt;br /&gt;
&lt;br /&gt;
Another problem is data safety. You want to be able to rely on the system to save all your data, no matter what newfangled algorithms are employed behind the scenes. As far as HBase and the log is concerned you can turn down the log flush times to as low as you want - you are still dependent on the underlaying file system as mentioned above; the stream used to store the data is flushed but is it written to disk yet? We are talking about &lt;a href="http://en.wikipedia.org/wiki/Sync_(Unix)"&gt;fsync&lt;/a&gt; style issues. Now for HBase we are most likely talking Hadoop's HDFS as being the file system that is persisted to.&lt;br /&gt;
&lt;br /&gt;
Up to this point it should be abundantly clear that the log is what keeps data safe. For that reason a log could be kept open for up to an hour (or more if configured so). As data arrives a new key/value pair is written to the SequenceFile and occasionally flushed to disk. But that is not how Hadoop was set out to work. It was meant to provide an API that allows to open a file, write data into it (preferably a lot) and closed right away, leaving an immutable file for everyone else to read many times. Only after a file is closed it is visible and readable to others. If a process dies while writing the data the file is pretty much considered lost. What is required is a feature that allows to read the log up to the point where the crashed server has written it (or as close as possible). &lt;br /&gt;
&lt;br /&gt;
&lt;div style="background-color: #eeeeee; border: 1px solid #000000; padding: 5px;"&gt;&lt;b&gt;Interlude:&lt;/b&gt; &lt;u&gt;HDFS append, hflush, hsync, sync... wth?&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
It all started with &lt;a href="http://issues.apache.org/jira/browse/HADOOP-1700"&gt;HADOOP-1700&lt;/a&gt; reported by HBase lead Michael Stack. It was committed in Hadoop 0.19.0 and meant to solve the problem. But that was not the case. So the issue was tackled again in HADOOP-4379 aka &lt;a href="http://issues.apache.org/jira/browse/HDFS-200"&gt;HDFS-200&lt;/a&gt; and implemented &lt;code&gt;syncFs()&lt;/code&gt; that was meant to help syncing changes to a file to be more reliable. For a while we had custom code (see &lt;a href="http://issues.apache.org/jira/browse/HBASE-1470"&gt;HBASE-1470&lt;/a&gt;) that detected a patched Hadoop that exposed that API. But again this did not solve the issue entirely. &lt;br /&gt;
&lt;br /&gt;
Then came &lt;a href="http://issues.apache.org/jira/browse/HDFS-265"&gt;HDFS-265&lt;/a&gt;, which revisits the append idea in general. It also introduces a &lt;code&gt;Syncable&lt;/code&gt; interface that exposes &lt;code&gt;hsync()&lt;/code&gt; and &lt;code&gt;hflush()&lt;/code&gt;. &lt;br /&gt;
&lt;br /&gt;
Lastly &lt;code&gt;SequenceFile.Writer.sync()&lt;/code&gt; is &lt;u&gt;not&lt;/u&gt; the same as the above, it simply writes a synchronization marker into the file that helps reading it later - or recover data if broken.&lt;/div&gt;&lt;br /&gt;
While append for HDFS in general is useful it is not used in HBase, but the &lt;code&gt;hflush()&lt;/code&gt; is. What it does is writing out everything to disk as the log is written. In case of a server crash we can safely read that "dirty" file up to the last edits. The append in Hadoop 0.19.0 was so badly suited that a &lt;code&gt;hadoop fsck /&lt;/code&gt; would report the DFS being corrupt because of the open log files HBase kept.&lt;br /&gt;
&lt;br /&gt;
Bottom line is, without Hadoop 0.21.0 you can very well face data loss. With Hadoop 0.21.0 you have a state-of-the-art system.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Planned Improvements&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
For HBase 0.21.0 there are quite a few things lined up that affect the WAL architecture. Here are some of the noteworthy ones.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;SequenceFile Replacement&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
One of the central building blocks around the WAL is the actual storage file format. The used SequenceFile has quite a few shortcomings that need to be addressed. One for example is the suboptimal performance as all writing in SequenceFile is synchronized, as documented in &lt;a href="http://issues.apache.org/jira/browse/HBASE-2105"&gt;HBASE-2105&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
As with HFile replacing MapFile in HBase 0.20.0 it makes sense to think about a complete replacement. A first step was done to make the HBase classes independent of the underlaying file format. &lt;a href="http://issues.apache.org/jira/browse/HBASE-2059"&gt;HBASE-2059&lt;/a&gt; made the class implementing the log configurable.&lt;br /&gt;
&lt;br /&gt;
Another idea is to change to a different serialization altogether. &lt;a href="http://issues.apache.org/jira/browse/HBASE-2055"&gt;HBASE-2055&lt;/a&gt; proposes such a format using Hadoop's &lt;a href="http://hadoop.apache.org/avro/"&gt;Avro&lt;/a&gt; as the low level system. Avro is also slated to be the new RPC format for Hadoop, which does help as more people are familiar with it.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Append/Sync&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Even with &lt;code&gt;hflush()&lt;/code&gt; we have a problem that calling it too often may cause the system to slow down. Previous tests using the older &lt;code&gt;syncFs()&lt;/code&gt; call did show that calling it for every record slows down the system considerably. One step to help is to implement a "Group Commit", done in &lt;a href="http://issues.apache.org/jira/browse/HBASE-1939"&gt;HBASE-1939&lt;/a&gt;. It flushes out records in batches. In addition &lt;a href="http://issues.apache.org/jira/browse/HBASE-1944"&gt;HBASE-1944&lt;/a&gt;&amp;nbsp;adds the notion of a "deferred log flush" as a parameter of a Column Family. If set to &lt;code&gt;true&lt;/code&gt; it leaves the syncing of changes to the log to the newly added LogSyncer class and thread. Finally &lt;a href="http://issues.apache.org/jira/browse/HBASE-2041"&gt;HBASE-2041&lt;/a&gt; sets the &lt;code&gt;flushlogentries&lt;/code&gt; to 1 and &lt;code&gt;optionallogflushinterval&lt;/code&gt; to 1000 msecs. The &lt;code&gt;.META.&lt;/code&gt; is always synced for every change, user tables can be configured as needed. &lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Distributed Log Splitting&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
As remarked splitting the log is an issue when regions need to be redeployed. One idea is to keep a list of regions with edits in Zookeeper. That way at least all "clean" regions can be deployed instantly. Only those with edits need to wait then until the logs are split. &lt;br /&gt;
&lt;br /&gt;
What is left is to improve how the logs are split to make the process faster. Here is how is the BigTable addresses the issue:&lt;br /&gt;
&lt;blockquote style="border: 1px solid #000000; margin: 1em 20px; padding: 5px;"&gt;One approach would be for each new tablet server to read this full commit log file and apply just the entries needed for the tablets it needs to recover. However, under such a scheme, if 100 machines were each assigned a single tablet from a failed tablet server, then the log file would be read 100 times (once by each server).&lt;/blockquote&gt;and further &lt;br /&gt;
&lt;blockquote style="border: 1px solid #000000; margin: 1em 20px; padding: 5px;"&gt;We avoid duplicating log reads by first sorting the commit log entries in order of the keys (table, row name, log sequence number). In the sorted output, all mutations for a particular tablet are contiguous and can therefore be read efficiently with one disk seek followed by a sequential read. To parallelize the sorting, we partition the log file into 64 MB segments, and sort each segment in parallel on different tablet servers. This sorting process is coordinated by the master and is initiated when a tablet server indicates that it needs to recover mutations from some commit log file.&lt;/blockquote&gt;This is where its at. As part of the HMaster rewrite (see &lt;a href="http://issues.apache.org/jira/browse/HBASE-1816"&gt;HBASE-1816&lt;/a&gt;) the log splitting will be addressed as well. &lt;a href="http://issues.apache.org/jira/browse/HBASE-1364"&gt;HBASE-1364&lt;/a&gt; wraps the splitting of logs into one issue. But I am sure that will evolve in more sub tasks as the details get discussed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-6702483680144268895?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/MeJZikzxUavt8JybqRQL25-GBYA/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/MeJZikzxUavt8JybqRQL25-GBYA/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/MeJZikzxUavt8JybqRQL25-GBYA/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/MeJZikzxUavt8JybqRQL25-GBYA/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/MUdpK092KeM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/6702483680144268895/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/6702483680144268895?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/6702483680144268895?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/MUdpK092KeM/hbase-architecture-101-write-ahead-log.html" title="HBase Architecture 101 - Write-ahead-Log" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_Cib_A77V54U/S2M98DazIVI/AAAAAAAAAFw/cmp0W38kWGY/s72-c/wal-flow.png" height="72" width="72" /><thr:total>1</thr:total><feedburner:origLink>http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUYCSH87fyp7ImA9WxBXFkU.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-5322959831908772873</id><published>2010-01-28T04:47:00.000-08:00</published><updated>2010-01-28T04:52:49.107-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-28T04:52:49.107-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="nosql" /><category scheme="http://www.blogger.com/atom/ns#" term="openhug" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>2nd Munich OpenHUG Meeting</title><content type="html">At the end of last year we had a first meeting in Munich to allow everyone interested in all things Hadoop and related technologies to gather, get to know each other and exchange their knowledge, experiences and information. We would like to invite for the second Munich OpenHUG Meeting and hope to see you all again and meet even more new enthusiasts there. We would also be thrilled if those attending would be willing to share their story so that we can learn about other projects and how people are using the exciting NoSQL related technologies. No pressure though, come along and simply listen if you prefer, we welcome anyone and everybody!&lt;br /&gt;
&lt;br /&gt;
When: Thursday February 25, 2010 at 5:30pm open end &lt;br /&gt;
Where: eCircle AG, Nymphenburger Straße 86, 80636 München ["Bruckmann" Building, "U1 Mailinger Str", &lt;a href="http://www.ecircle.com/de/kontakt/anfahrt.html"&gt;map&lt;/a&gt; (in German) and look for the signs]&lt;br /&gt;
&lt;br /&gt;
Thanks again to Bob Schulze from eCircle to provide the location and projector. So far we have a talk scheduled by Christoph Rupp about &lt;a href="http://hamsterdb.com/"&gt;HamsterDB&lt;/a&gt;. We are still looking for volunteers who would like to present on any related topic (please &lt;a href="mailto:info@larsgeorge.com"&gt;contact me&lt;/a&gt;)! Otherwise we will have an open discussion about whatever is brought up by the attendees.&lt;br /&gt;
&lt;br /&gt;
Last but not least there will be something to drink and we will get pizzas in. Since we do not know how many of you will come we simply stay at the events location and continue our chats over food. &lt;br /&gt;
&lt;br /&gt;
Looking forward to seeing you there!&lt;br /&gt;
&lt;br /&gt;
Please RSVP at &lt;a href="http://upcoming.yahoo.com/event/5279322/BY/Mnchen/2nd-Munich-OpenHUG-Meeting/eCircle-AG"&gt;Upcoming&lt;/a&gt; or &lt;a href="https://www.xing.com/events/2nd-munich-openhug-meeting-458852"&gt;Xing&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-5322959831908772873?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/fjTo8Vxm-z8SucWB0_urMVkm0ZY/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/fjTo8Vxm-z8SucWB0_urMVkm0ZY/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/fjTo8Vxm-z8SucWB0_urMVkm0ZY/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/fjTo8Vxm-z8SucWB0_urMVkm0ZY/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/3G7L-Uwlspc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/5322959831908772873/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2010/01/2nd-munich-openhug-meeting-at-end-of.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/5322959831908772873?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/5322959831908772873?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/3G7L-Uwlspc/2nd-munich-openhug-meeting-at-end-of.html" title="2nd Munich OpenHUG Meeting" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2010/01/2nd-munich-openhug-meeting-at-end-of.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUQMQX45fCp7ImA9WxBQEUg.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-5176142705160110735</id><published>2010-01-10T11:52:00.000-08:00</published><updated>2010-01-10T11:56:20.024-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-10T11:56:20.024-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="openhug" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>First Munich OpenHUG Meeting - Summary</title><content type="html">On December 17th we had the first Munich OpenHUG meeting. The location was kindly provided by eCircle's Bob Schulze. This was the first in a series of meetings in Munich based on everything Hadoop and related technologies. Let's use the term NoSQL with care as this is not about blame or finger pointing (I feel that it was used like that in the past and therefore making this express point). We are trying to get together the brightest local and remote talent to report and present on new age and evolved existing technologies. &lt;br /&gt;
&lt;br /&gt;
The first talk of the night was given by Bob himself presenting his findings of evaluating HBase and Hadoop for their internal use. He went into detail explaining how HBase is structuring its data and how it can be used for their needs. One thing that I noted in particular was his &lt;a href="http://sourceforge.net/projects/hbaseexplorer/"&gt;HBase Explorer&lt;/a&gt;, which he subsequently published on SourceForge as an Open Source project. The talk was concluded by an open discussion about HBase.&lt;br /&gt;
&lt;br /&gt;
The second part of the meeting was my own &lt;a href="http://www.docstoc.com/docs/21748013/HBase-at-WorldLingo---Munich-OpenHUG"&gt;presentation&lt;/a&gt; about how we at WorldLingo use HBase and Hadoop (as well as Lucene etc.)&lt;br /&gt;
&lt;br /&gt;
We continued our discussion on HBase with the developers of eCircle present. This was very interesting and fruitful and we had the chance to exchange experiences made along our similar paths. &lt;br /&gt;
&lt;br /&gt;
I would have wished for the overall attendance to be a little higher, but it was a great start. Talking to other hosts of similar events it seems that this is normal and therefore my hopes are up for the next meetings throughout this year. We have planned the next meeting for &lt;b&gt;February 25th, 2010&lt;/b&gt; at the same location. If you have interest in presenting a talk on any related topic, please &lt;a href="mailto:info@larsgeorge.com"&gt;contact&lt;/a&gt; me!&lt;br /&gt;
&lt;br /&gt;
I am looking forward to meeting you all there!&lt;br /&gt;
&lt;br /&gt;
Lars&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-5176142705160110735?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/PFHjzgSU1ojyLpPPgFyT1oCAkrM/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/PFHjzgSU1ojyLpPPgFyT1oCAkrM/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/PFHjzgSU1ojyLpPPgFyT1oCAkrM/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/PFHjzgSU1ojyLpPPgFyT1oCAkrM/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/_muycWcMem8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/5176142705160110735/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2010/01/first-munich-openhug-meeting-summary.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/5176142705160110735?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/5176142705160110735?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/_muycWcMem8/first-munich-openhug-meeting-summary.html" title="First Munich OpenHUG Meeting - Summary" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2010/01/first-munich-openhug-meeting-summary.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEENR3k4eSp7ImA9WxBTF0U.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-3007165915101675372</id><published>2009-12-14T01:22:00.000-08:00</published><updated>2009-12-14T01:24:56.731-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-14T01:24:56.731-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="nosql" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>First Munich OpenHUG Meeting</title><content type="html">First Munich OpenHUG Meeting&lt;br /&gt;
&lt;br /&gt;
We are trying to gauge the interest in a south Germany Hadoop User Group Meeting. After seeing quite a big interest in the Berlin meetings a few of us got together and decided to test the waters for another meeting at the other end of the country. We are therefore happy to announce the first Munich OpenHUG Meeting.&lt;br /&gt;
&lt;br /&gt;
When: Thursday December 17, 2009 at 5:30pm open end&lt;br /&gt;
Where: eCircle AG, Nymphenburger Straße 86, 80636 München ("Bruckmann" Building, "U1 Mailinger Str", &lt;a href="http://www.ecircle.com/de/kontakt/anfahrt.html"&gt;map&lt;/a&gt; in German and look for the signs)&lt;br /&gt;
&lt;br /&gt;
Thanks to Bob Schulze from eCircle to provide the location, projector and also giving a first presentation on how eCircle is planning to use the Hadoop stack.&lt;br /&gt;
&lt;br /&gt;
We also have Dave Butlerdi giving an overview of his usage of Hadoop. &lt;br /&gt;
Finally I will give a state of affairs of the HBase project. What is it, what does it do and how am I using it (since early 2008).&lt;br /&gt;
&lt;br /&gt;
We are also open for everyone who wants to talk about anything related to these new technologies often combined under the rather new term "NoSQL". Take the opportunity to talk about what you are working on and find like minded people to bounce ideas off. This is also why we chose the title OpenHUG for the meeting. While we mostly work with Hadoop and its subprojects we also like to learn about related projects and technologies.&lt;br /&gt;
&lt;br /&gt;
Last but not least there will be something to drink and we will get pizzas in. Since we do not know how many of you will come on such short notice we simply stay at Bob's place and continue or chats over food.&lt;br /&gt;
&lt;br /&gt;
As this is a first meeting in Munich on this topic we called it in a day after the Berlin meeting. Given there is interest we will in the future settle on dates that fit nicely between the Berlin dates so that we have no overlap and you can attend both meetings.&lt;br /&gt;
&lt;br /&gt;
Please RSVP at &lt;a href="http://upcoming.yahoo.com/event/4897497/BY/Mnchen/First-Munich-OpenHUG/eCircle-AG"&gt;Yahoo's Upcoming&lt;/a&gt; or &lt;a href="http://www.xing.com/events/munich-openhug-437166"&gt;Xing&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-3007165915101675372?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/YE2ODPjAr2DT9DuLX4bxbaNCJL4/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/YE2ODPjAr2DT9DuLX4bxbaNCJL4/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/YE2ODPjAr2DT9DuLX4bxbaNCJL4/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/YE2ODPjAr2DT9DuLX4bxbaNCJL4/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/D1qV5J1kd5U" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/3007165915101675372/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/12/first-munich-openhug-meeting.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/3007165915101675372?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/3007165915101675372?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/D1qV5J1kd5U/first-munich-openhug-meeting.html" title="First Munich OpenHUG Meeting" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/12/first-munich-openhug-meeting.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEcMSHw7fCp7ImA9WxNaFkU.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-6396866955477406261</id><published>2009-11-24T06:08:00.000-08:00</published><updated>2009-12-01T08:48:09.204-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-01T08:48:09.204-08:00</app:edited><title>HBase vs. BigTable Comparison</title><content type="html">HBase is an open-source implementation of the Google &lt;a href="http://labs.google.com/papers/bigtable.html"&gt;BigTable&lt;/a&gt; architecture. That part is fairly easy to understand and grasp. What I personally feel is a bit more difficult is to understand how much HBase covers and where there are differences (still) compared to the BigTable specification. This post is an attempt to compare the two systems.&lt;br /&gt;
&lt;br /&gt;
Before we embark onto the &lt;strike&gt;dark&lt;/strike&gt; technology side of things I would like to point out one thing upfront: HBase is very close to what the BigTable paper describes. Putting aside minor differences, as of &lt;a href="http://hadoop.apache.org/hbase/releases.html"&gt;HBase 0.20&lt;/a&gt;, which is using &lt;a href="http://hadoop.apache.org/zookeeper/"&gt;ZooKeeper&lt;/a&gt; as its &lt;strike&gt;lock&lt;/strike&gt; distributed coordination service, it has all the means to be nearly an exact implementation of BigTable's functionality. What I will be looking into below are mainly subtle variations or differences. Where possible I will try to point out how the HBase team is working on improving the situation given there is a need to do so. &lt;br /&gt;
&lt;br /&gt;
&lt;h4&gt;Scope&lt;/h4&gt;The comparison in this post is based on the OSDI'06 paper that describes the system Google implemented in about seven person-years and which is in operation since 2005. The paper was published 2006 while the HBase sub-project of &lt;a href="http://hadoop.apache.org/"&gt;Hadoop&lt;/a&gt; was established only around the end of that same year to early 2007. Back then the current version of Hadoop was 0.15.0. Given we are now about 2 years in, with Hadoop 0.20.1 and HBase 0.20.2 available, you can hopefully understand that indeed much has happened since. Please also note that I am comparing a 14 page high level technical paper with an open-source project that can be examined freely from top to bottom. It usually means that there is more to tell about how HBase does things because the information is available. &lt;br /&gt;
&lt;br /&gt;
Towards the end I will also address a few newer features that BigTable has nowadays and how HBase is comparing to those. We start though with naming conventions.&lt;br /&gt;
&lt;br /&gt;
&lt;h4&gt;Terminology&lt;/h4&gt;There are a few different terms used in either system describing the same thing. The most prominent being what HBase calls "regions" while Google refers to it as "tablet". These are the partitions of subsequent rows spread across many "region servers" - or "tablet server" respectively. Apart from that most differences are minor or caused by usage of related technologies since Google's code is obviously closed-source and therefore only mirrored by open-source projects. The open-source projects are free to use other terms and most importantly names for the projects themselves.&lt;br /&gt;
&lt;br /&gt;
&lt;h4&gt;Features&lt;/h4&gt;The following table lists various "features" of BigTable and compares them with what HBase has to offer. Some are actual implementation details, some are configurable option and so on. This may be confusing but it would be difficult to sort them into categories and not ending up with one entry only in each of them. &lt;br /&gt;
 &lt;br /&gt;
&lt;table border="1" bordercolor="#000000" cellpadding="8" cellspacing="0" style="page-break-before: always;" width="95%"&gt;&lt;tbody&gt;
&lt;tr valign="top"&gt;
&lt;td bgcolor="#000000" nowrap&gt;&lt;span style="color: white;"&gt;&lt;b&gt;Feature&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td bgcolor="#000000" nowrap&gt;&lt;div align="center"&gt;&lt;span style="color: white;"&gt;&lt;b&gt;&amp;nbsp;Google BigTable&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;
&lt;td bgcolor="#000000" nowrap&gt;&lt;div align="center"&gt;&lt;span style="color: white;"&gt;&lt;b&gt;&amp;nbsp;Apache HBase&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;
&lt;td bgcolor="#000000"&gt;&lt;span style="color: white;"&gt;&lt;b&gt;Notes&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Atomic Read/Write/Modify&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes, per row&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes, per row&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Since BigTable does not strive to be a relational database it does not have transactions. The closest to such a mechanism is the atomic access to each row in the table. HBase also implements a row lock API which allows the user to lock more than one row at a time.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Lexicographic Row Order&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;All rows are sorted lexicographically in one order and that one order only. Again, this is no SQL database where you can have different sorting orders.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Block Support&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Within each storage file data is written as smaller blocks of data. This enables faster loading of data from large storage files. The size is configurable in either system. The typical size is 64K.&lt;/td&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Block Compression&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes, per column family&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes, per column family&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Google uses BMDiff and Zippy in a two step process. BMDiff works really well because neighboring key-value pairs in the store files are often very similar. This can be achieved by using versioning so that all modifications to a value are stored next to each other but still have a lot in common. Or by designing the row keys in such a way that for example web pages from the same site are all bundled. Zippy then is a modified LZW algorithm. HBase on the other hand uses the standard Java supplied GZip or with a little &lt;a href="http://wiki.apache.org/hadoop/UsingLzoCompression"&gt;effort&lt;/a&gt; the GPL licensed LZO format. There are indications though that Hadoop also may want to have BMDiff (&lt;a href="http://issues.apache.org/jira/browse/HADOOP-5793"&gt;HADOOP-5793&lt;/a&gt;) and possibly Zippy as well.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Number of Column Families&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Hundreds at Most&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Less than 100&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;While the number of rows and columns is theoretically unbound the number of column families is not. This is a design trade-off but does not impose too much restrictions if the tables and key are designed accordingly.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Column Family Name Format&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Printable&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Printable&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;The main reason for HBase here is that column family names are used as directories in the file system.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Qualifier Format&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Arbitrary&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Arbitrary&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Any arbitrary byte[] array can be used.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Key/Value Format&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Arbitrary&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Arbitrary&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Like above, any arbitrary byte[] array can be used.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Access Control&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ffff00"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;BigTable enforces access control on a column family level. HBase does not have yet have that feature (see &lt;a href="http://issues.apache.org/jira/browse/hbase-1697"&gt;HBASE-1697&lt;/a&gt;).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Cell Versions&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Versioning is done using timestamps. See next feature below too. The number of versions that should be kept are freely configurable on a column family level.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Custom Timestamps&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes (micro)&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes (milli)&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;With both systems you can either set the timestamp of a value that is stored yourself or leave the default "now". There are "known" restrictions in HBase that the outcome is indeterminate when adding older timestamps after already having stored newer ones beforehand.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Data Time-To-Live&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Besides having versions of data cells the user can also set a time-to-live on the stored data that allows to discard data after a specific amount of time.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Batch Writes&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Both systems allow to batch table operations.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Value based Counters&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;BigTable and HBase can use a specific column as atomic counters. HBase does this by acquiring a row lock before the value is incremented.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Row Filters&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Again both system allow to apply filters when scanning rows.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Client Script Execution&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ffff00"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;BigTable uses &lt;a href="http://research.google.com/archive/sawzall.html"&gt;Sawzall&lt;/a&gt; to enable users to process the stored data.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;MapReduce Support&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Both systems have convenience classes that allow scanning a table in MapReduce jobs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Storage Systems&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;GFS&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;HDFS, S3, S3N, EBS&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;While BigTable works on Google's GFS, HBase has the option to use any file system as long as there is a proxy or driver class for it.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;File Format&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;SSTable&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;HFile&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Block Index&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;At end of file&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;At end of file&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Both storage file formats have a similar block oriented structure with the block index stored at the end of the file.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Memory Mapping&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ffff00"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;BigTable can memory map storage files directly into memory.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Lock Service&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Chubby&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;ZooKeeper&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;There is a difference in where ZooKeeper is used to coordinate tasks in HBase as opposed to provide locking services. Overall though ZooKeeper does for HBase pretty much what Chubby does for BigTable with slightly different semantics.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Single Master&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ff0000"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;HBase recently added support for multiple masters. These are on "hot" standby and monitor the master's ZooKeeper node.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Tablet/Region Count&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;10-1000&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;10-1000&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Both systems recommend about the same amount of regions per region server. Of course this depends on many things but given a similar setup as far as "commodity" machines are concerned it seems to result in the same amount of load on each server.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Tablet/Region Size&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;100-200MB&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;256MB&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;The maximum region size can be configured for HBase and BigTable. HBase used 256MB as the default value.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Root Location&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;1st META / Chubby&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;-ROOT- / ZooKeeper&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;HBase handles the Root table slightly different from BigTable, where it is the first region in the Meta table. HBase uses its own table with a single region to store the Root table. Once either system starts the address of the server hosting the Root region is stored in ZooKeeper or Chubby so that the clients can resolve its location without hitting the master. &lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Client Region Cache&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;The clients in either system caches the location of regions and has appropriate mechanisms to detect stale information and update the local cache respectively&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Meta Prefetch&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ffff00"&gt;&lt;div align="center"&gt;No (?)&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;A design feature of BigTable is to fetch more than one Meta region information. This proactively fills the client cache for future lookups.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Historian&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;The history of region related events (such as splits, assignment, reassignment) is recorded in the Meta table.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Locality Groups&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ff0000"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;It is not entirely clear but it seems everything in BigTable is defined by Locality Groups. The group multiple column families into one so that they get stored together and also share the same configuration parameters. A single column family is probably a Locality Group with one member. HBase does not have this option and handles each column family separately.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;In-Memory Column Families&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;These are for relatively small tables that need very fast access times.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;KeyValue (Cell) Cache&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ffff00"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;This is a cache that servers hot cells.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Block Cache&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Blocks read from the storage files are cached internally in configurable caches.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Bloom Filters&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;These filters allow - at a cost of using memory on the region server - to quickly check if a specific cell exists or maybe not.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Write-Ahead Log (WAL)&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Each region server in either system stores one modification log for all regions it hosts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Secondary Log&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ff0000"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;In addition to the Write-Ahead log mentioned above BigTable has a second log that it can use when the first is going slow. This is a performance optimization.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Skip Write-Ahead Log&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;?&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;For bulk imports the client in HBase can opt to skip writing into the WAL.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Fast Table/Region Split&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #00ff00"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;Splitting a region or tablet is fast as the daughter regions first read the original storage file until a compaction finally rewrites the data into the region's local store.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;
&lt;h4&gt;New Features&lt;/h4&gt;As mentioned above, a few years have passed since the original OSDI'06 BigTable paper. Jeff Dean - a fellow at Google - has mentioned a few new BigTable &lt;a href="http://www.scribd.com/doc/21244790/Google-Designs-Lessons-and-Advice-from-Building-Large-Distributed-Systems"&gt;features&lt;/a&gt; during speeches and presentations he gave recently. We will have a look at some of them here.&lt;br /&gt;
&lt;br /&gt;
&lt;table border="1" bordercolor="#000000" cellpadding="8" cellspacing="0" style="page-break-before: always;" width="95%"&gt;&lt;tbody&gt;
&lt;tr valign="top"&gt;
&lt;td bgcolor="#000000" nowrap&gt;&lt;span style="color: white;"&gt;&lt;b&gt;Feature&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td bgcolor="#000000" nowrap&gt;&lt;div align="center"&gt;&lt;span style="color: white;"&gt;&lt;b&gt;&amp;nbsp;Google BigTable&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;
&lt;td bgcolor="#000000" nowrap&gt;&lt;div align="center"&gt;&lt;span style="color: white;"&gt;&lt;b&gt;&amp;nbsp;Apache HBase&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;
&lt;td bgcolor="#000000"&gt;&lt;span style="color: white;"&gt;&lt;b&gt;Notes&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Client Isolation&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ffff00"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;BigTable is internally used to server many separate clients and can therefore keep the data between isolated.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Coprocessors&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ffff00"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;BigTable can host code that resides with the regions and splits with them as well. See &lt;a href="http://issues.apache.org/jira/browse/HBASE-2000"&gt;HBASE-2000&lt;/a&gt; for progress on this feature within HBase.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Corruption Safety&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;This is an interesting topic. BigTable uses CRC checksums to verify if data has been written safely. While HBase does not have this, the question is if that is build into Hadoop's HDFS?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr valign="top"&gt;       
&lt;td style="border: solid black 1.0pt;"&gt;Replication&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;&lt;div align="center"&gt;Yes&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt; background-color: #ffff00"&gt;&lt;div align="center"&gt;No&lt;/div&gt;&lt;/td&gt;
&lt;td style="border: solid black 1.0pt;"&gt;HBase is working on the same topic in &lt;a href="http://issues.apache.org/jira/browse/HBASE-1295"&gt;HBASE-1295&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;
Note: the color codes indicate what features have a direct match or where it is missing (yet). Weaker features are colored yellow, as I am not sure if they are immediately necessary or even applicable given HBase's implementation.&lt;br /&gt;
&lt;br /&gt;
&lt;h4&gt;Variations and Differences&lt;/h4&gt;Some of the above features need a bit more looking into as they are difficult to be narrowed down to simple "Yay or Nay" questions. I am addressing them below separately.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Lock Service&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
This is from the BigTable paper:&lt;br /&gt;
&lt;blockquote style="margin:1em 20px; border: 1px solid #000000; padding: 5px;"&gt;Bigtable uses Chubby for a variety of tasks: to ensure that there is at most one active master at any time; to store the bootstrap location of Bigtable data (see Section 5.1); to discover tablet servers and finalize tablet server deaths (see Section 5.2); to store Bigtable schema information (the column family information for each table); and to store access control lists. If Chubby becomes unavailable for an extended period of time, Bigtable becomes unavailable.&lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
There is a lot of overlap compared to how HBase does use ZooKeeper. What is different though is that schema information is not stored in ZooKeeper (yet, see http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases) for details. What is important here though is the same reliance on the lock service being available. From my own experience and reading the threads on the HBase mailing list it is often underestimated what can happen when ZooKeeper does not get the resources it needs to react timely. It is better to have a small ZooKeeper cluster on older machines not doing anything else as opposed to having ZooKeeper nodes running next to the already heavy Hadoop or HBase processes. Once you starve ZooKeeper you will see a domino effect of HBase nodes going down with it - including the master(s).&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Update:&lt;/b&gt; After talking to a few guys of the ZooKeeper team I would like to point out that this is indeed &lt;u&gt;not&lt;/u&gt; a ZooKeeper issue. It has to do with the fact that if you have an already heavily loaded node trying to also respond in time to ZooKeeper resources then you may face a timeout situation where the HBase RegionServers and even the Master may think that their coordination service is gone and shut themselves down. Patrick Hunt has responded to this by &lt;a href="http://www.mail-archive.com/hbase-user@hadoop.apache.org/msg07290.html"&gt;mail&lt;/a&gt; and by &lt;a href="http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview"&gt;post&lt;/a&gt;. Please read both to see that ZooKeeper is able to handle the load. I personally recommend to set up ZooKeeper in combination with HBase on a separate cluster, maybe a set of spare machines you have from a recent update to the cluster and which are slightly outdated (no 2xquad core CPU with 16GB of memory) but are otherwise perfectly fine. This also allows you to monitor the machines separately and not having to see a combined CPU load of 100% on the servers and not really knowing where it comes from and what effect it may have. &lt;br /&gt;
&lt;br /&gt;
Another important difference is that ZooKeeper is no lock service like Chubby - and I do think it does not have to be as far as HBase is concerned. ZooKeeper is a distributed coordination service enabling HBase to do Master node elections etc. It also allows using semaphores to indicate state or actions required. So where Chubby creates a lock file to indicate a tablet server is up and running HBase in turn uses ephemeral nodes that exist as long as the session between the RegionServer which creates that node and ZooKeeper is active. This also causes the differences in semantics where in BigTable can delete a tablet servers lock file to indicate that it has lost its lease on tablets. In HBase this has to be handled differently because of the slightly less restrictive architecture of ZooKeeper. These are only semantics as mentioned and do not mean one is better than the other - just different.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style="margin:1em 20px; border: 1px solid #000000; padding: 5px;"&gt;The first level is a file stored in Chubby that contains the location of the root tablet. The root tablet contains the location of all tablets in a special METADATA table. Each METADATA tablet contains the location of a set of user tablets. The root tablet is just the first tablet in the METADATA table, but is treated specially - it is never split - to ensure that the tablet location hierarchy has no more than three levels.&lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
As mentioned above in HBase the root region is its own table with a single region. If that makes a difference to having it as the first (non-splittable) region of the meta table I doubt strongly. It is just the same feature but implemented differently.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style="margin:1em 20px; border: 1px solid #000000; padding: 5px;"&gt;The METADATA table stores the location of a tablet under a row key that is an encoding of the tablet's table identifier and its end row. &lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
HBase does have a different layout here. It stores the start and end row with each region where the end row is exclusive and denotes the first (or start) row of the next region. Again, these are minor differences and I am not sure if there is a better or worse solution. It is just done differently.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Master Operation&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style="margin:1em 20px; border: 1px solid #000000; padding: 5px;"&gt;To detect when a tablet server is no longer serving its tablets, the master periodically asks each tablet server for the status of its lock. If a tablet server reports that it has lost its lock, or if the master was unable to reach a server during its last several attempts, the master attempts to acquire an exclusive lock on the server's file. If the master is able to acquire the lock, then Chubby is live and the tablet server is either dead or having trouble reaching Chubby, so the master ensures that the tablet server can never serve again by deleting its server file. Once a server's file has been deleted, the master can move all the tablets that were previously assigned to that server into the set of unassigned tablets. To ensure that a Bigtable cluster is not vulnerable to networking issues between the master and Chubby, the master kills itself if its Chubby session expires.&lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
This is quite different even up to the current HBase 0.20.2. Here the master uses a heartbeat protocol that is used by the region servers to report for duty and that they are still alive subsequently. I am not sure if this topic is covered by the master rewrite umbrella issue &lt;a href="http://issues.apache.org/jira/browse/HBASE-1816"&gt;HBASE-1816&lt;/a&gt; - and if it needs to be addressed at all. It could well be that what we have in HBase now is sufficient and does its job just fine. It was created when there was no lock service yet and therefore could be considered legacy code too.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Master Startup&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style="margin:1em 20px; border: 1px solid #000000; padding: 5px;"&gt;The master executes the following steps at startup. (1) The master grabs a unique master lock in Chubby, which prevents concurrent master instantiations. (2) The master scans the servers directory in Chubby to find the live servers. (3) The master communicates with every live tablet server to discover what tablets are already assigned to each server. (4) The master scans the METADATA table to learn the set of tablets. Whenever this scan encounters a tablet that is not already assigned, the master adds the tablet to the set of unassigned tablets, which makes the tablet eligible for tablet assignment.&lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
Along what I mentioned above, this part of the code was created before ZooKeeper was available. So HBase actually waits for the region servers to report for duty. It also scans the .META. table to learn what is there and which server is assigned to it. ZooKeeper is (yet) only used to publish the server hosting the -ROOT- region. &lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Tablet/Region Splits&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style="margin:1em 20px; border: 1px solid #000000; padding: 5px;"&gt;In case the split notification is lost (either because the tablet server or the master died), the master detects the new tablet when it asks a tablet server to load the tablet that has now split. The tablet server will notify the master of the split, because the tablet entry it finds in the METADATA table will specify only a portion of the tablet that the master asked it to load.&lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
The master node in HBase uses the .META. solely to detect when a region was split but the message was lost. For that reason it scans the .META. on a regular basis to see when a region appears that is not yet assigned. It will then assign that region as per its default strategy.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Compactions&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
The following are more terminology differences than anything else.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style="margin:1em 20px; border: 1px solid #000000; padding: 5px;"&gt;As write operations execute, the size of the memtable increases. When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS. This minor compaction process has two goals: it shrinks the memory usage of the tablet server, and it reduces the amount of data that has to be read from the commit log during recovery if this server dies. Incoming read and write operations can continue while compactions occur.&lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
HBase has a similar operation but it is referred to as a "flush". Opposed to that "minor compactions" in HBase rewrite the last N used store files, i.e. those with the most recent mutations as they are probably much smaller than previously created files that have more data in them.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style="margin:1em 20px; border: 1px solid #000000; padding: 5px;"&gt;... we bound the number of such files by periodically executing a merging compaction in the background. A merging compaction reads the contents of a few SSTables and the memtable, and writes out a new SSTable. The input SSTables and memtable can be discarded as soon as the compaction has finished. &lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
This again refers to what is called "minor compaction" in HBase.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style="margin:1em 20px; border: 1px solid #000000; padding: 5px;"&gt;A merging compaction that rewrites all SSTables into exactly one SSTable is called a major compaction.&lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
Here we have an exact match though, a "major compaction" in HBase also rewrites all files into one.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Immutable Files&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Knowing that files are fixed once written BigTable makes the following assumption:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote style="margin:1em 20px; border: 1px solid #000000; padding: 5px;"&gt;The only mutable data structure that is accessed by both reads and writes is the memtable. To reduce contention during reads of the memtable, we make each memtable row copy-on-write and allow reads and writes to proceed in parallel.&lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
I do believe this is done similar in HBase but am not sure. It certainly has the same architecture as HDFS files for example are also immutable once written.&lt;br /&gt;
&lt;br /&gt;
I can only recommend that you read the BigTable too and make up your own mind. This post was inspired by the idea to learn what BigTable really has to offer and how much HBase has already covered. The difficult part is of course that there is not too much information available on BigTable. But the numbers even the 2006 paper lists are more than impressive. If HBase as on open-source project with just a handful of committers of whom most have a full-time day jobs can achieve something even remotely comparable I think this is a huge success. And looking at the 0.21 and 0.22 road map, the already small gap is going to shrink even further!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-6396866955477406261?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/M-DUu3kQTGL2RVsV4xfYirYuacQ/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/M-DUu3kQTGL2RVsV4xfYirYuacQ/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/M-DUu3kQTGL2RVsV4xfYirYuacQ/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/M-DUu3kQTGL2RVsV4xfYirYuacQ/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/vW2Cz93Zfik" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/6396866955477406261/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/11/hbase-vs-bigtable-comparison.html#comment-form" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/6396866955477406261?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/6396866955477406261?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/vW2Cz93Zfik/hbase-vs-bigtable-comparison.html" title="HBase vs. BigTable Comparison" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>10</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/11/hbase-vs-bigtable-comparison.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUYGRng8fCp7ImA9WxNbF08.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-5411860328427146528</id><published>2009-11-20T06:05:00.000-08:00</published><updated>2009-11-20T06:25:27.674-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-20T06:25:27.674-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="linux" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>HBase on Cloudera Training Virtual Machine (0.3.2)</title><content type="html">Note: This is a follow up to my earlier &lt;a href="http://www.larsgeorge.com/2009/10/hbase-on-cloudera-training-virtual.html"&gt;post&lt;/a&gt;. Since then Cloudera released a new VM that includes the current 0.20 branch of Hadoop. Below I have the same post adjusted to work with that new release. Please note that there are subtle changes, for example the NameNode port has changed. So if you in any way still have the older post please make sure you forget about it and follow this one here instead for the new VM version.&lt;br /&gt;
&lt;br /&gt;
You might want to run HBase on Cloudera's &lt;a href="http://www.cloudera.com/hadoop-training-virtual-machine"&gt;Virtual Machine&lt;/a&gt; to get a quick start to a prototyping setup. In theory you download the VM, start it and you are ready to go. The main issue though is that the current Hadoop Training VM does not include HBase at all (yet?). Apart from that the install of a local HBase instance is a straight forward process. &lt;br /&gt;
&lt;br /&gt;
Here are the steps to get HBase running on Cloudera's VM:&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;Download VM &lt;br /&gt;
&lt;br /&gt;
Get it from Cloudera's &lt;a href="http://www.cloudera.com/hadoop-training-virtual-machine"&gt;website&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Start VM&lt;br /&gt;
&lt;br /&gt;
As the above page states: "To launch the VMWare image, you will either need VMware Player for windows and linux, or VMware Fusion for Mac."&lt;br /&gt;
&lt;br /&gt;
Note: I have Parallels for Mac and wanted to use that. I used Parallels Transporter to convert the "cloudera-training-0.3.2.vmx" to a new "cloudera-training-0.2-cl4-000001.hdd", create a new VM in Parallels selecting Ubuntu Linux as the OS and the newly created .hdd as the disk image. Boot up the VM and you are up and running. I gave it a bit more memory for the graphics to be able to switch the VM to 1440x900 which is the native screen resolution on my MacBook Pro I am using.&lt;br /&gt;
&lt;br /&gt;
Finally follow the steps explained on the page above, i.e. open a Terminal and issue:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ cd ~/git
$ ./update-exercises --workspace
&lt;/pre&gt;&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Pull HBase branch&lt;br /&gt;
&lt;br /&gt;
We are using the brand new HBase 0.20.2 release. Open a new Terminal (or issue a &lt;code&gt;$ cd ..&lt;/code&gt; in the open one), then:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ sudo -u hadoop git clone http://git.apache.org/hbase.git /home/hadoop/hbase
$ sudo -u hadoop sh -c "cd /home/hadoop/hbase ; git checkout origin/tags/0.20.2"
Note: moving to "origin/tags/0.20.2" which isn't a local branch
If you want to create a new branch from this checkout, you may do so
(now or later) by using -b with the checkout command again. Example:
  git checkout -b &amp;lt;new_branch_name&amp;gt;
HEAD is now at 777fb63... HBase release 0.20.2
&lt;/pre&gt;&lt;br /&gt;
First we clone the repository, then switch to the actual branch. You will notice that I am using &lt;code&gt;sudo -u hadoop&lt;/code&gt; because Hadoop itself is started under that account and so I wanted it to match. Also, the default "training" account does not have SSH set up as explained in Hadoop's &lt;a href="http://hadoop.apache.org/common/docs/current/quickstart.html"&gt;quick-start&lt;/a&gt; guide. When &lt;code&gt;sudo&lt;/code&gt; is asking for a password use the default, which is set to "training". &lt;br /&gt;
&lt;br /&gt;
You can ignore the messages git prints out while performing the checkout.&lt;br /&gt;
&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Build Branch&lt;br /&gt;
&lt;br /&gt;
Continue in Terminal:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ sudo -u hadoop sh -c "cd /home/hadoop/hbase/ ; export PATH=$PATH:/usr/share/apache-ant-1.7.1/bin ; ant package"
...
BUILD SUCCESSFUL
&lt;/pre&gt;&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Configure HBase&lt;br /&gt;
&lt;br /&gt;
There are a few edits to be made to get HBase running. &lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ sudo -u hadoop vim /home/hadoop/hbase/build/conf/hbase-site.xml

&amp;lt;configuration&amp;gt;

&amp;nbsp;&amp;nbsp;&amp;lt;property&amp;gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;name&amp;gt;hbase.rootdir&amp;lt;/name&amp;gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;value&amp;gt;hdfs://localhost:8022/hbase&amp;lt;/value&amp;gt;
&amp;nbsp;&amp;nbsp;&amp;lt;/property&amp;gt;

&amp;lt;/configuration&amp;gt;

$ sudo -u hadoop vim /home/hadoop/hbase/build/conf/hbase-env.sh 

# The java implementation to use.  Java 1.6 required.
# export JAVA_HOME=/usr/java/jdk1.6.0/
export JAVA_HOME=/usr/lib/jvm/java-6-sun
...
&lt;/pre&gt;&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Rev up the Engine!&lt;br /&gt;
&lt;br /&gt;
The final thing is to start HBase:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ sudo -u hadoop /home/hadoop/hbase/build/bin/start-hbase.sh

$ sudo -u hadoop /home/hadoop/hbase/build/bin/hbase shell
HBase Shell; enter 'help&amp;lt;RETURN&amp;gt;' for list of supported commands.
Version: 0.20.2, r777fb63ff0c73369abc4d799388a45b8bda9e5fd, Thu Nov 19 15:32:17 PST 2009
hbase(main):001:0&amp;gt;
&lt;/pre&gt;&lt;br /&gt;
Done!&lt;br /&gt;
&lt;br /&gt;
Let's create a table and check if it was created OK.&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;hbase(main):001:0&amp;gt; list
0 row(s) in 0.0910 seconds

hbase(main):002:0&amp;gt; create 't1', 'f1', 'f2', 'f3'
0 row(s) in 6.1260 seconds

hbase(main):003:0&amp;gt; list                         
t1                                                                                                            
1 row(s) in 0.0470 seconds

hbase(main):004:0&amp;gt; describe 't1'                
DESCRIPTION                                                             ENABLED                               
 {NAME =&amp;gt; 't1', FAMILIES =&amp;gt; [{NAME =&amp;gt; 'f1', COMPRESSION =&amp;gt; 'NONE', VERS true                                  
 IONS =&amp;gt; '3', TTL =&amp;gt; '2147483647', BLOCKSIZE =&amp;gt; '65536', IN_MEMORY =&amp;gt; '                                       
 false', BLOCKCACHE =&amp;gt; 'true'}, {NAME =&amp;gt; 'f2', COMPRESSION =&amp;gt; 'NONE', V                                       
 ERSIONS =&amp;gt; '3', TTL =&amp;gt; '2147483647', BLOCKSIZE =&amp;gt; '65536', IN_MEMORY =                                       
 &amp;gt; 'false', BLOCKCACHE =&amp;gt; 'true'}, {NAME =&amp;gt; 'f3', COMPRESSION =&amp;gt; 'NONE'                                       
 , VERSIONS =&amp;gt; '3', TTL =&amp;gt; '2147483647', BLOCKSIZE =&amp;gt; '65536', IN_MEMOR                                       
 Y =&amp;gt; 'false', BLOCKCACHE =&amp;gt; 'true'}]}                                                                        
1 row(s) in 0.0750 seconds
hbase(main):005:0&amp;gt; 
&lt;/pre&gt;&lt;/li&gt;
&lt;/ol&gt;This sums it up. I hope you give HBase on the Cloudera Training VM a whirl as it also has Eclipse installed and therefore provides a quick start into Hadoop and HBase. &lt;br /&gt;
&lt;br /&gt;
Just keep in mind that this is for prototyping only! With such a setup you will only be able to insert a handful of rows. If you overdo it you will bring it to its knees very quickly. But you can safely use it to play around with the shell to create tables or use the API to get used to it and test changes in your code etc.&lt;br /&gt;
&lt;br /&gt;
Finally a screenshot of the running HBase UI:&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;a href="http://2.bp.blogspot.com/_Cib_A77V54U/SwaVeKlhZyI/AAAAAAAAAEw/0G60iZ0axIk/s1600/hbase-cloudera.png" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_Cib_A77V54U/SwaVeKlhZyI/AAAAAAAAAEw/0G60iZ0axIk/s320/hbase-cloudera.png" /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-5411860328427146528?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/AqKLIUzsdesMXKoSNVcYBaygg4k/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/AqKLIUzsdesMXKoSNVcYBaygg4k/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/AqKLIUzsdesMXKoSNVcYBaygg4k/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/AqKLIUzsdesMXKoSNVcYBaygg4k/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/3Wq33fSikOc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/5411860328427146528/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/11/hbase-on-cloudera-training-virtual.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/5411860328427146528?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/5411860328427146528?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/3Wq33fSikOc/hbase-on-cloudera-training-virtual.html" title="HBase on Cloudera Training Virtual Machine (0.3.2)" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_Cib_A77V54U/SwaVeKlhZyI/AAAAAAAAAEw/0G60iZ0axIk/s72-c/hbase-cloudera.png" height="72" width="72" /><thr:total>2</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/11/hbase-on-cloudera-training-virtual.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C08ARncyeSp7ImA9WxNbF08.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-3856481156377183352</id><published>2009-10-20T10:22:00.000-07:00</published><updated>2009-11-20T04:57:27.991-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-20T04:57:27.991-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="linux" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>HBase on Cloudera Training Virtual Machine (0.3.1)</title><content type="html">You might want to run HBase on Cloudera's &lt;a href="http://www.cloudera.com/hadoop-training-virtual-machine"&gt;Virtual Machine&lt;/a&gt; to get a quick start to a prototyping setup. In theory you download the VM, start it and you are ready to go. There are a few issues though, the worst being that the current Hadoop Training VM does not include HBase at all. Also, Cloudera is using a specific version of Hadoop that it deems stable and maintains it own release cycle. So Cloudera's version of Hadoop is 0.18.3. HBase though needs Hadoop 0.20 - but we are in luck as Andrew Purtell of TrendMicro maintains a &lt;a href="http://svn.apache.org/repos/asf/hadoop/hbase/branches/0.20_on_hadoop-0.18.3/"&gt;special branch&lt;/a&gt; of HBase 0.20 that works with Cloudera's release. &lt;br /&gt;
&lt;br /&gt;
Here are the steps to get HBase running on Cloudera's VM:&lt;br /&gt;
&lt;ol&gt;&lt;li&gt;Download VM &lt;br /&gt;
&lt;br /&gt;
Get it from Cloudera's &lt;a href="http://www.cloudera.com/hadoop-training-virtual-machine"&gt;website&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Start VM&lt;br /&gt;
&lt;br /&gt;
As the above page states: "To launch the VMWare image, you will either need VMware Player for windows and linux, or VMware Fusion for Mac."&lt;br /&gt;
&lt;br /&gt;
Note: I have Parallels for Mac and wanted to use that. I used Parallels Transporter to convert the "cloudera-training-0.3.1.vmx" to a new "cloudera-training-0.2-cl3-000002.hdd", create a new VM in Parallels selecting Ubuntu Linux as the OS and the newly created .hdd as the disk image. Boot up the VM and you are up and running. I gave it a bit more memory for the graphics to be able to switch the VM to 1440x900 which is native to my MacBook Pro I am using.&lt;br /&gt;
&lt;br /&gt;
Finally follow the steps explained on the page above, i.e. open a Terminal and issue:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ cd ~/git
$ ./update-exercises --workspace
&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;Pull HBase branch&lt;br /&gt;
&lt;br /&gt;
Open a new Terminal (or issue a &lt;code&gt;$ cd ..&lt;/code&gt; in the open one), then:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ sudo -u hadoop git clone http://git.apache.org/hbase.git /home/hadoop/hbase
$ sudo -u hadoop sh -c "cd /home/hadoop/hbase ; git checkout origin/0.20_on_hadoop-0.18.3"
...
HEAD is now at c050f68... pull up to release
&lt;/pre&gt;&lt;br /&gt;
First we clone the repository, then switch to the actual branch. You will notice that I am using &lt;code&gt;sudo -u hadoop&lt;/code&gt; because Hadoop itself is started under that account and so I wanted it to match. Also, the default "training" account does not have SSH set up as explained in Hadoop's &lt;a href="http://hadoop.apache.org/common/docs/current/quickstart.html"&gt;quick-start&lt;/a&gt; guide. When &lt;code&gt;sudo&lt;/code&gt; is asking for a password use the default set to "training". &lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Build Branch&lt;br /&gt;
&lt;br /&gt;
Continue in Terminal:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ sudo -u hadoop sh -c "cd /home/hadoop/hbase/ ; export PATH=$PATH:/usr/share/apache-ant-1.7.1/bin ; ant package"
...
BUILD SUCCESSFUL
&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;Configure HBase&lt;br /&gt;
&lt;br /&gt;
There are a few edits to be made to get HBase running. &lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ sudo -u hadoop vim /home/hadoop/hbase/build/conf/hbase-site.xml

&amp;lt;configuration&amp;gt;

&amp;nbsp;&amp;nbsp;&amp;lt;property&amp;gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;name&amp;gt;hbase.rootdir&amp;lt;/name&amp;gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;value&amp;gt;hdfs://localhost:8020/hbase&amp;lt;/value&amp;gt;
&amp;nbsp;&amp;nbsp;&amp;lt;/property&amp;gt;

&amp;lt;/configuration&amp;gt;

$ sudo -u hadoop vim /home/hadoop/hbase/build/conf/hbase-env.sh 

# The java implementation to use.  Java 1.6 required.
# export JAVA_HOME=/usr/java/jdk1.6.0/
export JAVA_HOME=/usr/lib/jvm/java-6-sun
...
&lt;/pre&gt;&lt;br /&gt;
Note: There is a small glitch in the revision 826669 of that Cloudera specific HBase branch. The master UI (on port 60010 on localhost) will not start because a path is different and Jetty packages are missing because of it. You can fix it by editing the start up script and changing the path scanned: &lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ sudo -u hadoop vim /home/hadoop/hbase/build/bin/hbase
&lt;/pre&gt;&lt;br /&gt;
Replace&lt;br /&gt;
&lt;code&gt;for f in $HBASE_HOME/lib/jsp-2.1/*.jar; do&lt;/code&gt;&lt;br /&gt;
with&lt;br /&gt;
&lt;code&gt;for f in $HBASE_HOME/lib/jetty-ext/*.jar; do&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
This is only until the developers have fixed this in the branch (compare the revision I used r813052 with what you get). Or if you do not want the UI you can ignore this and the error in the logs too. HBase will still run, just not its web based interface. &lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Rev up the Engine!&lt;br /&gt;
&lt;br /&gt;
The final thing is to start HBase:&lt;br /&gt;
&lt;pre class="brush:plain; gutter: false;"&gt;$ sudo -u hadoop /home/hadoop/hbase/build/bin/start-hbase.sh
$ sudo -u hadoop /home/hadoop/hbase/build/bin/hbase shell

HBase Shell; enter 'help&amp;lt;RETURN&amp;gt;' for list of supported commands.
Version: 0.20.0-0.18.3, r813052, Mon Oct 19 06:51:57 PDT 2009
hbase(main):001:0&amp;gt; list
0 row(s) in 0.2320 seconds
hbase(main):002:0&amp;gt;
&lt;/pre&gt;&lt;br /&gt;
Done!&lt;br /&gt;
&lt;/li&gt;
&lt;/ol&gt;&lt;br /&gt;
This sums it up. I hope you give HBase on the Cloudera Training VM a whirl as it also has Eclipse installed and therefore provides a quick start into Hadoop and HBase. &lt;br /&gt;
&lt;br /&gt;
Just keep in mind that this is for prototyping only! With such a setup you will only be able to insert a handful of rows. If you overdo it you will bring it to its knees very quickly. But you can safely use it to play around with the shell to create tables or use the API to get used to it and test changes in your code etc.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Update:&lt;/b&gt; Updated title to include version number, fixed XML&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-3856481156377183352?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/CbLyZIFRdohBXzZYb6Nsr5-vWno/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/CbLyZIFRdohBXzZYb6Nsr5-vWno/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/CbLyZIFRdohBXzZYb6Nsr5-vWno/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/CbLyZIFRdohBXzZYb6Nsr5-vWno/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/Ch9-Mh5uBU4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/3856481156377183352/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/10/hbase-on-cloudera-training-virtual.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/3856481156377183352?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/3856481156377183352?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/Ch9-Mh5uBU4/hbase-on-cloudera-training-virtual.html" title="HBase on Cloudera Training Virtual Machine (0.3.1)" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>2</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/10/hbase-on-cloudera-training-virtual.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0YDSXk4eip7ImA9WxNWGUs.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-8073228651920917483</id><published>2009-10-12T14:35:00.000-07:00</published><updated>2009-10-19T08:12:58.732-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-19T08:12:58.732-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>HBase Architecture 101 - Storage</title><content type="html">One of the more hidden aspects of &lt;a href="http://hadoop.apache.org/hbase/"&gt;HBase&lt;/a&gt; is how data is actually stored. While the majority of users may never have to bother about it you may have to get up to speed when you want to learn what the various advanced configuration options you have at your disposal mean. "How can I tune HBase to my needs?", and other similar questions are certainly interesting once you get over the (at times steep) learning curve of setting up a basic system. Another reason wanting to know more is if for whatever reason disaster strikes and you have to recover a HBase installation. &lt;br /&gt;&lt;br /&gt;In my own efforts getting to know the respective classes that handle the various files I started to sketch a picture in my head illustrating the storage architecture of HBase. But while the ingenious and blessed committers of HBase easily navigate back and forth through that maze I find it much more difficult to keep a coherent image. So I decided to put that sketch to paper. Here it is.  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Cib_A77V54U/StorLZRjHSI/AAAAAAAAAEI/4IznGhslNxw/s1600-h/hbase-files.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 202px;" src="http://1.bp.blogspot.com/_Cib_A77V54U/StorLZRjHSI/AAAAAAAAAEI/4IznGhslNxw/s400/hbase-files.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5393670978492636450" /&gt;&lt;/a&gt; Please note that this is not a UML or call graph but a merged picture of classes and the files they handle and by no means complete though focuses on the topic of this post. I will discuss the details below and also look at the configuration options and how they affect the low-level storage files.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;The Big Picture&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;So what does my sketch of the HBase innards really say? You can see that HBase handles basically two kinds of file types. One is used for the write-ahead log and the other for the actual data storage. The files are primarily handled by the &lt;code&gt;HRegionServer&lt;/code&gt;'s. But in certain scenarios even the &lt;code&gt;HMaster&lt;/code&gt; will have to perform low-level file operations. You may also notice that the actual files are in fact divided up into smaller blocks when stored within the Hadoop Distributed Filesystem (HDFS). This is also one of the areas where you can configure the system to handle larger or smaller data better. More on that later.&lt;br /&gt;&lt;br /&gt;The general flow is that a new client contacts the Zookeeper quorum (a separate cluster of Zookeeper nodes) first to find a particular row key. It does so by retrieving the server name (i.e. host name) that hosts the -ROOT- region from Zookeeper. With that information it can query that server to get the server that hosts the .META. table. Both of these two details are cached and only looked up once. Lastly it can query the .META. server and retrieve the server that has the row the client is looking for.&lt;br /&gt;&lt;br /&gt;Once it has been told where the row resides, i.e. in what region, it caches this information as well and contacts the &lt;code&gt;HRegionServer&lt;/code&gt; hosting that region directly. So over time the client has a pretty complete picture of where to get rows from without needing to query the .META. server again. &lt;br /&gt;&lt;br /&gt;Note: The &lt;code&gt;HMaster&lt;/code&gt; is responsible to assign the regions to each &lt;code&gt;HRegionServer&lt;/code&gt; when you start HBase. This also includes the "special" -ROOT- and .META. tables.&lt;br /&gt;&lt;br /&gt;Next the &lt;code&gt;HRegionServer&lt;/code&gt; opens the region it creates a corresponding &lt;code&gt;HRegion&lt;/code&gt; object. When the &lt;code&gt;HRegion&lt;/code&gt; is "opened" it sets up a &lt;code&gt;Store&lt;/code&gt; instance for each &lt;code&gt;HColumnFamily&lt;/code&gt; for every table as defined by the user beforehand. Each of the &lt;code&gt;Store&lt;/code&gt; instances can in turn have one or more &lt;code&gt;StoreFile&lt;/code&gt; instances, which are lightweight wrappers around the actual storage file called &lt;code&gt;HFile&lt;/code&gt;. A &lt;code&gt;HRegion&lt;/code&gt; also has a &lt;code&gt;MemStore&lt;/code&gt; and a &lt;code&gt;HLog&lt;/code&gt; instance. We will now have a look at how they work together but also where there are exceptions to the rule. &lt;br /&gt;&lt;br /&gt;&lt;u&gt;Stay Put&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;So how is data written to the actual storage? The client issues a &lt;code&gt;HTable.put(Put)&lt;/code&gt; request to the &lt;code&gt;HRegionServer&lt;/code&gt; which hands the details to the matching &lt;code&gt;HRegion&lt;/code&gt; instance. The first step is now to decide if the data should be first written to the "Write-Ahead-Log" (WAL) represented by the &lt;code&gt;HLog&lt;/code&gt; class. The decision is based on the flag set by the client using &lt;code&gt;Put.writeToWAL(boolean)&lt;/code&gt; method. The WAL is a standard Hadoop &lt;code&gt;SequenceFile&lt;/code&gt; (although it is currently discussed if that should not be changed to a more HBase suitable file format) and it stores &lt;code&gt;HLogKey&lt;/code&gt;'s. These keys contain a sequential number as well as the actual data and are used to replay not yet persisted data after a server crash.&lt;br /&gt;&lt;br /&gt;Once the data is written (or not) to the WAL it is placed in the &lt;code&gt;MemStore&lt;/code&gt;. At the same time it is checked if the &lt;code&gt;MemStore&lt;/code&gt; is full and in that case a flush to disk is requested. When the request is served by a separate thread in the &lt;code&gt;HRegionServer&lt;/code&gt; it writes the data to an &lt;code&gt;HFile&lt;/code&gt; located in the HDFS. It also saves the last written sequence number so the system knows what was persisted so far. Let"s have a look at the files now.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Files&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;HBase has a configurable root directory in the HDFS but the default is &lt;code&gt;/hbase&lt;/code&gt;. You can simply use the DFS tool of the Hadoop command line tool to look at the various files HBase stores.&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush:plain"&gt;$ hadoop dfs -lsr /hbase/docs&lt;br /&gt;...&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-09-28 14:22 /hbase/.logs&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-10-15 14:33 /hbase/.logs/srv1.foo.bar,60020,1254172960891&lt;br /&gt;-rw-r--r--   3 hadoop supergroup      14980 2009-10-14 01:32 /hbase/.logs/srv1.foo.bar,60020,1254172960891/hlog.dat.1255509179458&lt;br /&gt;-rw-r--r--   3 hadoop supergroup       1773 2009-10-14 02:33 /hbase/.logs/srv1.foo.bar,60020,1254172960891/hlog.dat.1255512781014&lt;br /&gt;-rw-r--r--   3 hadoop supergroup      37902 2009-10-14 03:33 /hbase/.logs/srv1.foo.bar,60020,1254172960891/hlog.dat.1255516382506&lt;br /&gt;...&lt;br /&gt;-rw-r--r--   3 hadoop supergroup  137648437 2009-09-28 14:20 /hbase/docs/1905740638/oldlogfile.log&lt;br /&gt;...&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-09-27 18:03 /hbase/docs/999041123&lt;br /&gt;-rw-r--r--   3 hadoop supergroup       2323 2009-09-01 23:16 /hbase/docs/999041123/.regioninfo&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-10-13 01:36 /hbase/docs/999041123/cache&lt;br /&gt;-rw-r--r--   3 hadoop supergroup   91540404 2009-10-13 01:36 /hbase/docs/999041123/cache/5151973105100598304&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-09-27 18:03 /hbase/docs/999041123/contents&lt;br /&gt;-rw-r--r--   3 hadoop supergroup  333470401 2009-09-27 18:02 /hbase/docs/999041123/contents/4397485149704042145&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-09-04 01:16 /hbase/docs/999041123/language&lt;br /&gt;-rw-r--r--   3 hadoop supergroup      39499 2009-09-04 01:16 /hbase/docs/999041123/language/8466543386566168248&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-09-04 01:16 /hbase/docs/999041123/mimetype&lt;br /&gt;-rw-r--r--   3 hadoop supergroup     134729 2009-09-04 01:16 /hbase/docs/999041123/mimetype/786163868456226374&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-10-08 22:45 /hbase/docs/999882558&lt;br /&gt;-rw-r--r--   3 hadoop supergroup       2867 2009-10-08 22:45 /hbase/docs/999882558/.regioninfo&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-10-09 23:01 /hbase/docs/999882558/cache&lt;br /&gt;-rw-r--r--   3 hadoop supergroup   45473255 2009-10-09 23:01 /hbase/docs/999882558/cache/974303626218211126&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-10-12 00:37 /hbase/docs/999882558/contents&lt;br /&gt;-rw-r--r--   3 hadoop supergroup  467410053 2009-10-12 00:36 /hbase/docs/999882558/contents/2507607731379043001&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-10-09 23:02 /hbase/docs/999882558/language&lt;br /&gt;-rw-r--r--   3 hadoop supergroup        541 2009-10-09 23:02 /hbase/docs/999882558/language/5662037059920609304&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-10-09 23:02 /hbase/docs/999882558/mimetype&lt;br /&gt;-rw-r--r--   3 hadoop supergroup      84447 2009-10-09 23:02 /hbase/docs/999882558/mimetype/2642281535820134018&lt;br /&gt;drwxr-xr-x   - hadoop supergroup          0 2009-10-14 10:58 /hbase/docs/compaction.dir&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The first set of files are the log files handled by the &lt;code&gt;HLog&lt;/code&gt; instances and which are created in a directory called &lt;code&gt;.logs&lt;/code&gt; underneath the HBase root directory. Then there is another subdirectory for each &lt;code&gt;HRegionServer&lt;/code&gt; and then a log for each &lt;code&gt;HRegion&lt;/code&gt;. &lt;br /&gt;&lt;br /&gt;Next there is a file called &lt;code&gt;oldlogfile.log&lt;/code&gt; which you may not even see on your cluster. They are created by one of the exceptions I mentioned earlier as far as file access is concerned. They are a result of so called "log splits". When the &lt;code&gt;HMaster&lt;/code&gt; starts and finds that there is a log file that is not handled by a &lt;code&gt;HRegionServer&lt;/code&gt; anymore it splits the log copying the &lt;code&gt;HLogKey&lt;/code&gt;'s to the new regions they should be in. It places them directly in the region's directory in a file named &lt;code&gt;oldlogfile.log&lt;/code&gt;. Now when the respective &lt;code&gt;HRegion&lt;/code&gt; is instantiated it reads these files and inserts the contained data into its local &lt;code&gt;MemStore&lt;/code&gt; and starts a flush to persist the data right away and delete the file. &lt;br /&gt;&lt;br /&gt;Note: Sometimes you may see left-over &lt;code&gt;oldlogfile.log.old&lt;/code&gt; (yes, there is another .old at the end) which are caused by the &lt;code&gt;HMaster&lt;/code&gt; trying repeatedly to split the log and found there was already another split log in place. At that point you have to consult with the &lt;code&gt;HRegionServer&lt;/code&gt; or &lt;code&gt;HMaster&lt;/code&gt; logs to see what is going on and if you can remove those files. I found at times that they were empty and therefore could safely be removed.&lt;br /&gt;&lt;br /&gt;The next set of files are the actual regions. Each region name is encoded using a Jenkins Hash function and a directory created for it. The reason to hash the region name is because it may contain characters that cannot be used in a path name in DFS. The Jenkins Hash always returns legal characters, as simple as that. So you get the following path structure:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;/hbase/&amp;lt;tablename&amp;gt;/&amp;lt;encoded-regionname&amp;gt;/&amp;lt;column-family&amp;gt;/&amp;lt;filename&amp;gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;In the root of the region directory there is also a &lt;code&gt;.regioninfo&lt;/code&gt; holding meta data about the region. This will be used in the future by an HBase &lt;code&gt;fsck&lt;/code&gt; utility (see &lt;a href="http://issues.apache.org/jira/browse/HBASE-7"&gt;HBASE-7&lt;/a&gt;) to be able to rebuild a broken &lt;code&gt;.META.&lt;/code&gt; table. For a first usage of the region info can be seen in &lt;a href="http://issues.apache.org/jira/browse/HBASE-1867"&gt;HBASE-1867&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;In each column-family directory you can see the actual data files, which I explain in the following section in detail. &lt;br /&gt;&lt;br /&gt;Something that I have not shown above are split regions with their initial daughter reference files. When a data file within a region grows larger than the configured &lt;code&gt;hbase.hregion.max.filesize&lt;/code&gt; then the region is split in two. This is done initially very quickly because the system simply creates two reference files in the new regions now supposed to host each half. The name of the reference file is an ID with the hashed name of the referenced region as a postfix, e.g. &lt;code&gt;1278437856009925445.3323223323&lt;/code&gt;. The reference files only hold little information: the key the original region was split at and wether it is the top or bottom reference. Of note is that these references are then used by the &lt;code&gt;HalfHFileReader&lt;/code&gt; class (which I also omitted from the big picture above as it is only used temporarily) to read the original region data files. Only upon a compaction the original files are rewritten into separate files in the new region directory. This also removes the small reference files as well as the original data file in the original region. &lt;br /&gt;&lt;br /&gt;And this also concludes the file dump here, the last thing you see is a &lt;code&gt;compaction.dir&lt;/code&gt; directory in each table directory. They are used when splitting or compacting regions as noted above. They are usually empty and are used as a scratch area to stage the new data files before swapping them into place.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;HFile&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;So we are now at a very low level of HBase's architecture. &lt;code&gt;HFile&lt;/code&gt;'s (kudos to Ryan Rawson) are the actual storage files, specifically created to serve one purpose: store HBase's data fast and efficiently. They are apparently based on Hadoop's &lt;code&gt;TFile&lt;/code&gt; (see &lt;a href="http://issues.apache.org/jira/browse/HADOOP-3315"&gt;HADOOP-3315&lt;/a&gt;) and mimic the SSTable format used in Googles BigTable architecture. The previous use of Hadoop's &lt;code&gt;MapFile&lt;/code&gt;'s in HBase proved to be not good enough performance wise. So how do the files look like?&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Cib_A77V54U/SteEzNS2qPI/AAAAAAAAAD4/z13-DGcA_qs/s1600-h/hfile.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 118px;" src="http://4.bp.blogspot.com/_Cib_A77V54U/SteEzNS2qPI/AAAAAAAAAD4/z13-DGcA_qs/s400/hfile.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5392925094076393714" /&gt;&lt;/a&gt; The files have a variable length, the only fixed blocks are the FileInfo and Trailer block. As the picture shows it is the Trailer that has the pointers to the other blocks and it is written at the end of persisting the data to the file, finalizing the now immutable data store. The Index blocks record the offsets of the Data and Meta blocks. Both the Data and the Meta blocks are actually optional. But you most likely you would always find data in a data store file.&lt;br /&gt;&lt;br /&gt;How is the block size configured? It is driven solely by the &lt;code&gt;HColumnDescriptor&lt;/code&gt; which in turn is specified at table creation time by the user or defaults to reasonable standard values. Here is an example as shown in the master web based interface: &lt;br /&gt;&lt;br /&gt;&lt;code&gt;{NAME =&gt; 'docs', FAMILIES =&gt; [{NAME =&gt; 'cache', COMPRESSION =&gt; 'NONE', VERSIONS =&gt; '3', TTL =&gt; '2147483647', BLOCKSIZE =&gt; '65536', IN_MEMORY =&gt; 'false', BLOCKCACHE =&gt; 'false'}, {NAME =&gt; 'contents', COMPRESSION =&gt; 'NONE', VERSIONS =&gt; '3', TTL =&gt; '2147483647', BLOCKSIZE =&gt; '65536', IN_MEMORY =&gt; 'false', BLOCKCACHE =&gt; 'false'}, ...&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;The default is "64KB" (or 65535 bytes). Here is what the HFile JavaDoc explains:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;"Minimum block size. We recommend a setting of minimum block size between 8KB to 1MB for general usage. Larger block size is preferred if files are primarily for sequential access. However, it would lead to inefficient random access (because there are more data to decompress). Smaller blocks are good for random access, but require more memory to hold the block index, and may be slower to create (because we must flush the compressor stream at the conclusion of each data block, which leads to an FS I/O flush). Further, due to the internal caching in Compression codec, the smallest possible block size would be around 20KB-30KB."&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;So each block with its prefixed "magic" header contains either plain or compressed data. How that looks like we will have a look at in the next section.&lt;br /&gt;&lt;br /&gt;One thing you may notice is that the default block size for files in DFS is 64MB, which is 1024 times what the &lt;code&gt;HFile&lt;/code&gt; default block size is. So the HBase storage files blocks do &lt;u&gt;not&lt;/u&gt; match the Hadoop blocks. Therefore you have to think about both parameters separately and find the sweet spot in terms of performance for your particular setup.&lt;br /&gt;&lt;br /&gt;One option in the HBase configuration you may see is &lt;code&gt;hfile.min.blocksize.size&lt;/code&gt;. It seems to be only used during migration from earlier versions of HBase (since it had no block file format) and when directly creating &lt;code&gt;HFile&lt;/code&gt; during bulk imports for example.&lt;br /&gt;&lt;br /&gt;So far so good, but how can you see if a &lt;code&gt;HFile&lt;/code&gt; is OK or what data it contains? There is an App for that!&lt;br /&gt;&lt;br /&gt;The &lt;code&gt;HFile.main()&lt;/code&gt; method provides the tools to dump a data file:&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush:plain"&gt;$ hbase org.apache.hadoop.hbase.io.hfile.HFile&lt;br /&gt;usage: HFile  [-f &lt;arg&gt;] [-v] [-r &lt;arg&gt;] [-a] [-p] [-m] [-k]&lt;br /&gt; -a,--checkfamily    Enable family check&lt;br /&gt; -f,--file &lt;arg&gt;     File to scan. Pass full-path; e.g.&lt;br /&gt;                     hdfs://a:9000/hbase/.META./12/34&lt;br /&gt; -k,--checkrow       Enable row order check; looks for out-of-order keys&lt;br /&gt; -m,--printmeta      Print meta data of file&lt;br /&gt; -p,--printkv        Print key/value pairs&lt;br /&gt; -r,--region &lt;arg&gt;   Region to scan. Pass region name; e.g. '.META.,,1'&lt;br /&gt; -v,--verbose        Verbose output; emits file and meta data delimiters&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here is an example of what the output will look like (shortened here):&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush:plain"&gt;$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -p -m -f \&lt;br /&gt;  hdfs://srv1.foo.bar:9000/hbase/docs/999882558/mimetype/2642281535820134018&lt;br /&gt;&lt;br /&gt;Scanning -&gt; hdfs://srv1.foo.bar:9000/hbase/docs/999882558/mimetype/2642281535820134018&lt;br /&gt;...&lt;br /&gt;K: \x00\x04docA\x08mimetype\x00\x00\x01\x23y\x60\xE7\xB5\x04 V: text\x2Fxml&lt;br /&gt;K: \x00\x04docB\x08mimetype\x00\x00\x01\x23x\x8C\x1C\x5E\x04 V: text\x2Fxml&lt;br /&gt;K: \x00\x04docC\x08mimetype\x00\x00\x01\x23xz\xC08\x04 V: text\x2Fxml&lt;br /&gt;K: \x00\x04docD\x08mimetype\x00\x00\x01\x23y\x1EK\x15\x04 V: text\x2Fxml&lt;br /&gt;K: \x00\x04docE\x08mimetype\x00\x00\x01\x23x\xF3\x23n\x04 V: text\x2Fxml&lt;br /&gt;Scanned kv count -&gt; 1554&lt;br /&gt;&lt;br /&gt;Block index size as per heapsize: 296&lt;br /&gt;reader=hdfs://srv1.foo.bar:9000/hbase/docs/999882558/mimetype/2642281535820134018, \&lt;br /&gt;  compression=none, inMemory=false, \&lt;br /&gt;  firstKey=US6683275_20040127/mimetype:/1251853756871/Put, \&lt;br /&gt;  lastKey=US6684814_20040203/mimetype:/1251864683374/Put, \&lt;br /&gt;  avgKeyLen=37, avgValueLen=8, \&lt;br /&gt;  entries=1554, length=84447&lt;br /&gt;fileinfoOffset=84055, dataIndexOffset=84277, dataIndexCount=2, metaIndexOffset=0, \&lt;br /&gt;  metaIndexCount=0, totalBytes=84055, entryCount=1554, version=1&lt;br /&gt;Fileinfo:&lt;br /&gt;MAJOR_COMPACTION_KEY = \xFF&lt;br /&gt;MAX_SEQ_ID_KEY = 32041891&lt;br /&gt;hfile.AVG_KEY_LEN = \x00\x00\x00\x25&lt;br /&gt;hfile.AVG_VALUE_LEN = \x00\x00\x00\x08&lt;br /&gt;hfile.COMPARATOR = org.apache.hadoop.hbase.KeyValue\x24KeyComparator&lt;br /&gt;hfile.LASTKEY = \x00\x12US6684814_20040203\x08mimetype\x00\x00\x01\x23x\xF3\x23n\x04&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The first part is the actual data stored as &lt;code&gt;KeyValue&lt;/code&gt; pairs, explained in detail in the next section. The second part dumps the internal &lt;code&gt;HFile.Reader&lt;/code&gt; properties as well as the Trailer block details and finally the FileInfo block values. This is a great way to check if a data file is still healthy. &lt;br /&gt;&lt;br /&gt;&lt;u&gt;KeyValue's&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;In essence each &lt;code&gt;KeyValue&lt;/code&gt; in the &lt;code&gt;HFile&lt;/code&gt; is simply a low-level byte array that allows for "zero-copy" access to the data, even with lazy or custom parsing if necessary. How are the instances arranged?&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Cib_A77V54U/StZMrzaKufI/AAAAAAAAADo/ZhK7bGoJdMQ/s1600-h/KeyValue.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 62px;" src="http://2.bp.blogspot.com/_Cib_A77V54U/StZMrzaKufI/AAAAAAAAADo/ZhK7bGoJdMQ/s400/KeyValue.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5392581919240796658" /&gt;&lt;/a&gt; The structure starts with two fixed length numbers indicating the size of the key and the value part. With that info you can offset into the array to for example get direct access to the value, ignoring the key - if you know what you are doing. Otherwise you can get the required information from the key part. Once parsed into a &lt;code&gt;KeyValue&lt;/code&gt; object you have getters to access the details.&lt;br /&gt;&lt;br /&gt;Note: One thing to watch out for is the difference between &lt;code&gt;KeyValue.getKey()&lt;/code&gt; and &lt;code&gt;KeyValue.getRow()&lt;/code&gt;. I think for me the confusion arose from referring to "row keys" as the primary key to get a row out of HBase. That would be the latter of the two methods, i.e. &lt;code&gt;KeyValue.getRow()&lt;/code&gt;. The former simply returns the complete byte array part representing the raw "key" as colored and labeled in the diagram. &lt;br /&gt;&lt;br /&gt;This concludes my analysis of the HBase storage architecture. I hope it provides a starting point for your own efforts to dig into the grimy details. Have fun!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; Slightly updated with more links to JIRA issues. Also added Zookeeper to be more precise about the current mechanisms to look up a region.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update 2:&lt;/b&gt; Added details about region references.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update 3:&lt;/b&gt; Added more details about region lookup as requested.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-8073228651920917483?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/L4NqEQB5dz5QpkYjXM65Ihth4FI/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/L4NqEQB5dz5QpkYjXM65Ihth4FI/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/L4NqEQB5dz5QpkYjXM65Ihth4FI/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/L4NqEQB5dz5QpkYjXM65Ihth4FI/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/WZzB1ksv078" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/8073228651920917483/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html#comment-form" title="15 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/8073228651920917483?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/8073228651920917483?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/WZzB1ksv078/hbase-architecture-101-storage.html" title="HBase Architecture 101 - Storage" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_Cib_A77V54U/StorLZRjHSI/AAAAAAAAAEI/4IznGhslNxw/s72-c/hbase-files.png" height="72" width="72" /><thr:total>15</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEQCRXszeip7ImA9WxBTF0U.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-2315355857313374748</id><published>2009-10-12T13:20:00.000-07:00</published><updated>2009-12-14T01:19:24.582-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-14T01:19:24.582-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>Hive vs. Pig</title><content type="html">While I was looking at &lt;a href="http://wiki.apache.org/hadoop/Hive"&gt;Hive&lt;/a&gt; and &lt;a href="http://hadoop.apache.org/pig/"&gt;Pig&lt;/a&gt; for processing large amounts of data without the need to write MapReduce code I found that there is no easy way to compare them against each other without reading into both in greater detail.&lt;br /&gt;
&lt;br /&gt;
In this post I am trying to give you a 10,000ft view of both and compare some of the more prominent and interesting features. The following table - which is discussed below - compares what I deemed to be such features:&lt;br /&gt;
&lt;br /&gt;
&lt;table border="1" cellspacing="0" cellpadding="0" style="border-collapse:collapse;border:none; "&gt;&lt;tbody&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border-top:solid black 1.0pt;border-left:solid black 1.0pt; border-bottom:none;border-right:solid windowtext 1.0pt; background:black;padding:0in 5.4pt 0in 5.4pt; height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;;color:white;"&gt;Feature&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border:solid windowtext 1.0pt;border-left: none; background:black;padding:0in 5.4pt 0in 5.4pt; height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; color:white;"&gt;Hive&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border:solid windowtext 1.0pt;border-left: none; background:black;padding:0in 5.4pt 0in 5.4pt; height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;; color:white;"&gt;Pig&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border:solid black 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Language&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;SQL-like&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;PigLatin&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border-top:none;border-left:solid black 1.0pt; border-bottom:none;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Schemas/Types&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes (explicit)&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes (implicit)&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border:solid black 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Partitions&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;No&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border-top:none;border-left:solid black 1.0pt; border-bottom:none;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Server&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Optional (Thrift)&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;No&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border:solid black 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;User Defined Functions (UDF)&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes (Java)&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes (Java)&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border-top:none;border-left:solid black 1.0pt; border-bottom:none;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Custom Serializer/Deserializer&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border:solid black 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;DFS Direct Access&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes (implicit)&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes (explicit)&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border-top:none;border-left:solid black 1.0pt; border-bottom:none;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Join/Order/Sort&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border:solid black 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Shell&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border-top:none;border-left:solid black 1.0pt; border-bottom:none;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Streaming&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border:solid black 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Web Interface&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal;"&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;No&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;/tr&gt;
&lt;tr style="height:.3in"&gt; &lt;td width="335" style="width:251.6pt;border-top:none;border-left:solid black 1.0pt; border-bottom:solid black 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p style="margin-bottom:0in;margin-bottom:.0001pt;line-height: normal;"&gt;&lt;b&gt;&lt;span style="font-size:9.0pt;font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;JDBC/ODBC&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="156" style="width:117.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;Yes (limited)&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td width="147" style="width:110.2pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt;height:.3in"&gt; &lt;p align="center" style="margin-bottom:0in;margin-bottom:.0001pt; text-align:center;line-height:normal"&gt;&lt;span style="font-size:9.0pt; font-family:&amp;quot;Verdana&amp;quot;,&amp;quot;sans-serif&amp;quot;"&gt;No&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;
&lt;br /&gt;
Let us look now into each of these with a bit more detail.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;General Purpose&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
The question is "What does Hive or Pig solve?". Both - and I think this lucky for us in regards to comparing them - have a very similar goal. They try to ease the complexity of writing MapReduce jobs in a programming language like Java by giving the user a set of tools that they may be more familiar with (more on this below). The raw data is stored in Hadoop's HDFS and can be any format although natively it usually is a TAB separated text file, while internally they also may make use of Hadoop's SequenceFile file format. The idea is to be able to parse the raw data file, for example a web server log file, and use the contained information to slice and dice them into what is needed for business needs. Therefore they provide means to aggregate fields based on specific keys. In the end they both emit the result again in either text or a custom file format. Efforts are also underway to have both use other systems as a source for data, for example HBase.&lt;br /&gt;
&lt;br /&gt;
The features I am comparing are chosen pretty much at random because they stood out when I read into each of these two frameworks. So keep in mind that this is a subjective list.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Language&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Hive lends itself to SQL. But since we can only read already existing files in HDFS it is lacking UPDATE or DELETE support for example. It focuses primarily on the query part of SQL. But even there it has its own spin on things to reflect better the underlaying MapReduce process. Overall is seems that someone familiar with SQL can very quickly learn Hive's version of it and get results fast.&lt;br /&gt;
&lt;br /&gt;
Pig on the other hand looks more like a very simplistic scripting language. As with those (and this is a nearly religious topic) some are more intuitive and some are less. As with PigLatin I was able to see what the samples do, but lacking the full knowledge of its syntax I was somewhat finding myself thinking if I really would be able to get what I needed without too many trial-and-error loops. Sure, the Hive SQL needs probably as many iterations to fully grasp - but there is at least a greater understanding of what to expect.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Schemas/Types&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Hive uses once more a specific variation of SQL's Data Definition Language (DDL). It defines the "tables" beforehand and stores the schema in a either shared or local database. Any JDBC offering will do, but it also comes with a built in Derby instance to get you started quickly. If the database is local then only you can run specific Hive commands. If you share the database then others can also run these - or would have to set up their own local database copy. Types are also defined upfront and supported types are INT, BIGINT, BOOLEAN, STRING and so on. There are also array types that lets you handle specific fields in the raw data files as a group.&lt;br /&gt;
&lt;br /&gt;
Pig has no such metadata database. Datatypes and schemas are defined within each script. Types furthermore are usually automatically determined by their use. So if you use a field as an Integer it is handled that way by Pig. You do have the option though to override it and have explicit type definitions, again within the script you need them. Pig has a similar set of types compared to Hive. For example it also has an array type called "bag".&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Partitions&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Hive has a notion of partitions. They are basically subdirectories in HDFS. It allows for example processing a subset of the data by alphabet or date. It is up to the user to create these "partitions" as they are not enforced nor required.&lt;br /&gt;
&lt;br /&gt;
Pig does not seem to have such a feature. It may be that filters can achieve the same but it is not immediately obvious to me.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Server&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Hive can start an optional server, which is allegedly Thrift based. With the server I presume you can send queries from anywhere to the Hive server which in turn executes them.&lt;br /&gt;
&lt;br /&gt;
Pig does not seem to have such a facility yet.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;User Defined Functions&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Hive and Pig allow for user functionality by supplying Java code to the query process. These functions can add any additional feature that is required to crunch the numbers as required.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Custom Serializer/Deserializer&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Again, both Hive and Pig allow for custom Java classes that can read or write any file format required. I also assume that is how it connects to HBase eventually (just a guess). You can write a parser for Apache log files or, for example, the binary &lt;a href="http://github.com/larsgeorge/ulog-reader"&gt;Tokyo Tyrant Ulog&lt;/a&gt; format. The same goes for the output, write a database output class and you can write the results back into a database.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;DFS Direct Access&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Hive is smart about how to access the raw data. A "select * from table limit 10" for example does a direct read from the file. If the query is too complicated it will fall back to use a full MapReduce run to determine the outcome, just as expected.&lt;br /&gt;
&lt;br /&gt;
With Pig I am not sure if it does the same to speed up simple PigLatin scripts. At least it does not seem to be mentioned anywhere as an important feature.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Join/Order/Sort&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Hive and Pig have support for joining, ordering or sorting data dynamically. They perform the same purpose in both pretty allowing you to aggregate and sort the result as is needed. Pig also has a COGROUP feature that allows you to do OUTER JOIN's and so on. I think this is where you spent most of your time with either package - especially when you start out. But from a cursory look it seems both can do pretty much the same.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Shell&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Both Hive and Pig have a shell that allows you to query specific things or run the actual queries. Pig also passes on DFS commands such as "cat" to allow you to quickly check what an outcome of a specific PigLatin script was.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Streaming&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Once more, both frameworks seem to provide streaming interfaces so that you can process data with external tools or languages, such as Ruby or Python. How the streaming performs I do not know and if they affect them differently. This is for you to tell me :)&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Web Interface&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Only Hive has a &lt;a href="http://wiki.apache.org/hadoop/Hive/HiveWebInterface"&gt;web interface&lt;/a&gt; or UI that can be used to visualize the various schemas and issue queries. This is different to the above mentioned Server as it is an interactive web UI for a human operator. The Hive Server is for use from another programming or scripting language for example.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;JDBC/ODBC&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Another Hive only feature is the availability of a - again limited functionality - JDBC/ODBC driver. It is another way for programmers to use Hive without having to bother with its shell or web interface, or even the Hive Server. Since only a subset of features is available it will require small adjustments on the programmers side of things but otherwise seems like a nice-to-have feature.&lt;br /&gt;
&lt;br /&gt;
&lt;u&gt;Conclusion&lt;/u&gt;&lt;br /&gt;
&lt;br /&gt;
Well, it seems to me that both can help you achieve the same goals, while Hive comes more natural to database developers and Pig to "script kiddies" (just kidding). Hive has more features as far as access choices are concerned. They also have reportedly roughly the same amount of committers in each project and are going strong development wise.&lt;br /&gt;
&lt;br /&gt;
This is it from me. Do you have a different opinion or comment on the above then please feel free to reply below. Over and out!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-2315355857313374748?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/M9s0PrJI9e5_c5j5RDwv62aL_oQ/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/M9s0PrJI9e5_c5j5RDwv62aL_oQ/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/M9s0PrJI9e5_c5j5RDwv62aL_oQ/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/M9s0PrJI9e5_c5j5RDwv62aL_oQ/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/T9h-4AlOUEw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/2315355857313374748/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/10/hive-vs-pig.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/2315355857313374748?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/2315355857313374748?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/T9h-4AlOUEw/hive-vs-pig.html" title="Hive vs. Pig" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>4</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/10/hive-vs-pig.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0YBR385fCp7ImA9WxJQE0s.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-3650504781091875191</id><published>2009-05-26T05:12:00.000-07:00</published><updated>2009-05-26T11:05:56.124-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-26T11:05:56.124-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><title>HBase Schema Manager</title><content type="html">As already mentioned in one of my &lt;a href="http://www.larsgeorge.com/2009/01/changing-hbase-tables-in-code.html"&gt;previous&lt;/a&gt; posts, HBase at times makes it difficult to maintain or even create a new table structure. Imagine you have a running cluster and quite an elaborate table setup. Now you want to to create a backup cluster for load balancing and general tasks like reporting etc. How do you get all the values from one system into the other?&lt;br /&gt;&lt;br /&gt;While you can use various &lt;a href="http://hadoop.apache.org/hbase/docs/r0.19.0/api/org/apache/hadoop/hbase/mapred/package-summary.html"&gt;examples&lt;/a&gt; that help you backing up the data and eventually restore it, how do you "clone" the table schemas?&lt;br /&gt;&lt;br /&gt;Or imagine you have an existing system like the one we talked about above and you simply want to change a few things around. With an RDBMS you can save the required steps in a DDL statement and execute it on the server - or the backup server etc. But with HBase there is now DDL or even the possibility of executing pre-built scripts against a running cluster.&lt;br /&gt;&lt;br /&gt;What I described in my &lt;a href="http://www.larsgeorge.com/2009/01/changing-hbase-tables-in-code.html"&gt;previous&lt;/a&gt; post was a why to store the table schemas into an XML configuration file and run that against a cluster. The code handles adding new tables and more importantly the addition, removal and modification of column families for any named table. &lt;br /&gt;&lt;br /&gt;I have put this all into a separate Java application that may be useful to you. You can get it from my GitHub &lt;a href="http://github.com/larsgeorge/hbase-schema-manager/tree/master"&gt;repository&lt;/a&gt;. It is really simple to use, you create an XML based configuration file, for example:&lt;br /&gt;&lt;pre class="brush:xml"&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;UTF-8&amp;quot;?&amp;gt;&lt;br /&gt;&amp;lt;configurations&amp;gt;&lt;br /&gt;  &amp;lt;configuration&amp;gt;&lt;br /&gt;    &amp;lt;name&amp;gt;foo&amp;lt;/name&amp;gt;&lt;br /&gt;    &amp;lt;description&amp;gt;Configuration for the FooBar HBase cluster.&amp;lt;/description&amp;gt;&lt;br /&gt;    &amp;lt;hbase_master&amp;gt;foo.bar.com:60000&amp;lt;/hbase_master&amp;gt;&lt;br /&gt;    &amp;lt;schema&amp;gt;&lt;br /&gt;      &amp;lt;table&amp;gt;&lt;br /&gt;        &amp;lt;name&amp;gt;test&amp;lt;/name&amp;gt;&lt;br /&gt;        &amp;lt;description&amp;gt;Test table.&amp;lt;/description&amp;gt;&lt;br /&gt;        &amp;lt;column_family&amp;gt;&lt;br /&gt;          &amp;lt;name&amp;gt;sample&amp;lt;/name&amp;gt;&lt;br /&gt;          &amp;lt;description&amp;gt;Sample column.&amp;lt;/description&amp;gt;&lt;br /&gt;          &amp;lt;!-- Default: 3 --&amp;gt;&lt;br /&gt;          &amp;lt;max_versions&amp;gt;1&amp;lt;/max_versions&amp;gt;&lt;br /&gt;          &amp;lt;!-- Default: DEFAULT_COMPRESSION_TYPE --&amp;gt;&lt;br /&gt;          &amp;lt;compression_type/&amp;gt;&lt;br /&gt;          &amp;lt;!-- Default: false --&amp;gt;&lt;br /&gt;          &amp;lt;in_memory/&amp;gt;&lt;br /&gt;          &amp;lt;!-- Default: false --&amp;gt;&lt;br /&gt;          &amp;lt;block_cache_enabled/&amp;gt;&lt;br /&gt;          &amp;lt;!-- Default: -1 (forever) --&amp;gt;&lt;br /&gt;          &amp;lt;time_to_live/&amp;gt;&lt;br /&gt;          &amp;lt;!-- Default: 2147483647 --&amp;gt;&lt;br /&gt;          &amp;lt;max_value_length/&amp;gt;&lt;br /&gt;          &amp;lt;!-- Default: DEFAULT_BLOOM_FILTER_DESCRIPTOR --&amp;gt;&lt;br /&gt;          &amp;lt;bloom_filter/&amp;gt;&lt;br /&gt;        &amp;lt;/column_family&amp;gt;&lt;br /&gt;      &amp;lt;/table&amp;gt;&lt;br /&gt;    &amp;lt;/schema&amp;gt;&lt;br /&gt;  &amp;lt;/configuration&amp;gt;&lt;br /&gt;&amp;lt;/configurations&amp;gt;&lt;/pre&gt;&lt;br /&gt;Then all you have to do is run the application like so:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;java -jar hbase-manager-1.0.jar schema.xml&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;The "schema.xml" is the above XML configuration saved on your local machine. The output shows the steps performed:&lt;br /&gt;&lt;pre class="brush:plain"&gt;$ java -jar hbase-manager-1.0.jar schema.xml&lt;br /&gt; creating table test...&lt;br /&gt; table created&lt;br /&gt;done.&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You can also specify more options on the command line:&lt;br /&gt;&lt;pre class="brush:plain"&gt;usage: HbaseManager [&amp;lt;options&amp;gt;] &amp;lt;schema-xml-filename&amp;gt; [&amp;lt;config-name&amp;gt;]&lt;br /&gt; -l,--list       lists all tables but performs no further action.&lt;br /&gt; -n,--nocreate   do not create non-existent tables.&lt;br /&gt; -v,--verbose    print verbose output.&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If you use the "verbose" option you get more details:&lt;br /&gt;&lt;pre class="brush:plain"&gt;$ java -jar hbase-manager-1.0.jar -v schema.xml&lt;br /&gt;schema filename: schema.xml&lt;br /&gt;configuration used: null&lt;br /&gt;using config number: default&lt;br /&gt;table schemas read from config: &lt;br /&gt;  [name -&gt; test&lt;br /&gt;  description -&amp;gt; Test table.&lt;br /&gt;  columns -&amp;gt; {sample=name -&gt; sample&lt;br /&gt;  description -&amp;gt; Sample column.&lt;br /&gt;  maxVersions -&amp;gt; 1&lt;br /&gt;  compressionType -&amp;gt; NONE&lt;br /&gt;  inMemory -&amp;gt; false&lt;br /&gt;  blockCacheEnabled -&amp;gt; false&lt;br /&gt;  maxValueLength -&amp;gt; 2147483647&lt;br /&gt;  timeToLive -&amp;gt; -1&lt;br /&gt;  bloomFilter -&amp;gt; false}]&lt;br /&gt; hbase.master -&amp;gt; foo.bar.com:60000&lt;br /&gt; authoritative -&amp;gt; true&lt;br /&gt; name -&amp;gt; test&lt;br /&gt; tableExists -&amp;gt; true&lt;br /&gt; changing table test...&lt;br /&gt; no changes detected!&lt;br /&gt;done.&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Finally you can use the "list" option to check initial connectivity and the successful changes:&lt;br /&gt;&lt;pre class="brush:plain"&gt;$ java -jar hbase-manager-1.0.jar -l schema.xml&lt;br /&gt;tables found: 1&lt;br /&gt;  test&lt;br /&gt;done.&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;A few notes: First and most importantly, if you change a large table, i.e. one with thousands of regions, this process can take quite a long time. This is caused by the &lt;code&gt;enableTable()&lt;/code&gt; call having to scan the complete .META. table to assign the regions to their respective region servers. There is possibly room for improvement in my little application to handle this better - suggestions welcome!&lt;br /&gt;&lt;br /&gt;Also, I do not have &lt;span style="font-style:italic;"&gt;Bloom Filter&lt;/span&gt; settings implemented, as this is still changing from 0.19 to 0.20. Once it has been finalized I will add support for it.&lt;br /&gt;&lt;br /&gt;If you do not specify a configuration name then the first one is used. Having more than one configuration allows you to have multiple clusters defined in one schema file and by specifying the name you can execute only a specific one when you need to.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-3650504781091875191?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/nrImnmcjHYz91bIoHlbLeFtYx4g/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/nrImnmcjHYz91bIoHlbLeFtYx4g/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/nrImnmcjHYz91bIoHlbLeFtYx4g/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/nrImnmcjHYz91bIoHlbLeFtYx4g/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/oz3d_oirec0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/3650504781091875191/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/05/hbase-schema-manager.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/3650504781091875191?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/3650504781091875191?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/oz3d_oirec0/hbase-schema-manager.html" title="HBase Schema Manager" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/05/hbase-schema-manager.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DU4DR30_eCp7ImA9WxJRF0s.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-410827567364136915</id><published>2009-05-11T15:08:00.000-07:00</published><updated>2009-05-19T14:19:36.340-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-19T14:19:36.340-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>HBase MapReduce 101 - Part I</title><content type="html">In this and the following posts I would like to take the opportunity to go into detail about the &lt;a href="http://labs.google.com/papers/mapreduce.html"&gt;MapReduce&lt;/a&gt; process as provided by &lt;a href="http://hadoop.apache.org/core/docs/current/mapred_tutorial.html"&gt;Hadoop&lt;/a&gt; but more importantly how it applies to &lt;a href="http://hadoop.apache.org/hbase/"&gt;HBase&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;MapReduce&lt;/h3&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Cib_A77V54U/ShJ8K99N0fI/AAAAAAAAACY/aFbcbtIK4nI/s1600-h/MapReduce2.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 400px; height: 340px;" src="http://1.bp.blogspot.com/_Cib_A77V54U/ShJ8K99N0fI/AAAAAAAAACY/aFbcbtIK4nI/s400/MapReduce2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5337465036259316210" /&gt;&lt;/a&gt;&lt;br /&gt;MapReduce as a process was designed to solve the problem of processing in excess of terabytes of data in a scalable way. There should be a way to design such a system that increases in performance linearly with the number of physical machines added. That is what MapReduce strives to do. It follows a divide-and-conquer approach by splitting the data located on a distributed file system so that the servers (or rather cpu's, or more modern "cores") available can access these pieces and process them as fast as they can. The problem with this approach is that you will have to consolidate the data at the end. Again, MapReduce has this built right into it.&lt;br /&gt;&lt;br /&gt;The above simplified image of the MapReduce process shows you how the data is processed. The first thing that happens is the &lt;span style="font-style:italic;"&gt;split&lt;/span&gt; which is responsible to divide the input data in reasonable size chunks that are then processed by one server at a time. This splitting has to be done somewhat smart to make best use of available servers and the infrastructure in general. In this example the data may be a very large log file that is divided into equal size pieces on line boundaries. This is OK for example for say Apache log files. Input data may also be binary though where you may have to write your own &lt;code&gt;getSplits()&lt;/code&gt; method - but more on that below. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Classes&lt;/h3&gt;&lt;br /&gt;The above image also shows you the classes that are involved in the Hadoop implementation of MapReduce. Let's look at them and also at the specific implementations that HBase provides on top of those. &lt;br /&gt;&lt;br /&gt;&lt;u&gt;InputFormat&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Cib_A77V54U/ShLU17tI4bI/AAAAAAAAADI/w1ah_hgGZpM/s1600-h/input.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 600px;" src="http://4.bp.blogspot.com/_Cib_A77V54U/ShLU17tI4bI/AAAAAAAAADI/w1ah_hgGZpM/s800/input.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5337562531412631986" /&gt;&lt;/a&gt;&lt;br /&gt;The first class to deal with is the &lt;code&gt;InputFormat&lt;/code&gt; class. It is responsible for two things. First is does the actual splitting of the input data as well as returning a &lt;code&gt;RecordReader&lt;/code&gt; instance that defines the classes of the &lt;span style="font-style:italic;"&gt;key&lt;/span&gt; and &lt;span style="font-style:italic;"&gt;value&lt;/span&gt; objects as well as providing a &lt;code&gt;next()&lt;/code&gt; method that is used to iterate over each input record. &lt;br /&gt;&lt;br /&gt;As far as HBase is concerned there is a special implementation called &lt;code&gt;TableInputFormatBase&lt;/code&gt; as well as its subclass &lt;code&gt;TableInputFormat&lt;/code&gt;. The former implements the majority of the functionality but remains abstract. The subclass is a light-weight concrete version of the TableInputFormat and is used by many supplied sample and real MapReduce classes.&lt;br /&gt;&lt;br /&gt;But most importantly these classed implement the full turn-key solution to scan a HBase table. You can provide the name of the table to scan and the columns you want to process during the Map phase. It splits the table into proper pieces for you and hands them over to the subsequent classes. There are quite a few tweaks which we will address below and in the following installments of this series.&lt;br /&gt;&lt;br /&gt;For now let's look at the other classes involved.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Mapper&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Cib_A77V54U/ShKM4M07-nI/AAAAAAAAACo/x99sr5Evg7s/s1600-h/mapper.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 474px;" src="http://4.bp.blogspot.com/_Cib_A77V54U/ShKM4M07-nI/AAAAAAAAACo/x99sr5Evg7s/s800/mapper.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5337483405531282034" /&gt;&lt;/a&gt;&lt;br /&gt;The &lt;code&gt;Mapper&lt;/code&gt; class(es) are for the next stage of the MapReduce process and one of its namesakes. In this step each record read using the &lt;code&gt;RecordReader&lt;/code&gt; is processed using the &lt;code&gt;map()&lt;/code&gt; method. What is also visible somewhat from the first figure above is that the Mapper reads a specific type of key/value pair but emits possibly another. This is handy to convert the raw data into something more useful for further processing. &lt;br /&gt;&lt;br /&gt;Again, looking at HBase's extensions to this, you will find a &lt;code&gt;TableMap&lt;/code&gt; class that is specific to iterating over a HBase table. Once specific implementation is the &lt;code&gt;IdentityTableMap&lt;/code&gt; which is also a good example on how to add your own functionality to the supplied classes. The &lt;code&gt;TableMap&lt;/code&gt; class itself does not implement anything but only adds the signatures of what the actual key/value pair classes are. The &lt;code&gt;IdentityTableMap&lt;/code&gt; is simply passing on the records to the next stage of the processing.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Reducer&lt;/u&gt;&lt;br /&gt;&lt;br /&gt; &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Cib_A77V54U/ShKPELsMsMI/AAAAAAAAACw/wwLwE9Ez9hY/s1600-h/reduce.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 504px;" src="http://1.bp.blogspot.com/_Cib_A77V54U/ShKPELsMsMI/AAAAAAAAACw/wwLwE9Ez9hY/s800/reduce.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5337485810407878850" /&gt;&lt;/a&gt;&lt;br /&gt;The &lt;code&gt;Reduce&lt;/code&gt; stage and class layout is very similar to the Mapper one explained above. This time we get the output of a Mapper class and process it after the data was &lt;span style="font-style:italic;"&gt;shuffled&lt;/span&gt; and &lt;span style="font-style:italic;"&gt;sorted&lt;/span&gt;. In the implicit shuffle between the Mapper and Reducer stages the intermediate data is copied from different Map to the Reduce servers and the sort combines the shuffled (copied) data so that the Reducer sees the intermediate data as a nicely sorted set where now each unique key (and that is something I will get back to later) is associated with all of the possible values it was found with. &lt;br /&gt;&lt;br /&gt;&lt;u&gt;OutputFormat&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Cib_A77V54U/ShLQzvCHrwI/AAAAAAAAADA/FFzv3MsXg2I/s1600-h/output.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 580px;" src="http://2.bp.blogspot.com/_Cib_A77V54U/ShLQzvCHrwI/AAAAAAAAADA/FFzv3MsXg2I/s800/output.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5337558095604723458" /&gt;&lt;/a&gt;&lt;br /&gt;The final stage is the OutputFormat class and its job to persist the data in various locations. There are specific implementations that allow output to files or to HBase tables in case of the &lt;code&gt;TableOutputFormat&lt;/code&gt;. It uses a &lt;code&gt;RecordWriter&lt;/code&gt; to write the data into the specific HBase output table. &lt;br /&gt;&lt;br /&gt;It is important to note the cardinality as well. While there are many Mappers handing records to many Reducers, there is only one OutputFormat that takes each output record from its Reducer subsequently. It is the final class handling the key/value pairs and writes them to their final destination, this being a file or a table.&lt;br /&gt;&lt;br /&gt;The name of the output table is specified when the job is created. Otherwise it does not add much more complexity. One rather significant thing it does is set the table's &lt;span style="font-style:italic;"&gt;auto flush&lt;/span&gt; to "false" and handles the buffer flushing implicitly. This helps a lot &lt;a href="http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html"&gt;speeding up&lt;/a&gt; the import of large data sets.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;To Map or Reduce or Not Map or Reduce&lt;/h3&gt;&lt;br /&gt;This is now a crucial point we are at deciding on how to process tables stored in HBase. From the above it seems that we simply use a TableInputFormat to feed through a s TableMap and TableReduce to eventually persist the data with a TableOutputFormat. But this may not be what you want when you deal with HBase. The question is if there is a better way to handle the process given certain specific architectural features HBase provides.  Depending on your data source and target there are a few different scenarios we could think of.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Cib_A77V54U/ShKfjYi6gGI/AAAAAAAAAC4/9y5pLXdhl_c/s1600-h/table.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 400px; height: 268px;" src="http://4.bp.blogspot.com/_Cib_A77V54U/ShKfjYi6gGI/AAAAAAAAAC4/9y5pLXdhl_c/s400/table.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5337503938620588130" /&gt;&lt;/a&gt; If you want to import a large set of data into a HBase table you can read the data using a Mapper and after aggregating it on a per key basis using a Reducer and finally writing it into a HBase table. This involves the whole MapReduce stack including the shuffle and sort using intermediate files. But what if you know that the data for example has a unique key? Why go through the extra step of copying and sorting when there is always just exactly one key/value pair? At this point you can ask yourself, wouldn't it be better if I could skip that whole reduce stage? The answer is &lt;u&gt;yes&lt;/u&gt; you can! And you should as you can harvest the pure computational power of all CPU's to crunch the data and writing it at top IO speed to its final target. &lt;br /&gt;&lt;br /&gt;As you can see from the matrix, there are quite a few scenarios where you can decide if you want Map only or both, Map and Reduce. But when it comes to handling HBase tables as sources and targets there are a few exceptions to this rule and I highlighted them accordingly. &lt;br /&gt;&lt;br /&gt;&lt;u&gt;Same or different Tables&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;This is the an important distinction. The bottom line is, when you read a table in the Map stage you should consider not writing back to that very same table in the same process. It could on one hand hinder the proper distribution of regions across the servers (open scanners block regions splits) and on the other hand you may or may not see the new data as you scan. But when you read from one table and write to another then you can do that in a single stage. So for two different tables you can write your table updates directly in the &lt;code&gt;TableMap.map()&lt;/code&gt; while with the same table you must write the same code in the &lt;code&gt;TableReduce.reduce()&lt;/code&gt; - or in its &lt;code&gt;TableOutputFormat&lt;/code&gt; (or even better simply use that class as is and you are done). The reason is that the Map stage completely reads a table and then passes the data on in intermediate files to the Reduce stage. In turn this means that the Reducer reads from the distributed file system (DFS) and writes into the now idle HBase table. And all is well.&lt;br /&gt;&lt;br /&gt;All of the above are simply recommendations based on what is currently available with HBase. There is of course no reason not to scan and modify a table in the same process. But to avoid certain non-deterministic issue I personally would not recommend this - especially if you are new to HBase. Maybe this also could be taken into consideration when designing your HBase tables. Maybe you separate distinct data into two tables or separate column families so you can scan one table while changing the other. &lt;br /&gt;&lt;br /&gt;&lt;u&gt;Key Distribution&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;Another specific requirement for an effective import is to have a random keys as they are read. While this will be difficult if you scan a table in the Map phase, as keys are sorted, you may be able to make use of this when reading from a raw data file. Instead of leaving the key the offset of the file, as created by the &lt;code&gt;TextOutputFormat&lt;/code&gt; for example, you could simply replace the rather useless offset with a random key. This will guarantee that the data is spread across all servers more evenly. Especially the &lt;code&gt;HRegionServers&lt;/code&gt; will be very thankful as they each host as set of regions and random keys makes for a random load on these regions. &lt;br /&gt;&lt;br /&gt;Of course this depends on how the data is written to the raw files or how the real row keys are computed, but still a very valuable thing to keep in mind.&lt;br /&gt;&lt;br /&gt;In the next post I will show you how to import data from a raw data file into a HBase table and how you eventually process the data in the HBase table. We will address questions like how many mappers and/or reducers are needed and how can I improve import and processing performance. &lt;br /&gt;&lt;br /&gt;Until then, have a good day!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-410827567364136915?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/CGqi7XQ-nZ0C_6ps8JmezGnhqMw/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/CGqi7XQ-nZ0C_6ps8JmezGnhqMw/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/CGqi7XQ-nZ0C_6ps8JmezGnhqMw/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/CGqi7XQ-nZ0C_6ps8JmezGnhqMw/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/3VDwmQ-wkBs" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/410827567364136915/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/410827567364136915?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/410827567364136915?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/3VDwmQ-wkBs/hbase-mapreduce-101-part-i.html" title="HBase MapReduce 101 - Part I" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_Cib_A77V54U/ShJ8K99N0fI/AAAAAAAAACY/aFbcbtIK4nI/s72-c/MapReduce2.png" height="72" width="72" /><thr:total>2</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0UGRHc4fSp7ImA9WxNbFk8.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-1603546021082141639</id><published>2009-05-03T11:09:00.000-07:00</published><updated>2009-11-19T03:13:45.935-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-19T03:13:45.935-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><title>European HBase Ambassador</title><content type="html">&lt;p&gt;I am on a mission! The mission is to spread the word on HBase. There are only a few choices when it comes to large scale data storage solutions. What I am referring to is not your vertical scale, big-iron relational database system. The time is now for a &lt;a href="http://www.cringely.com/2009/05/the-sequel-dilemma/"&gt;paradigm shift&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;There are &lt;a href="http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/"&gt;various&lt;/a&gt; key/value or more structured stores that strive to achieve the same - but getting proper information is often a big challenge. Either there is no proper examples, or use-cases for that matter. Even worse are performance details, the usual rap being "all depends!" - of course it does. Without real examples it is difficult to determine if a suitable system design wise is also holding up to the task at hand. How many servers are needed? How should the hardware stack be organized?&lt;/p&gt;&lt;p&gt;Here is where I feel I can help. I am using &lt;a href="http://hadoop.apache.org/hbase/"&gt;HBase&lt;/a&gt; for over a year now (started prototyping in late 2007), in production with three clusters spread over more than 50 servers. And I pretty much set them up from plugging in the hardware to the designing and running the cluster and the system on top of that. While this is in itself nothing special, I feel that I gained a lot of experience with HBase. I also feel that it could be very helpful to others that are thinking about HBase and how it may help them.&lt;/p&gt;&lt;p&gt;So I hereby declare myself to be a &lt;span style="font-weight:bold;"&gt;European HBase Ambassador&lt;/span&gt;. Mind you, this is no official title. The purpose is to help furthering the adoption of HBase in research and/or commercial projects. So what does this "position" entail? I offer this:&lt;br /&gt;
&lt;blockquote&gt;For the cost of travel (and if necessary accommodation) I will present any aspect of HBase in production to whoever is interested.&lt;/blockquote&gt;Yes, I will come to you if you ask, no matter where in Europe (or beyond). I will &lt;a href="http://www.larsgeorge.com/2009/03/hbase-vs-couchdb-in-berlin.html"&gt;present&lt;/a&gt; on the internals of HBase, its API and so on to developers or higher concepts to architects. Or to management on a white paper level. You want to know about HBase and how it can help you? Let me know and I will show you.&lt;/p&gt;&lt;p&gt;Why me? Besides what I mentioned above I have over 13 years experience in software engineering and am responsible as the CTO at &lt;a href="http://www.worldlingo.com"&gt;WorldLingo&lt;/a&gt; which for example is the sole provider for all machine translations in Microsoft Office for Windows and MacOS - given the text is longer than a few words because otherwise the internal word dictionaries are taking preference. I write and speak German (native) and English fluently. Last but not least I have regular contact with the core developers of HBase and am a contributor myself - as much as time allows.&lt;/p&gt;&lt;p&gt;So there you have it. This is my offer. I hope you see value in and take me up on it! I surely am looking forward to meeting you.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-1603546021082141639?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/JGj5W8IDEywNQLsIuRbVuGNhBV8/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/JGj5W8IDEywNQLsIuRbVuGNhBV8/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/JGj5W8IDEywNQLsIuRbVuGNhBV8/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/JGj5W8IDEywNQLsIuRbVuGNhBV8/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/J9khYAJtmRs" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/1603546021082141639/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/05/european-hbase-ambassador.html#comment-form" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/1603546021082141639?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/1603546021082141639?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/J9khYAJtmRs/european-hbase-ambassador.html" title="European HBase Ambassador" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>8</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/05/european-hbase-ambassador.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0MASXs7eyp7ImA9WxVbFU4.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-6770725984106216636</id><published>2009-03-31T15:04:00.001-07:00</published><updated>2009-03-31T15:44:08.503-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-31T15:44:08.503-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="work" /><title>10 Years in one Project</title><content type="html">&lt;p&gt;For about ten years now I am the CTO at &lt;a href="http://www.worldlingo.com"&gt;WorldLingo&lt;/a&gt;. During those years I have seen quite a few people join and leaving us eventually. Below is a small snapshot of how time has passed. Obviously I am quite proud to be somewhat the rock in the sea.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;object width="640" height="480"&gt;&lt;param name="allowfullscreen" value="true" /&gt;&lt;param name="allowscriptaccess" value="always" /&gt;&lt;param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3944920&amp;amp;server=vimeo.com&amp;amp;show_title=1&amp;amp;show_byline=1&amp;amp;show_portrait=0&amp;amp;color=00ADEF&amp;amp;fullscreen=1" /&gt;&lt;embed src="http://vimeo.com/moogaloop.swf?clip_id=3944920&amp;amp;server=vimeo.com&amp;amp;show_title=1&amp;amp;show_byline=1&amp;amp;show_portrait=0&amp;amp;color=00ADEF&amp;amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="640" height="480"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;/p&gt;&lt;p&gt;If you like to know how the video was created, then read on.&lt;/p&gt;&lt;p&gt;I download the &lt;a href="http://code.google.com/p/codeswarm/source/checkout"&gt;source&lt;/a&gt; of the code_swarm project following the description, i.e. I used&lt;br /&gt;&lt;pre name="code" class="bash"&gt;&lt;br /&gt;svn checkout http://codeswarm.googlecode.com/svn/trunk/ codeswarm-read-only&lt;br /&gt;cd codeswarm-read-only&lt;br /&gt;ant all&lt;/pre&gt;&lt;br /&gt;to get the code and then ran &lt;code&gt;ant all&lt;/code&gt; in its root directory:&lt;br /&gt;&lt;pre name="code" class="bash"&gt;&lt;br /&gt;C:\CODESW~1&amp;gt;ant all&lt;br /&gt;Buildfile: build.xml&lt;br /&gt;&lt;br /&gt;init:&lt;br /&gt;     [echo] Running INIT&lt;br /&gt;&lt;br /&gt;build:&lt;br /&gt;     [echo] Running BUILD&lt;br /&gt;    [mkdir] Created dir: C:\CODESW~1\build&lt;br /&gt;    [javac] Compiling 18 source files to C:\CODESW~1\build&lt;br /&gt;     [copy] Copying 1 file to C:\CODESW~1\build&lt;br /&gt;&lt;br /&gt;jar:&lt;br /&gt;     [echo] Running JAR&lt;br /&gt;    [mkdir] Created dir: C:\CODESW~1\dist&lt;br /&gt;      [jar] Building jar: C:\CODESW~1\dist\code_swarm.jar&lt;br /&gt;&lt;br /&gt;all:&lt;br /&gt;     [echo] Building ALL&lt;br /&gt;&lt;br /&gt;BUILD SUCCESSFUL&lt;br /&gt;Total time: 6 seconds&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note that this is on my Windows machine. After the build you will have to edit the config file to have your settings and regular expressions match your project. I really took the supplied sample config file, copied it and modified these lines:&lt;br /&gt;&lt;pre name="code" class="bash"&gt;&lt;br /&gt;# This is a sample configuration file for code_swarm&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;# Input file&lt;br /&gt;InputFile=data/wl-repevents.xml&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;# Project time per frame&lt;br /&gt;#MillisecondsPerFrame=51600000&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;# Optional Method instead of MillisecondsPerFrame&lt;br /&gt;FramesPerDay=2&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;ColorAssign1="wlsystem",".*wlsystem.*", 0,0,255, 0,0,255&lt;br /&gt;ColorAssign2="www",".*www.*", 0,255,0, 0,255,0&lt;br /&gt;ColorAssign3="docs",".*docs.*", 102,0,255, 102,0,255&lt;br /&gt;ColorAssign4="serverconfig",".*serverconf.*", 255,0,0, 255,0,0&lt;br /&gt;&lt;br /&gt;# Save each frame to an image?&lt;br /&gt;TakeSnapshots=true&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;DrawNamesHalos=true&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This is just adjusting the labels and turning on the snap shots to be able to create a video at the end. I found a &lt;a href="http://code.google.com/p/codeswarm/wiki/GeneratingAVideo"&gt;tutorial&lt;/a&gt; that explained how to set this up.&lt;/p&gt;&lt;p&gt;What did not work for me is getting mencoder to work. I downloaded the MPlayer Windows installer from its &lt;a href="http://www.mplayerhq.hu/design7/dload.html"&gt;official site&lt;/a&gt; and although it is meant to have mencoder included it does not. Or I am blind.&lt;/p&gt;&lt;p&gt;So, I simply ran &lt;br /&gt;&lt;pre name="code" class="bash"&gt;mkdir frames&lt;br /&gt;runrepositoryfetch.bat data\wl.config&lt;/pre&gt; &lt;br /&gt;to fetch the history of our repository spanning about 10 years - going from Visual SourceSafe, to CVS and currently running on Subversion. One further problem was that the output file of the above script was not named as I had previously specified in the config file, so I had to rename it like so: &lt;br /&gt;&lt;pre name="code" class="bash"&gt;cd data&lt;br /&gt;ren realtime_sample1157501935.xml wl-repevents.xml&lt;/pre&gt;&lt;br /&gt;After that I was able to use &lt;code&gt;run.bat data\wl.config&lt;/code&gt; to see the full movie in real time.&lt;/p&gt;&lt;p&gt;With the snap shots created but me not willing to further dig into the absence of mencoder I fired up my trusted MacBookPro and used Quicktime to create the movie from an image sequence.&lt;/p&gt;&lt;p&gt;When Quicktime did its magic I saved the .mov file and used VisualHub to convert it to a proper video format to upload to Vimeo. And that was it really.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-6770725984106216636?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/wIRQWo8rf-feDmC-4S-pnNAQI2s/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/wIRQWo8rf-feDmC-4S-pnNAQI2s/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/wIRQWo8rf-feDmC-4S-pnNAQI2s/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/wIRQWo8rf-feDmC-4S-pnNAQI2s/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/8glZakxUPOU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/6770725984106216636/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/03/10-years-in-one-project.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/6770725984106216636?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/6770725984106216636?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/8glZakxUPOU/10-years-in-one-project.html" title="10 Years in one Project" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/03/10-years-in-one-project.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEcDRns-fCp7ImA9WxVUEkU.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-2663151232898805205</id><published>2009-03-16T13:11:00.000-07:00</published><updated>2009-03-17T02:27:57.554-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-17T02:27:57.554-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="couchdb" /><title>CouchDB and CouchApp</title><content type="html">&lt;p&gt;As mentioned in my previous post, I wanted to see CouchDB in action and decided to "push" one of the available sample applications into it. CouchDB has a built in web server to support the REST based API CouchDB supports. The developers were smart enough to see its broader use and allow for applications to be uploaded into a database. An application is a set of static HTML files, images, and JavaScripts that can form a fully functional web application, including the data stored directly in that very same database. With CouchDB's built in replication you get a fully distributed application. Sweet! &lt;/p&gt;&lt;p&gt;Sure you will have to install a load balancer in front of multiple instances of CouchDB, but that is a simple engineering task. I am not sure how session handling will work though. Without a somewhat standard session handling framework it may be difficult to build state aware applications. You can save the session in the database of course and reload upon each request. Is it replicated fast enough though for random access to cluster nodes?&lt;/p&gt;&lt;p&gt;Back to the sample application. I chose the &lt;a href="http://github.com/jchris/couchdb-twitter-client/tree/master"&gt;Twitter client&lt;/a&gt; provided by &lt;a href="http://jchrisa.net/drl/_design/sofa/_list/index/recent-posts?descending=true&amp;amp;limit=5"&gt;Chris Anderson&lt;/a&gt; from the CouchDB team (btw, his blog is now hosted directly on CouchDB). I got the tar ball from the above GitHub repository and unpacked it. To get it "pushed" into a CouchDB database you need the actual &lt;a href="http://github.com/jchris/couchapp/tree/master"&gt;CouchApp&lt;/a&gt; as well. Download its tar ball as well and unpack it. Now we can install CouchApp first. I went with the README file provide and tried the Ruby version, installing it as a gem:&lt;br /&gt;&lt;pre name="code" class="bash"&gt;&lt;br /&gt;$ sudo gem update --system&lt;br /&gt;$ sudo gem install couchapp&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;While that installed fine, the syntax that you then find for example for the application to be pushed into the database does not match with the Ruby version of CouchApp. After talking to Chris on the IRC channel he advised uninstalling the Ruby version and rather use the Python based version. I uninstalled the gem and ran:&lt;br /&gt;&lt;pre name="code" class="bash"&gt;&lt;br /&gt;$ sudo easy_install couchapp&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now the syntax of &lt;code&gt;couchapp&lt;/code&gt; did match and I was ready to upload the Twitter sample application - or was I?&lt;/p&gt;&lt;p&gt;Not so fast though, every attempt to upload the application resulted in partial failures. I compared what I had in the uploaded database with what Chris had. His had the attachments needed, mine did not. The database has been created but half of the files were missing. After a quick chat with Chris again we realized that he is using the trunk version of CouchDB, I was using the latest release. Mine was simply outdated and was missing the new application extensions. Here are the next steps I had to run:&lt;br /&gt;&lt;pre name="code" class="bash"&gt;&lt;br /&gt;$ cd /downloads&lt;br /&gt;$ svn co http://svn.apache.org/repos/asf/couchdb/trunk couchdb&lt;br /&gt;$ cd couchdb/&lt;br /&gt;$ ./bootstrap&lt;br /&gt;$ ./configure&lt;br /&gt;$ make &amp;amp;&amp;amp; sudo make install&lt;br /&gt;$ sudo -i -u couchdb couchdb&lt;br /&gt;&lt;/pre&gt;and to push the application&lt;br /&gt;&lt;pre name="code" class="bash"&gt;&lt;br /&gt;$ cd ../jchris-couchdb-twitter-client-6bee14ae1b3525d56d77dd9c114002582dc0abe8&lt;br /&gt;$ couchapp push http://localhost:5984/test-twitter&lt;br /&gt;&lt;/pre&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Cib_A77V54U/Sb9mWtSmdXI/AAAAAAAAACA/oE3LRxwNNrQ/s1600-h/cdb-design.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px; height: 174px;" src="http://2.bp.blogspot.com/_Cib_A77V54U/Sb9mWtSmdXI/AAAAAAAAACA/oE3LRxwNNrQ/s200/cdb-design.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5314078625621243250" /&gt;&lt;/a&gt;&lt;br /&gt;As expected, the application push did succeed now. I was able to see all the files in the newly created database. The screen shot shows the design view of the new "test-twitter" database. Everything you need to serve the application is there, even the favicon.ico to display in the browsers address bar. If you click on the "index.html" the newly uploaded application is started and after logging into Twitter I had everything running as I wanted it.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Cib_A77V54U/Sb9o-9wK5gI/AAAAAAAAACQ/aQCw8_mfnJA/s1600-h/cdb-twitter.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 200px; height: 156px;" src="http://3.bp.blogspot.com/_Cib_A77V54U/Sb9o-9wK5gI/AAAAAAAAACQ/aQCw8_mfnJA/s200/cdb-twitter.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5314081516258256386" /&gt;&lt;/a&gt;&lt;br /&gt;Here is the a screen shot of the application running. This is really great. I just have to figure out now for myself how to use it either for work or privately. So many choices - but only 24 hours in a day. &lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-2663151232898805205?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/jqwN3DG9vMFUSxz9sikSOYypyPU/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/jqwN3DG9vMFUSxz9sikSOYypyPU/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/jqwN3DG9vMFUSxz9sikSOYypyPU/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/jqwN3DG9vMFUSxz9sikSOYypyPU/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/t_r3QzWBqe0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/2663151232898805205/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/03/couchdb-and-couchapp.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/2663151232898805205?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/2663151232898805205?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/t_r3QzWBqe0/couchdb-and-couchapp.html" title="CouchDB and CouchApp" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_Cib_A77V54U/Sb9mWtSmdXI/AAAAAAAAACA/oE3LRxwNNrQ/s72-c/cdb-design.png" height="72" width="72" /><thr:total>1</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/03/couchdb-and-couchapp.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkMFQns6eCp7ImA9WxVUEUk.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-6818531142587163943</id><published>2009-03-15T07:28:00.000-07:00</published><updated>2009-03-15T12:13:33.510-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-15T12:13:33.510-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="macos" /><category scheme="http://www.blogger.com/atom/ns#" term="erlang" /><title>Erlang and CouchDB on MacOS</title><content type="html">&lt;p&gt;As I mentioned before I am looking into how I could use Erlang in our own efforts. Sure, it is not the silver bullet for all problems ("Can it make coffee?") and all the hype on the developer ether. But it is certainly built to be the basis of large concurrent systems, for example Facebook's &lt;a href="http://www.facebook.com/notes.php?id=9445547199"&gt;chat system&lt;/a&gt;. Or the below mentioned Amazon Dynamo clone called &lt;a href="http://github.com/cliffmoon/dynomite/tree/master"&gt;Dynomite&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;For starters I purchased "The Pragmatic Programmers" screen case &lt;a href="http://www.pragprog.com/screencasts/v-kserl/erlang-in-practice"&gt;Erlang in Practise&lt;/a&gt; with Kevin Smith. I have to say, it is worth every cent! I did watch it on a flight from Munich to Las Vegas and it made the hours literally "fly by". There is something really cool about seeing a program being developed in front of your eyes and re-written many times to explain more advanced concepts as you go along. Highly recommended. &lt;/p&gt;&lt;p&gt;Now I have a background in Prolog and Lisp so getting into the Erlang way of doing things was not too difficult. Of course, the difficult yet again is how to use it for something own. I decided to first build Erlang on my MacBook Pro and then try an Erlang based system to see it working. Building Erlang on MacOS is described in &lt;a href="http://tim.dysinger.net/2007/12/20/compiling-erlang-on-mac-os-x-leopard-from-scratch/"&gt;here&lt;/a&gt; but overall it is a simple "wget &amp;&amp; tar -zxvf &amp;&amp; ./configure &amp;&amp; make" obstacle course with a few hickups thrown in for good measure. As the post describes, you first need to install XCode from either the supplied MacOS disks or by downloading it from the net - easy peasy. Next is to build libgd. The above post has a link to the &lt;a href="http://www.libgd.org/DOC_Compiling_GD_on_Mac_OS_X_HOWTO"&gt;details&lt;/a&gt; required to build libgd on MacOS.  It first required downloading all the required library tar balls and then running the following commands: &lt;br /&gt;&lt;pre name="code"  class="bash"&gt;&lt;br /&gt;$ cd /downloads/&lt;br /&gt;$ tar -zxvf zlib-1.2.3.tar.gz &lt;br /&gt;$ tar -zxvf gd-2.0.35.tar.gz &lt;br /&gt;$ tar -zxvf freetype-2.3.8.tar.gz &lt;br /&gt;$ tar -zxvf jpegsrc.v6b.tar.gz &lt;br /&gt;$ tar -zxvf libpng-1.2.34.tar.gz &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This assumes all tar balls are saved in the "/downloads" directory. Next is &lt;code&gt;zlib&lt;/code&gt;:&lt;br /&gt;&lt;pre name="code"  class="bash"&gt;&lt;br /&gt;$ cd zlib-1.2.3 ; ./configure --shared &amp;&amp; make &amp;&amp; sudo make install&lt;br /&gt;$ ./example&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Then &lt;code&gt;libpng&lt;/code&gt;:&lt;br /&gt;&lt;pre name="code"  class="bash"&gt;&lt;br /&gt;$ cd ../libpng-1.2.34&lt;br /&gt;$ cp scripts/makefile.darwin Makefile&lt;br /&gt;$ vim Makefile&lt;br /&gt;$ make &amp;&amp; sudo make install&lt;br /&gt;$ export srcdir=.; ./test-pngtest.sh&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Next is the &lt;code&gt;jpeg&lt;/code&gt; library. Here I did not have to symlink the &lt;code&gt;libtool&lt;/code&gt; as described in the post. I assume that is because I am on MacOS 10.5. So all I did is this:&lt;br /&gt;&lt;pre name="code"  class="bash"&gt;&lt;br /&gt;$ cd ../jpeg-6b/&lt;br /&gt;$ cp /usr/share/libtool/config.sub .&lt;br /&gt;$ cp /usr/share/libtool/config.guess .&lt;br /&gt;$ ./configure --enable-shared&lt;br /&gt;$ make&lt;br /&gt;$ sudo make install&lt;br /&gt;$ sudo ranlib /usr/local/lib/libjpeg.a&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We are getting closer. The &lt;code&gt;freetype&lt;/code&gt; library needs these steps, where the last line is for the subsequent &lt;code&gt;libgd&lt;/code&gt; build:&lt;br /&gt;&lt;pre name="code"  class="bash"&gt;&lt;br /&gt;$ cd ../freetype-2.3.8&lt;br /&gt;$ ./configure &amp;&amp; make &amp;&amp; sudo make install&lt;br /&gt;$ sudo ln -s /usr/X11R6/include/fontconfig /usr/local/include&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;OK, now the &lt;code&gt;libgd&lt;/code&gt; library:&lt;br /&gt;&lt;pre name="code"  class="bash"&gt;&lt;br /&gt;$ cd ../gd-2.0.35&lt;br /&gt;$ ln -s `which glibtool` ./libtool&lt;br /&gt;$ ./configure &lt;br /&gt;$ make &amp;&amp; sudo make install &lt;br /&gt;$ ./gdtest test/gdtest.png&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;With this all done we can build Erlang from sources like so:&lt;br /&gt;&lt;pre name="code"  class="bash"&gt;&lt;br /&gt;$ cd /downloads&lt;br /&gt;$ tar -zxvf otp_src_R12B-5.tar.gz &lt;br /&gt;$ cd otp_src_R12B-5&lt;br /&gt;$ ./configure --enable-hipe --enable-smp-support --enable-threads&lt;br /&gt;$ make &amp;&amp; sudo make install&lt;br /&gt;$ erl&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;All cool, the Erlang shell starts up and is ready for action:&lt;br /&gt;&lt;pre name="code"  class="bash"&gt;&lt;br /&gt;$ erl&lt;br /&gt;Erlang (BEAM) emulator version 5.6.5 [source] [smp:2] [async-threads:0] [hipe] [kernel-poll:false]&lt;br /&gt;&lt;br /&gt;Eshell V5.6.5  (abort with ^G)&lt;br /&gt;1&amp;gt; "hello world".&lt;br /&gt;"hello world"&lt;br /&gt;2&amp;gt; &lt;br /&gt;&lt;/pre&gt;&lt;/p&gt;&lt;p&gt;With this in place I decided to try CouchDB. Again, here are the steps to get it running:&lt;br /&gt;&lt;pre name="code"  class="bash"&gt;&lt;br /&gt;$ cd /downloads&lt;br /&gt;$ tar -zxvf apache-couchdb-0.8.1-incubating.tar.gz &lt;br /&gt;$ cd apache-couchdb-0.8.1-incubating&lt;br /&gt;$ less README &lt;br /&gt;$ sudo port install automake autoconf libtool help2man&lt;br /&gt;$ sudo port install icu spidermonkey&lt;br /&gt;$ ./configure &lt;br /&gt;$ make&lt;br /&gt;$ sudo make install&lt;br /&gt;$  sudo -i -u couchdb couchdb&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I did not have to execute this line &lt;code&gt;$ open /Applications/Installers/Xcode\ Tools/XcodeTools.mpkg&lt;/code&gt;. I assume it is because I had the full XCode install done beforehand. You can also see that you need to install the &lt;a href="http://www.macports.org/"&gt;MacPorts&lt;/a&gt; tools. With those you will have to add a few binary packages to be able to build CouchDB. Especially the Unicode library &lt;code&gt;ICU&lt;/code&gt; and the &lt;code&gt;spidermonkey&lt;/code&gt; C based JavaScript library provided by the Mozilla organization. The last line starts up the database and directing your browser to &lt;code&gt;http://localhost:5984/_utils/&lt;/code&gt; allows you to see its internal UI. Now relax! ;)&lt;/p&gt;&lt;p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Cib_A77V54U/Sb0g9ohb13I/AAAAAAAAAB4/7v3ZwsWsi54/s1600-h/couchdb.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 306px;" src="http://1.bp.blogspot.com/_Cib_A77V54U/Sb0g9ohb13I/AAAAAAAAAB4/7v3ZwsWsi54/s400/couchdb.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5313439378588817266" /&gt;&lt;/a&gt;If you look closely at the screen shot you will see an inconsistency to the notes above. I will describe this in more detail in a future post about getting the sample Twitter client for CouchDB running using CouchApp.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-6818531142587163943?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/v7ow3pvlSjTgETIRk5Is6ulIvAk/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/v7ow3pvlSjTgETIRk5Is6ulIvAk/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/v7ow3pvlSjTgETIRk5Is6ulIvAk/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/v7ow3pvlSjTgETIRk5Is6ulIvAk/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/laqIGXrpowM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/6818531142587163943/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/03/erlang-and-couchdb-on-macos.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/6818531142587163943?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/6818531142587163943?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/laqIGXrpowM/erlang-and-couchdb-on-macos.html" title="Erlang and CouchDB on MacOS" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_Cib_A77V54U/Sb0g9ohb13I/AAAAAAAAAB4/7v3ZwsWsi54/s72-c/couchdb.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/03/erlang-and-couchdb-on-macos.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0cHRngyeSp7ImA9WxVVFE8.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-8908268768859093340</id><published>2009-03-07T02:16:00.000-08:00</published><updated>2009-03-07T04:30:37.691-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-07T04:30:37.691-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="erlang" /><category scheme="http://www.blogger.com/atom/ns#" term="work" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><title>HBase vs. CouchDB in Berlin</title><content type="html">I had the pleasure of presenting our involvement with HBase at the &lt;a href="http://newthinking-store.de/event/2009/03/5/day"&gt;4th Berlin&lt;/a&gt; &lt;a href="http://upcoming.yahoo.com/event/1764187"&gt;Hadoop Get Together&lt;/a&gt;. It was organized by &lt;a href="http://www.isabel-drost.de/"&gt;Isabel Drost&lt;/a&gt;. Thanks again to Isabel for having me there, I thoroughly enjoyed it. First off, here are the slides:&lt;p&gt;&lt;br /&gt;&lt;object id="_ds_4769422" name="_ds_4769422" width="450" height="350" type="application/x-shockwave-flash" data="http://viewer.docstoc.com/"&gt;&lt;param name="FlashVars" value="doc_id=4769422&amp;mem_id=602922&amp;doc_type=ppt&amp;fullscreen=0" /&gt;&lt;param name="movie" value="http://viewer.docstoc.com/"/&gt;&lt;param name="allowScriptAccess" value="always" /&gt;&lt;param name="allowFullScreen" value="true" /&gt;&lt;/object&gt;&lt;br /&gt;&lt;font size="1"&gt;&lt;a href="http://www.docstoc.com/docs/4769422/HBase--WorldLingo"&gt;HBase @ WorldLingo&lt;/a&gt; - Get more &lt;a href="http://www.docstoc.com/documents/technology/"&gt;Information Technology&lt;/a&gt;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;The second talk given was by &lt;a href="http://jan.prima.de/"&gt;Jan Lehnardt&lt;/a&gt;, a &lt;a href="http://couchdb.apache.org/"&gt;CouchDB&lt;/a&gt; team member. I am looking into Erlang for the last few months to see how we could use it for our own efforts. CouchDB is one of the projects you come across when reading articles about Erlang. So it was really great to have Jan present too. &lt;/p&gt;&lt;p&gt;At the end of both our talks it was great to see how the questions from the audience at times tried to compare the two. So is HBase better or worse than CouchDB. Of course, you cannot compare them directly. While they share common features (well, they store data, right?) they are made to solve different problems. CouchDB is offering a schema free storage with build in replication, which can even be used to create offline clients that sync their changes with another site when they have connectivity again. One of the features puzzling me most is the ability to use it to serve your own applications to the world. You create the pages and scripts you need and push it into the database using &lt;a href="http://groups.google.com/group/couchapp?pli=1"&gt;CouchApp&lt;/a&gt;. Since the database already has a built-in web server it can handle your applications requirements implicitly. Nice!&lt;/p&gt;&lt;p&gt;I asked Jan if he had concerns about scaling this, or if it wouldn't be better to use Apache or Nginx to serve the static content. His argument was that Erlang can handle many many more concurrent request than Apache can for example. I read up on &lt;a href="http://en.wikipedia.org/wiki/Yaws_(web_server)"&gt;Yaws&lt;/a&gt; and saw his point. So I guess it is a question then of memory and CPU requirements. The former is apparently another strength of CouchDB, which has proven to serve thousands of concurrent requests only needed about 10MB of RAM - how awesome is that?!?! I am not sure about CPU then - but take a gander that it is equally sane.&lt;/p&gt;&lt;p&gt;Another Erlang project I am interested in is &lt;a href="http://github.com/cliffmoon/dynomite/tree/master"&gt;Dynomite&lt;/a&gt;, a Erlang based Amazon Dynamo "clone" (or rather implementation). Talking to Cliff it seems it is as awesome leveraging the Erlang OTP abilities to create something that a normal Java developer and their JRE is just not used to.&lt;/p&gt;&lt;p&gt;And that brings me to HBase. I told the crowd in Berlin that as of version 0.18.0 HBase is ready for anyone to get started with - given they read the Wiki to set the file handles right and a few other bits in pieces. &lt;/p&gt;&lt;p&gt;&lt;span style="font-weight:bold;"&gt;Note:&lt;/span&gt; I was actually thinking about suggesting an improvement to the HBase team to have a command line check that can be invoked separately or is called when "start-hbase.sh" is called that checks a few of these common parameters and prints out warnings to the user. I know that the file handle count is printed out in the log files, but for a newbie this is a bit too deep down. What could be checked? First of the file handles being say 32K. The next thing is newer resource limits that were introduced with Hadoop for example that now need tweaking. An example is the "xciever" (sic) value. This again is documented in the Wiki, but who reads it, right? Another common issue is RAM. If the master knows the number of regions (or while it is scanning the META to determine it) it could warn if the JRE is not given enough memory. Sure, there are no hard boundaries, but better to see a &lt;code&gt;Warning: Found x regions. Your configured memory for the JRE seems too low for the system to run stable.&lt;/code&gt;&lt;/p&gt;&lt;p&gt;Back to HBase. I also told the audience that as of HBase 0.19.0 the scanning was much improved speed wise and that I am happy where we are nowadays in terms of stability and speed. Sure, it could be faster for random reads so I may be able to drop my MemCached layer. And the team is working on that. So, here's hoping that we will see the best HBase ever in the upcoming version. I for myself am 100% sure that the HBase guys can deliver - they have done so in the past and will now as well. All I can say - give it a shot!&lt;/p&gt;&lt;p&gt;So, CouchDB is lean and mean while HBase is a resource hog from my experience. But it is also built to scale to Petabyte size data. With CouchDB, you would have to add sharding on top of it including all the issues that come with it, for example rebalancing, fail-over, recovery, adding more servers and so on. For me HBase is the system of choice - for this particular problem. That does not mean I could use CouchDB, or even Erlang for that matter, in a separate area. Until then I will keep my eyes very close in this exciting (though in case of Erlang not new!) technology. May the open-source projects are rule and live long and prosper!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-8908268768859093340?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/8q7upajhuuokyL1Ecz1Dl7yuKhU/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/8q7upajhuuokyL1Ecz1Dl7yuKhU/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/8q7upajhuuokyL1Ecz1Dl7yuKhU/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/8q7upajhuuokyL1Ecz1Dl7yuKhU/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/GxwxpLi8U9M" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/8908268768859093340/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/03/hbase-vs-couchdb-in-berlin.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/8908268768859093340?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/8908268768859093340?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/GxwxpLi8U9M/hbase-vs-couchdb-in-berlin.html" title="HBase vs. CouchDB in Berlin" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>5</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/03/hbase-vs-couchdb-in-berlin.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkYHQXkycSp7ImA9WxVWE08.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-3728820754682891186</id><published>2009-02-21T18:13:00.000-08:00</published><updated>2009-02-22T10:42:10.799-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-02-22T10:42:10.799-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="java" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>Mini Local HBase Cluster</title><content type="html">I am trying to get a local setup going where I have everything I need on my PC - heck even on my MacBookPro. I use Eclipse and Java to develop, so that is easy. I also use &lt;a href="http://www.danga.com/memcached/"&gt;Memcached&lt;/a&gt; and there is a nice &lt;a href="http://trac.macports.org/browser/trunk/dports/sysutils/memcached/Portfile"&gt;MacPorts&lt;/a&gt; version for it available. But what I also need is a working Hadoop/HBase cluster! &lt;br /&gt;&lt;br /&gt;At work we have a few of these, large and small, but they are either in production or simply to complex to use them for day to day testing. Especially when you try to debug a MapReduce job or code talking directly to HBase. I found that the excellent HBase team had already a class in place that is used to set up the JUnit tests they run. And the same goes for Hadoop. So I set out to extract the bare essentials if you will to create a tiny HBase cluster running on a tiny Hadoop distributed filesystem. &lt;br /&gt;&lt;br /&gt;After a couple of issues that had to be resolved the below class is my "culmination" of sweet cluster goodness ;)&lt;br /&gt;&lt;pre name="code" class="java"&gt;&lt;br /&gt;/* File:    MiniLocalHBase.java&lt;br /&gt; * Created: Feb 21, 2009&lt;br /&gt; * Author:  Lars George&lt;br /&gt; *&lt;br /&gt; * Copyright (c) 2009 larsgeorge.com&lt;br /&gt; */&lt;br /&gt;&lt;br /&gt;package com.larsgeorge.hadoop.hbase;&lt;br /&gt;&lt;br /&gt;import java.io.IOException;&lt;br /&gt;import org.apache.hadoop.fs.FileSystem;&lt;br /&gt;import org.apache.hadoop.fs.Path;&lt;br /&gt;import org.apache.hadoop.hbase.HBaseConfiguration;&lt;br /&gt;import org.apache.hadoop.hbase.HConstants;&lt;br /&gt;import org.apache.hadoop.hbase.MiniHBaseCluster;&lt;br /&gt;import org.apache.hadoop.hbase.util.FSUtils;&lt;br /&gt;import org.apache.hadoop.hdfs.MiniDFSCluster;&lt;br /&gt;&lt;br /&gt;/**&lt;br /&gt; * Starts a small local DFS and HBase cluster.&lt;br /&gt; *&lt;br /&gt; * @author Lars George&lt;br /&gt; */&lt;br /&gt;public class MiniLocalHBase {&lt;br /&gt;&lt;br /&gt;  static HBaseConfiguration conf = null;&lt;br /&gt;  static MiniDFSCluster dfs = null;&lt;br /&gt;  static MiniHBaseCluster hbase = null;&lt;br /&gt;  &lt;br /&gt;  /**&lt;br /&gt;   * Main entry point to this class. &lt;br /&gt;   *&lt;br /&gt;   * @param args  The command line arguments.&lt;br /&gt;   */&lt;br /&gt;  public static void main(String[] args) {&lt;br /&gt;    try {&lt;br /&gt;      int n = args.length &amp;gt; 0 &amp;&amp; args[0] != null ? &lt;br /&gt;        Integer.parseInt(args[0]) : 4;&lt;br /&gt;      conf = new HBaseConfiguration();&lt;br /&gt;      dfs = new MiniDFSCluster(conf, 2, true, (String[]) null);&lt;br /&gt;      // set file system to the mini dfs just started up&lt;br /&gt;      FileSystem fs = dfs.getFileSystem();&lt;br /&gt;      conf.set("fs.default.name", fs.getUri().toString());      &lt;br /&gt;      Path parentdir = fs.getHomeDirectory();&lt;br /&gt;      conf.set(HConstants.HBASE_DIR, parentdir.toString());&lt;br /&gt;      fs.mkdirs(parentdir);&lt;br /&gt;      FSUtils.setVersion(fs, parentdir);&lt;br /&gt;      conf.set(HConstants.REGIONSERVER_ADDRESS, HConstants.DEFAULT_HOST + ":0");&lt;br /&gt;      // disable UI or it clashes for more than one RegionServer&lt;br /&gt;      conf.set("hbase.regionserver.info.port", "-1");&lt;br /&gt;      hbase = new MiniHBaseCluster(conf, n);&lt;br /&gt;      // add close hook&lt;br /&gt;      Runtime.getRuntime().addShutdownHook(new Thread() {&lt;br /&gt;        public void run() {&lt;br /&gt;          hbase.shutdown();&lt;br /&gt;          if (dfs != null) {&lt;br /&gt;            try {&lt;br /&gt;              FileSystem fs = dfs.getFileSystem();&lt;br /&gt;              if (fs != null) fs.close();&lt;br /&gt;            } catch (IOException e) {&lt;br /&gt;              System.err.println("error closing file system: " + e);&lt;br /&gt;            }&lt;br /&gt;            try {&lt;br /&gt;              dfs.shutdown();&lt;br /&gt;            } catch (Exception e) { /*ignore*/ }&lt;br /&gt;          }&lt;br /&gt;        }&lt;br /&gt;      } );&lt;br /&gt;    } catch (Exception e) {&lt;br /&gt;      e.printStackTrace();&lt;br /&gt;    }&lt;br /&gt;  } // main&lt;br /&gt;&lt;br /&gt;} // MiniLocalHBase&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The critical part for me was that if you wanted to be able to start more than one region server you have to disable the UI of each of these region servers or they will fail trying to bind the same info port, usually 60030. &lt;br /&gt;&lt;br /&gt;I also added a small shutdown hook so that when you quit the process it will shut down nicely and keep the data in such a condition that you can restart the local again later on for further testing. Otherwise you may end up having to redo the file system - no biggie I guess, but hey why not? You can specify the number of RegionServer's being started on the command line. It defaults to 4 in my sample code above. Also, you do not need any &lt;code&gt;hbase-site.xml&lt;/code&gt; or &lt;code&gt;hadoop-site.xml&lt;/code&gt; to set anything else. All required settings are hardcoded to start the different servers in separate threads. You can of course add one and tweak further settings - just keep in mind that the ones hardcoded in the code cannot be reassigned by the external XML settings files. You would have to move those directly into the code.&lt;br /&gt;&lt;br /&gt;To start this mini cluster you can either run this from within Eclipse for example, which makes it really easy since all the required libraries are in place, or you start it from the command line. This could work like so:&lt;br /&gt;&lt;pre name="code" class="bash"&gt;&lt;br /&gt;hadoop$ java -Xms512m -Xmx512m -cp bin:lib/hadoop-0.19.0-core.jar:lib/hadoop-0.19.0-test.jar:lib/hbase-0.19.0.jar:lib/hbase-0.19.0-test.jar:lib/commons-logging-1.0.4.jar:lib/jetty-5.1.4.jar:lib/servlet-api.jar:lib/jetty-ext/jasper-runtime.jar:lib/jetty-ext/jsp-api.jar:lib/jetty-ext/jasper-compiler.jar:lib/jetty-ext/commons-el.jar com.larsgeorge.hadoop.hbase.MiniLocalHBase&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;What I did is create a small project, have the class compile into the "bin" directory and threw all Hadoop and HBase libraries into the "lib" directory. This was only for the sake of keeping the command line short. I suggest you have the classpath set already or have it point to the original locations where you have untar'ed the respective packages.&lt;br /&gt;&lt;br /&gt;Running it from within Eclipse let's you of course use the integrated debugging tools at hand. The next step is to follow through with what the test classes already have implemented and be able to start Map/Reduce jobs with the debugging enabled. Mind you though as the local cluster is not very powerful - even if you give it more memory than I did above. But fill it with a few hundred rows and use it to debug your code and once it runs fine, run it happily ever after on your production site.&lt;br /&gt;&lt;br /&gt;All the credit goes to the Hadoop and HBase teams of course, I simply gathered their code from various places.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-3728820754682891186?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/DWGMLGGvNvumu3EvgSd3FeFshR4/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/DWGMLGGvNvumu3EvgSd3FeFshR4/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/DWGMLGGvNvumu3EvgSd3FeFshR4/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/DWGMLGGvNvumu3EvgSd3FeFshR4/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/d-oy3XmNhfw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/3728820754682891186/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/02/mini-local-hbase-cluster.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/3728820754682891186?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/3728820754682891186?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/d-oy3XmNhfw/mini-local-hbase-cluster.html" title="Mini Local HBase Cluster" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>4</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/02/mini-local-hbase-cluster.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A08ERHk-fip7ImA9WxVQGEg.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-8517736107078165827</id><published>2009-02-05T10:21:00.001-08:00</published><updated>2009-02-05T10:50:05.756-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-02-05T10:50:05.756-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="linux" /><category scheme="http://www.blogger.com/atom/ns#" term="work" /><category scheme="http://www.blogger.com/atom/ns#" term="apache" /><title>Apache fails on Semaphores</title><content type="html">In the last few years I had twice an issue with our Apache web servers where all of a sudden they would crash and not start again. While there are obvious &lt;a href="http://www.cyberciti.biz/faq/troubleshooting-apache-webserver-will-not-restart-start/"&gt;reasons&lt;/a&gt; in case the configuration is screwed up there are also cases where you simply do not know why it would not restart. There is enough drive space, RAM, no other processes running locking the port (even checked with lsof). &lt;br /&gt;&lt;br /&gt;All you get is an error message in the log saying:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;[Fri May 21 15:34:22 2008] [crit] (28)No space left on device: mod_rewrite: could not create rewrite_log_lock&lt;br /&gt;Configuration Failed &lt;/code&gt;&lt;br /&gt;&lt;br /&gt;After some digging the issue was that all semaphores were used up and had to be deleted first. Here is a script I use to do that:&lt;br /&gt;&lt;pre name="code" class="bash"&gt;echo "Semaphores found: "&lt;br /&gt;ipcs -s | awk '{ print $2 }' | wc -l&lt;br /&gt;ipcs -s | awk '{ print $2 }' | xargs -n 1 ipcrm sem&lt;br /&gt;echo "Semaphores found after removal: "&lt;br /&gt;ipcs -s | awk '{ print $2 }' | wc -l&lt;/pre&gt;&lt;br /&gt;Sometimes you really wonder what else could go wrong.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-8517736107078165827?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/yutRGHutLsxwldw7WxJjNLswOEg/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/yutRGHutLsxwldw7WxJjNLswOEg/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/yutRGHutLsxwldw7WxJjNLswOEg/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/yutRGHutLsxwldw7WxJjNLswOEg/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/hkRV9fF-la8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/8517736107078165827/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/02/apache-fails-on-semaphores.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/8517736107078165827?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/8517736107078165827?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/hkRV9fF-la8/apache-fails-on-semaphores.html" title="Apache fails on Semaphores" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/02/apache-fails-on-semaphores.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0EMQXkzfip7ImA9WxVQF0g.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-2102802439764735087</id><published>2009-02-04T03:48:00.001-08:00</published><updated>2009-02-04T04:48:00.786-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-02-04T04:48:00.786-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="work" /><category scheme="http://www.blogger.com/atom/ns#" term="xml" /><category scheme="http://www.blogger.com/atom/ns#" term="xsl" /><category scheme="http://www.blogger.com/atom/ns#" term="xslt" /><title>String starts with a number in XSL</title><content type="html">I needed a way to test if a particular value in an XML file started with a letter based prefix. If not, then the value would start with a number and needed to be prefixed first before being output. While I found a great &lt;a href="http://www.melandri.net/xml-programming/remove-leading-zeros-in-an-xsl/"&gt;post&lt;/a&gt; how to remove leading zeros I could not find how to check if the first letter is of a particular type, for example a letter or a number. In Java you can do that easily like so (using &lt;a href="http://www.beanshell.org/"&gt;BeanShell&lt;/a&gt; here):&lt;br /&gt;&lt;pre name="code" class="java"&gt;&lt;br /&gt;bsh % String s1 = "1234";&lt;br /&gt;bsh % String s2 = "A1234";&lt;br /&gt;bsh % print(Character.isLetter(s1.charAt(0)));&lt;br /&gt;false&lt;br /&gt;bsh % print(Character.isLetter(s2.charAt(0)));&lt;br /&gt;true &lt;/pre&gt;&lt;br /&gt;This is of course Unicode safe. With XSL though I could not find a similar feature but for my purposes it was sufficient to reverse the check and see if I had a Latin number first. Here is how: &lt;br /&gt;&lt;pre name="code" class="xml"&gt;&amp;lt;xsl:template match="/person/personnumber"&amp;gt;&lt;br /&gt;  &amp;lt;reference&amp;gt;&lt;br /&gt;    &amp;lt;xsl:variable name="num"&amp;gt;&lt;br /&gt;      &amp;lt;xsl:value-of select="."/&amp;gt;&lt;br /&gt;    &amp;lt;/xsl:variable&amp;gt;&lt;br /&gt;    &amp;lt;xsl:choose&amp;gt;&lt;br /&gt;      &amp;lt;xsl:when test="contains('0123456789', substring($num, 1, 1))"&amp;gt;&lt;br /&gt;        &amp;lt;xsl:variable name="snum"&amp;gt;&lt;br /&gt;          &amp;lt;xsl:call-template name="removeLeadingZeros"&amp;gt;&lt;br /&gt;            &amp;lt;xsl:with-param name="originalString" select="$num"/&amp;gt;&lt;br /&gt;          &amp;lt;/xsl:call-template&amp;gt;&lt;br /&gt;        &amp;lt;/xsl:variable&amp;gt;&lt;br /&gt;        &amp;lt;xsl:value-of select="concat('PE', $snum)"/&amp;gt;&lt;br /&gt;      &amp;lt;/xsl:when&amp;gt;&lt;br /&gt;      &amp;lt;xsl:otherwise&amp;gt;&lt;br /&gt;        &amp;lt;xsl:value-of select="."/&amp;gt;&lt;br /&gt;      &amp;lt;/xsl:otherwise&amp;gt;&lt;br /&gt;    &amp;lt;/xsl:choose&amp;gt;&lt;br /&gt;  &amp;lt;/reference&amp;gt;&lt;br /&gt;&amp;lt;/xsl:template&amp;gt; &lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-2102802439764735087?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/rTLEU1sTOZ7nbaY3iISgqv_WNX8/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/rTLEU1sTOZ7nbaY3iISgqv_WNX8/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/rTLEU1sTOZ7nbaY3iISgqv_WNX8/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/rTLEU1sTOZ7nbaY3iISgqv_WNX8/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/JEQ9Zh65jbA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/2102802439764735087/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/02/string-starts-with-number-in-xsl.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/2102802439764735087?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/2102802439764735087?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/JEQ9Zh65jbA/string-starts-with-number-in-xsl.html" title="String starts with a number in XSL" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/02/string-starts-with-number-in-xsl.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkIHRH4zeCp7ImA9WxVQFko.&quot;"><id>tag:blogger.com,1999:blog-860423771829255614.post-2511722452909278051</id><published>2009-02-02T05:27:00.001-08:00</published><updated>2009-02-03T08:28:55.080-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-02-03T08:28:55.080-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="work" /><category scheme="http://www.blogger.com/atom/ns#" term="hbase" /><category scheme="http://www.blogger.com/atom/ns#" term="hadoop" /><title>Hadoop Scripts</title><content type="html">If you work with HBase and Hadoop in particular, you start off doing most things on the command line. After a while this is getting tedious and - in the end - becomes a nuisance. And error prone! While I wish there would be a an existing and established solution out there that helps managing a Hadoop cluster I find that there are few that you can use right now. Of the few that come to mind is the "Hadoop on Demand" (HOD) package residing in the contribution folder of the Hadoop releases. The other is &lt;a href="http://hadoop.apache.org/zookeeper/docs/current/"&gt;ZooKeeper&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;Interesting things are in the pipeline though, for example &lt;a href="http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/aw-apachecon-eu-2008.pdf"&gt;Simon&lt;/a&gt; from Yahoo!.&lt;br /&gt;&lt;br /&gt;What I often need are small helpers that allow me to clean up behind me or which helps me deploy new servers. There are different solutions that usually involve some sort of combination of SSH and rsync. Tools I found and some of them even tried are &lt;a href="http://www.smartfrog.org/"&gt;SmartFrog&lt;/a&gt;, &lt;a href="http://reductivelabs.com/trac/puppet/"&gt;Puppet&lt;/a&gt;, and &lt;a href="http://tentakel.biskalar.de/"&gt;Tentakel&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Especially in the beginning you often find that you do not know these tools well enough or they do one thing - but not another. Of course, you can combine them and make that work somehow. I usually resort to a set of well known and proven scripts that I created over time to simplify working with a particular system. With Hadoop most of these scripts are run on the master and since it already is set up to use SSH to talk to all slaves it makes it easy to use the same mechanism.&lt;br /&gt;&lt;br /&gt;The first one is to show all Java processes across the machines to see that they are up - or all shut down before attempting a new start:&lt;br /&gt;&lt;pre name="code" class="bash"&gt;#!/bin/bash&lt;br /&gt;# $Revision: 1.0 $&lt;br /&gt;#&lt;br /&gt;# Shows all Java processes on the Hadoop cluster.&lt;br /&gt;#&lt;br /&gt;# Created 2008/01/07 by Lars George&lt;br /&gt;#&lt;br /&gt;&lt;br /&gt;servers="$(cat /usr/local/hadoop/conf/masters /usr/local/hadoop/conf/slaves)"&lt;br /&gt;&lt;br /&gt;for srv in $servers; do&lt;br /&gt;  echo "Sending command to $srv..."; &lt;br /&gt;  ssh $srv "ps aux | grep -v grep | grep java"&lt;br /&gt;done&lt;br /&gt;&lt;br /&gt;echo "done."&lt;/pre&gt;&lt;br /&gt;The next one is a poor-man's deployment thingamajig. It helps copying a new release across the machines and setting up the symbolic link I use for the current version in production. Of course this all varies with your setup.&lt;br /&gt;&lt;pre name="code" class="bash"&gt;#!/bin/bash&lt;br /&gt;# $Revision: 1.0 $&lt;br /&gt;#&lt;br /&gt;# Rsync's Hadoop files across all slaves. Must run on namenode.&lt;br /&gt;#&lt;br /&gt;# Created 2008/01/03 by Lars George&lt;br /&gt;#&lt;br /&gt;&lt;br /&gt;if [ "$#" != "2" ]; then&lt;br /&gt;  echo "usage: $(basename $0) &amp;lt;dir-name&amp;gt; &amp;lt;ln-name&amp;gt;"&lt;br /&gt;  echo "  example: $(basename $0) hbase-0.1 hbase"&lt;br /&gt;  exit 1&lt;br /&gt;fi&lt;br /&gt;&lt;br /&gt;for srv in $(cat /usr/local/hadoop/conf/slaves); do&lt;br /&gt;  echo "Sending command to $srv..."; &lt;br /&gt;  rsync -vaz --exclude='logs/*' /usr/local/$1 $srv:/usr/local/&lt;br /&gt;  ssh $srv "rm -fR /usr/local/$2 ; ln -s /usr/local/$1 /usr/local/$2"&lt;br /&gt;done&lt;br /&gt;&lt;br /&gt;echo "done."&lt;/pre&gt;&lt;br /&gt;I basically download a new version on the master (or build one) and issue a &lt;br /&gt;&lt;br /&gt;&lt;code&gt;$ rsyncnewhadoop /usr/local/hadoop-0.19.0 hadoop&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;It copies the directory across and changes the "/usr/local/hadoop" symbolic link to point to the this new release.&lt;br /&gt;&lt;br /&gt;Another helper I use quite often is to diff an existing and a new version before I actually copy them across the cluster. It can be used like so:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;$ diffnewversion /usr/local/hbase-0.19.0 /usr/local/hadoop-0.19.0&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Again I assume that the current version is symlinked as explained above. Otherwise you would have to make adjustements obviously.&lt;br /&gt;&lt;pre name="code" class="bash"&gt;#!/bin/bash&lt;br /&gt;#&lt;br /&gt;# Diff's the configuration files between the current symlinked versions and the given one.&lt;br /&gt;#&lt;br /&gt;# Created 2009/01/23 by Lars George&lt;br /&gt;#&lt;br /&gt;&lt;br /&gt;if [[ $# == 0 ]]; then&lt;br /&gt;  echo "usage: $(basename $0) &amp;lt;new_dir&amp;gt; [&amp;lt;new_dir&amp;gt;]"&lt;br /&gt;  exit 1;&lt;br /&gt;fi&lt;br /&gt;&lt;br /&gt;DIRS="conf bin"&lt;br /&gt;&lt;br /&gt;for path in $*; do&lt;br /&gt;  if [[ "$1" == *hadoop* ]]; then&lt;br /&gt;    kind="hadoop"&lt;br /&gt;  else&lt;br /&gt;    kind="hbase"&lt;br /&gt;  fi&lt;br /&gt;  for dir in $DIRS; do  &lt;br /&gt;    echo&lt;br /&gt;    echo "Comparing $kind $dir directory..."&lt;br /&gt;    echo&lt;br /&gt;    for f in /usr/local/$kind/$dir/*; do &lt;br /&gt;      echo&lt;br /&gt;      echo&lt;br /&gt;      echo "Checking $(basename $f)" &lt;br /&gt;      diff -w $f $1/$dir/$(basename $f)&lt;br /&gt;      if [[ $? == 0 ]]; then&lt;br /&gt;        echo "Files are the same..."&lt;br /&gt;      fi&lt;br /&gt;      echo &lt;br /&gt;      echo "================================================================"&lt;br /&gt;    done &lt;br /&gt;  done&lt;br /&gt;  shift 1&lt;br /&gt;done&lt;br /&gt;&lt;br /&gt;echo "done."&lt;/pre&gt;&lt;br /&gt;The last one I am posting here helps removing the Distributed File System (DFS) after for example a complete corruption (I didn't say they happen) or when you want to have a clean start. &lt;br /&gt;&lt;br /&gt;Note: It assumes that the data is stored under "/data1/hadoop" and "/data2/hadoop" - that is where I have my data. If yours is different then adjust the path or - if you like - grep/awk the hadoop-site.xml and parse the paths out of the "dfs.name.dir" and "dfs.data.dir" respectively.&lt;br /&gt;&lt;pre name="code" class="bash"&gt;#!/bin/bash&lt;br /&gt;# $Revision: 1.0 $&lt;br /&gt;#&lt;br /&gt;# Deletes all files and directories pertaining to the Hadoop DFS.&lt;br /&gt;#&lt;br /&gt;# Created 2008/12/12 by Lars George&lt;br /&gt;#&lt;br /&gt;&lt;br /&gt;servers="$(cat /usr/local/hadoop/conf/masters /usr/local/hadoop/conf/slaves)"&lt;br /&gt;# optionally allow single server use&lt;br /&gt;if [[ $# &amp;gt; 0 ]]; then &lt;br /&gt;  servers="$*"&lt;br /&gt;fi&lt;br /&gt;first="$(echo $servers | head -n 1 | awk -F. '{ print $1 }')"&lt;br /&gt;dirs="/tmp/hbase* /tmp/hsperfdata* /tmp/task* /tmp/Jetty* /data1/hadoop/* /data2/hadoop/*"&lt;br /&gt;&lt;br /&gt;echo "IMPORTANT: Are you sure you want to delete the DFS starting with $first?"&lt;br /&gt;echo "Type \"yes\" to continue:"&lt;br /&gt;read yes&lt;br /&gt;if [ "$yes" == "yes" ]; then&lt;br /&gt;  for srv in $servers; do&lt;br /&gt;    echo "Sending command to $srv..."; &lt;br /&gt;    for dir in $dirs; do&lt;br /&gt;      pa=$(dirname $dir)&lt;br /&gt;      fn=$(basename $dir)&lt;br /&gt;      echo "removing $pa/$fn...";&lt;br /&gt;      ssh $srv "find $pa -name \"$fn\" -type f -delete ; rm -fR $pa/$fn" &lt;br /&gt;    done&lt;br /&gt;  done&lt;br /&gt;else &lt;br /&gt;  echo "aborted."&lt;br /&gt;fi&lt;br /&gt;&lt;br /&gt;echo "done."&lt;/pre&gt;&lt;br /&gt;I have a few others that for example let me kill runaway Java processes, sync only config changes across the cluster machines, starts and stops safely, and so on. I won't post them here as they are pretty trivial like the ones above or do not differ much. Let me know if you have similar scripts or better ones!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/860423771829255614-2511722452909278051?l=www.larsgeorge.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/gqd1qIqTXBIEuaV5OjUBTt_aKpE/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/gqd1qIqTXBIEuaV5OjUBTt_aKpE/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/gqd1qIqTXBIEuaV5OjUBTt_aKpE/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/gqd1qIqTXBIEuaV5OjUBTt_aKpE/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/Lineland/~4/9FRO5_3AOsQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.larsgeorge.com/feeds/2511722452909278051/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.larsgeorge.com/2009/02/hadoop-scripts-part-1.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/2511722452909278051?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/860423771829255614/posts/default/2511722452909278051?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Lineland/~3/9FRO5_3AOsQ/hadoop-scripts-part-1.html" title="Hadoop Scripts" /><author><name>Lars George</name><uri>http://www.blogger.com/profile/18168538475015227467</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07492346046977538662" /></author><thr:total>2</thr:total><feedburner:origLink>http://www.larsgeorge.com/2009/02/hadoop-scripts-part-1.html</feedburner:origLink></entry></feed>
