<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>DataStax</title>
	
	<link>http://www.datastax.com</link>
	<description>DataStax - Software, support, and training for Apache Cassandra</description>
	<lastBuildDate>Fri, 24 Feb 2012 00:25:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/datastax-dev" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="datastax-dev" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Shopsavvy, big data, and Cassandra</title>
		<link>http://www.datastax.com/dev/blog/shopsavvy-big-data-and-cassandra</link>
		<comments>http://www.datastax.com/dev/blog/shopsavvy-big-data-and-cassandra#comments</comments>
		<pubDate>Thu, 23 Feb 2012 14:25:02 +0000</pubDate>
		<dc:creator>Robin Schumacher</dc:creator>
				<category><![CDATA[Blog Post]]></category>

		<guid isPermaLink="false">http://www.datastax.com/?post_type=dev-post&amp;p=9402</guid>
		<description><![CDATA[
When you get a chance, check out this <a href="http://shopsavvy.mobi/2012/02/18/big-data-meet-shopsavvy/"> blog post</a> from Shopsavvy on how they&#8217;re using Cassandra and Hadoop to manage their big data systems. Some interesting stats there&#8230;]]></description>
			<content:encoded><![CDATA[<p>
When you get a chance, check out this <a href="http://shopsavvy.mobi/2012/02/18/big-data-meet-shopsavvy/"> blog post</a> from Shopsavvy on how they&#8217;re using Cassandra and Hadoop to manage their big data systems. Some interesting stats there&#8230;]]></content:encoded>
			<wfw:commentRss>http://www.datastax.com/dev/blog/shopsavvy-big-data-and-cassandra/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Coming up in Cassandra 1.1: Row Level Isolation</title>
		<link>http://www.datastax.com/dev/blog/row-level-isolation</link>
		<comments>http://www.datastax.com/dev/blog/row-level-isolation#comments</comments>
		<pubDate>Tue, 21 Feb 2012 14:46:57 +0000</pubDate>
		<dc:creator>Sylvain Lebresne</dc:creator>
		
		<guid isPermaLink="false">http://www.datastax.com/?post_type=dev-post&amp;p=9373</guid>
		<description><![CDATA[While Apache Cassandra does not provide <a href="http://en.wikipedia.org/wiki/ACID">ACID</a> properties (no complex transactions support), it still provides some useful atomicity guarantees.



More precisely, Cassandra has always provided row-level atomicity of batch mutations. This means that multiple batched writes to the same row are persisted by nodes atomically. When doing
<code></code>&#8230;]]></description>
			<content:encoded><![CDATA[<p>
While Apache Cassandra does not provide <a href="http://en.wikipedia.org/wiki/ACID">ACID</a> properties (no complex transactions support), it still provides some useful atomicity guarantees.
</p>

<p>
More precisely, Cassandra has always provided row-level atomicity of batch mutations. This means that multiple batched writes to the same row are persisted by nodes atomically. When doing
<code>
&nbsp;&nbsp;&nbsp;&nbsp;UPDATE Users
&nbsp;&nbsp;&nbsp;&nbsp;SET login='eric22' AND password='f3g$dq!'
&nbsp;&nbsp;&nbsp;&nbsp;WHERE key='550e8400-e29b-41d4-a716-446655440000'
</code>
Cassandra guarantees that the new <tt>login</tt> and <tt>password</tt> are either both persisted or none are.
</p>

<p>
However, up to Cassandra 1.0, the isolation of such an update was not guaranteed. In other words, it is possible (during a very brief moment during the update) that a read like<br />
<code>
&nbsp;&nbsp;&nbsp;&nbsp;SELECT login, password
&nbsp;&nbsp;&nbsp;&nbsp;FROM Users
&nbsp;&nbsp;&nbsp;&nbsp;WHERE key='550e8400-e29b-41d4-a716-446655440000'
</code>
returns the new login (<tt>'eric22'</tt>) but not the new password (<tt>'f3g$dq!'</tt>). This changes in Cassandra 1.1 as row-level updates are now <a href="https://issues.apache.org/jira/browse/CASSANDRA-2893">made in isolation</a>. Cassandra 1.1 guarantees that if you update both the login and password in the same update (for the same row key) then no concurrent read may see only a partial update.
</p>

<p>
These atomicity and isolation guarantees apply to columns written under the same physical row, i.e. that are within the same column family and share the same <a href="http://www.datastax.com/dev/blog/schema-in-cassandra-1-1">partition key</a>. For atomicity, the guarantee actually extends across column families (within the same keyspace): updates for the same partition key are persisted atomically even for different column families. This is not the case however for isolation (updates to different column families are not isolated).
</p>

<p>
Note that when we say that Cassandra persists row-level writes atomically, this applies to each node of the cluster individually; Cassandra does not provide any cluster-wide rollback mechanism. In the preceding example, the guarantee is that the new login cannot be persisted without the new password being persisted too (and vice-versa). It is however possible for both to be persisted even if the client operation end up with a timeout (because not enough nodes have acknowledge the write to satisfy the requested consistency level). It is up to the client to retry a failed write in such cases.
</p>

<h2>Implementation details</h2>

<p></p>

<h3>Atomicity</h3>

<p>
Internally, the row-level atomicity is guaranteed mainly by the commit log.  Upon reception by the coordinator, each write query is transformed into a bunch of &#8216;RowMutation&#8217;. Each of those RowMutation regroups all updates for a given row key (even for different column families). On every replica, each RowMutation is first serialized and written to the commit log as one mutation (individually checksummed for assessing integrity in case of failure).  This ensures that on failure, that RowMutation is either replayed entirely (if it had been completely written in the commit log and isn&#8217;t corrupted) or not at all. The other part of guaranteeing the atomicity of persistence comes from the fact that a given RowMutation is applied to one and only one memtable. It follows that the RowMutation (all the updates from a client query for a given row key) can only be persisted together or not at all.
</p>

<h3>Isolation</h3>

<p>
To a large extent, the log-structured nature of Cassandra storage engine makes row-level isolation easier. Writes are applied to memtables that are then persisted as sstables which are immutable. Thus ensuring that a RowMutation is applied to the current memtable in isolation (of other writes and reads) is enough to ensure complete isolation. That is what was added to Cassandra 1.1, the application of RowMutation to memtables in isolation. Technically, we use <a href="https://github.com/nbronson/snaptree/">SnapTree</a> copy-on-write clone facilities: all the columns of a new mutation are applied to a non-visible (and thus isolated) copy of the in-memtable row they are applied to and then we atomically replace the original row with the new copy through a <a href="http://en.wikipedia.org/wiki/Compare-and-swap">compare-and-set</a>.
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datastax.com/dev/blog/row-level-isolation/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>DataStax Enterprise 1.0.2 Service Pack Now Available</title>
		<link>http://www.datastax.com/dev/blog/datastax-enterprise-1-0-2-service-pack-now-available</link>
		<comments>http://www.datastax.com/dev/blog/datastax-enterprise-1-0-2-service-pack-now-available#comments</comments>
		<pubDate>Fri, 17 Feb 2012 13:39:54 +0000</pubDate>
		<dc:creator>Robin Schumacher</dc:creator>
				<category><![CDATA[Blog Post]]></category>

		<guid isPermaLink="false">http://www.datastax.com/?post_type=dev-post&amp;p=9363</guid>
		<description><![CDATA[
We&#8217;re please to let you know that DataStax Enterprise service pack 1.0.2 is now available for <a href="http://www.datastax.com/download/enterprise/versions">download</a>. Please see the <a href="http://www.datastax.com/docs/1.0/datastax_enterprise/dse_release_notes">release notes</a> for the changes included in the service pack and the <a href="http://www.datastax.com/docs">online documentation</a> for upgrade instructions. 

Remember that DataStax Enterprise is completely free for non-production use. ]]></description>
			<content:encoded><![CDATA[<p>
We&#8217;re please to let you know that DataStax Enterprise service pack 1.0.2 is now available for <a href="http://www.datastax.com/download/enterprise/versions">download</a>. Please see the <a href="http://www.datastax.com/docs/1.0/datastax_enterprise/dse_release_notes">release notes</a> for the changes included in the service pack and the <a href="http://www.datastax.com/docs">online documentation</a> for upgrade instructions. 

Remember that DataStax Enterprise is completely free for non-production use. ]]></content:encoded>
			<wfw:commentRss>http://www.datastax.com/dev/blog/datastax-enterprise-1-0-2-service-pack-now-available/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PDF Documentation Now Available</title>
		<link>http://www.datastax.com/dev/blog/pdf-documentation-now-available</link>
		<comments>http://www.datastax.com/dev/blog/pdf-documentation-now-available#comments</comments>
		<pubDate>Fri, 17 Feb 2012 00:25:25 +0000</pubDate>
		<dc:creator>Robin Schumacher</dc:creator>
				<category><![CDATA[Blog Post]]></category>

		<guid isPermaLink="false">http://www.datastax.com/?post_type=dev-post&amp;p=9358</guid>
		<description><![CDATA[
I&#8217;m happy to let you know that our docs team has delivered on one of your top requests, which was to make our documentation available in downloadable PDF format. You can now find both the online HTML and PDF formats on our <a href="http://www.datastax.com/docs">documentation page</a>. Enjoy!]]></description>
			<content:encoded><![CDATA[<p>
I&#8217;m happy to let you know that our docs team has delivered on one of your top requests, which was to make our documentation available in downloadable PDF format. You can now find both the online HTML and PDF formats on our <a href="http://www.datastax.com/docs">documentation page</a>. Enjoy!]]></content:encoded>
			<wfw:commentRss>http://www.datastax.com/dev/blog/pdf-documentation-now-available/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What’s your Biggest Cassandra Pain? Vote in New Poll…</title>
		<link>http://www.datastax.com/dev/blog/whats-your-biggest-cassandra-pain-vote-in-new-poll</link>
		<comments>http://www.datastax.com/dev/blog/whats-your-biggest-cassandra-pain-vote-in-new-poll#comments</comments>
		<pubDate>Thu, 16 Feb 2012 23:51:55 +0000</pubDate>
		<dc:creator>Robin Schumacher</dc:creator>
				<category><![CDATA[Blog Post]]></category>

		<guid isPermaLink="false">http://www.datastax.com/?post_type=dev-post&amp;p=9353</guid>
		<description><![CDATA[Is it performance tuning? Using the Cassandra data model? Loading data? Something else? 

We&#8217;re figuring out the Cassandra roadmap for the rest of 2012 and need input now on your biggest Cassandra pain point. Help us out and take a couple of seconds to <a href="http://www.datastax.com/dev">vote in our new</a>&#8230;]]></description>
			<content:encoded><![CDATA[<p>
Is it performance tuning? Using the Cassandra data model? Loading data? Something else? 

We&#8217;re figuring out the Cassandra roadmap for the rest of 2012 and need input now on your biggest Cassandra pain point. Help us out and take a couple of seconds to <a href="http://www.datastax.com/dev">vote in our new poll</a>. Thanks for the help!  ]]></content:encoded>
			<wfw:commentRss>http://www.datastax.com/dev/blog/whats-your-biggest-cassandra-pain-vote-in-new-poll/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Schema in Cassandra 1.1</title>
		<link>http://www.datastax.com/dev/blog/schema-in-cassandra-1-1</link>
		<comments>http://www.datastax.com/dev/blog/schema-in-cassandra-1-1#comments</comments>
		<pubDate>Wed, 15 Feb 2012 22:01:18 +0000</pubDate>
		<dc:creator>Jonathan Ellis</dc:creator>
		
		<guid isPermaLink="false">http://www.datastax.com/?post_type=dev-post&amp;p=9307</guid>
		<description><![CDATA[The evolution of schema in Cassandra
When Cassandra was first released several years ago, it followed closely the data model outlined in Google&#8217;s <a href="http://research.google.com/archive/bigtable.html">Bigtable</a> paper (with the notable addition of SuperColumns &#8212; more on these later): ColumnFamilies grouping related columns needed to be defined up-front, but column names were&#8230;]]></description>
			<content:encoded><![CDATA[<h2>The evolution of schema in Cassandra</h2>
When Cassandra was first released several years ago, it followed closely the data model outlined in Google&#8217;s <a href="http://research.google.com/archive/bigtable.html">Bigtable</a> paper (with the notable addition of SuperColumns &#8212; more on these later): ColumnFamilies grouping related columns needed to be defined up-front, but column names were just byte arrays interpreted by the application.  It would be fair to characterize this early Cassandra data model as &#8220;schemaless.&#8221;

However, as systems deployed on Cassandra grew and matured, lack of schema became a pain point.  When multiple teams are using the same data, it&#8217;s very useful to be able to ask &#8220;what data is in this table (or columnfamily),&#8221; without diving into the source of the code that uses it.  And as more codebases share a database, it also becomes more useful to have the database validate that the birth_date column in one row is always an integer.

So, starting with the 0.7 release roughly a year ago, Cassandra has first allowed, then encouraged telling Cassandra about your data types.  I&#8217;ve taken to describing Cassandra as &#8220;Schema-optional:&#8221; it&#8217;s not required, and you can ignore it at first then go back and add it later if you&#8217;d rather, but it&#8217;s a good habit to get into.  Today, doing this in <a href="http://www.datastax.com/dev/blog/what%E2%80%99s-new-in-cassandra-0-8-part-1-cql-the-cassandra-query-language">CQL</a> looks familiar:

<code>
CREATE TABLE users (
&nbsp;&nbsp;&nbsp;&nbsp;id uuid PRIMARY KEY,
&nbsp;&nbsp;&nbsp;&nbsp;name varchar,
&nbsp;&nbsp;&nbsp;&nbsp;state varchar
);</code>

<code> </code>

<code>ALTER TABLE users ADD birth_date INT;
</code>

(Using UUIDs as a <a href="http://en.wikipedia.org/wiki/Surrogate_key">surrogate key</a> is common in Cassandra, so that you don&#8217;t need to worry about sequence or autoincrement synchronization across multiple machines.)
<h2>The best of both worlds</h2>
Superficially it may sound like Cassandra is headed back where relational databases started: every column predefined and typed.  The big difference is in the practical limitations of Cassandra&#8217;s <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.2782">log-structured merge-tree</a> storage engine, compared to RDBMS <a href="http://en.wikipedia.org/wiki/B-tree">b-trees</a>.

Without going into too much detail, traditional storage engines allocate room for each column in each row, up front.  (Rows that have different sets of columns are grudgingly accomodated via nulls.)

<div id="attachment_9331" class="wp-caption aligncenter" style="width: 479px"><a rel="attachment wp-att-9331" href="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.06.24-PM.png" rel="facebox"><img class="size-full wp-image-9331" title="Static columns" src="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.06.24-PM.png" alt="" width="469" height="211" /></a><p class="wp-caption-text">In a static-column storage engine, each row must reserve space for every column</p></div>

In Cassandra&#8217;s storage engine, each row is sparse: for a given row, we store only the columns present in that row.  Technically this implies that we store the column names redundantly in each row, trading disk space to gain flexibility.  Thus, adding columns to a Cassandra table always only takes a few milliseconds, rather than growing from minutes to hours or even <a href="http://twitoaster.com/country-us/abraham/15-months-ago-twitter-had-800-million-tweets-and-alter-table-took-2-weeks-twitter-is-now-over-12-billion-chirp/">weeks</a> as data is added to the table with a storage engine that needs to re-allocate space row by row to accommodate the new data.
<p style="text-align: center;">&nbsp;</p>


<div id="attachment_9333" class="wp-caption aligncenter" style="width: 578px"><a rel="attachment wp-att-9333" href="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.11.01-PM1.png" rel="facebox"><img class="size-full wp-image-9333 " title="Sparse columns" src="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.11.01-PM1.png" alt="" width="568" height="183" /></a><p class="wp-caption-text">In a sparse-column engine, space is only used by columns present in each row</p></div>

This also means that Cassandra can easily support thousands of columns per table, without wasting space if each row only needs a few of them.

Thus, Cassandra gives you the flexibility normally associated with schemaless systems, while also delivering the benefits of having a defined schema.
<h2>Clustering, composite keys, and more</h2>
Starting in the upcoming Cassandra 1.1 release, CQL (the <a href="http://www.datastax.com/dev/blog/what%E2%80%99s-new-in-cassandra-0-8-part-1-cql-the-cassandra-query-language">Cassandra Query Language</a>) supports defining columnfamilies with composite primary keys.  The first column in a composite key definition continues to be used as the <a href="http://www.datastax.com/docs/1.0/cluster_architecture/partitioning">partition key</a>, and remaining columns are automatically <a href="http://en.wikipedia.org/wiki/Database_index#Clustered">clustered</a>: that is, all the rows sharing a given partition key will be sorted by the remaining components of the primary key.

For example, consider the <tt>sblocks</tt> table in the <a href="http://www.datastax.com/dev/blog/cassandra-file-system-design">CassandraFS</a> data model:

<code>
CREATE TABLE sblocks (
&nbsp;&nbsp;&nbsp;&nbsp;block_id uuid,
&nbsp;&nbsp;&nbsp;&nbsp;subblock_id uuid,
&nbsp;&nbsp;&nbsp;&nbsp;data blob,
&nbsp;&nbsp;&nbsp;&nbsp;PRIMARY KEY (block_id, subblock_id)
)
WITH COMPACT STORAGE;
</code>

The first element of the primary key, <tt>block_id</tt>, is the partition key, which means that all subblocks of a given block will be routed to the same replicas.  For each block, subblocks are also ordered by the subblock id.  <a href="http://www.datastax.com/products/enterprise">DataStax Enterprise</a> uses this property to make sure that <tt>SELECT data FROM sblocks WHERE block_id = ?</tt> is sequential i/o in subblock_id order.

Composite keys can also be useful when denormalizing data for faster queries.  Consider a Twitter data model like <a href="https://github.com/twissandra/twissandra">Twissandra&#8217;s</a>.  We have tweet data:

<code>
CREATE TABLE tweets (
&nbsp;&nbsp;&nbsp;&nbsp;tweet_id uuid PRIMARY KEY,
&nbsp;&nbsp;&nbsp;&nbsp;author varchar,
&nbsp;&nbsp;&nbsp;&nbsp;body varchar
);
</code>

But the most frequent query (&#8220;show me the 20 most recent tweets from people I follow&#8221;) would be expensive against a normalized model.  So we denormalize into another table:

<code>
CREATE TABLE timeline (
&nbsp;&nbsp;&nbsp;&nbsp;user_id varchar,
&nbsp;&nbsp;&nbsp;&nbsp;tweet_id uuid,
&nbsp;&nbsp;&nbsp;&nbsp;author varchar,
&nbsp;&nbsp;&nbsp;&nbsp;body varchar,
&nbsp;&nbsp;&nbsp;&nbsp;PRIMARY KEY (user_id, tweet_id)
);
</code>

That is, any time a given author makes a tweet, we look up who follows him, and insert a copy of the tweet into the followers&#8217; timeline.  Cassandra orders <a href="http://en.wikipedia.org/wiki/Universally_unique_identifier#Version_1_.28MAC_address.29">version 1 UUIDs</a> by their time component, so <tt>SELECT * FROM timeline WHERE user_id = ? ORDER BY tweet_id DESC LIMIT 20</tt> requires no sort at query time.

(At the time of this writing, ORDER BY syntax is <a href="https://issues.apache.org/jira/browse/CASSANDRA-3925">being finalized</a>; this is my best guess as to what it will look like.)
<h2>Under the hood and historical notes</h2>
Cassandra&#8217;s storage engine uses <a href="http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1">composite columns</a> under the hood to store clustered rows. This means that all the logical rows with the same partition key get stored as a single physical &#8220;wide row.&#8221; This is why Cassandra supports up to <a href="https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces9">2 billion columns</a> per (physical) row, and why Cassandra&#8217;s old Thrift api has methods to take &#8220;slices&#8221; of such rows.

To illustrate this, let&#8217;s consider three tweets for our timeline data model above:
<p style="text-align: center;">

<div id="attachment_9336" class="wp-caption aligncenter" style="width: 622px"><a rel="attachment wp-att-9336" href="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.13.49-PM.png" rel="facebox"><img class="size-full wp-image-9336 " title="Tweets" src="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.13.49-PM.png" alt="" width="612" height="262" /></a><p class="wp-caption-text">Raw tweet data</p></div>

</p>
We&#8217;ll have timeline entries for jadams, who follows gwashington and jmadison, and ahamilton, who follows gwashington and gmason. I&#8217;ve colored these rows by their partition key, the user_id:
<p style="text-align: center;">

<div id="attachment_9337" class="wp-caption aligncenter" style="width: 548px"><a rel="attachment wp-att-9337" href="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.17.44-PM.png" rel="facebox"><img class="size-full wp-image-9337 " title="Denormalized tweets, logical representation" src="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.17.44-PM.png" alt="" width="538" height="206" /></a><p class="wp-caption-text">Logical representation of the denormalized timeline rows</p></div>

</p>
The physical layout of this data looks like this to Cassandra&#8217;s storage engine:

<div id="attachment_9338" class="wp-caption aligncenter" style="width: 589px"><a rel="attachment wp-att-9338" href="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.20.23-PM.png" rel="facebox"><img class="size-full wp-image-9338" title="Denormalized tweets, physical representation" src="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.20.23-PM.png" alt="" width="579" height="137" /></a><p class="wp-caption-text">Physical representation of the denormalized timeline rows</p></div>
<p style="text-align: center;">&nbsp;</p>
The  <tt>WITH COMPACT STORAGE</tt> directive is provided for backwards compatibility with older Cassandra applications, as in the CassandraFS example above; new applications should avoid it.  Using <tt>COMPACT STORAGE</tt> will prevent you from adding new columns that are not part of the <tt>PRIMARY KEY</tt>. With <tt>COMPACT STORAGE</tt>, each logical row corresponds to exactly one physical column:

<div id="attachment_9339" class="wp-caption aligncenter" style="width: 415px"><a rel="attachment wp-att-9339" href="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.21.40-PM.png" rel="facebox"><img class="size-full wp-image-9339" title="WITH COMPACT" src="http://www.datastax.com/wp-content/uploads/2012/02/Screen-shot-2012-02-16-at-4.21.40-PM.png" alt="" width="405" height="137" /></a><p class="wp-caption-text">Physical representation of the denormalized timeline rows, WITH COMPACT STORAGE</p></div>

<a href="http://www.datastax.com/docs/1.0/ddl/column_family">SuperColumns</a> were an early attempt at providing the same kinds of denormalization tools discussed above.  They have important limitations (e.g., reading any subcolumn from a SuperColumn pulls the entire SuperColumn into memory) and will eventually be <a href="https://issues.apache.org/jira/browse/CASSANDRA-3237">replaced by a composite column implementation</a> with the same API.  So if you have an application using SuperColumns, you don&#8217;t need to rewrite anything, but if you are starting fresh, you should use the more flexible approach described above.]]></content:encoded>
			<wfw:commentRss>http://www.datastax.com/dev/blog/schema-in-cassandra-1-1/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Cassandra File System Design</title>
		<link>http://www.datastax.com/dev/blog/cassandra-file-system-design</link>
		<comments>http://www.datastax.com/dev/blog/cassandra-file-system-design#comments</comments>
		<pubDate>Fri, 10 Feb 2012 17:38:16 +0000</pubDate>
		<dc:creator>Jake Luciani</dc:creator>
				<category><![CDATA[Blog Post]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hdfs]]></category>

		<guid isPermaLink="false">http://www.datastax.com/?post_type=dev-post&amp;p=9249</guid>
		<description><![CDATA[The Cassandra File System (CFS) is an <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html">HDFS compatible filesystem</a> built to replace the traditional Hadoop NameNode, Secondary NameNode and DataNode daemons. It is the foundation of our Hadoop support in <a href="http://www.datastax.com/solutions/reliable-scalable-hadoop">DataStax Enterprise</a>.

The main design goals for the Cassandra File System were to first, simplify the operational overhead&#8230;]]></description>
			<content:encoded><![CDATA[<p>The Cassandra File System (CFS) is an <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html">HDFS compatible filesystem</a> built to replace the traditional Hadoop NameNode, Secondary NameNode and DataNode daemons. It is the foundation of our Hadoop support in <a href="http://www.datastax.com/solutions/reliable-scalable-hadoop">DataStax Enterprise</a>.</p>

<p>The main design goals for the Cassandra File System were to first, simplify the operational overhead of Hadoop by removing the single points of failure in the Hadoop NameNode. Second, to offer easy Hadoop integration for Cassandra users (one distributed system is enough).</p>

<p>In order to support massive files in Cassandra we came up with a novel approach that relies heavily on the fundamentals of Cassandra&#8217;s architecture, both Dynamo and BigTable.</p>
<h2>CFS Data Model</h2>
<a rel="attachment wp-att-9250" href="http://www.datastax.com/wp-content/uploads/2012/02/CFSmodel-1.png" rel="facebox"><img class="aligncenter size-large wp-image-9250" title="CFSmodel " src="http://www.datastax.com/wp-content/uploads/2012/02/CFSmodel-1-1024x716.png" alt="" width="550" height="384" /></a>

<p><strong>CFS</strong> is modeled as a Keyspace with two Column Families in Cassandra.  The Keyspace is where replication settings are, so unlike HDFS, you can&#8217;t change replication per file.  You can, however, accomplish the same with multiple CFS Keyspaces. The two Column Families represent the two primary HDFS services. The HDFS NameNode service, that tracks each files metadata and block locations, is replaced with the &#8220;inode&#8221; column family.  The HDFS DataNode service, that stores file blocks, is replaced with the &#8220;sblocks&#8221; Column Family. By doing this we have removed three services of the traditional Hadoop stack and replaced them with one fault tolerant scalable component.</p>

<p>The &#8216;inode&#8217; Column Family contains meta information about a file and uses a <a href="https://github.com/rantav/hector/wiki/DynamicCompositeType-with-templates">DynamicCompositeType comparator</a>.  Meta information includes: filename, parent path, user, group, permissions, filetype and a list of block ids that make up the file. For block ids it uses <a href="http://en.wikipedia.org/wiki/Universally_Unique_Identifier#Version_1_.28MAC_address.29">TimeUUID</a> so blocks are ordered sequentially naturally.  This makes supporting <a href="http://www.cloudera.com/blog/2009/07/file-appends-in-hdfs/">HDFS append</a>() simple.</p>

<a rel="attachment wp-att-9251" href="http://www.datastax.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-10-at-11.21.55-AM.png" rel="facebox"><img class="aligncenter size-full wp-image-9251" title="inode model" src="http://www.datastax.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-10-at-11.21.55-AM.png" alt="" width="479" height="301" /></a>

<p>Secondary indexes are used to support operations like “ls” and “rmdir” the corresponding <a href="http://www.datastax.com/docs/1.0/dml/using_cql">CQL</a> looks something like:</p>
<pre>select filename from inode where parent_path=‘/tmp’;</pre>
<pre>select filename from inode where filename &gt; ‘/tmp’ and filename &lt; ‘/tmq’ and sentinel = ‘x’;</pre>
<p>The sentinel is needed because of how cassandra implements <a href="http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes">secondary indexes</a>. It chooses the predicate with the most selectivity and filters from there.</p>

<p>Since file identifiers are accessed by secondary indexes it can use a <a href="http://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a> as the row key, giving us good distribution of data across the cluster. To summarize, an entire files meta information is stored under one wide row key.</p>

<p>The second column family &#8216;sblocks&#8217; stores the actual contents of the file.</p>

<a rel="attachment wp-att-9252" href="http://www.datastax.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-10-at-12.18.32-PM.png" rel="facebox"><img class="aligncenter size-full wp-image-9252" title="sblocks model" src="http://www.datastax.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-10-at-12.18.32-PM.png" alt="" width="474" height="138" /></a>

<p>Each row represents a block of data associated with an inode record.  The row key is a block TimeUUID from a inode record. The columns are time ordered compressed sub-blocks that, when decompressed and combined, equal one HDFS block.  Let&#8217;s walk through the read and write paths to make this all more clear.</p>
<h2>CFS Write Path</h2>
<p>Hadoop has the “dfs.block.size” parameter to tell how big a file block should be per file write. When a file comes in it writes the static attributes to the inode column family.</p>

<p>Then allocates a new block object and reads dfs.block.size worth of data.   As that data is read it splits it into sub blocks of size “cfs.local.subblock.size”.  those sub-blocks are compressed using <a href="http://code.google.com/p/snappy/">google snappy compression</a>.  Once a block is finished the block id is written to the inode row and the sub blocks are written to cassandra with the block id as the row key and the sub block ids as the columns.  CFS splits a block into sub-blocks since it relies on thrift, which does not support streaming so it must ensure it doesn&#8217;t OOM the node by sending 256MB or 512Mb of data at once. To Hadoop though, the block looks like a single block so it doesn&#8217;t cause any change in the job split logic of map reduce.</p>
<h2>CFS Read Path</h2>
<p>When a read comes in for a file or part of a file (let&#8217;s assume Hadoop looked up the the uuid from the secondary index) it reads the inode info and finds the block and subblock to read.  CFS then executes a custom thrift call that returns either the specified sub-block data or, if the call was made on a node with the data locally, the file and offset information of the Cassandra SSTable file with the subblock.  It does this since during a mapreduce task the jobtracker tries to put each computation on the node with the actual data. By using the SSTable information it is much faster, since the mapper can access the data directly without needing to serialize/deserialize via thrift.</p>

<p>One question that comes up is why does CFS compress sub-blocks and not use the ColumnFamily compression in Cassandra 1.0?  The reason is by compressing and de-compressing on the client side it cuts down of the network traffic between nodes.</p>
<h2>Integrating with Hadoop</h2>
<p>Hadoop makes it very simple to hook a custom file system implementation in and everything just works.  Only the following change is needed to core-site.xml</p>
<pre>&lt;property&gt;
 &lt;name&gt;fs.cfs.impl&lt;/name&gt;
 &lt;value&gt;com.datastax.bdp.hadoop.cfs.CassandraFileSystem&lt;/value&gt;
&lt;/property&gt;</pre>
<p>Now it&#8217;s possible to execute the following commands:</p>
<pre>hadoop fs -copyFromLocal /tmp/giant_log.gz cfs://cassandrahost/tmp</pre>
<p>or even</p>
<pre>hadoop fs distcp hdfs:/// cfs:///</pre>]]></content:encoded>
			<wfw:commentRss>http://www.datastax.com/dev/blog/cassandra-file-system-design/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Benefits of Data Compression in Cassandra</title>
		<link>http://www.datastax.com/dev/blog/the-benefits-of-data-compression-in-cassandra</link>
		<comments>http://www.datastax.com/dev/blog/the-benefits-of-data-compression-in-cassandra#comments</comments>
		<pubDate>Wed, 08 Feb 2012 14:47:35 +0000</pubDate>
		<dc:creator>Robin Schumacher</dc:creator>
				<category><![CDATA[Blog Post]]></category>

		<guid isPermaLink="false">http://www.datastax.com/?post_type=dev-post&amp;p=9240</guid>
		<description><![CDATA[In case you missed it, Edward Capriolo posted a <a href="http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/cassandra_compression_is_like_getting">nice blog entry</a> on his tests with data compression in Cassandra 1.0.7 (compression was added in version 1.0.0). If you&#8217;ve been wondering whether your particular app/deployment of Cassandra would benefit from enabling compression on column families, you should give&#8230;]]></description>
			<content:encoded><![CDATA[<p>
In case you missed it, Edward Capriolo posted a <a href="http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/cassandra_compression_is_like_getting">nice blog entry</a> on his tests with data compression in Cassandra 1.0.7 (compression was added in version 1.0.0). If you&#8217;ve been wondering whether your particular app/deployment of Cassandra would benefit from enabling compression on column families, you should give Edward&#8217;s writeup a look through&#8230; ]]></content:encoded>
			<wfw:commentRss>http://www.datastax.com/dev/blog/the-benefits-of-data-compression-in-cassandra/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nice Presentation on Cassandra from Scandit</title>
		<link>http://www.datastax.com/dev/blog/nice-presentation-on-cassandra-from-scandit</link>
		<comments>http://www.datastax.com/dev/blog/nice-presentation-on-cassandra-from-scandit#comments</comments>
		<pubDate>Fri, 03 Feb 2012 21:00:58 +0000</pubDate>
		<dc:creator>Robin Schumacher</dc:creator>
				<category><![CDATA[Blog Post]]></category>

		<guid isPermaLink="false">http://www.datastax.com/?post_type=dev-post&amp;p=9227</guid>
		<description><![CDATA[<a href="http://www.iscandit.com/">Scandit</a> is doing some pretty cool things with barcode recognition technology. Their SDK that integrates into your iPhone or Android app and gives you access to their web analytics platform (Scanalytics) that lets you see what kind of products your users scan with their phone (e.g. electronics, food&#8230;]]></description>
			<content:encoded><![CDATA[<p>
<a href="http://www.iscandit.com/">Scandit</a> is doing some pretty cool things with barcode recognition technology. Their SDK that integrates into your iPhone or Android app and gives you access to their web analytics platform (Scanalytics) that lets you see what kind of products your users scan with their phone (e.g. electronics, food items, etc.) and where they scan them.

Naturally, they have to consume and manage a lot of data very fast, which is why they use Cassandra. Below is a recent presentation Scandit&#8217;s COO gave on their use of Cassandra. Enjoy!

<div style="width:425px" id="__ss_11388713"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/Scandit/netcetera" title="Netcetera" target="_blank">Netcetera</a></strong> <iframe src="http://www.slideshare.net/slideshow/embed_code/11388713" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe> <div style="padding:5px 0 12px"> View more <a href="http://www.slideshare.net/" target="_blank">presentations</a> from <a href="http://www.slideshare.net/Scandit" target="_blank">Scandit</a> </div> </div>]]></content:encoded>
			<wfw:commentRss>http://www.datastax.com/dev/blog/nice-presentation-on-cassandra-from-scandit/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meet the Experts</title>
		<link>http://www.datastax.com/dev/blog/meet-the-experts</link>
		<comments>http://www.datastax.com/dev/blog/meet-the-experts#comments</comments>
		<pubDate>Wed, 01 Feb 2012 15:36:47 +0000</pubDate>
		<dc:creator>Matt Pfeil</dc:creator>
		
		<guid isPermaLink="false">http://www.datastax.com/?post_type=dev-post&amp;p=9182</guid>
		<description><![CDATA[As Cassandra adoption continues to increase, we&#8217;ve noticed that people have a lot of random questions about using it. What&#8217;s a super column? How do I change my cache settings? How do I add a 3rd data center to my existing cluster?
We&#8217;re here to help. We&#8217;ve set up &#8220;Meet the&#8230;]]></description>
			<content:encoded><![CDATA[<p>As Cassandra adoption continues to increase, we&#8217;ve noticed that people have a lot of random questions about using it. What&#8217;s a super column? How do I change my cache settings? How do I add a 3rd data center to my existing cluster?</p>
<p>We&#8217;re here to help. We&#8217;ve set up &#8220;Meet the Experts&#8221; sessions around the <a href="http://www.datastax.com/events#MeettheExperts">country</a> on regular schedules. Stop by and meet Cassandra developers and experts to get your questions answered.</p>
<p>We&#8217;re adding more and more cities to the list, but if one of these doesn&#8217;t work for you feel free to stop by our support forums for the digital <a href="http://www.datastax.com/support-forums/">version</a>.  We&#8217;ll answer your questions online with the same level of expertise.</p>]]></content:encoded>
			<wfw:commentRss>http://www.datastax.com/dev/blog/meet-the-experts/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

