<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>www.BenStopford.com</title>
	
	<link>http://www.benstopford.com</link>
	<description>Gently flexing the grid</description>
	<lastBuildDate>Tue, 24 Jan 2012 08:35:56 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/benstopford" /><feedburner:info uri="benstopford" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Welcome Jon ‘The Gridman’ Knight</title>
		<link>http://feedproxy.google.com/~r/benstopford/~3/rKvvYs-osMU/</link>
		<comments>http://www.benstopford.com/2012/01/24/welcome-jon-the-gridman-knight/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 08:25:12 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://www.benstopford.com/?p=1990</guid>
		<description><![CDATA[Jon &#8216;The Gridman&#8217; Knight has finally dusted off his keyboard and entered the blogsphere with  fantastic  post on how we implement a reliable version of Coherence&#8217;s putAll() over here on ODC. One to add to your feed if you are interested in all things Coherence.
http://thegridman.com/coherence/coherence-alternative-putall-2/
]]></description>
			<content:encoded><![CDATA[<p>Jon &#8216;The Gridman&#8217; Knight has finally dusted off his keyboard and entered the blogsphere with  fantastic  post on how we implement a reliable version of Coherence&#8217;s putAll() over here on ODC. One to add to your feed if you are interested in all things Coherence.</p>
<p><a href="http://thegridman.com/coherence/coherence-alternative-putall-2/">http://thegridman.com/coherence/coherence-alternative-putall-2/</a></p>
<img src="http://feeds.feedburner.com/~r/benstopford/~4/rKvvYs-osMU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.benstopford.com/2012/01/24/welcome-jon-the-gridman-knight/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.benstopford.com/2012/01/24/welcome-jon-the-gridman-knight/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</feedburner:origLink></item>
		<item>
		<title>Interesting Links Dec 2011</title>
		<link>http://feedproxy.google.com/~r/benstopford/~3/ufJe1F5pW0E/</link>
		<comments>http://www.benstopford.com/2011/12/31/interesting-links-for-dec-2011/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 15:32:51 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Links]]></category>

		<guid isPermaLink="false">http://www.benstopford.com/?p=1945</guid>
		<description><![CDATA[Hardware

Intel managing to squeeze 50 cores on a single chip, breaking through the teraflop boundary as they do so: Brier Dudley&#8217;s Blog &#124; Wow: Intel unveils 1 teraflop chip with 50-plus cores &#124; Seattle Times Newspaper
RISC architectures have had a renaissance thanks largely to the needs of the mobile sector, could their low power consumption [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Hardware</strong></p>
<ul>
<li>Intel managing to squeeze 50 cores on a single chip, breaking through the teraflop boundary as they do so: <a href="http://seattletimes.nwsource.com/html/technologybrierdudleysblog/2016775145_wow_intel_unveils_1_teraflop_c.html">Brier Dudley&#8217;s Blog | Wow: Intel unveils 1 teraflop chip with 50-plus cores | Seattle Times Newspaper</a></li>
<li>RISC architectures have had a renaissance thanks largely to the needs of the mobile sector, could their low power consumption make them a serious contender for enterprise space? <a href="http://drdobbs.com/architecture-and-design/231903053">x86 Faces Unexpected RISC Competition</a></li>
<li>AMD announce 4 memory channels allowing massive addressable spaces up to 364GB per CMP : <a href="http://www.eetimes.com/electronics-news/4230565/AMD-s-Interlagos-and-Valencia-finally-emerge">AMD&#8217;s Interlagos and Valencia finally emerge</a></li>
<li>Anyone who follows my blog will know of my belief in large address spaces reshaping the landscape, certainly for enterprise applications. This articles echoes these views:  <a href="http://drdobbs.com/architecture-and-design/232300106">Megatrend: Cheap RAM Reshaping All of Computing | Dr Dobb&#8217;s</a></li>
</ul>
<p><strong> </strong></p>
<p><strong>FPGA</strong></p>
<ul>
<li>IBM’s Lime is an interesting approach to simplifying the programming of secondary devices. See <a href="http://domino.watson.ibm.com/comm/research_people.nsf/pages/bacon.refereed-pubs.html/$FILE/Auerbach10Lime.pdf">Lime paper</a> and the related <a href="https://researcher.ibm.com/researcher/view_project.php?id=122">Liquid Metal</a> project.</li>
<li>JVM on FPGA: <a href="http://www.jopdesign.com/">JOP: A Tiny Java Processor Core for FPGA</a></li>
<li>An interesting paper on using FPGA for Monte Carlo Simulation: <a href="http://www.doc.ic.ac.uk/~dt10/research/thomas-08-finance-mcmc.pdf">FPGA for monte carlos</a></li>
</ul>
<p><strong>High Performance Java</strong></p>
<ul>
<li>An excellent talk about using memory efficiently in Java applications, that the costs are often higher than we think. It includes clear descriptions of the footprint of all Java objects and utilities : <a href="http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf">Building Memory Efficient Java Applications</a></li>
<li>There has been a flurry of activity coming from Azul Systems recently. Most notably the release of <a href="http://www.infoq.com/news/2011/11/zing5-native">Zing</a>, their pauseless garbage collector. Gene Til’s talk about the State of the Art in GC from QCon SF 2011 is one of the best I’ve seen (QConSF 2011: <a href="http://www.benstopford.com/uploads/MemoryWallQConSF2011.pdf#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">State of the Art in Garbage Collection</a>).</li>
<li>Azul have also recently released JHiccup. An interesting utility that measures operating system stalls. <a href="http://www.azulsystems.com/dev_resources/jhiccup#Help_Developers">Java Developer Tools: jHiccup Java Performance Analysis</a></li>
<li>Charles Nutter’s comments on his favourite JVM flags including my favourite (-XX:+PrintOptoAssembly): <a href="http://blog.headius.com/2009/01/my-favorite-hotspot-jvm-flags.html">Headius: My Favorite Hotspot JVM Flags</a></li>
</ul>
<p><strong>Distributed Data Storage</strong></p>
<ul>
<li>A great paper from VLDB describing an approach for balancing replication and partitioning, something close to my own heart: <a href="http://www.vldb.org/pvldb/vldb2010/papers/R04.pdf">Schism: a Workload-Driven Approach to Database Replication and Partitioning</a></li>
<li>Hasso Plattner (the P is SAP) wrote this paper which provides an insigntful view of where he believes the field should be going (and of course SAP’s solution Hana): <span style="text-decoration: underline;"><a href="http://epic.hpi.uni-potsdam.de/pub/Home/TrendsAndConceptsII2010/Plattner-AcommonDBApproach.pdf">Hasso Plattner on In-Memory OLAP &amp; OLTP</a></span></li>
<li>I enjoyed watching this talk about Mongo: <a href="http://www.infoq.com/presentations/Scaling-with-MongoDB">InfoQ: Scaling with MongoDB</a></li>
</ul>
<p><strong>Interesting:</strong></p>
<ul>
<li>An entertaining article from the Economist about David Gelernter’s predictions of the future of computing: <a href="http://www.economist.com/node/21540383">Brain scan: Seer of the mirror world | The Economist</a></li>
<li>Could Prezi really dislodge PowerPoint? <a href="http://prezi.com/index/">Prezi</a></li>
<li>Double Loop Learning – a different view on organizational learning. <a href="http://www.infed.org/thinkers/argyris.htm">Chris Argyris</a>.</li>
<li>Worth reading if you are not familiar with the idea already: <a href="http://martinfowler.com/bliki/CQRS.html">CQRS</a></li>
<li>An interesting twist on the traditional storyboard approach <a href="http://agilesoftwaredevelopment.com/blog/janusz-gorycki/out-story-board-better-yours">Our Story Board is Better Than Yours&#8230;</a> I’m a big fan of replacing estimation with uniformly sized stories.</li>
<li>Booked your next holiday? What about a <a href="http://coderetreat.com/history.html">Code Retreat with Corey Haines</a></li>
</ul>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<img src="http://feeds.feedburner.com/~r/benstopford/~4/ufJe1F5pW0E" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.benstopford.com/2011/12/31/interesting-links-for-dec-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.benstopford.com/2011/12/31/interesting-links-for-dec-2011/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</feedburner:origLink></item>
		<item>
		<title>Data Storage for Extreme Use Cases: The Lay of the Land and a Peek at ODC (Slides and Notes for Advanced Database Course)</title>
		<link>http://feedproxy.google.com/~r/benstopford/~3/HwUOqG_qces/</link>
		<comments>http://www.benstopford.com/2011/12/13/data-storage-for-extreme-use-cases-the-lay-of-the-land-and-a-peek-at-odc-slides-and-notes-for-advanced-database-course/#comments</comments>
		<pubDate>Tue, 13 Dec 2011 22:09:58 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://www.benstopford.com/?p=1930</guid>
		<description><![CDATA[Full slides available here: http://www.slideshare.net/benstopford/advanced-databases-ben-stopford
 Advanced databases ben stopford 
 View more presentations from Ben Stopford 

Introduction
Comp Sci 101 normally includes something about thesimplest and most efficient ways to hold and access data being via a Hashmap. Hashmaps provide rapid key based access to data – around 20 nanoseconds for a fast implementation in Java. [...]]]></description>
			<content:encoded><![CDATA[<p>Full slides available here: <a href="http://www.slideshare.net/benstopford/advanced-databases-ben-stopford">http://www.slideshare.net/benstopford/advanced-databases-ben-stopford</a></p>
<div style="width:425px" id="__ss_10566526"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/benstopford/advanced-databases-ben-stopford" title="Advanced databases ben stopford" target="_blank">Advanced databases ben stopford</a></strong> <object id="__sse10566526" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=advanceddatabases-benstopford-111212155918-phpapp02&#038;stripped_title=advanced-databases-ben-stopford&#038;userName=benstopford" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><param name="wmode" value="transparent"/><embed name="__sse10566526" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=advanceddatabases-benstopford-111212155918-phpapp02&#038;stripped_title=advanced-databases-ben-stopford&#038;userName=benstopford" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" wmode="transparent" width="425" height="355"></embed></object>
<div style="padding:5px 0 12px"> View more <a href="http://www.slideshare.net/" target="_blank">presentations</a> from <a href="http://www.slideshare.net/benstopford" target="_blank">Ben Stopford</a> </div>
</p></div>
<p><strong>Introduction</strong></p>
<p>Comp Sci 101 normally includes something about thesimplest and most efficient ways to hold and access data being via a Hashmap. Hashmaps provide rapid key based access to data – around 20 nanoseconds for a fast implementation in Java. This speed is largely due to the structure sitting wholly in memory, allowing the computer to take advantage of its various layers of caching to optimise performance. In fact a hashmap lookup will complete in around the time it takes a light beam to travel around a typical room. That’s pretty fast!</p>
<p>Querying a database is a somewhat different affair. There are more steps for a start, far more codeto be executed, the OS gets involved, as will the network, and of course a disk. This brings a simple database query in at around the 20 milliseconds mark. That’s a big difference to our hashmap; around six orders of magnitude!</p>
<p>A comparison between these two is of course unfair, but it highlights the importance of mechanical sympathy when thinking about how we store our data.We need to be aware of the performance characteristics of each element of our systembecause each extra step costs performance. In fact there are two key factors that separate the database and the hashmap. First they are physically very different: One being a single process and one a variety of processes and a variety of steps. Secondly they are functionally different: the database provides far more functionality that the hashmap.</p>
<p>Modern times have brought with them a huge array of different data storage system. These systems are built using a variety of architectures, differentiated by different physical characteristics. This allows them to take different positions on the performance trade-off curve.</p>
<p>The onset of these new technologies has prompted some pretty vocal debate about the applicability of the traditional database architecture, characterised by row-oriented operations on a magnetic disk. Michael Stonebraker, a leading database expert, puts it quite bluntly:</p>
<p><em>“Because RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark, then there is no market where they are competitive. As such, they should be considered as legacy technology more than a quarter of a century in age, for which a complete redesign and re-architecting is the appropriate next step.”</em></p>
<p>The point he makes is that, if performance is truly a factor (and the data size and population are appropriate), solutions that change the architectures are more likely to win-out. In the wealth of solutions available today there are a few common themesand we’ll elaborate on these next.</p>
<p><strong>Simplifying the Contract &amp;NoSQL</strong></p>
<p>One of the most recent, and pertinent, developments has been the idea of simplifying the contract. For some years data-storage has been synonymous with the implementation of ACID. However the last few years have seen a notable move away from ACID when dealing with very large data-sets where the amount of distribution required makes implementing ACID prohibitive. What’s more many applications simply don’t require these levels of guarantee. This brings us to the idea of <em>simplifying the contract</em>. The Internet currently contains around 5 Exabytes of data. That is a fantastically large amount, certainly in database terms. By comparison the average enterprise database is around 1 terabyte (based on research in 2009). The point is simple: the context of data management has changed and for those dealing in high-web scale data volumes simplifying the contract is absolutely mandatory.</p>
<p>An interesting development of the last few years has been the, rather poorly named NoSQL movement. If the name were indicative of anything it would be a (not so subtle) hint that the movers and shakers in early NoSQL technology were keen to shrug off the constraints of traditional data storage. In fact the early NoSQLstores like Voldamort and Cassandra really grew out of a simply storing data in lots of files, in an attempt to gain the scalability of simple “sharded” storage.</p>
<p>The idea of simplifying the contract is not limited solely to distributed datastores. Even traditional databases, residing on a single machine, have large operational overheads (with one piece of research suggesting less than 10% of instructions contribute to “useful” work).</p>
<p>If there is a point it is this: as you increase the level of distribution (needed to process large data sets) the practicality of implementing ACID starts to spiral out of control.</p>
<p>However the risks of dropping ACID, in particular embracing eventual consistency, should not be taken lightly. Drop ACID because you have to, not because you think DBAs are yesteryear weirdos that wear sandals and attach their blackberries to their belts <img src='http://www.benstopford.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p><strong>The Three Directions of Database Progression</strong></p>
<p>The are essentially three mechanisms for providing better performance over the ‘traditional’ database architecture (and I’ve tacked a fourth on the end – you’ll see why later):</p>
<ul>
<li>Shared disk: Several machines share a single shared disk array. Popular for mid-range data sets; but problem of disk/lock contention. Oracle RAC is a good example but there are many more in the enterprise space.</li>
<li>Shared nothing – characterised by partitioning the data across different machines so that each node has complete autonomy over the data it holds; more scalable; popular for high-end data sets. Big Data era has provided a need for this architecture. But limited by performance of joins across different nodes.</li>
<li>In-memory database – everything in a single address space. Query planning less important as the penalty of getting it wrong is not as crippling as in disk-based systems. The speed improvement comes from memory being at least 100 times faster than disk, as well as it being far better suited to random access c.f. TPC-H benchmark results [1]. The problem is that the address spaces is relatively small and of a fixed size (but rapidly growing over recent years from GBs to TBs). Also, there is the Durability issue of main-memory databases.</li>
<li>A solution to the two problems with in-memory is to add distribution. Distributed, in-memory, shared-nothing architectures solve both the ‘one more bit’ problem as well as enabling durability. Fixed data space being solved by simply adding more machines, and durability by keeping backups elsewhere in the cluster.  The downside however is that we have lost the single address space and all the advantages that go with it.</li>
</ul>
<p><strong>ODC – RBS’s In-Memory Datastore</strong></p>
<p>(A better textual version can be found at [2] and a video covering this can be found at [3])</p>
<p>ODC is RBS’s in-memory data-storage solution, built on Oracle Coherence. ODC occupies an interesting position on the performance trade-off curve: Being in-memory makes it very low latency whilst being distributed, shared-nothing allows it to be high throughput. The downside is the cost of all the RAM storage.</p>
<p>ODC uses an interesting approach to a problem that plagues all shared-nothing data stores: the distributed join problem. This occurs when data that lives on different nodes must be joined together across the network – with the network “hops” associated with transferring the intermediary results degrading its performance.</p>
<p>One approach to this distributed join problem is to denormalise so that related ‘rows’ (or object graphs in our case) are always bound together. There is no need to bind them across the network because all relations are held in one row (or object graph). This is great for reducing communication costs, but hugely increases the amount of data duplication, particularly when data is versioned. The consequence is that a lot of memory is used up (memory being something of a commodity in in-memory solutions, even today). There is an additional problem of maintaining the replicated data – more specifically the issue of needing large shared locks across the multiple replicas.</p>
<p>So what we really want is all the advantages of normalised data with the speed of denormalisation!</p>
<p>The solution to this problem has two stages:</p>
<p>The first is to use (or rather bastardise) a Snowflake structure (of the type typical in Data Warehouse schemas) to collocate records that have the same keys. “Fact Table” records are spread across the cluster of machines while “Dimension Table” records are replicated at all nodes. Fact tables are generally much larger than Dimension tables, which is the reason that it is reasonable to replicate them.</p>
<p>This is best demonstrated with a simple example: Consider you have are building an online shopping application (think Amazon). Let’s say we decide to partition (“shard”) by userId. The “facts” of the system would be your basket, orders, order delivery details etc. All facts that are specific to <em>you </em>(i.e. to one userId) and hence can be collocated on the same machine by using userId in the hash function that specifies where data is held in the network (the well known hashing algorithm). The “dimensions” provide the context for the “facts”. Some of this context will be user-specific – like your address -  but other items would be shared across many users &#8211; like the list of products the site sells. Dimensions, for example the list of products, have keys that ‘crosscut’ the key used to partition the facts, that is to say that it is not possible to uniquely partition products so that they are collocated with orders because the keys simply don’t ‘line up’. This inability to collocate Dimensions leads us to the cross-network joins we are trying to avoid.</p>
<p>The solution is simple: Partition facts and Replicate Dimensions. By doing so any join is possible without the need for network “hops” (i.e. no distributed joins) because all the related records are collocated at one network node.</p>
<p>However there is a problem, the solution to which brings us to that second stage mentioned above. It is inevitable that all Dimensions will not play to our nice heuristic. In fact, in reality, some Dimension tables will be quite large.  Because they are replicated, large dimension tables are impractical due to the total memory they will consume across the cluster.</p>
<p>The solution is to make use of the “Connected Replication” pattern.  This simply tracks whether, at a point in time, a certain Dimension record is ‘connected’ (via some path of foreign keys) to a Fact record in the database. Put another way it tracks whether a dimension record is actually <em>used</em>. This ‘trick’ works because, in reality, much of the Dimension data we hold is not actually used. In fact one recent study showed that 80% of the data we hold is no longer used. By implementing a simple, recursive process that navigates the hierarchy of foreign key relationships when data is inserted we can track which dimension records are used and which are not. This ‘trick’ reduces the cost of replicated storage to around 10% of its original size and by doing so really makes the idea of replicating dimensions practical in in-memory architectures.</p>
<p><strong>In conclusion:</strong></p>
<ul>
<li>Traditional database architectures are inappropriate for applications that require very low latency or very high throughput.</li>
<li>At one end of the scale are the huge shared-nothing architectures, favouring scalability.</li>
<li>At the other end are in-memory architectures, leveraging the simplicity and speed of a single address space.</li>
<li>You can blend the two approaches (as, for example, in ODC).</li>
<li>ODC attacks the Distributed Join Problem in an unusual way: By balancing Replication and Partitioning we can do any join in a single step. Connected Replication adds an additional ‘twist’ that reduces the amount of data replicated by an order of magnitude, making replication in an in-memory architecture practical.</li>
</ul>
<p><strong>References</strong></p>
<p>[1] <a href="http://www.tpc.org/tpch/results/tpch_perf_results.asp?resulttype=cluster">http://www.tpc.org/tpch/results/tpch_perf_results.asp?resulttype=cluster</a></p>
<p>[2] <a href="http://www.benstopford.com/2011/09/22/achieving-fast-joins-in-distributed-data-stores-through-the-application-of-snowflake-schemas-and-the-connected-replication-pattern-2/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">http://www.benstopford.com/2011/09/22/achieving-fast-joins-in-distributed-data-stores-through-the-application-of-snowflake-schemas-and-the-connected-replication-pattern-2/</a></p>
<p>[3] <a href="http://www.infoq.com/presentations/ODC-Beyond-The-Data-Grid">http://www.infoq.com/presentations/ODC-Beyond-The-Data-Grid</a></p>
<p><strong>Further Reading</strong></p>
<p>An fantastic paper covering many of the issues. Strongly recommended:</p>
<ul>
<li>The End of An Architectural Era (It’s Time for a Complete Rewrite), M. Stonebraker et al. VLDB 2007, pp 1150-1159. At <a href="http://www.vldb.org/conf/2007/papers/industrial/p1150-stonebraker.pdf">http://www.vldb.org/conf/2007/papers/industrial/p1150-stonebraker.pdf</a></li>
</ul>
<p>Good blog to follow:</p>
<ul>
<li><a href="http://dbmsmusings.blogspot.com/">http://dbmsmusings.blogspot.com/</a></li>
</ul>
<p>Related modern database technologies:</p>
<ul>
<li><a href="http://www.exasol.com/en/home.html">http://www.exasol.com/en/home.html</a></li>
<li>http://www.paraccel.com/</li>
</ul>
<p>Related articles from me:</p>
<ul>
<li><a href="http://www.benstopford.com/2009/12/06/are-databases-a-thing-of-the-past/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">http://www.benstopford.com/2009/12/06/are-databases-a-thing-of-the-past/</a></li>
<li><a href="http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/</a></li>
<li><a href="http://www.benstopford.com/2011/08/14/distributed-storage-phase-change-memory-and-the-rebirth-of-the-in-memory-database/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">http://www.benstopford.com/2011/08/14/distributed-storage-phase-change-memory-and-the-rebirth-of-the-in-memory-database/</a></li>
<li><a href="http://www.benstopford.com/2011/01/27/beyond-the-data-grid-building-a-normalised-data-store-using-coherence/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">http://www.benstopford.com/2011/01/27/beyond-the-data-grid-building-a-normalised-data-store-using-coherence/</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/benstopford/~4/HwUOqG_qces" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.benstopford.com/2011/12/13/data-storage-for-extreme-use-cases-the-lay-of-the-land-and-a-peek-at-odc-slides-and-notes-for-advanced-database-course/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.benstopford.com/2011/12/13/data-storage-for-extreme-use-cases-the-lay-of-the-land-and-a-peek-at-odc-slides-and-notes-for-advanced-database-course/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</feedburner:origLink></item>
		<item>
		<title>The Collections Cache</title>
		<link>http://feedproxy.google.com/~r/benstopford/~3/FeRcXs7GzA8/</link>
		<comments>http://www.benstopford.com/2011/11/07/the-collections-cache/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 09:59:23 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Coherence]]></category>
		<category><![CDATA[Coherence Patterns]]></category>

		<guid isPermaLink="false">http://www.benstopford.com/?p=1889</guid>
		<description><![CDATA[This is a very simple pattern for storing data in a collection in the cache]]></description>
			<content:encoded><![CDATA[<p>This is a very simple pattern that can be used to solve a variety of problems. The structure uses a &#8220;Collections Cache&#8221;: a cache that appends values to a collection using a Trigger. You can then access the entire collection using a get() or alternatively use an EntryProcessor to extract a certain value from the collection.</p>
<p>The pattern is used on ODC to track aggregate views. Say you want a materialised view of trades grouped by book.  We keep a reverse index of trade references for each book. The view is updated asynchronously as data is added to the cache using an async CacheStore.  The pattern is applicable to a variety of other use cases, one being an approach to managing version history.</p>
<p>One downside of this pattern is that if the concept is not well known it can be confusing, after all the object you put() is not the same type as is returned from a get().  Simple naming as collections* can help avoid this confusion though.</p>
<p style="text-align: center;"><a href="http://www.benstopford.com/wp-content/uploads/2011/11/collections-cache.png#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed"><img class="aligncenter size-full wp-image-1891" title="collections-cache" src="http://www.benstopford.com/wp-content/uploads/2011/11/collections-cache.png" alt="" width="519" height="272" /></a></p>
<img src="http://feeds.feedburner.com/~r/benstopford/~4/FeRcXs7GzA8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.benstopford.com/2011/11/07/the-collections-cache/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.benstopford.com/2011/11/07/the-collections-cache/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</feedburner:origLink></item>
		<item>
		<title>A Singleton Service</title>
		<link>http://feedproxy.google.com/~r/benstopford/~3/d7jmczbEN48/</link>
		<comments>http://www.benstopford.com/2011/11/05/coherence-the-singleton-service/#comments</comments>
		<pubDate>Sat, 05 Nov 2011 13:27:44 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Coherence]]></category>
		<category><![CDATA[Coherence Patterns]]></category>
		<category><![CDATA[Top4]]></category>

		<guid isPermaLink="false">http://www.benstopford.com/?p=1792</guid>
		<description><![CDATA[A pattern for running a reliable service just once in a Coherence Cluster.]]></description>
			<content:encoded><![CDATA[<p><em><a href="http://www.benstopford.com/wp-content/uploads/2011/11/singleton2.png#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed"><img class="alignright size-medium wp-image-1909" title="singleton2" src="http://www.benstopford.com/wp-content/uploads/2011/11/singleton2-300x80.png" alt="" width="300" height="80" /></a></em>Being a data grid, Coherence is very good at doing things in a distributed way. However it doesn’t offer any functionality (currently) for running something once in a reliable manner. Most applications solve this problem by simply running another process, for example you might start a second process that reads data off some queue and keeps your cluster up to date. It’d be nice however if you could leverage Coherence’s fault tolerance to ensure that, if the cluster was running, your QueueListener was always running too. In fact this is fairly simple to do and can be used for a host of common applications including loading data, keeping it up to date, adding indexes and regulating a cluster wide time stamp (article to follow).</p>
<p><em>What we want is a service that will always run on one of our Coherence nodes no matter what happens to the cluster.</em></p>
<p>This solution is conceptually simple. You have lots of processes in your cluster. When each node starts it simply checks whether the service has already been started elsewhere by attempting to lock a fictitious, well-known key:</p>
<pre>lockCache.lock(“SingletonLockKey”);</pre>
<p>Only one of the processes in the cluster will attain the lock. If it does attain it then it starts the Singleton Service, adds indexes, loads data or whatever. Simple. If the node running the service dies then the lock is released and another process will acquire it and start the singleton service there.</p>
<pre>//Run in a new thread on a wrapped DefaultCacheServer i.e. should run on every node
int blockUntilLockAquired = -1;
lockCache.lock(“SingetonLockKey”);
while(true){
   boolean locked = lockCache.lock("singletonLockKey", blockUntilLockAquired);
   if(locked){
      //start singletons here
      wait();
   }
}</pre>
<img src="http://feeds.feedburner.com/~r/benstopford/~4/d7jmczbEN48" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.benstopford.com/2011/11/05/coherence-the-singleton-service/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.benstopford.com/2011/11/05/coherence-the-singleton-service/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</feedburner:origLink></item>
		<item>
		<title>A Reliable version of putAll()</title>
		<link>http://feedproxy.google.com/~r/benstopford/~3/aK8vS5YlaLk/</link>
		<comments>http://www.benstopford.com/2011/11/04/coherence-implementing-a-reliable-version-of-putall/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 21:29:02 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Coherence]]></category>
		<category><![CDATA[Coherence Patterns]]></category>

		<guid isPermaLink="false">http://www.benstopford.com/?p=1806</guid>
		<description><![CDATA[I like triggers in Coherence. They allow us to do lots of cool stuff to our objects as we add them to the cache. Implement versioning, stamp them with cluster time, save them to a messaging system, check for duplicate writes, check for concurrent writes … the list goes on. But with all this processing [...]]]></description>
			<content:encoded><![CDATA[<p>I like triggers in Coherence. They allow us to do lots of cool stuff to our objects as we add them to the cache. Implement versioning, stamp them with cluster time, save them to a messaging system, check for duplicate writes, check for concurrent writes … the list goes on. But with all this processing comes the risk of failure and Coherence provides little in the way of exception reporting. In fact it provides no information on the individual failures, something that quickly becomes a problem as the level of trigger functionality increases. On ODC this caused us a real problem so we re-implemented putAll() so that it correctly reported those writes that failed. Credit goes to Jonathan Knight and Andrew Wilson for working this implementation through.</p>
<p><img class="alignright size-medium wp-image-1807" style="float: right; border: 0px initial initial;" title="ExceptionHandlingPutAll" src="http://www.benstopford.com/wp-content/uploads/2011/11/ExceptionHandlingPutAll-300x163.png" alt="" width="300" height="163" />The pattern is pretty simple at a high level. It involves two Invocables. The first simply executes on the extend proxy, as we need to be inside the cluster to get access to the key assignment strategy. The next step is to split the data being written into the subsets applicable to each node using getKeyOwner(). These subsets are then sent, via a second Invocable, to the members that own them and EntryProcessors are used to do the write to the backing map directly (although this is no longer needed in 3.7). This is shown pictorially below.</p>
<p>[Edit Jan '12] My colleague Jon &#8216;The Gridman&#8217; Knight has done a fantastically detailed <a href="http://thegridman.com/coherence/coherence-alternative-putall-2/">post</a> which drills into how to implement this pattern in Coherence]</p>
<img src="http://feeds.feedburner.com/~r/benstopford/~4/aK8vS5YlaLk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.benstopford.com/2011/11/04/coherence-implementing-a-reliable-version-of-putall/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.benstopford.com/2011/11/04/coherence-implementing-a-reliable-version-of-putall/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</feedburner:origLink></item>
		<item>
		<title>Coherence Implementation Patterns – Slides from Coherence SIG</title>
		<link>http://feedproxy.google.com/~r/benstopford/~3/jO6B7cF1WGo/</link>
		<comments>http://www.benstopford.com/2011/11/04/coherence-implementation-patterns-slides-from-coherence-sig/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 17:09:11 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://www.benstopford.com/?p=1802</guid>
		<description><![CDATA[You can view the PDF version here
Coherence Implementation Patterns &#8211; Sig Nov 2011 
View more presentations from Ben Stopford

]]></description>
			<content:encoded><![CDATA[<p><strong>You can view the PDF version <a href="http://www.benstopford.com/wp-content/uploads/2011/11/SIG-Presentation-Coherence-Patterns-Ben-Stopford.pdf#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">here</a></strong></p>
<div id="__ss_10028401" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"><a title="Coherence Implementation Patterns - Sig Nov 2011" href="http://www.slideshare.net/benstopford/coherence-implementation-patterns-sig-nov-2011" target="_blank">Coherence Implementation Patterns &#8211; Sig Nov 2011</a></strong> <object id="__sse10028401" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sigpresentation-coherencepatterns-111104120329-phpapp02&amp;stripped_title=coherence-implementation-patterns-sig-nov-2011&amp;userName=benstopford" /><param name="name" value="__sse10028401" /><param name="allowfullscreen" value="true" /><embed id="__sse10028401" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sigpresentation-coherencepatterns-111104120329-phpapp02&amp;stripped_title=coherence-implementation-patterns-sig-nov-2011&amp;userName=benstopford" name="__sse10028401" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="padding: 5px 0 12px;">View more <a href="http://www.slideshare.net/" target="_blank">presentations</a> from <a href="http://www.slideshare.net/benstopford" target="_blank">Ben Stopford</a></div>
</div>
<img src="http://feeds.feedburner.com/~r/benstopford/~4/jO6B7cF1WGo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.benstopford.com/2011/11/04/coherence-implementation-patterns-slides-from-coherence-sig/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.benstopford.com/2011/11/04/coherence-implementation-patterns-slides-from-coherence-sig/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</feedburner:origLink></item>
		<item>
		<title>Interesting Links Oct 2011</title>
		<link>http://feedproxy.google.com/~r/benstopford/~3/Dq_99cMC02Y/</link>
		<comments>http://www.benstopford.com/2011/10/25/links-for-october/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 08:56:04 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Links]]></category>

		<guid isPermaLink="false">http://www.benstopford.com/?p=1778</guid>
		<description><![CDATA[High Performance Java

Not exactly lightweight reading but one of the most detailed and influential papers on tuning your software for processing efficiency: What Developers Should Understand About Memory
If you read the above and want to put some of it into action then VTune should be your next port of call. Diagnostic software for CPU cache hits [...]]]></description>
			<content:encoded><![CDATA[<p><strong>High Performance Java</strong></p>
<ul>
<li>Not exactly lightweight reading but one of the most detailed and influential papers on tuning your software for processing efficiency: <span style="text-decoration: underline;"><a href="http://www.akkadia.org/drepper/cpumemory.pdf">What Developers Should Understand About Memory</a></span></li>
<li>If you read the above and want to put some of it into action then VTune should be your next port of call. Diagnostic software for CPU cache hits etc: <span style="text-decoration: underline;"><a id="NDQAAAAAQiu3N3o7SqwIg3rSk9-T8nLudAQ-title" href="http://www.google.com/url?q=http%3A%2F%2Fsoftware.intel.com%2Fen-us%2Farticles%2Fintel-vtune-amplifier-xe%2F&amp;sa=D&amp;usg=AFQjCNE76LmyWEYfMg-e4OOogSDPaRpbhg" target="_blank">VTune™ Amplifier XE 2011 from Intel &#8211; Intel® Software Network</a></span></li>
<li>When it really won’t go any faster, look at the Assembler: <a id="NDQAAAAAQu4zrjo_SqwIg2PvLmv3OlN3EAQ-title" href="http://www.google.com/url?q=http%3A%2F%2Fweblogs.java.net%2Fblog%2Fkohsuke%2Farchive%2F2008%2F03%2Fdeep_dive_into.html&amp;sa=D&amp;usg=AFQjCNF9j3TVTfxhXRTO7rBDsixnDOSp2g" target="_blank">Deep dive into assembly code from Java | Java.net</a></li>
<li>In anticipation of G1 (in case they ever get it finished) here’s the original paper with anticipated performance figures: <a href="http://www.google.com/url?q=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.63.6386%26rep%3Drep1%26type%3Dpdf&amp;sa=D&amp;usg=AFQjCNEidAH-zqps3zmyz8k-UKn_2KRa6Q">G1 paper with figures</a></li>
<li>A different approach to GC using processor specific minor collections (in Haskell): <a href="http://www.google.com/url?q=http%3A%2F%2Fcommunity.haskell.org%2F~simonmar%2Fpapers%2Flocal-gc.pdf&amp;sa=D&amp;usg=AFQjCNF1hXVUo8GzoztALbEtuLezvohzqA">Multicore Garbage Collection with Local Heaps</a></li>
</ul>
<p><strong>Distributed Data Storage:</strong></p>
<ul>
<li>The new Oracle NoSQL database – this is the best article I’ve read summarising it’s position in the market: <a href="http://www.google.com/url?q=http%3A%2F%2Fdbmsmusings.blogspot.com%2F2011%2F10%2Foverview-of-oracle-nosql-database.html&amp;sa=D&amp;usg=AFQjCNFcoAUZh-ZS7foxTaKtfnKaAZhM_g">DBMS Musings: Overview of the Oracle NoSQL Database</a></li>
<li>The official Oracle NoSQL Whitepaper: <a href="http://www.oracle.com/technetwork/database/nosqldb/learnmore/nosql-database-498041.pdf">Oracle NoSQL Database White Paper</a></li>
<li>An interesting approach to data storage: an FPGA based data warehouse: <a href="http://www.google.com/url?q=http%3A%2F%2Fwww.cse.buffalo.edu%2F~vipin%2Fpapers%2F2010%2Ftodd1.pdf&amp;sa=D&amp;usg=AFQjCNGDdiZ0xR89Uzsi8S3e_Y2dV_Lmhg">FPGA Data Warehouse</a></li>
<li>Google’s interesting SQL wrapped MapReduce framework: <a href="http://www.google.com/url?q=http%3A%2F%2Fresearch.google.com%2Fpubs%2Fpub37200.html&amp;sa=D&amp;usg=AFQjCNHQz67AXSon61tegck3Y5tG2Hqkkg">Tenzing A SQL Implementation On The MapReduce Framework</a></li>
</ul>
<p><strong>Distributed Computing:</strong></p>
<ul>
<li>The Actors Model &#8211; just in case you’re not familiar with it: <a id="NDQAAAAAQ-IbSxa7iqwIg8pvbsL_X6ZrzAQ-title" href="http://www.google.com/url?q=http%3A%2F%2Fdspace.mit.edu%2Fbitstream%2Fhandle%2F1721.1%2F6952%2FAITR-844.pdf%3Fsequence%3D2&amp;sa=D&amp;usg=AFQjCNFu9U89c4XHaN_HwNuiaX_QxjOA4Q" target="_blank">Actors model for distribution</a></li>
<li>Gluster – an open source distributed file system: <a id="NDQAAAAAQjOjkwfnsqwIgk8D02cTNsZH2AQ-title" href="http://www.google.com/url?q=http%3A%2F%2Fwww.gluster.org%2Fabout%2F&amp;sa=D&amp;usg=AFQjCNHqZ1-JuPfVLZPHXoCtdyYeXyHjFQ" target="_blank">Gluster</a></li>
<li>Running Cuda natively on x86 processors: <a id="NDQAAAAAQsfzjwuj0qgIg_4md0rn5-_7JAQ-title" href="http://www.google.com/url?q=http%3A%2F%2Fdrdobbs.com%2Fhigh-performance-computing%2F231500166&amp;sa=D&amp;usg=AFQjCNF4faA5OmAD4M1mOwk8W7ohlxhFxw" target="_blank">Running CUDA Code Natively on x86 Processors | Dr Dobb&#8217;s Journal</a></li>
</ul>
<p><strong>Coherence related:</strong></p>
<ul>
<li>Thinking about using 64bit JVMs with compressed pointers : <a id="NDQAAAAAQvMai7-HsqgIg6aCI5NDQodbRAQ-title" href="http://www.google.com/url?q=http%3A%2F%2Fblog.juma.me.uk%2F2008%2F10%2F14%2F32-bit-or-64-bit-jvm-how-about-a-hybrid%2F&amp;sa=D&amp;usg=AFQjCNFX2AQwBuyYd9vCqtnUR8pJ5-hm3Q" target="_blank">32-bit or 64-bit JVM? How about a Hybrid?</a></li>
<li>Using different caches for read and write. A sensible pattern for Cohernece implementation:  <a href="http://aragozin.wordpress.com/2009/04/22/cache-for-read-and-cache-for-write/">Alexey Ragozin’s Blog</a></li>
<li>OCZ Z-Drive – an interesting and competitively priced alternative to FusionIO:
<ul>
<li><a id="NDQAAAAAQlP3A4JjjqwIg7JbBtPK94-7jAQ-title" href="http://www.google.com/url?q=http%3A%2F%2Fthessdreview.com%2Four-reviews%2Focz-z-drive-r4-pci-express-1-6tb-ssd-review%2F3%2F&amp;sa=D&amp;usg=AFQjCNEZpF7aZEO7a9s_vF8iJjFSdG_dsA" target="_blank">OCZ Z-Drive R4 C PCI Express 1.6TB SSD Review</a></li>
<li><a id="NDQAAAAAQuLXb5JjjqwIgpqei94ibxPdH-title" href="http://www.google.com/url?q=http%3A%2F%2Fm.engadget.com%2Fdefault%2Farticle.do%3FartUrl%3Dhttp%3A%2F%2Fwww.engadget.com%2F2011%2F09%2F28%2Focz-z-drive-r4-review-roundup-this-is-what-2-800mb-s-looks-like%2F%26category%3Dclassic%26postPage%3D4%26icid%3Deng_latest_art&amp;sa=D&amp;usg=AFQjCNFewaoDd8wE8B87N7Z_em5e9aJxaA" target="_blank">OCZ Z-Drive R4 review roundup</a></li>
</ul>
</li>
</ul>
<p><strong>Just Interesting:</strong></p>
<ul>
<li>The architecture of the transputer. An interesting reflection on a couple of Bristol’s finest exports (other than Portishead): the  Transputer and the Occum programming language. <a href="http://www.google.com/url?q=http%3A%2F%2Fwww.reghardware.com%2F2011%2F08%2F18%2Fheroes_of_tech_david_may%2F&amp;sa=D&amp;usg=AFQjCNHFH9xmF7J7Dxo1gTdH1XWrIJWBOA">David May, parallel processing pioneer • reghardware</a></li>
<li>Is your brain like an Iphone? <a id="NDQAAAAAQpPalnfvvqwIgjZLbi_6s5LZT-title" href="http://www.google.com/url?q=http%3A%2F%2Fnovato.patch.com%2Fblog_posts%2Fis-your-brain-like-an-iphone-which-app-is-running-now&amp;sa=D&amp;usg=AFQjCNF-HxIaTUCTWEnlc8r9LuDpmOVDpQ" target="_blank">Is Your Brain Like an iPhone? Which App is Running Now? &#8211; Novato, CA Patch</a></li>
<li>Just be still for once: <a href="http://www.google.com/url?q=http%3A%2F%2Fundertheapricottree.wordpress.com%2F2011%2F10%2F18%2Fno-shame-in-stillness%2F&amp;sa=D&amp;usg=AFQjCNEfBD0QXrGGXLZuxGNlgx5pga1XbQ">No Shame in Stillness « Under the Apricot Tree</a></li>
<li>Of the huge amount of writing about Steve Jobs I thought the Economist’s coverage was the best: <a href="http://www.google.com/url?q=http%3A%2F%2Fwww.economist.com%2Fnode%2F21531529&amp;sa=D&amp;usg=AFQjCNGk2MSjCdwsBiuOM-DUypZPHKMDWg">Steve Jobs: The magician | The Economist</a></li>
<li>Scott Marcar’s thought prevoking dialog on technology through a financial crisis:  <span style="text-decoration: underline;"><a id="NDQAAAAAQ-OeRy-uZqwIg0aa9yoaMmYSqAQ-title" href="http://www.google.com/url?q=http%3A%2F%2Fwww.waterstechnology.com%2Fwaters%2Ffeature%2F2107476%2Fhaul-scott-marcar-leads-rbs-tech-team-financial-crisis&amp;sa=D&amp;usg=AFQjCNHFpohlurAdX45LdNxLCGmVk2e68Q" target="_blank">The Long Haul: Scott Marcar Leads RBS&#8217; Tech Team Through the Financial Crisis- WatersTechnology.com</a></span></li>
<li>Short but thought provoking article on company culture: <a href="http://www.google.com/url?q=http%3A%2F%2Fblogs.hbr.org%2Fashkenas%2F2011%2F10%2Fwhy-you-should-question-your-c.html%3Futm_source%3Dfeedburner%26utm_medium%3Dfeed%26utm_campaign%3DFeed%3A%2Bharvardbusiness%2B(HBR.org)&amp;sa=D&amp;usg=AFQjCNEA05R3NBBLum_p-G0nx5ZfXvyzkA">Why You Should Question Your Culture &#8211; Ron Ashkenas &#8211; Harvard Business Review</a></li>
</ul>
<img src="http://feeds.feedburner.com/~r/benstopford/~4/Dq_99cMC02Y" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.benstopford.com/2011/10/25/links-for-october/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.benstopford.com/2011/10/25/links-for-october/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</feedburner:origLink></item>
		<item>
		<title>Slides for Financial Computing course @ UCL</title>
		<link>http://feedproxy.google.com/~r/benstopford/~3/tIaf2KjkL2M/</link>
		<comments>http://www.benstopford.com/2011/10/23/slides-for-financial-computing-course-ucl/#comments</comments>
		<pubDate>Sun, 23 Oct 2011 21:34:45 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://www.benstopford.com/?p=1769</guid>
		<description><![CDATA[ A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access 
 View more presentations from Ben Stopford 

]]></description>
			<content:encoded><![CDATA[<div style="width:425px" id="__ss_9847293"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/benstopford/a-paradigm-shift-the-increasing-dominance-of-memoryoriented-solutions-for-high-performance-data-access" title="A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access" target="_blank">A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access</a></strong> <object id="__sse9847293" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=thedominanceofin-memorydatastorageforfinanceusecases-111023160517-phpapp01&#038;stripped_title=a-paradigm-shift-the-increasing-dominance-of-memoryoriented-solutions-for-high-performance-data-access&#038;userName=benstopford" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse9847293" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=thedominanceofin-memorydatastorageforfinanceusecases-111023160517-phpapp01&#038;stripped_title=a-paradigm-shift-the-increasing-dominance-of-memoryoriented-solutions-for-high-performance-data-access&#038;userName=benstopford" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="450" height="355"></embed></object>
<div style="padding:5px 0 12px"> View more <a href="http://www.slideshare.net/" target="_blank">presentations</a> from <a href="http://www.slideshare.net/benstopford" target="_blank">Ben Stopford</a> </div>
</p></div>
<img src="http://feeds.feedburner.com/~r/benstopford/~4/tIaf2KjkL2M" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.benstopford.com/2011/10/23/slides-for-financial-computing-course-ucl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.benstopford.com/2011/10/23/slides-for-financial-computing-course-ucl/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</feedburner:origLink></item>
		<item>
		<title>Managing Versioning</title>
		<link>http://feedproxy.google.com/~r/benstopford/~3/R6WTQTvimzE/</link>
		<comments>http://www.benstopford.com/2011/10/19/coherence-implementation-pattern-latestversioned-caches/#comments</comments>
		<pubDate>Wed, 19 Oct 2011 11:52:21 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Coherence]]></category>
		<category><![CDATA[Coherence Patterns]]></category>

		<guid isPermaLink="false">http://www.benstopford.com/?p=1710</guid>
		<description><![CDATA[Provides descriptions of two mechanisms for managing object versioning in Coherence ]]></description>
			<content:encoded><![CDATA[<p>This is the first in a series of posts describing some useful patterns for implementing Coherence data grids.</p>
<p>Most non-trivial caches need to version their objects. There are number of reasons for wanting this:</p>
<ol>
<li>Versioning provides a historic record of changes.</li>
<li>By linking versioning with the wall-clock / business times (i.e. bi-temporal) views of the system at previous points in time can be recomponsed. This is important for providing consistent views over your data.</li>
<li>Versioning allows concurrency to be managed through Multi-Version Concurrency Control (MVCC)</li>
</ol>
<p>However simply adding versions to your objects (more precisely your object key) has the downside that you can no longer look up the value via it&#8217;s business key: you must know the business key <em><strong>as well as the version </strong></em>of the object that you want.</p>
<address>Key = [Business Key][Version]</address>
<p>In Coherence accessing objects via their key directly is far more performant than doing a query (see <a href="http://www.benstopford.com/2009/12/12/coherence-the-falacy-of-linear-scalability/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">The Fallacy of Linear Scalability</a>) so it is preferable to keep the latest version of the object available via its business key alone. There are two common approaches to solving this problem: The Latest/Versioned pattern and the Latest Version Marker pattern.</p>
<p><img class="alignright size-medium wp-image-1718" style="border: 5px solid black; margin: 20px;" title="LatestVersioned" src="http://www.benstopford.com/wp-content/uploads/2011/10/LatestVersioned1-300x247.png" alt="" width="300" height="247" /></p>
<p><strong>Using Latest and Versioned Caches</strong></p>
<p>The first approach is to define two caches for every object. The Latest&#8230; cache and the Versioned&#8230; cache. The key of the &#8216;latest&#8217; cache is simply the business key:</p>
<address>Latest Cache Key = [Business Key]</address>
<p>This cache only ever contains the latest object. The &#8216;versioned&#8217; cache contains all versions of the object with a, usually monotonically incrementing version embedded in the key:</p>
<address>Versioned Cache Key = [Business Key][Version]</address>
<p>Writes must be directed at the &#8216;Latest&#8217; cache and a Coherence Trigger is used to copy the object reference to the &#8216;Versioned&#8217; cache adding the version onto the key as it does so. This is demonstrated in the first figure opposite.</p>
<p>The disadvantage of this approach is a memory inefficiency arising because the  latest object exists in both Latest and Versioned caches. When the object is written the same reference can be used to save space, however the backup copies in each cache will be different instances and, should a node be lost, and  process of recreating the primary from the backup copy will create new instances by default further eating memory. It is therefore advisable to use the LatestMarker pattern below when memory is a concern. The advantage of this approach is that it reduces the number of records in the latest caches which makes filter operations faster when they operate only on &#8216;Latest&#8217; data (a common use case in most applications).</p>
<p>Checklist:</p>
<ul>
<li>Define two cache schemes based on the masks Latest* and Versioned* ensuring that they are in the same CacheService.</li>
<li>In the Latest* scheme specify a trigger to forward objects to the versioned cache, incrementing the version as it does so.</li>
<li>Specify KeyAssociation (<a title="Affinity" href="http://wiki.tangosol.com/display/COH35UG/Data+Affinity" target="_blank">Affinity</a>) on the business key of the Latest* cache across both caches.</li>
<li>Write a trigger that adds a monotomically incrementing version to the business key as it copies the value&#8217;s reference to the Versioned cache. You&#8217;ll need to use direct backing map access to avoid reentrancy problems (I&#8217;ve discussed the issues of reentrancy in Coherence before. See <a title="&lt;p&gt;Coherence appears the perfect platform in which to merge data and processing. However the reality is slightly less rosy. This article looks at why.&lt;/p&gt; " href="http://www.benstopford.com/2009/08/30/the-trials-of-merging-data-and-processing-in-coherence-why-it-doesnt-quite-just-work/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">Merging Data And Processing: Why it doesn’t “just work”</a>). The code sample below is provided for reference.</li>
</ul>
<p><img class="alignright size-medium wp-image-1717" style="border: 5px solid black; margin: 20px;" title="LatestMarker" src="http://www.benstopford.com/wp-content/uploads/2011/10/LatestMarker-300x220.png" alt="" width="300" height="220" /></p>
<p><strong>Using Versioned Cache Only With a Latest Version Marker</strong></p>
<p>A second approach to solving the same problem is to only use a single cache with the key format:</p>
<address>Key = [Business Key][Version]</address>
<p>but specifying that the latest version of an object has a special version marker:</p>
<address>KeyLatest = [Business Key][LatestVersionMarker]</address>
<p>As clients are aware of the LatestVersionMarker (for example -1 is common) they can always access the latest value directly by calling:</p>
<address>cache.get([businessKey][-1])</address>
<p>This approach does not suffer from the issues of duplication  associated with separate Latest and Versioned caches but has the disadvantage that versioned data is in the same cache as latest data, marginally slowing down filters. Just reiterating that again: in this pattern there is only one copy of the latest object. The one with the latest marker. This is different to the latest/versioned pattern where the latest object will exist in both caches (so twice) so that the versioned cache can contain all versions of that object.</p>
<p>Checklist:</p>
<ul>
<li>Create a cache with a KeyAssociation on the business key (i.e. the key parts without the version number). Add a trigger that replaces the current value for the  &#8220;LatestMarker&#8221; with the new object whilst copying the old value to a key with the appropriate real version. You&#8217;ll need to use direct backing map access to avoid reentrancy problems (I&#8217;ve discussed the issues of reentrancy in Coherence before [<a href="http://www.benstopford.com/2009/08/30/the-trials-of-merging-data-and-processing-in-coherence-why-it-doesnt-quite-just-work/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">link</a>]). See code sample below.</li>
</ul>
<p><strong>Implementing the trigger to avoid reentrancy issues</strong></p>
<p>The below code outlines one  mechanism for moving objects (in this case for the Latest/Versioned pattern) from one cache to the other using direct backing map access.</p>
<blockquote>
<pre>public void copyObjectToVersionedCacheAddingVersion(MapTrigger.Entry entry) {
   // I'm assuming that you are tracking the version, and incrementing it, in your object
   // Also note that it's more efficient to just take the version out rather than deserialise
   // the whole object but this way is more succinct
   MyValue value = (MyValue)entry.getValue();
   MyKey versionedKey = (MyKey)value.getKey();

   BinaryEntry binaryEntry = (BinaryEntry)entry;
   Binary binaryValue = binaryEntry.getBinaryValue();

   Map versionedCacheBackingMap = binaryEntry.getContext().getBackingMap("VersionedCacheName");
   versionedCacheBackingMap.put(toBinary(versionedKey), binaryValue);
}</pre>
</blockquote>
<p><strong><br />
</strong></p>
<p><strong>Summary</strong></p>
<p>Both approaches provide direct access to latest objects (most importantly key-based access to the most commonly used &#8216;latest&#8217; version) without requiring knowledge of the version itself. Both retain a history of versions, something that is important for locking, MVCC and snapshotting (I&#8217;ll be writing more about these later). Affinity (Key Association) is used to ensure that  the versioning process is entirely local to the JVM doing the write.</p>
<p><strong>Related Posts</strong></p>
<ol>
<li>Use normalisation to reduce the versioning burden through the application of Star Schemas and Connected Replication [<a href="http://www.benstopford.com/2011/10/05/session-on-fast-joins-in-distributed-databases-javaone/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">link</a>]</li>
<li>Performing cross cache joins in Coherence [<a href="http://www.benstopford.com/2009/11/20/how-to-perform-efficient-cross-cache-joins-in-coherence/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">link</a>]</li>
<li>Understanding problems of reentrancy in Coherence [<a href="http://www.benstopford.com/2009/08/30/the-trials-of-merging-data-and-processing-in-coherence-why-it-doesnt-quite-just-work/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed">link</a>]</li>
</ol>
<img src="http://feeds.feedburner.com/~r/benstopford/~4/R6WTQTvimzE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.benstopford.com/2011/10/19/coherence-implementation-pattern-latestversioned-caches/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		<feedburner:origLink>http://www.benstopford.com/2011/10/19/coherence-implementation-pattern-latestversioned-caches/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic page generated in 16.064 seconds. --><!-- Cached page generated by WP-Super-Cache on 2012-02-03 20:10:35 -->

