<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;AkQCRnwyfyp7ImA9WxJUEU4.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308</id><updated>2009-07-09T04:19:27.297-07:00</updated><title>Grid Designer's Blog</title><subtitle type="html" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://blog.griddynamics.com/" /><author><name>Grid Dynamics</name><uri>http://www.blogger.com/profile/18125799569183836823</uri><email>noreply@blogger.com</email></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>25</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><link rel="self" href="http://feeds.feedburner.com/griddynamics" type="application/atom+xml" /><entry gd:etag="W/&quot;DUMCSX0ycCp7ImA9WxJRF0w.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-3870382272773649304</id><published>2009-05-18T04:19:00.000-07:00</published><updated>2009-05-19T00:17:48.398-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-19T00:17:48.398-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="gemfire" /><category scheme="http://www.blogger.com/atom/ns#" term="data aware routing" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="Sun Grid Engine" /><category scheme="http://www.blogger.com/atom/ns#" term="convergence" /><title>Data-Aware Routing on a Cloud, featuring Sun Grid Engine, GemFire and EC2</title><content type="html">&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_CQV12Vs8lZ0/ShFOS2t-TII/AAAAAAAAIGQ/2VNtsMtvZrU/s1600-h/demo-screen2.png"&gt;&lt;/a&gt;&lt;p class="MsoNormal"&gt;We are excited to announce that we have taken our Convergence project to the next step in the last few weeks. &lt;a href="http://blog.griddynamics.com/2008/02/data-aware-routing-datasynapse_14.html"&gt;Last time&lt;/a&gt; we demonstrated how data aware routing can speed up the combination of compute grids and data grids. Since then, we have developed new grid adapters for our Convergence project and moved to the cloud.&lt;/p&gt;&lt;p style="font-weight: bold;" class="MsoNormal"&gt;&lt;span class="Apple-style-span"  style="font-size:x-large;"&gt;New adapters: Sun Grid Engine, GemFire&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;a href="http://www.sun.com/software/sge/"&gt;Sun Grid Engine&lt;/a&gt; (SGE) is an open source batch-queuing system, developed and supported by Sun Microsystems. SGE is typically used in a server farm or high-performance computing (HPC) cluster and is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs. It also manages and schedules the allocation of distributed resources such as processors, memory, disk space, and software licenses. We have developed an adapter that wraps SGE's Java API and enables data aware routing of tasks.&lt;br /&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;a href="http://www.gemstone.com/products/gemfire/"&gt;GemFire&lt;/a&gt; is an Enterprise Data Fabric (EDF) solution from GemStone Systems, Inc. It is a high performance, distributed in-memory-data-grid (IMDG) that offers very low latency, high resiliency, scalability and high throughput data sharing and event distribution features for high performance computing applications that need access to real-time data. We have developed a monitor component, which can query location of data from GemFire regions with partitioned schema.&lt;br /&gt;&lt;/p&gt;&lt;p style="font-weight: bold;" class="MsoNormal"&gt;&lt;span class="Apple-style-span"  style="font-size:x-large;"&gt;Running on the cloud&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span class="Apple-style-span"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;Along with integrating new grids, we changed our demo application to use a cloud infrastructure. It is deployed now on Amazon EC2. Our demo now allocates servers, completes setup of cluster software (SGE and GemFire in this case) and starts the demo application server on the fly. Just a single click and few minutes later you will have a new cluster up and running.&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;The deployment process is straightforward and requires just a couple of scripts. First, we allocate one EC2 server and start the master node. When the master is started, it starts allocating worker nodes and sets up the grid software. When the cluster setup is complete, we start the demo control UI on the master node, and from that moment our interactive data aware routing demo is available. Each cluster has a time to live, and when its lifespan expires, all servers are returned to EC2. We are using OpenSolaris AMI, provided by Sun Microsystems, for all our servers. Deployment of GemFire is trivial --we just need to copy few jars. Installing SGE has its quirks, but after we have figured out how to do it correctly, it just works.&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;From a usability point of view, cloud hosting has huge benefits. Each developer can work with his own cluster, or even several clusters (e.g., comparing the effects of data aware routing between clusters of different size).&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="font-weight: bold;" class="MsoNormal"&gt;&lt;span class="Apple-style-span"  style="font-size:x-large;"&gt;Data-aware routing demo&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;W&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;e have adapted our existing demo application to work with new grids and be scalable on a cloud. In our demo application we are simulating a financial application. We have a set of trade objects, which are loaded into a partitioned GemFire region. 50k objects are placed on each server (size of each object is about 2Kb). Those trades belong to different equal-sized books (5K trades in each book). The job here is to evaluate all books using Sun Grid Engine to distribute work across cluster nodes. This means that a job consists of 10*(number of servers) tasks and each task should fetch 5K trades (about 10MB of data) from the data grid and perform some calculation over them. For simplicity, we just sum the IDs of the trades and return the result.&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;U&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;nlike DataSynapse GridServer, SGE is a process-oriented computational grid. Each task in SGE is an operating system process, and execution time includes all JVM startup and class loading overheads. Usually tasks are long enough, and process start up time does not make much difference, but in the case of an interactive demo we should find a balance. We cannot make the user wait half an hour to see the first result, but if we make tasks too short, the effect of data aware routing will not be visible due to JVM startup overheads.&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" style="text-decoration: none;" href="http://1.bp.blogspot.com/_CQV12Vs8lZ0/ShFOS2t-TII/AAAAAAAAIGQ/2VNtsMtvZrU/s1600-h/demo-screen2.png"&gt;&lt;img style="margin: 0px auto 10px; text-align: center; width: 400px; display: block; height: 281px; cursor: pointer; text-decoration: underline;" id="BLOGGER_PHOTO_ID_5337133119243701378" alt="" src="http://1.bp.blogspot.com/_CQV12Vs8lZ0/ShFOS2t-TII/AAAAAAAAIGQ/2VNtsMtvZrU/s400/demo-screen2.png" border="0" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span"&gt;&lt;span style="color: rgb(0, 0, 0);" class="Apple-style-span"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;The demo runs a constant flow of jobs and constantly measures job completion latency and task completion latency. To illustrate data aware routing advantages we introduced 3 different scheduling modes. “Data aware” mode is where we perform data aware routing, ensuring local space access for the engine. “Neutral” mode is where we do unguided DS scheduling &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;as SGE sees fit. “Anti data aware” mode is where we deliberately violate data awareness and ensure network space access. The user can change the task scheduling mode on the fly and see the impact of the scheduling mode on performance.&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;The diagram on the left shows the completion time of the job and small gray bars show cumulative task execution time on each server in cluster. Gray bars do not include JVM start up time, so cumulative execution time on each server is considerably smaller then job execution time.&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;The diagram on the right shows average task completion latency (without JVM startup time), so you can see effect of data aware routing on task and job level.&lt;/span&gt;&lt;/p&gt;&lt;div style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:x-large;"&gt;Conclusion&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;We have demonstrated how a generic data-aware routing approach implemented in the Convergence project can be used with different grid products. We had a very good experience migrating our research/demo platform to the cloud. Using a cloud provided us flexibility and comfort of development, which are hard to achieve in traditional resource constraint environment.&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;Stay tuned for more advances in the Convergence project!&lt;/span&gt;&lt;/div&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-3870382272773649304?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/3870382272773649304/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=3870382272773649304" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/3870382272773649304?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/3870382272773649304?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/HPcuswsh-3g/data-aware-routing-gemstone-gemfire-and.html" title="Data-Aware Routing on a Cloud, featuring Sun Grid Engine, GemFire and EC2" /><author><name>Alexey Ragozin</name><uri>http://www.blogger.com/profile/13720493857045012756</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="03800576285958180449" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_CQV12Vs8lZ0/ShFOS2t-TII/AAAAAAAAIGQ/2VNtsMtvZrU/s72-c/demo-screen2.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2009/05/data-aware-routing-gemstone-gemfire-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUEAR38-cSp7ImA9WxJRF0g.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-1628332447054810167</id><published>2009-05-05T09:50:00.000-07:00</published><updated>2009-05-19T10:20:46.159-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-19T10:20:46.159-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="hpc" /><category scheme="http://www.blogger.com/atom/ns#" term="Waters" /><category scheme="http://www.blogger.com/atom/ns#" term="Microsoft HPC" /><category scheme="http://www.blogger.com/atom/ns#" term="Conference" /><category scheme="http://www.blogger.com/atom/ns#" term="~Shravan Kumar" /><title>Waters Power 2009 Conference</title><content type="html">Grid Dynamics was one of the sponsors for Incisive Media's &lt;a href="http://web.incisive-events.com/fit/2009/04/waters-power/"&gt;Waters Power 2009 event&lt;/a&gt; that was held in New York City. Main agenda of the conference was to showcase the most up-to date developments in HPC with in-depth analysis of cloud computing, virtualization and SOA solutions, bringing the latest strategies, techniques and technologies that give optimum performance and maximum efficiency for any data center. Major HPC vendors and many wall street firms were represented at the conference&lt;br /&gt;&lt;br /&gt;Key note speech was presented by Jeffrey Birnbaum from Merrill Lynch. It was a well presented speech that highlighted the opportunities, challenges and approaches of cloud computing for HPC. Highlights from his speech are&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Main attraction of cloud computing is to drive the cost of computing down. Google and Amazon had done a good job getting the cost pretty low. How can the enterprises do the same, or come close?&lt;/li&gt;&lt;li&gt;Most cloud providers like Amazon, Google etc. are GbE Based. Everyone is moving towards 10GbE and this serves as foundation for viable clouds&lt;/li&gt;&lt;li&gt;Enterprise cloud infrastructure needs a global file system. All software is installed on that file system. This makes it simple for the end users, in order to run any compute environment - just mount a file system&lt;/li&gt;&lt;li&gt;In order to get global scalability - use multiple copies of the files&lt;/li&gt;&lt;li&gt;Replicate the files in real-time on any update. It might be better to wait on update than deal with eventual consistency&lt;/li&gt;&lt;li&gt;For better performance cache the files in regional locations&lt;/li&gt;&lt;li&gt;Do not provide node-level redundancy and increase the cost of hardware, buy commodity hardware and design for failure. Route the workload from a node that failed to some available node (like what Google and Amazon do)&lt;/li&gt;&lt;li&gt;Design Data Centers around PODs connected by layer-2, not layer-3.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;br /&gt;Technologies that will change the world,&lt;br /&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Multi-core/multi-socket commodity compute nodes (run more VMs, writing more parallel code)&lt;/li&gt;&lt;li&gt;10GbE with iWARP or RDMA (lower latency to storage)&lt;/li&gt;&lt;li&gt;Flash-based storage at 200K IOPS (totally changes how you think about the problems)&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;br /&gt;Implications of the above changes are dramatic and will affect the way we design the Data Centers and Applications&lt;br /&gt;&lt;br /&gt;There was a Panel Discussion about "A silver lining for cloud computing". Panel included Victoria Livschitz, Founder and CEO of Grid Dynamics. Questions asked by the moderator triggered insightful discussions and sometimes contradicting answers,&lt;br /&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Define cloud computing? This is probably most asked question in cloud computing forums and glad to hear we are closing in on a definition. This brought up another question about what stage of technology maturity is cloud computing in. Victoria's response to this was that it is in really early stages and it will take about 5 yrs to be mature enough for enterprise adoption&lt;br /&gt;&lt;/li&gt;&lt;li&gt;What applications are not suitable for cloud computing? It was acknowledged by few panel members that low latency applications may be bad candidates before cloud infrastructure matures but one speaker thought that this is up to the design of the cloud and it is possible to host low latency applications if architected well&lt;br /&gt;&lt;/li&gt;&lt;li&gt;An interesting conversation that came up is when cloud will get more adoption. This is same question that keeps coming up when any new technology comes into light. Old companies with huge investments in the existing infrastructure and approaches will need to spend more time adopting while newer players will benefit from them much sooner&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Victoria made an insightful argument about the way applications were designed and implemented, this was about the data and its ownership. Most of the applications now are designed with data ownership as the premise but the promise of consuming and providing data as a service has many opportunities. For e.g., cloud computing is making infrastructure, software and computation as a service and having data also as a consumable entity gives the applications unbound possibilities &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The thoughts about the data ownership and promise of being able to consume this as a service were echoed by few other speakers in later discussions&lt;br /&gt;&lt;br /&gt;Ken Michellini's (from CitiHub) speech about "Keeping your feet on the ground: Unveiling the truth about cloud computing" was educational. It talked about the ROI calculations in a public cloud, 3rd party hosted and internal cloud scenarios and gave some ideas on how you can make these decisions when building your next big application. He also talked about characteristics of a good cloud application and typical challenges in building cloud applications.&lt;br /&gt;&lt;br /&gt;There were few more Panel discussions that we attended including "Building the 21st century data center", "Preparing for Next phase in grid computing: What it takes to build the perfect data grid" and "Virtual reality: Optimizing storage, application and network virtualization" and another presentation about real life lessons learned while administering grids "Nuts and Bolts: Practical issues in grid administration"&lt;br /&gt;&lt;br /&gt;I would like to extend congratulations to Incisive Media for a well conducted conference that let many financial industry experts brain storm and discuss the opportunities of Cloud. Great job guys!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-1628332447054810167?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/1628332447054810167/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=1628332447054810167" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1628332447054810167?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1628332447054810167?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/g-u1ADPNv7Y/waters-power-2009-conference_05.html" title="Waters Power 2009 Conference" /><author><name>Shravan (Sean) Kumar</name><uri>http://www.blogger.com/profile/09712909110035296320</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="09230803115972803308" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2009/05/waters-power-2009-conference_05.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CE4CSX0-eip7ImA9WxVXEkU.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-5063298488428032343</id><published>2009-02-09T22:39:00.000-08:00</published><updated>2009-02-10T08:22:48.352-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-02-10T08:22:48.352-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="distributed cache" /><category scheme="http://www.blogger.com/atom/ns#" term="~Alexey Kharlamov" /><category scheme="http://www.blogger.com/atom/ns#" term="data grid" /><title>Full-text &amp; faceted search over In-Memory Data Grids</title><content type="html">Modern in-memory data grid (IMDG) solutions provide different facilities for execution of queries over whole stored data sets with different levels of sophistication. &lt;a href="http://www.oracle.com/technology/products/coherence/index.html"&gt;Oracle Coherence&lt;/a&gt; provides Query facilities (one time full scan and continuous querying with Cost-Based-Optimized). &lt;a href="http://www.gigaspaces.com/"&gt;GigaSpaces&lt;/a&gt; has a JDBC Query interface with the ability to use hash and B-Tree indexes. Those solutions work quite well for many problem areas. However, for heavy loads and complex multi-criteria queries those facilities can quickly become a bottleneck.&lt;br /&gt;&lt;br /&gt;There is a  class of workloads that produce high loads with complex queries on IMDGs. Retail companies that use IMDGs for their item catalogs are a good example. Those catalogs are hit by diverse stream of multi-criteria queries. A typical query you may see there looks like:&lt;br /&gt;&lt;blockquote&gt;give me cell phones with MP3 support, Java and in red color.&lt;/blockquote&gt;&lt;br /&gt;Fortunately, the &lt;a href="http://www.compass-project.org/"&gt;Compass Framework&lt;/a&gt; allows you to process such queries effectively. You can build inverse indexes with &lt;a href="http://lucene.apache.org/java/docs/"&gt;Apache Lucene&lt;/a&gt; and store them on a grid. This capability is based on the very modular design of the Lucene framework. All index I/O operations are well-hidden by the abstraction of FileDirectory.&lt;br /&gt;&lt;br /&gt;For now Compass provides implementations for Coherence, GigaSpaces and &lt;a href="http://www.terracotta.org/"&gt;Terracotta&lt;/a&gt;, introducing an unprecedented ability to build a vertical search solution on top of In-Memory Data Grids.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_aTVQwIbMAHo/SY8yqV_ej-I/AAAAAAAAAk8/c-1ElxpXAqM/s1600-h/Compass+IMDG.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 142px;" src="http://1.bp.blogspot.com/_aTVQwIbMAHo/SY8yqV_ej-I/AAAAAAAAAk8/c-1ElxpXAqM/s320/Compass+IMDG.png" alt="" id="BLOGGER_PHOTO_ID_5300510989477646306" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In addition, Compass has a sophisticated object-to-document mapping system that allows you to make stored objects searchable just by adding Java annotations or XML mapping files. Mappings can also be built in runtime.&lt;br /&gt;&lt;br /&gt;However, despite its great codebase, Compass documentation is pretty sparse. It may take significant time to dive into the code and docs to get what you want. But the results will overcome all your expectations. Search engine performance on top of data grids easily overcomes any old-generation search technology.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Enjoy!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-5063298488428032343?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/5063298488428032343/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=5063298488428032343" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5063298488428032343?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5063298488428032343?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/HVkEK1k_iaQ/full-text-faceted-search-over-in-memory.html" title="Full-text &amp; faceted search over In-Memory Data Grids" /><author><name>Gurney</name><uri>http://www.blogger.com/profile/09326323020559522679</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="01349825872338679018" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_aTVQwIbMAHo/SY8yqV_ej-I/AAAAAAAAAk8/c-1ElxpXAqM/s72-c/Compass+IMDG.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://blog.griddynamics.com/2009/02/full-text-faceted-search-over-in-memory.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkcMRXs8eCp7ImA9WxRaGUg.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-2683915723008575189</id><published>2008-12-22T00:57:00.000-08:00</published><updated>2008-12-22T05:54:44.570-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-12-22T05:54:44.570-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Velocity" /><category scheme="http://www.blogger.com/atom/ns#" term=".NET" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="~Max Martynov" /><category scheme="http://www.blogger.com/atom/ns#" term="Microsoft HPC" /><category scheme="http://www.blogger.com/atom/ns#" term="distributed cache" /><category scheme="http://www.blogger.com/atom/ns#" term="scalability" /><category scheme="http://www.blogger.com/atom/ns#" term="data grid" /><title>Speeding up data-intensive HPC applications with Velocity</title><content type="html">Massively parallel, data intensive applications that run on computer grids require timely access to their data. When the application data is distributed among the grid nodes, data access is not a problem because it scales along with the number of computational task. However, when the application data is stored in a centralized database, access to data can quickly become a serious bottleneck.&lt;br /&gt;&lt;br /&gt;We encountered such a problem in one of our recent projects, where a single database was used as a centralized application data repository for the entire compute grid. In this case, access to application data became bogged down enough to cause an overall degradation of the system performance, despite the fact that the database was being hosted on sufficiently powerful server.&lt;br /&gt;&lt;br /&gt;In this article we will describe different ways of reducing database load and how they affect the overall system performance. We will also explore the advantages and disadvantages of &lt;a href="http://msdn.microsoft.com/en-us/data/cc655792.aspx"&gt;Velocity&lt;/a&gt; as one of the ways to reduce data access latency. More specifically, we will show that:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Database load can be reduced by introducing Velocity distributed cache, or by simplifying SQL queries with moving all calculations from SQL to application logic.&lt;/li&gt;&lt;li&gt;Velocity CTP1 distributed cache improves overall system performance dramatically. In our case, querying data with active Velocity cache was up to 31 times faster than without it!&lt;/li&gt;&lt;li&gt;In CTP1, we experienced some scalability issues in particular configurations, but we are confident these issues will be resolved in future releases.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Compute Environment Overview&lt;/span&gt;&lt;br /&gt;&lt;a href="http://labs.microsofthpc.net/"&gt;HPC++ CompFin Labs&lt;/a&gt; is a compute cloud that was created to give university students the ability to run massive analytical financial computations.&lt;br /&gt;&lt;br /&gt;When a computation needs to be performed, one has to create a custom computational model that defines how the calculation will be performed and how the resulting data will be handled. The computational model operates in accordance with the MapReduce paradigm and includes an Excel-based UI with a list of input parameters, and the .NET assembly with the logic for splitting the computation into tasks, running a single task (map task) and combining task results into computation results (reduce task).&lt;br /&gt;&lt;br /&gt;For example, one may write a model that performs some statistical calculation for a given stock symbol over some period of time. Later, someone else can open the Excel UI, change the stock symbol and the time period, and re-run the calculation. The following diagram illustrates how the example model would work in a CompFin cloud:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_sZh4q0-S4-Y/SU9tu0y2e9I/AAAAAAAAABg/L4ZR7dvDJvA/s1600-h/CompFinArchitecture.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 600px; height: 379px;" src="http://3.bp.blogspot.com/_sZh4q0-S4-Y/SU9tu0y2e9I/AAAAAAAAABg/L4ZR7dvDJvA/s800/CompFinArchitecture.png" alt="" id="BLOGGER_PHOTO_ID_5282561539142220754" border="0" /&gt;&lt;/a&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-size:78%;"&gt;Picture 1. Original CompFin&lt;/span&gt;&lt;/p&gt;The system consists from the following components:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A &lt;a href="http://www.blogger.com/www.microsoft.com/HPC/"&gt;Microsoft HPC Server 2008&lt;/a&gt; cluster.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;A Sharepoint Server, where the CompFin website and the computational model files are hosted.&lt;/li&gt;&lt;li&gt;A SQL Server, where computation results are stored.&lt;/li&gt;&lt;li&gt;A SQL Server Intermediate Storage Provider (ISP), where task results are stored for subsequent consumption in the “reduce” phase.&lt;/li&gt;&lt;li&gt;A centralized market data database, where all financial data is located.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The computation proceeds as follows:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The user navigates to the CompFin website. On this website, the user chooses the model to run and opens the appropriate Excel file. Next, the user fills in the required computation parameters and, with CompFin Excel plug-in, submits the job.&lt;/li&gt;&lt;li&gt;The job start request is handled by the CompFin web service, which typically runs on the Sharepoint server as well. This web service calls the model logic to split the computation into tasks.&lt;/li&gt;&lt;li&gt;All computational tasks are submitted to the Microsoft HPC Cluster.&lt;/li&gt;&lt;li&gt;The map tasks are started first. Each task retrieves appropriate data from central market database, performs calculations on this data, and submits intermediate results in the SQL Server ISP.&lt;/li&gt;&lt;li&gt;When all map tasks are finished, the reduce task is launched. It retrieves intermediate data, combines it into final results data, and stores final results into the result storage.&lt;/li&gt;&lt;li&gt;After the computation is finished, the user may request job results via the CompFin website.&lt;/li&gt;&lt;li&gt;The computation results are retrieved from the result storage and returned to the user.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;The above compute grid uses one SQL Server database for storage of stock prices, and one SQL Server database for storage of intermediate and final results. The combined compute capacity of the HPC compute cluster is 400GB of RAM and 200 (2.00GHz) Xeon cores, in particular, 50 machines each with 4 (2.00GHz) Xeon cores and 8GB of RAM. The capacity of each SQL Server is 32GB of RAM and 4 (2.33GHz) Xeon cores. So, the compute cluster had 12.5 times more RAM and 50 times more CPU. Based on these numbers, we anticipated two potential places of bottlenecks – the central market database, and the ISP.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Finding the bottleneck&lt;/span&gt;&lt;br /&gt;The computational model, chosen for our experiments, computed correlation between stock prices over some period of time and contained very data-intensive computations with complex queries to the SQL Server that required significant grouping and sorting. The model used significant amount of data, most of which could be cached. The anticipated maximum cache hit ratio for the model was greater than 90%.&lt;br /&gt;&lt;br /&gt;First, we needed to conduct a test of the model. We decided to test all three ways of scalability measurement (one data logical unit is roughly equal to 32 millions of records in database or 512MB of tick data returned from queries):&lt;br /&gt;&lt;br /&gt;Leave amount of data to process, increase processing power:&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_sZh4q0-S4-Y/SU9zQqMv72I/AAAAAAAAACA/p3_cOoo7z40/s1600-h/TestPlan3.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 339px; height: 61px;" src="http://3.bp.blogspot.com/_sZh4q0-S4-Y/SU9zQqMv72I/AAAAAAAAACA/p3_cOoo7z40/s400/TestPlan3.png" alt="" id="BLOGGER_PHOTO_ID_5282567617971744610" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Increase amount of data, leave processing power:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_sZh4q0-S4-Y/SU9zQldAzSI/AAAAAAAAAB4/rsCkaaCeBk8/s1600-h/TestPlan2.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 307px; height: 61px;" src="http://4.bp.blogspot.com/_sZh4q0-S4-Y/SU9zQldAzSI/AAAAAAAAAB4/rsCkaaCeBk8/s400/TestPlan2.png" alt="" id="BLOGGER_PHOTO_ID_5282567616697781538" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Increase amount of data and processing power synchronously:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_sZh4q0-S4-Y/SU9zQtLiD1I/AAAAAAAAABw/BdyA24zBcQ0/s1600-h/TestPlan1.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 275px; height: 61px;" src="http://2.bp.blogspot.com/_sZh4q0-S4-Y/SU9zQtLiD1I/AAAAAAAAABw/BdyA24zBcQ0/s400/TestPlan1.png" alt="" id="BLOGGER_PHOTO_ID_5282567618771947346" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;We tested the model using this test plan and measured the entire time of map task, time spent in retrieving financial data from central market database (GetTrades method), and time spent in SQL Server ISP. There is also the graphic, which shows how the task time will behave if the system is completely linear:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_sZh4q0-S4-Y/SU-aiaCCVNI/AAAAAAAAADI/GEOHmUKECg8/s1600-h/Benchmark1_1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 600px; height: 301px;" src="http://1.bp.blogspot.com/_sZh4q0-S4-Y/SU-aiaCCVNI/AAAAAAAAADI/GEOHmUKECg8/s800/Benchmark1_1.png" alt="" id="BLOGGER_PHOTO_ID_5282610803822974162" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_sZh4q0-S4-Y/SU93LqRLLKI/AAAAAAAAACI/rO8E0QrKYFc/s1600-h/Benchmark1.png"&gt;&lt;/a&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-size:78%;"&gt;Picture 2. Map task time distribution of original model in CompFin&lt;/span&gt;&lt;/p&gt;You may notice that the bottleneck was in the central market database: time, spent in retrieving financial data is almost equal to the entire task time. You also may notice that currently the system is much worse than linear. So, it was a good chance for Velocity to help to improve the performance and scalability.&lt;br /&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Getting rid of the bottleneck – Velocity cache for tick data&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;At first, we created a Velocity distributed cache for financial data. This was just an intra-job cache for reducing the negative impact of data-reuse inside the computation job. Complete replacement of central market database by the Velocity cache was not considered at this stage. This approach is shown as on the picture 3 (blue cloud):&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_sZh4q0-S4-Y/SU9313AFm1I/AAAAAAAAACQ/3zhDivoV5uY/s1600-h/CompFinAndVelocityArchitecture.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 600px; height: 397px;" src="http://2.bp.blogspot.com/_sZh4q0-S4-Y/SU9313AFm1I/AAAAAAAAACQ/3zhDivoV5uY/s800/CompFinAndVelocityArchitecture.png" alt="" id="BLOGGER_PHOTO_ID_5282572655109970770" border="0" /&gt;&lt;/a&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-size:78%;"&gt;Picture 3. CompFin + Velocity improvements (distributed and local caches)&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;span style="font-size:100%;"&gt;Velocity cache for tick data ran on HPC cluster nodes. More precisely, the same machines, where the computation tasks ran, were also used as Velocity hosts.&lt;br /&gt;&lt;br /&gt;We compared this approach with the Original CompFin:&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_sZh4q0-S4-Y/SU-aiQyeqBI/AAAAAAAAADQ/W3M9p0WBDzM/s1600-h/Benchmark2_1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 600px; height: 322px;" src="http://1.bp.blogspot.com/_sZh4q0-S4-Y/SU-aiQyeqBI/AAAAAAAAADQ/W3M9p0WBDzM/s800/Benchmark2_1.png" alt="" id="BLOGGER_PHOTO_ID_5282610801341802514" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_sZh4q0-S4-Y/SU94ki5iXRI/AAAAAAAAACY/PYrk4Ufy-_s/s1600-h/Benchmark2.png"&gt;&lt;/a&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-size:78%;"&gt;Picture 4. Comparison of map task time between original CompFin and CompFin with Velocity distributed cache&lt;/span&gt;&lt;/p&gt;&lt;span style="font-size:100%;"&gt;In the beginning, everything was working well. Velocity distributed cache reduced the number of requests to the central market database and performance increased dramatically.&lt;br /&gt;&lt;br /&gt;As the number of processors used for the computation increased to 128, the performance started to decrease. On 32 machines (128 processors), Velocity still dramatically outperformed the original CompFin results, but some performance degradation was experienced relative to the 16 machine (64 processors) scenario. On 50 machines (200 processors), in CTP1, we began to experience some failures and timeouts.&lt;br /&gt;&lt;br /&gt;Nevertheless, keeping in mind that Velocity was in first CTP, the results were still quite amazing.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Velocity scalability – local cache &amp;amp; data-aware routing&lt;/span&gt;&lt;br /&gt;Fortunately, we had a plan of how to further improve the performance and reduce the load on Velocity distributed cache. We enabled Velocity “local cache” and added data-aware routing in the CompFin (see picture 3). This was possible, because all tasks, which execute on the same HPC node, execute in the same Windows process. The local in-process cache can improve performance if the data-reuse factor for tasks that run in the same process is high. Hence the data-aware routing was required.&lt;br /&gt;&lt;br /&gt;Below is the comparison between this approach and the approach with Velocity distributed cache alone:&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_sZh4q0-S4-Y/SU-aijvpQ8I/AAAAAAAAADY/rcaRHLayrcU/s1600-h/Benchmark3_1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 600px; height: 322px;" src="http://1.bp.blogspot.com/_sZh4q0-S4-Y/SU-aijvpQ8I/AAAAAAAAADY/rcaRHLayrcU/s800/Benchmark3_1.png" alt="" id="BLOGGER_PHOTO_ID_5282610806430188482" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_sZh4q0-S4-Y/SU94kweLrMI/AAAAAAAAACg/HpwbZ1Xvk1Q/s1600-h/Benchmark3.png"&gt;&lt;/a&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-size:78%;"&gt;Picture 5. Comparison of map task time between distributed cache and local cache&lt;/span&gt;&lt;/p&gt;&lt;span style="font-size:100%;"&gt;The problems with Velocity distributed cache scalability were solved, and we can conclude that Velocity can work well in CTP1 even on large clusters.&lt;br /&gt;&lt;br /&gt;This approach performed from 5.6 to 31.9 times better than original CompFin. It also scaled better than any other approach, although not linearly. We believe that this is because the interaction with the SQL Server was still required before data was retrievied from the cache.&lt;br /&gt;However, in the “Finding the bottleneck” chapter we mentioned that the model used complex queries with grouping to SQL Server. So, maybe the performance can be improved by just simplifying these queries and moving all logic from database to compute nodes.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Getting rid of the bottleneck via simplified queries&lt;/span&gt;&lt;br /&gt;Following the previous assumption, we tried to reduce the load on the central market database by simplifying the queries used in the model through removal of grouping operations. The results were largely positive:&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_sZh4q0-S4-Y/SU-ai3Cnr1I/AAAAAAAAADg/8qlPBL1fOfw/s1600-h/Benchmark4_1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 600px; height: 322px;" src="http://4.bp.blogspot.com/_sZh4q0-S4-Y/SU-ai3Cnr1I/AAAAAAAAADg/8qlPBL1fOfw/s800/Benchmark4_1.png" alt="" id="BLOGGER_PHOTO_ID_5282610811610050386" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_sZh4q0-S4-Y/SU94lGI322I/AAAAAAAAACo/dejG1Mo3i5A/s1600-h/Benchmark4.png"&gt;&lt;/a&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-size:78%;"&gt;Picture 6. Comparison of map task time between original CompFin with complex and simple queries&lt;/span&gt;&lt;/p&gt;The time, spent on retrieval of tick data, was reduced from 8.5 to 52 times and the task time became almost linear. However, the scalability problem with the SQL Server remained the same, so we tried to further improve the time by using our previous best approach – Velocity distributed cache with enabled local cache and data-aware routing:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_sZh4q0-S4-Y/SU-ajEvOvyI/AAAAAAAAADo/k5s2DEqY8yc/s1600-h/Benchmark5_1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 600px; height: 322px;" src="http://4.bp.blogspot.com/_sZh4q0-S4-Y/SU-ajEvOvyI/AAAAAAAAADo/k5s2DEqY8yc/s800/Benchmark5_1.png" alt="" id="BLOGGER_PHOTO_ID_5282610815286820642" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_sZh4q0-S4-Y/SU94lRkXZqI/AAAAAAAAACw/fmk2w-FXyGQ/s1600-h/Benchmark5.png"&gt;&lt;/a&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-size:78%;"&gt;Picture 7. Comparison of map task time between original CompFin and CompFin with Velocity local cache. Simplified queries are used in both cases.&lt;/span&gt;&lt;/p&gt;Surprisingly, doing so did not improve the task performance significantly. Perhaps, the overhead of putting items in Velocity distributed cache was too big, or the cache services just stole CPU cycles from computations. Nevertheless, we believed that these factors had only a minor impact on performance so we decided to compare only the time spent in GetTrades method, where the financial data was retrieved:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_sZh4q0-S4-Y/SU-arLwWx2I/AAAAAAAAADw/i6mZsormu4w/s1600-h/Benchmark6_1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 600px; height: 322px;" src="http://4.bp.blogspot.com/_sZh4q0-S4-Y/SU-arLwWx2I/AAAAAAAAADw/i6mZsormu4w/s800/Benchmark6_1.png" alt="" id="BLOGGER_PHOTO_ID_5282610954609543010" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_sZh4q0-S4-Y/SU94ljfNz8I/AAAAAAAAAC4/ROscU4dLyJ0/s1600-h/Benchmark6.png"&gt;&lt;/a&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-size:78%;"&gt;Picture 8. Comparison of time, spent on retrieving tick data, between original CompFin and CompFin with Velocity local cache. Simplified queries are used in both cases.&lt;/span&gt;&lt;/p&gt;Our assumptions were right and local cache approach greatly reduced time spent for retrieval of financial data. However, one question remained – why the overall task time was not reduced? To answer this question, we performed a number of tests to investigate the time distribution of task time, when the local cache was used:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_sZh4q0-S4-Y/SU-arCCLgUI/AAAAAAAAAD4/LjM-cp-GaoE/s1600-h/Benchmark7_1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 600px; height: 322px;" src="http://1.bp.blogspot.com/_sZh4q0-S4-Y/SU-arCCLgUI/AAAAAAAAAD4/LjM-cp-GaoE/s800/Benchmark7_1.png" alt="" id="BLOGGER_PHOTO_ID_5282610951999947074" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_sZh4q0-S4-Y/SU94p6LzkyI/AAAAAAAAADA/HNjRIH8hHLU/s1600-h/Benchmark7.png"&gt;&lt;/a&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-size:78%;"&gt;Picture 9. Time distribution in CompFin with &lt;/span&gt;&lt;span style="font-size:78%;"&gt; simplified queries and Velocity &lt;/span&gt;&lt;span style="font-size:78%;"&gt;local cache.&lt;/span&gt;&lt;/p&gt;These tests revealed that the SQL Server ISP, that was typically the least time consuming element of the system, now moved to the foreground. Since retrieval of financial data was now very fast, the frequency of write operations to the SQL Server has increased, and the SQL Server ISP became a bottleneck.&lt;br /&gt;&lt;br /&gt;Our conclusion is that Velocity can improve even a very fast model, where only simple queries are used. In this environment, however, the SQL Server ISP remained as a bottleneck. We have yet to experiment using Velocity as the ISP with simple queries.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Conclusion&lt;/span&gt;&lt;br /&gt;In this article we have shown that the database is the bottleneck for data-intensive applications, running on compute grids. We tried to reduce database load using two approaches: introducing Velocity distributed cache between the database and compute grid, and simplifying SQL queries to the database by moving all calculations and aggregations to the application logic.&lt;br /&gt;&lt;br /&gt;Both these approaches improved the database performance dramatically. In particular, querying data with active Velocity distributed cache was up to 31 times faster than without it. In certain configurations, when experimenting with CTP1, we also observed some Velocity scalability issues; however, for a first CTP, Velocity performed very well. At the moment of publishing, Velocity is already in CTP2 and its development continues to be in very active stages. In the near future we expect the Velocity team to spend a lot of time focused on improving scalability and performance prior to release. Velocity team has big plans and not only will some issues soon be resolved, but lots of new functionality will be added as well.&lt;br /&gt;&lt;br /&gt;So, if you have problems with database scalability and you use Microsoft technology stack, you may consider introducing Velocity to your system.&lt;br /&gt;&lt;br /&gt;There are still many interesting topics to investigate regarding the Velocity and CompFin. Also, there can be some other improvements can be made in CompFin, so we hope that this project was not the last one, where we were working with Microsoft technologies.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-2683915723008575189?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/2683915723008575189/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=2683915723008575189" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/2683915723008575189?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/2683915723008575189?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/xd933VY1DO4/speeding-up-data-intensive-hpc.html" title="Speeding up data-intensive HPC applications with Velocity" /><author><name>Max Martynov</name><uri>http://www.blogger.com/profile/12000269073735478943</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="16905684925284551113" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_sZh4q0-S4-Y/SU9tu0y2e9I/AAAAAAAAABg/L4ZR7dvDJvA/s72-c/CompFinArchitecture.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/12/speeding-up-data-intensive-hpc.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0EEQHw4eCp7ImA9WxRRE0s.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-1941024032633285802</id><published>2008-09-24T14:45:00.000-07:00</published><updated>2008-09-25T11:00:01.230-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-09-25T11:00:01.230-07:00</app:edited><title>Cloud Performance Reports</title><content type="html">Cloud computing is getting a great deal of attention these days. Unfortunately, there is very little data available about performance, scalability and usability of the cloud deployment platforms.&lt;br /&gt;&lt;br /&gt;I was recently invited to speak at &lt;a href="http://web.meetup.com/66/"&gt;Silicon Valley Cloud User Group&lt;/a&gt; where I tried to bring a "practitioner's prospective" and present the results of three different recent performance and scalability benchmarks related to the cloud computing. The first benchmark aims to establish the &lt;a href="http://blog.griddynamics.com/2008/07/gridgain-on-ec2.html"&gt;scalability of EC2&lt;/a&gt; on a perfectly parallel mathematical problem, a Monte Carlo simulation, executed by &lt;a href="http://www.gridgain.com/"&gt;Grid Gain&lt;/a&gt;'s popular open source map/reduce platform - and to document lessons learned in making the application scale to 512 nodes.&lt;br /&gt;&lt;br /&gt;The  second benchmark looks at a scalability of a more complex stateful application, typical to Risk Management, that required both in-memory data grid and compute grid. Both grids were  running on &lt;a href="http://aws.amazon.com/ec2/"&gt;EC2&lt;/a&gt; and executed by &lt;a href="http://www.gigaspaces.com/"&gt;GigaSpaces&lt;/a&gt;' data &amp;amp; compute grid platform.&lt;br /&gt;&lt;br /&gt;The third benchmark looks at a prototypical data-intensive Portfolio Analysis application used heavily in the financial services industry, and studies the performance impact of data being located close to computing, or "on the cloud" vs. "off the cloud".  This work was done in collaboration with Microsoft on their &lt;a href="http://hpc.microsofthpc.net/compfin/"&gt;HPC++ CompFin Lab&lt;/a&gt; that integrates &lt;a href="http://www.microsoft.com/hpc/en/us/default.aspx"&gt;Microsoft Windows HPC Server&lt;/a&gt;,        a central market data database and Microsoft productivity products        to provide academic community with an online service to publish, execute         and manage computational finance models.               &lt;br /&gt;&lt;br /&gt;You can find the &lt;a href="http://files.griddynamics.net/CloudUG.ppt"&gt;presentation&lt;/a&gt;, with summary of results here. Please, note that these results are very fresh and the benchmarks in two cases are still going on. You can find far more details on the first benchmark in our previous blog post. We will be coming with more detailed blog reports for the second and third benchmarks soon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-1941024032633285802?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/1941024032633285802/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=1941024032633285802" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1941024032633285802?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1941024032633285802?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/PDiCVpn5-hY/cloud-performance-reports.html" title="Cloud Performance Reports" /><author><name>Victoria Livschitz</name><uri>http://www.blogger.com/profile/12264301035182704078</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07316905052902410928" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/09/cloud-performance-reports.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUECQnk9eyp7ImA9WxdUGE4.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-8746063359187804882</id><published>2008-07-31T06:08:00.000-07:00</published><updated>2008-08-04T01:14:23.763-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-08-04T01:14:23.763-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="~Max Gorbunov" /><category scheme="http://www.blogger.com/atom/ns#" term="scalability" /><category scheme="http://www.blogger.com/atom/ns#" term="Amazon" /><category scheme="http://www.blogger.com/atom/ns#" term="GridGain" /><category scheme="http://www.blogger.com/atom/ns#" term="EC2" /><title>Scalability Benchmark of Monte Carlo Simulation on Amazon EC2 with GridGain Software</title><content type="html">&lt;p&gt;This blogpost presents the report on recently concluded scalability benchmark of Monte Carlo simulations running on &lt;a href="http://aws.amazon.com/ec2"&gt;Amazon EC2 &lt;/a&gt; using the &lt;a href="http://www.gridgain.com/"&gt;GridGain framework&lt;/a&gt;. It consists of two parts: Part I is a technical report on the benchmark goals, method and results and Part II is an account of the development process and lessons learned.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p style="font-weight: bold;"&gt;Part I: Benchmark description &amp;amp; results&lt;br /&gt;&lt;/p&gt;&lt;p&gt;The goal of the benchmark was to study the scalability characteristics of massively parallel algorithms executed on Amazon ’s Elastic Computing Cloud (EC2) and managed by GridGain &lt;/p&gt;&lt;p&gt;A Monte Carlo simulation was chosen as an algorithm to represent its widespread use in financial applications. The same algorithm with different parameters was used on a wide range of grid sizes: 2, 4, 8, 16, ..., 256, 512. The parameters guaranteed that the amount of work performed by the whole grid was always linear with respect to the number of nodes. In other words, twice as many nodes always performed twice as much work. Perfect linear scalability would demonstrate the identical *completion time* of Job-1 running on 2 node and Job-2 running on 512 nodes, given that Job-2 had 256-times more work to do.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;As Amazon permitted us to use the maximum of 550 nodes in a single run, the upper limit o fthe benchmark was chosen at 512.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;The test utilized a full open source software stack, including GridGain, the Linux operating system, Sun Microsystem’s Open MQ JMS messaging and Sun’s Java 5 VM. Amazon’s preinstalled Fedora Core 8 with custom testing framework was used to conduct the benchmark.&lt;/p&gt;&lt;p&gt;The results justified our hopes: we could successfully run up to 512 nodes without significant performance degradation. The results graph is shown on figure 1.&lt;br /&gt;&lt;/p&gt;&lt;p style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_5M9BNfStEvI/SJHvpM3gcqI/AAAAAAAAAzc/CZvODJ32klo/s1600-h/timings.PNG"&gt;&lt;img src="http://bp3.blogger.com/_5M9BNfStEvI/SJHvpM3gcqI/AAAAAAAAAzc/CZvODJ32klo/s400/timings.PNG" alt="" id="BLOGGER_PHOTO_ID_5229224133461570210" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-weight: bold;font-size:78%;" &gt;Figure 1. Average task execution times on 2-512 nodes grids.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;The performance degradation of 3 seconds (about 20%) should be considered minor given roughly 250-fold increase in scale. The curve rises two times: in the ranges 2-8 and 256-512, while 8-256 remains almost flat.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;The range 2-8 growth can be explained by initial rapid growth of grid size, which causes the rise of maximum task execution time among nodes (the overall time is determined by the “weak link”). The range 256-512 growth could be explained by OpenMQ limitations as applied to our use case (the JMS load is quadratic against the grid size). In order to continue scaling the grid beyond 512 nodes, a clusterization of OpenMQ is likely to be required.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;We can conclude that the obtained result showed near linear scalability and performance improvements from 2 to 512 nodes in all test runs.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-weight: bold;"&gt;Part II: How did we do it?&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;To start using EC2, just open &lt;a href="http://docs.amazonwebservices.com/AWSEC2/2008-02-01/GettingStartedGuide/"&gt;Amazon EC2 Getting Started Guide&lt;/a&gt;, download EC2 API tools and follow the instructions. You can use one of the predefined public machine images or create a bundle with your own image.&lt;/p&gt;&lt;p&gt;The relatively big problem is the lack of persistence options in EC2. When an instance goes down, all the data is lost. However, there are third party solutions capable of mounting Amazon's &lt;a href="http://aws.amazon.com/s3"&gt;S3&lt;/a&gt; storage interface as a Linux filesystem.&lt;/p&gt;&lt;p&gt;Now, we needed to create a simple framework allowing us to manage the grid on Amazon’s EC2. The basic functionality would look like:&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;start the grid;&lt;/li&gt;&lt;li&gt;add/remove nodes to/from the grid;&lt;/li&gt;&lt;li&gt;display the grid health;&lt;/li&gt;&lt;li&gt;run single calculation task interactively;&lt;/li&gt;&lt;li&gt;run a benchmark (batch task execution on 1, 2, 4, …, 512,... nodes).&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The first problem we encountered was that multicast is not supported in the EC2 infrastructure. By default GridGain is configured to make initial discovery by IP multicast. Unfortunately, as far as we know, the EC2 guys don’t plan to create fully functional multicast support anytime soon.&lt;/p&gt;&lt;p&gt;Luckily, GridGain is shipped with a set of various Service Provider Interfaces (SPIs). So, we could choose another DiscoverySPI that does not use IP multicast for discovery purposes, and we chose JMSDiscoverySPI. At first, we picked &lt;a href="http://activemq.apache.org/"&gt;Apache ActiveMQ&lt;/a&gt; as a JMS implementation and soon ran into stability issues. Then we switched to &lt;a href="https://mq.dev.java.net/"&gt;OpenMQ&lt;/a&gt; and it proved to be sufficiently robust.&lt;/p&gt;&lt;p&gt;Our second problem turned to be the default maximum number of running instances per user – 20. Since we were going to run grids much larger than 20 nodes, we needed to override the default limit. It took several steps and a few more days to negociate with Amazon EC2, but eventually, we were granted the right to run up to 550 nodes. It seems that the business process of requesting a large amount of nodes is still not very well-defined by Amazon.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;The image creation was not a very hard task: we got the public Fedora 8 i386 image, ran it, installed Java Runtime Environment 6, GridGain and ActiveMQ, then bundled this new image.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;We still required some extra configuration since the grid node must start on system startup. The hardest thing was to write shell scripts for starting the grid. These scripts had to parse user data sent to the instance (different for the Head Node and Worker Nodes), determine the node type and start the necessary software in each given configuration. Later these scripts were replaced by more convenient Java-based tools and Head Node’s ActiveMQ was replaced by a standalone OpenMQ.&lt;/p&gt;&lt;p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_5M9BNfStEvI/SJLg9fKjPII/AAAAAAAAAzk/AwnmEC8YY-o/s1600-h/ggec2-2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_5M9BNfStEvI/SJLg9fKjPII/AAAAAAAAAzk/AwnmEC8YY-o/s400/ggec2-2.png" alt="" id="BLOGGER_PHOTO_ID_5229489464272960642" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p style="text-align: center;"&gt;&lt;span style="font-weight: bold;font-size:78%;" &gt;Figure 2. Initial GridGain on Amazon EC2 architecture.&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;So, how does the grid work? First, we start the Head Node (see figure 2). Head Node automatically starts ActiveMQ (only in early versions), GridGain in master mode (will be described later) and runs the requested number of Worker Nodes. Since Workers should connect to the JMS server, Head Node passes its IP address as user data for Worker Nodes. Worker Nodes just start GridGain.&lt;/p&gt;&lt;p&gt;We realized quickly that we needed some user-friendly interface to manage the grid. So, we rewrote the scripts in Java using the &lt;a href="http://code.google.com/p/typica/"&gt;typica&lt;/a&gt; library. Thus we integrated Amazon and GridGain management. Now we can automatically start EC2 instances, wait for them to start, get their IP’s, track status, etc. Together with GridGain management it becomes a very powerful tool. Let’s imagine the grid running completely autonomously. The management module can automatically bring up more EC2 instances or shut them down depending on the current grid load.&lt;/p&gt;&lt;p&gt;When we had the simple web UI with capabilities of seeing the table of grid nodes and running our benchmark, we felt ready to run grids larger than 20 nodes. We asked Amazon to increase our running instances limit to 1050 nodes. Amazon agreed to let us run 550 instances. The know-how of our UI was the embedded scripting engine, allowing us to change a benchmarking schema without restarting the grid. To understand how simple it is to run a benchmark consisting of several task runs on different number of nodes, just look at this code:&lt;br /&gt;&lt;/p&gt;&lt;pre style="padding-left: 20px;"&gt;var itersPerNode = 5000;&lt;br /&gt;var cnode = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512];&lt;br /&gt;for (var i in cnode) {&lt;br /&gt;&lt;pre style="padding-left: 32px;"&gt;var n = cnode[i];&lt;br /&gt;grid.growEC2Grid(n, true);&lt;br /&gt;grid.waitForGridInstances(n);&lt;br /&gt;runTask(itersPerNode * n, n, 3);&lt;/pre&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;p style="text-align: left;"&gt;&lt;span style="font-weight: bold;font-size:78%;" &gt;Listing 1. Sample benchmarking script.&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;The closing remarks:&lt;br /&gt;&lt;/p&gt;&lt;p&gt;The framework we built is a good place to start developing a GridGain appliance. But what if we want to create, for instance, an In-Memory Data Grid on this basis? Our next step is making the framework more generic to allow easy integration with virtually any grid framework (either computing or IMDG). There are also other ideas such as creating a generic image and storing specific grid software/configurations in S3, which should ease debugging and small alterations to the appliance.&lt;/p&gt;&lt;p&gt;We are currently evaluating if our framework will be something more than just a GridGain testing framework. We'd love to hear from the community if this line of work is interesting to others beside ourselves. If you'd like to know more about this benchmark, or get access to the full source code, please contact me at mg&lt;span&gt;o&lt;/span&gt;rbunov@&lt;span style="display:none;"&gt;&lt;img src="wrong"/&gt;&lt;/span&gt;griddynamics.com&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-8746063359187804882?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/8746063359187804882/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=8746063359187804882" title="12 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8746063359187804882?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8746063359187804882?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/MCaibAd4liA/gridgain-on-ec2.html" title="Scalability Benchmark of Monte Carlo Simulation on Amazon EC2 with GridGain Software" /><author><name>Max Gorbunov</name><uri>http://www.blogger.com/profile/05243241093447788300</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="13489806596952463622" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://bp3.blogger.com/_5M9BNfStEvI/SJHvpM3gcqI/AAAAAAAAAzc/CZvODJ32klo/s72-c/timings.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">12</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/07/gridgain-on-ec2.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0EBQ3cyfip7ImA9WxdXEkk.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-7541093640272828945</id><published>2008-06-19T02:03:00.000-07:00</published><updated>2008-06-23T10:20:52.996-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-06-23T10:20:52.996-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term=".NET" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="~Max Martynov" /><category scheme="http://www.blogger.com/atom/ns#" term="distributed cache" /><category scheme="http://www.blogger.com/atom/ns#" term="scalability" /><category scheme="http://www.blogger.com/atom/ns#" term="enterprise applications" /><category scheme="http://www.blogger.com/atom/ns#" term="data grid" /><title>Grid technologies in middle-size applications</title><content type="html">Grid technologies were born to solve extreme problems and currently they are used primarily by large-scale applications. However, like computers, which were initially used only for solving complex scientific tasks and later came to almost every house, grid technologies are coming into middle-size enterprise market. In this article I will try to answer a question about how medium application can take advantage of grid technologies.&lt;br /&gt;&lt;br /&gt;When writing large application that solve extreme problems, scalability is always an issue and you do not have a choice: you must invest resources in scalability and developers often have enough time and knowledge to solve this problem. When writing small applications, you usually do not care about this kind of problems, because all will work fine on a single server. But when writing middle-size application, you are in trouble, because you are already big enough to start thinking about scalability, but you are not big enough to invest a lot of resources in solving this problem. When middle-size applications grow from little ones the trouble becomes really serious. However, scalability problems in middle-size applications are usually caused by a very limited set of architecture decisions. Knowing these causes and their resolutions will help to build a more scalable application.&lt;br /&gt;&lt;br /&gt;Whatever architecture you choose for a middle-size application, it will usually have a web server and a database. In the best case, it will have a single physical machine with both. In the worst case, it will have web servers, application servers and database server on separate machines. In all cases, the request processing chain will include several processes and will take a lot of time. Usually, the database server is also a bottleneck for the entire system, because it doesn’t scale well. While resolving these problems, mankind invented caches. Caches are useful to store some data that is costly to compute or retrieve. They can reduce both database load and request processing time a lot. However, caches can also be scaling killers.&lt;br /&gt;&lt;br /&gt;Assume you have an architecture with a dedicated web server that hosts your application process. It is relatively easy to maintain a cache on a single server. But suppose you want to scale and the single server becomes a load balancing cluster, where each machine should maintain its own cache, and this cache should be synchronized with caches on other servers. For example, if some item is removed from the cache in one server, it should be immediately removed in every cache on every other server in the cluster. This is extremely hard to accomplish. But since developers often implement simple local caches by themselves, they try to enhance their caches to support distributed behavior. The problem is that developers often do not have enough knowledge and experience to do this. Fortunately, the problem of distributed caches is well known and the solution already exists.&lt;br /&gt;&lt;br /&gt;The solution is to use third-party distributed caches or &lt;a href="http://en.wikipedia.org/wiki/In_Memory_Data_Grid"&gt;In Memory Data Grids&lt;/a&gt; (IMDG). In Memory Data Grids were created to solve problems of scaling data in extreme applications, where the cluster contains a hundreds of servers. Data stored on these servers can be partitioned between them or replicated. If data is partitioned, each server contains only one chunk of data and each chunk is stored on multiple servers to provide failover. This allows huge amounts of data to be stored in memory. If data is replicated, it is stored in full on each server. Of course, the data cached on each server is synchronized with data on other servers. This allows very fast access to data, because it is always available locally. In Memory Data Grids provide lots of other interesting features, which deserve an entire book to describe. Distributed caches are essentially a simplified form of an In Memory Data Grid. Currently there are many implementations of both IMDGs and distributed caches and if you choose to use these technologies you have a number of options.&lt;br /&gt;&lt;br /&gt;The concrete choice will depend on what technology or framework you use in your application:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;If you are using .NET, you may use &lt;a href="http://www.scaleoutsoftware.com/"&gt;ScaleOut&lt;/a&gt;, &lt;a href="http://www.alachisoft.com/ncache/"&gt;NCache&lt;/a&gt; or Microsoft’s new distributed cache &lt;a href="http://www.microsoft.com/downloads/details.aspx?FamilyId=B24C3708-EEFF-4055-A867-19B5851E7CD2&amp;amp;displaylang=en"&gt;Velocity&lt;/a&gt;, which is currently available as a Community Technology Preview. They all provide an ASP.NET session state provider – the easiest way to gain a benefit from grid technologies. With this session state provider you will not need to maintain a special SQL Server for storing ASP.NET session data, because all data will be distributed between web servers in the cluster in a reliable and robust way.&lt;/li&gt;&lt;li&gt;If you are using Java, &lt;a href="http://www.oracle.com/technology/products/coherence/index.html"&gt;Oracle Coherence&lt;/a&gt; and &lt;a href="http://www.gigaspaces.com/"&gt;GigaSpaces&lt;/a&gt; are the most famous In Memory Data Grids and, hence, can be used as distributed caches. They both provide a second level cache for Hibernate, so, if you use it, you can scale easily with no additional development efforts.&lt;/li&gt;&lt;li&gt;If you are using C++, PHP, Ruby or Python, you should consider &lt;a href="http://www.danga.com/memcached/"&gt;memcached&lt;/a&gt;. This is a very famous distributed cache initially developed for LiveJournal, which has already helped to scale many extreme applications, like Wikipedia, YouTube, Facebook and others.&lt;/li&gt;&lt;/ul&gt;All these implementations will help you to solve distributed cache problems and scale well. In simple cases, to start using them, you will need to replace your old local caches with new distributed ones. If you need to scale a Hibernate second-level cache, or an ASP.NET session state provider, you will not be required to write any code. However, in the case of complex and serious scalability problems you can consult with us at Grid Dynamics any time and we will help you to solve them in a most effective way.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-7541093640272828945?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/7541093640272828945/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=7541093640272828945" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7541093640272828945?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7541093640272828945?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/fVQ1LmDxzOM/grid-technologies-in-middle-size.html" title="Grid technologies in middle-size applications" /><author><name>Max Martynov</name><uri>http://www.blogger.com/profile/12000269073735478943</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="16905684925284551113" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/06/grid-technologies-in-middle-size.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkYGQXk9cCp7ImA9WxdXEkk.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-6484477625251023454</id><published>2008-05-31T15:22:00.000-07:00</published><updated>2008-06-23T09:55:20.768-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-06-23T09:55:20.768-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="~Sylvia Kainz" /><category scheme="http://www.blogger.com/atom/ns#" term="Open Source Grid and Cluster Conference" /><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="Sun Grid Engine" /><category scheme="http://www.blogger.com/atom/ns#" term="GridGain" /><category scheme="http://www.blogger.com/atom/ns#" term="Hadoop" /><title>Open Source Grid Expertise and Thought Leadership</title><content type="html">Last week, an exciting event took place here in the Bay Area: The Open Source Grid and Cluster Conference was held in Oakland. Engineers and scientists who care about the complex interactions between open source and grid computing from around the world,  discussed their projects and research insights. Grid Dynamics was represented in two ways:&lt;br /&gt;Eugene, our CTO, gave a presentation about Convergence, one of our key project about integrating Data grids with Compute grids. Check out his &lt;a href="http://www.griddynamics.com/opensource/Convergence_EugeneSteinberg.pdf"&gt;presentation&lt;/a&gt;.&lt;img style="margin: 0px auto 10px; display: block;" src="http://bp0.blogger.com/_N3LcjoJw03o/SDNhTt6BinI/AAAAAAAAALI/vtyjszCPFRY/s200/Eugene.JPG" alt="Eugene Steinberg giving talk on OSGE&amp;amp;TL Con" id="BLOGGER_PHOTO_ID_5202608985911429746" border="0" /&gt;&lt;span class="Apple-style-span"  style="font-size:x-small;"&gt;&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;Eugene Steinberg, CTO, Grid Dynamics&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span" style="font-size: 10px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;/span&gt;Eugene presented the latest version of Convergence which integrates DataSynapse GridServer5.0 and GigaSpaces XAP6.0. Future versions of Convergence will provide adaptors for Sun Grid Engine, Oracle's Coherence and GridGain.&lt;br /&gt;&lt;br /&gt;The second event was moderated by Victoria Livschitz, our CEO: A panel about Trends in Open Source Data Grids with experts from the open source grid computing industry: &lt;span style="font-weight: bold;"&gt;Doug Cutting&lt;/span&gt;, creator of Hadoop, &lt;span style="font-weight: bold;"&gt;Nati Shalom&lt;/span&gt;, CTO and co-founder of GigaSpaces, &lt;span style="font-weight: bold;"&gt;Nikita Ivanov&lt;/span&gt;, President and Founder of GridGain and &lt;span style="font-weight: bold;"&gt;Daniel Templeton&lt;/span&gt;, Manager at Sun Grid Engine.&lt;br /&gt;&lt;br /&gt;The discussion started with each of the panelist introducing their products, current initiatives &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_N3LcjoJw03o/SDNpUN6BiqI/AAAAAAAAALg/IOCJcvQZMiI/s1600-h/Panelists.JPG"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp2.blogger.com/_N3LcjoJw03o/SDNpUN6BiqI/AAAAAAAAALg/IOCJcvQZMiI/s200/Panelists.JPG" alt="" id="BLOGGER_PHOTO_ID_5202617790594386594" border="0" /&gt;&lt;/a&gt; and their commitment to the Open Source community.  Each participant had an interesting perspective particularly to the question on why Data Grids technology is deeply anchored in the Open Source community, but many Compute Grids are not.&lt;br /&gt;&lt;br /&gt;Attendees directly participated by asking the panel specific questions around their technologies and development efforts.&lt;br /&gt;&lt;span style="display: block; padding: 0px; text-align: right; font-size:78%;"&gt;&lt;br /&gt;Left to right: Nati Shalom, Nikita Ivanov,&lt;br /&gt;Doug Cutting, Daniel Templeton&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;The discussion continued  informally after the panel with conference attendees and speakers discussing product specific questions and key future technology.&lt;br /&gt;&lt;span style="display: block; padding: 0px; text-align: center;"&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_N3LcjoJw03o/SDNpxd6BirI/AAAAAAAAALo/WHWwsawsa34/s1600-h/Cutting.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; padding: 0px; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_N3LcjoJw03o/SDNpxd6BirI/AAAAAAAAALo/WHWwsawsa34/s200/Cutting.JPG" alt="" id="BLOGGER_PHOTO_ID_5202618293105560242" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:78%;"&gt;Doug Cutting in discussion with conference attendees&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_N3LcjoJw03o/SDNqbN6BitI/AAAAAAAAAL4/s_ab7q-RTuk/s1600-h/Victoria+and+Nikita.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; padding: 0px; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_N3LcjoJw03o/SDNqbN6BitI/AAAAAAAAAL4/s_ab7q-RTuk/s200/Victoria+and+Nikita.JPG" alt="" id="BLOGGER_PHOTO_ID_5202619010365098706" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:78%;"&gt;Victoria Livschitz and Nikita Ivanov&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_N3LcjoJw03o/SDNp8t6BisI/AAAAAAAAALw/zHvLfJNqQfg/s1600-h/Hari.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; padding: 0px; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_N3LcjoJw03o/SDNp8t6BisI/AAAAAAAAALw/zHvLfJNqQfg/s200/Hari.JPG" alt="" id="BLOGGER_PHOTO_ID_5202618486379088578" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:78%;"&gt;Nati Shalom with conference attendee&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;The conference setting allowed attendees to actively share their experience first hand, ask questions in an informal setting, hear the experts talk about technical details and strategic directions about the industry.&lt;br /&gt;&lt;br /&gt;If you want to know more about the Open Source Grid and Cluster Conference, &lt;a href="http://www.opensourcegridcluster.org/"&gt;here &lt;/a&gt;is link to this year's conference. I also found a few bloggers taping the &lt;a href="http://blogs.sun.com/deirdre/entry/where_i_am_now_open"&gt;key note speech&lt;/a&gt; by Fritz Ferstl and some &lt;a href="http://www.flickr.com/photos/chrisdag/2492116609/in/pool-opensourcegridcluster/"&gt;pictures &lt;/a&gt;of the event. Enjoy!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-6484477625251023454?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/6484477625251023454/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=6484477625251023454" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6484477625251023454?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6484477625251023454?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/KdhkbzSIGgs/open-source-grid-expertise-and-thought.html" title="Open Source Grid Expertise and Thought Leadership" /><author><name>skainz</name><uri>http://www.blogger.com/profile/00636676289358399014</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12766458989396206553" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://bp0.blogger.com/_N3LcjoJw03o/SDNhTt6BinI/AAAAAAAAALI/vtyjszCPFRY/s72-c/Eugene.JPG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/05/open-source-grid-expertise-and-thought.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D08FR346fyp7ImA9WxdREEk.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-4413156283142998620</id><published>2008-05-29T04:15:00.000-07:00</published><updated>2008-05-29T00:23:36.017-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-05-29T00:23:36.017-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="~Alexander Kusnetsov" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="convergence" /><title>Match Making in Grid Computing</title><content type="html">Data partitioning is widely used to store and manage huge amounts of data across several servers. Often there is a need to introduce global distributed transactions to keep the data consistent. Those global transactions represent a natural limit to the scalability for the entire system.&lt;br /&gt;&lt;br /&gt;Project &lt;a href="http://openspaces.org/display/CVG/Convergence"&gt;Convergence&lt;/a&gt; was born as an example of managing interaction between in-memory data grid and computational grid to address specifically problem of data consistency in systems with global distributed transactions. Based on the idea of &lt;a href="http://blog.griddynamics.com/2008/02/data-aware-routing-datasynapse_14.html"&gt;data aware routing&lt;/a&gt;, network utilization and overall performance will be increased. In addition, data aware routing can also improve scalability.&lt;br /&gt;&lt;br /&gt;How does it work? Let me illustrate the solution by using a simple example: You are the owner of a dating agency. A candidate comes to your agency and looks for a person with specific parameters such as the love for operas, an interest in cooking and a liking for dogs . Let's assume that the set of these parameters is unique for each person and all parameters need to be matching. So, your agency will find him a perfect match or he will be put into your database as a single new customer waiting for a match. Considering your candidate’s specific parameters, you expect that it will take some time to find him an exact match.&lt;br /&gt;&lt;br /&gt;Your company has an outstanding reputation and you are flooded with candidates and your company cannot handle all the new candidates. You realize that you need to get a business partner and you partner with another dating agency in town. You also hire as secretary who greets your candidates and sends them randomly to one of the offices.&lt;br /&gt;&lt;br /&gt;One day, two candidates with the exact same parameters (a perfect match) enter your office and your secretary sends one of them to your office and the other to your business partner’s office. Both of you start looking independently for an exact match simultaneously, but both of you cannot find one in your databases. As a result, you and your companion mark both candidates as single, waiting for a match while you miss the opportunity to generate a fee from the match. How can you solve this unfortunate incident?&lt;br /&gt;&lt;br /&gt;First, you can have your secretary manage the assignment better by having her search your and your business partner's combined database: But while she is using the database, nobody else can access the database until she is done with her search. This may produce a match of the candidates, but create a bottleneck and hence impact the performance of your match making capabilities.&lt;br /&gt;&lt;br /&gt;Secondly, you divide all candidates in two groups (for example, those that like dogs and those that do not, assuming there are about the same number of candidates that like or dislike dogs). This way, you and your partner are responsible for an equal amount of candidates. You will instruct your secretary to send incoming candidates to the offices depending on the likes/dislikes of dogs. Because you are processing only half of the candidates with one parameter already matched in your database, you will achieve a match at a faster rate than before.&lt;br /&gt;&lt;br /&gt;This is exactly how in-memory data grids and computational grids interact and how data aware job scheduling can address the ‘matching of data grid and compute grid requests.’ As in our example, a database or an in-memory data grid (&lt;a href="http://www.oracle.com/technology/products/coherence/index.html"&gt;Oracle Coherence&lt;/a&gt; or &lt;a href="http://www.gigaspaces.com/xapOverview"&gt;GigaSpaces XAP&lt;/a&gt;), represents the databases with all the parameters of the candidates. You and your business partner’s represent the computational grid (&lt;a href="http://gridgain.com/"&gt;GridGain&lt;/a&gt;, &lt;a href="http://www.datasynapse.com/en/products/gridserver.php"&gt;DataSynapse GridServer&lt;/a&gt; or &lt;a href="http://www.sun.com/software/gridware/"&gt;Sun Grid Engine&lt;/a&gt;), and your secretary fulfills the function of a data aware job scheduler.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-4413156283142998620?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/4413156283142998620/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=4413156283142998620" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4413156283142998620?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4413156283142998620?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/mu0A4X6E9V8/data-partitioning-and-global.html" title="Match Making in Grid Computing" /><author><name>Alexander</name><uri>http://www.blogger.com/profile/12758684695414200209</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08497447863300918744" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/05/data-partitioning-and-global.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUYARH8-eyp7ImA9WxdSGU4.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-5847730442849246030</id><published>2008-05-27T02:22:00.000-07:00</published><updated>2008-05-27T17:05:45.153-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-05-27T17:05:45.153-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="flex" /><category scheme="http://www.blogger.com/atom/ns#" term="visualization" /><category scheme="http://www.blogger.com/atom/ns#" term="RIA" /><category scheme="http://www.blogger.com/atom/ns#" term="~Ivan Bulanov" /><category scheme="http://www.blogger.com/atom/ns#" term="graph" /><title>Choosing graph manipulation library for your Flex app</title><content type="html">The open-source movement plays a pretty big role in everyday life here at &lt;a href="http://www.griddynamics.com/"&gt;Grid Dynamics&lt;/a&gt;. We always try to do our best to improve the software we're working with and contribute back into the community.&lt;br /&gt;&lt;br /&gt;While developing     an &lt;a href="http://en.wikipedia.org/wiki/Rich_Internet_application"&gt;RIA&lt;/a&gt; project  we faced a need to display complex graphs  within an Adobe Flex application. It turns out that there are not many solutions to this problem existing at the moment. Actually AFAIK there are only &lt;a href="http://flare.prefuse.org/"&gt;Flare toolkit&lt;/a&gt;, &lt;a href="http://www.yworks.com/en/products_yfilesflex_about.htm"&gt;yFiles Flex&lt;/a&gt;, &lt;a href="http://mark-shepherd.com/blog/springgraph-flex-component/"&gt;SpringGraph&lt;/a&gt; and &lt;a href="http://code.google.com/p/flexvizgraphlib"&gt;Flex Visual Graph Library&lt;/a&gt; (further on &lt;i&gt;FVG&lt;/i&gt;). Having done the comparison between the libraries we have chosen &lt;i&gt;Flex Visual Graph Library&lt;/i&gt; as our current graphing solution.&lt;br /&gt;&lt;h3&gt;Good Things and Bad Things&lt;/h3&gt;&lt;br /&gt;&lt;div align="left"&gt;Here is the list of the &lt;span&gt;most &lt;/span&gt;distinctive features of &lt;span&gt;the&lt;/span&gt; &lt;i&gt;FVG&lt;/i&gt;:&lt;br /&gt;&lt;/div&gt;&lt;ol&gt;&lt;li&gt;It is an open&lt;span&gt;-&lt;/span&gt;source project so everyone can use it for free and modify it in the way he likes;&lt;/li&gt;&lt;li&gt;It can render not only trees but also (relatively) &lt;span&gt;complex&lt;/span&gt;&lt;span&gt; graphs;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;It contains a number of algorithms for performing layout;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span&gt;Laying out the graph is perform&lt;span&gt;ed&lt;/span&gt;  on the client&lt;span&gt;-&lt;/span&gt;side and no server resources are involved in the process.&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;The library has some documentation and samples, although sometimes (and maybe too often) one has to dig into its source code to figure out the way it works.&lt;br /&gt;&lt;br /&gt;However, it is a young project so  it is expected to be imperfect. Here are the main drawbacks that&lt;span&gt; sometimes &lt;/span&gt;annoy me:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span&gt;A somewhat m&lt;/span&gt;&lt;span&gt;onolithic&lt;/span&gt; library design, &lt;span class="Apple-style-span"&gt;god-classes&lt;/span&gt; anti-pattern here and there; favor is given to state changes instead of the parameter passing. I would also mention here coding conventions, although it is surely my personal impression;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Lack of documentation on layout algorithms; incomplete design diagrams, tutorials and examples.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div align="left"&gt;When dealing with someone's else products instead of the homegrown ones you will almost always find its feature set is incomplete. This is the case not only for open-source projects but also for closed-source ones (including commercial) and the greatest advantage of open source is that it allows you to implement the desired features and not wait until someone else does it. And surely there were missing features in &lt;i&gt;FVG&lt;/i&gt; that our customers and we ourselves wanted to have.&lt;/div&gt;&lt;h3&gt;Our humble help&lt;/h3&gt; So here is a list of the things we improved:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Fixed and successfully harnessed functionality for adding and removing &lt;span&gt;of&lt;/span&gt; nodes. Before the fix it was impossible to do that after the graph was initialized. It seems that the developers of the library just did not need it so they haven't got it into shape.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Fixed support for directed graphs. It was one of the most controversial changes to the library. Before, the layouting algorithms were tested only on nondirected graphs so the spanning tree in the case of a directed graph was computed in a wrong way. This led to the graph being &lt;span&gt;&lt;span&gt;laid out i&lt;/span&gt;&lt;/span&gt;nappropriately. Now the user can choose if he wants to get the new behavior or emulate the old one.&lt;/li&gt;&lt;li&gt;Enhanced hierarchical layouter. It now supports interleaving of nodes. Sibling nodes (the nodes on the same  level) are placed now in a checkerboard formation so every even node is put a bit higher than an odd one. This enhances the readability of the graph and the overall appearance if  nodes have text labels under them.&lt;/li&gt;&lt;li&gt;Added possibility to sort sibling nodes in a &lt;span&gt;specified&lt;/span&gt; order. This again enhances readability. Suppose that you have some text on your nodes. You can sort the siblings in the lexicographical order so you can more easily find the node you want with your eyes. Or you may want to display a certain type of nodes first. Plugging in your own sorting logic is tremendously easy.&lt;/li&gt;&lt;li&gt;Added a very cool feature: graph zooming. There was nothing like that in the library before. One could only try to tune some parameters of layouters in order  to achieve zooming effect but the real zooming for every layouter was impossible.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;span&gt;In addition, we've been implementing numerous small fixes and improvements — and we are definitely going to continue this effort in the future.&lt;/span&gt;&lt;br /&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;In general, supporting the open&lt;span&gt;-&lt;/span&gt;source library has the following notic&lt;span&gt;e&lt;/span&gt;able peculiarity: as with other source management models, it is always needed to keep compatibility with those who already use your library in their applications. But when &lt;span&gt;a need to make significant and incompatible changes to the library appears, one can always commit the changes into a separate branch.&lt;/span&gt; It turns out to be a rather common practice in the open-sourced world. Although this is not as convenient as just committing the changes directly to the trunk, the approach has advantages. It allows you to not break the compatibility and let other developers test and review the new functionality. We've been maintaining a separate branch for quite a long time, partly because of tensions with the maintainers, partly because of a lack of time to perform all the necessary merges and testing. However, not long ago we decided that this will likely cause the branches to diverge in some point of time — definitely that will not add value to the library, but rather subtract from it. So we've consolidated our efforts with the maintainer of the library and&lt;span&gt; presented a &lt;a href="http://flexvizgraphlib.googlegroups.com/web/Demo.swf"&gt;demo&lt;/a&gt; of the new features. The guys were very collaborative and accepted our changes into the trunk with just a couple of minor changes.&lt;/span&gt;&lt;div align="left"&gt;&lt;span&gt;&lt;br /&gt;As we going to use the library further we will definitely continue to put effort in. So our next step is going be reviewing the &lt;i&gt;FVG&lt;/i&gt;'s architecture and proposing plans for refactoring the code. On the features side, we're going to have a hierarchical layouter that will be able to support multiple root nodes, a zooming engine that will preserve label and node size, node renderer that supports overlay icons, improved edge-node connections... lots of fun things to come.&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div align="left"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div align="left"&gt;Stay tuned!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-5847730442849246030?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/5847730442849246030/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=5847730442849246030" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5847730442849246030?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5847730442849246030?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/blyY-PDL1KE/choosing-graph-manipulation-library-for.html" title="Choosing graph manipulation library for your Flex app" /><author><name>IvanBulanov</name><uri>http://www.blogger.com/profile/05720205714226318572</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="06761332474760822788" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/04/choosing-graph-manipulation-library-for.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEENR3czfCp7ImA9WxdTEEw.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-676234938511704740</id><published>2008-05-05T08:52:00.000-07:00</published><updated>2008-05-05T11:38:16.984-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-05-05T11:38:16.984-07:00</app:edited><title>Partnership with GridGain</title><content type="html">Today we announced a new partnership with &lt;a href="http://www.gridgain.com/"&gt;GridGain&lt;/a&gt; (see the &lt;a href="http://enterpriseapps.itbusinessnet.com/articles/viewarticle.jsp?id=376803"&gt;press release here&lt;/a&gt;). I am extremely pleased to be launching this alliance and hope it will accelerate the already impressive adoption of GridGain's new and exciting open source Java platform for high-performance grid applications.&lt;br /&gt;&lt;br /&gt;At Grid Dynamics we see GridGain as a simple, elegant, extensible, open-source, pure-Java platform that makes  traditional high-performance computing  and all modern map/reduce applications significantly simpler and cheaper to develop and deploy than it has ever been possible before. The platform's extensibility also allows GridGain to be used for a wider set of applications and enables the integration with other tools, systems and frameworks.&lt;br /&gt;&lt;br /&gt;The partnership gives GridGain's customers an integrated Software + Service offering, enhancing GridGain's innovative technology with Grid Dynamics' expertise and track record in designing complex, scale-out applications. Customers are getting best-of-breed solutions from the industry leaders in their respective areas and should expect a seamless experience working across both organizations. Grid Dynamics and GridGain share the same core values, highly regarded by their mutual customers:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Both are fundamentally engineering companies, nearly obsessed with the performance, scalability and quality of applications we write.&lt;/li&gt;&lt;li&gt;Both recognize the importance of the Open Source as a medium for cost-effective and open  software products with almost "organic" properties, which thrive and evolve in the marketplace under a watchful eye and the collective efforts of their designers and users.&lt;/li&gt;&lt;li&gt;Both are cost-efficient, global-from-the-cradle organizations with significant R&amp;amp;D operations in Eastern Europe and headquarters in the Silicon Valley.&lt;/li&gt;&lt;/ul&gt;Under the terms of the relationship, Grid Dynamics is creating a "Center of Excellence" around the GridGain product and extending its design patterns for scalable high-performance applications to include mappings to GridGain technology. Both companies are collaborating on the joint R&amp;amp;D projects, including the planned release of a GridGain plug-in  for Grid Dynamics' open source project &lt;a href="http://www.openspaces.org/display/CVG/Convergence"&gt;Convergence&lt;/a&gt;, designed to enable interoperability between the leading compute and data grid platforms. Together, Grid Dynamics and GridGain simplify high-performance computing, lower the barrier of adoption of map/reduce paradigm for thousands of customers and ultimately enable a better scalability of a wide range of applications, networks and web sites.&lt;br /&gt;&lt;br /&gt;I am personally looking forward to a lasting relationship between our companies that benefits GridGain users and the entire open source high-performance computing community.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-676234938511704740?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/676234938511704740/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=676234938511704740" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/676234938511704740?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/676234938511704740?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/nO3HHVVLZOY/partnership-with-gridgain.html" title="Partnership with GridGain" /><author><name>Victoria Livschitz</name><uri>http://www.blogger.com/profile/12264301035182704078</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07316905052902410928" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/05/partnership-with-gridgain.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUEHSHo9fip7ImA9WxZbFk8.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-7328923381976762682</id><published>2008-04-03T06:57:00.000-07:00</published><updated>2008-04-19T09:47:19.466-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-04-19T09:47:19.466-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="grid dynamics" /><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="~Eugene Steinberg" /><category scheme="http://www.blogger.com/atom/ns#" term="binary calculator" /><category scheme="http://www.blogger.com/atom/ns#" term="testing" /><category scheme="http://www.blogger.com/atom/ns#" term="grid consulting" /><title>Binary Calculator Project - What is the footprint of my GigaSpaces Entries?</title><content type="html">In the initial stages of every Data Grid project it is always essential to get good estimates of memory requirements. How much memory will my domain objects converted to Entries consume and what will be the indexing overhead? The answer to this question defines what JVM heap size to choose and how many of JVMs will be needed to store the intended dataset - or determine how much data can be stored within a given hardware footprint.&lt;br /&gt;&lt;br /&gt;For GigaSpaces, it is hard to provide accurate theoretical estimates of your Entry size. Here are the reasons:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Space doesn't store entries as heap objects - they are stored decomposed to fields&lt;/li&gt;&lt;li&gt;String uid is generated and stored along with each entry&lt;/li&gt;&lt;li&gt;Index overhead is dependent on type of index (ordered or unordered ) and on a field dataset cardinality.&lt;/li&gt;&lt;/ul&gt;For these reasons it is far more practical to take an experimental approach in measuring entry footprint than trying to apply a theoretical formula. The goal of the &lt;a href="http://www.openspaces.org/display/OBC/OpenSpaces+Binary+Calculator"&gt;Binary Calculator project&lt;/a&gt; is to build a convenient toolkit for this problem. The main idea is quite simple:&lt;br /&gt;&lt;br /&gt;First, collect the basic statistics on the efficiency of storage of the individual entities:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Connect to remote space&lt;/li&gt;&lt;li&gt;Get the batch of tested entries from some entry source&lt;/li&gt;&lt;li&gt;Write a batch to remote space&lt;/li&gt;&lt;li&gt;Perform remote garbage collecting&lt;/li&gt;&lt;li&gt;Measure memory usage&lt;/li&gt;&lt;li&gt;Repeat step 2&lt;/li&gt;&lt;/ol&gt;After a sufficient number of iterations is done, we will get a number of data points in the format &lt;span style="font-style: italic;"&gt;(entriesWritten, memoryUsage)&lt;/span&gt;. Performing a linear approximation (e.g. min square linear fit) we receive the approximation of single entry footprint (including index overhead).&lt;br /&gt;&lt;br /&gt;&lt;div&gt;Implementing this idea, we have built an initial version of the Binary Calculator, which can be used as a toolkit for measuring arbitrary entry footprint. It has a very simple GUI that shows progress of the memory experiment.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_6Xw_KFI7ZPs/R_TqJBCJyyI/AAAAAAAABKk/Onpmui9wwCc/s1600-h/bincalc-0.1.0-screenshot.jpg"&gt;&lt;img style="cursor: pointer; display: block;" src="http://bp3.blogger.com/_6Xw_KFI7ZPs/R_TqJBCJyyI/AAAAAAAABKk/Onpmui9wwCc/s400/bincalc-0.1.0-screenshot.jpg" alt="" id="BLOGGER_PHOTO_ID_5185026511627471650" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span&gt;We are planning to turn this simple toolkit into a much more powerful tool, which will generate entries on the fly, based on user-supplied meta data. This way, the user can specify an Entry Description as a simple table in a GUI:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;table style="border: 1px solid black; margin: 0px; width: 70%;"&gt;&lt;tbody&gt;&lt;br /&gt;&lt;tr&gt;&lt;th&gt;Type&lt;/th&gt;&lt;th&gt;Indexed&lt;/th&gt;&lt;th&gt;Number of fields&lt;/th&gt;&lt;th&gt;Avg Length&lt;/th&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Long&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;N/A&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;String&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;1000&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;String&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;5000&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Integer&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;N/A&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;BinaryCalculator will generate Entries at runtime based on this description, populate it with random data, perform memory experiments and show estimated entry size.&lt;br /&gt;&lt;br /&gt;Also, we are planning to build a lightweight plugin system to supply a custom &lt;span style="font-style: italic;"&gt;EntrySource&lt;/span&gt;,&lt;br /&gt;for example, your own random entry generator or JDBC or Hibernate data source. Consequently, performing full fledged capacity experiments loading real data from the database will be much easier.&lt;br /&gt;&lt;br /&gt;We hope that this tool will be quite useful for GigaSpaces implementors in the field.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-7328923381976762682?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/7328923381976762682/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=7328923381976762682" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7328923381976762682?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7328923381976762682?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/PtGo-jb9I1U/binary-calculator-project-what-is.html" title="Binary Calculator Project - What is the footprint of my GigaSpaces Entries?" /><author><name>Eugene</name><uri>http://www.blogger.com/profile/06887960894636399710</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="09277193071639676635" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://bp3.blogger.com/_6Xw_KFI7ZPs/R_TqJBCJyyI/AAAAAAAABKk/Onpmui9wwCc/s72-c/bincalc-0.1.0-screenshot.jpg" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/04/binary-calculator-project-what-is.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkAAQnY-eSp7ImA9WxZUFUg.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-2998531681389602866</id><published>2008-04-01T07:12:00.000-07:00</published><updated>2008-04-07T01:59:03.851-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-04-07T01:59:03.851-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="~Alexander Kusnetsov" /><category scheme="http://www.blogger.com/atom/ns#" term="PackRat" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="grid consulting" /><category scheme="http://www.blogger.com/atom/ns#" term="openspaces.org" /><category scheme="http://www.blogger.com/atom/ns#" term="data grid" /><title>Introducing the PackRat Project</title><content type="html">Remember the legendary &lt;a href="http://en.wikipedia.org/wiki/Fallout_2"&gt;Fallout 2&lt;/a&gt; game? One of its cool features was perks. You could increase the number of action points, strength, critical strike percentage, etc. One of the most useful perks for those who wanted to carry as much as possible was the perk called Pack Rat.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.gamebanshee.com/fallout2/perks/images/packrat.jpg"&gt;&lt;img fix="fixed" style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.gamebanshee.com/fallout2/perks/images/packrat.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;But how does it relate to &lt;a href="http://en.wikipedia.org/wiki/Data_grid"&gt;Data Grids&lt;/a&gt;? The answer is very simple: it is possible to be more efficient at storing the data in the data grid by "packing" the objects. That's the goal of the &lt;a href="http://openspaces.org/display/PRT/PackRat"&gt;PackRat&lt;/a&gt; project, released yesterday on OpenSpaces.org. The initial idea came out of a customer project with &lt;a href="http://tunewiki.com/wiki/index.php/Main_Page"&gt;TuneWiki&lt;/a&gt;, a lyrics distribution site developed with &lt;a href="http://gigaspaces.com/pr_xap.html"&gt;GigaSpaces XAP 6.0&lt;/a&gt; where it was important to pack as many lyrics as possible in a space without noticeable loss of search and retrieval performance.&lt;br /&gt;&lt;br /&gt;Let's say we want to develop a hypothetical lyrics distribution service. First, we need a simple mechanism that packs together all entities related to a single lyric:&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;import&lt;/span&gt; org.openspaces.packrat.annotations.*&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(163, 21, 21);"&gt;@CompressedClass&lt;/span&gt;(type = &lt;span style="color: rgb(163, 21, 21);"&gt;"entry"&lt;/span&gt;)&lt;br /&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;public&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;class&lt;/span&gt; Lyrics {&lt;br /&gt;  &lt;span style="color: rgb(163, 21, 21);"&gt;@IndexField&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; Integer id;&lt;br /&gt;  &lt;span style="color: rgb(163, 21, 21);"&gt;@IndexField&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; String title;&lt;br /&gt;  &lt;span style="color: rgb(163, 21, 21);"&gt;@IndexField&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; String artist;&lt;br /&gt;  &lt;span style="color: rgb(163, 21, 21);"&gt;@CompressedField&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; Integer version;&lt;br /&gt;  &lt;span style="color: rgb(163, 21, 21);"&gt;@CompressedField&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; String language;&lt;br /&gt;  &lt;span style="color: rgb(163, 21, 21);"&gt;@CompressedField&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; String comment;&lt;br /&gt;  &lt;span style="color: rgb(163, 21, 21);"&gt;@CompressedField&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; String album;&lt;br /&gt;  &lt;span style="color: rgb(163, 21, 21);"&gt;@CompressedField&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; String year;&lt;br /&gt;  &lt;span style="color: rgb(163, 21, 21);"&gt;@CompressedField&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; String genre;&lt;br /&gt;  &lt;span style="color: rgb(163, 21, 21);"&gt;@CompressedField&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; String lyrics;&lt;br /&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 128, 0);"&gt;// other stuff&lt;/span&gt;&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;Let's say a song can be searched by artist and title (kind of a composite key). This means that all other non-searchable fields (like our &lt;code&gt;lyrics&lt;/code&gt; field) can be stored in compressed form in a &lt;code&gt;binary field&lt;/code&gt; and &lt;code&gt;unpacked on demand&lt;/code&gt;. The definition for such entry looks like this:&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;public&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;class&lt;/span&gt; LyricsPacked implement Entry{&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; Integer id;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; String title;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; String artist;&lt;br /&gt;  &lt;span style="color: rgb(0, 0, 255);"&gt;private&lt;/span&gt; &lt;span style="color: rgb(0, 0, 255);"&gt;byte&lt;/span&gt;[] binary;&lt;br /&gt;&lt;br /&gt;  &lt;span style="color: rgb(0, 128, 0);"&gt;// other stuff&lt;/span&gt;&lt;br /&gt;}&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;br /&gt;To pack and unpack entries we need to create an instance of &lt;code&gt;Packer&lt;/code&gt;. This class implements &lt;code&gt;Entry&lt;/code&gt; interface and has a method &lt;code&gt;pack&lt;/code&gt;, which packs a given object. This is how you may write a packed entry to a space:&lt;br /&gt;&lt;pre&gt;Packer packer = &lt;span style="color: rgb(0, 0, 255);"&gt;new&lt;/span&gt; Packer();&lt;br /&gt;IJSpace space = (IJSpace) SpaceFinder.find(&lt;span style="color: rgb(163, 21, 21);"&gt;"jini://*/*/mySpace"&lt;/span&gt;);&lt;br /&gt;&lt;br /&gt;Lyrics lyric = &lt;span style="color: rgb(0, 0, 255);"&gt;new&lt;/span&gt; Lyrics();&lt;br /&gt;&lt;span style="color: rgb(0, 128, 0);"&gt;// assign necessary fields&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;space.write(packer.pack(lyric), &lt;span style="color: rgb(0, 0, 255);"&gt;null&lt;/span&gt;, Lease.FOREVER);&lt;/pre&gt;&lt;br /&gt;At some point in time, you'll need to search the lyrics that are stored packed in a space. This code snippet shows how you'd do it using&lt;br /&gt;&lt;code&gt;Packer.packForTemplate(Object template)&lt;/code&gt;, which packs the template before searching for it:&lt;br /&gt;&lt;pre&gt;Lyrics templateLyrics = &lt;span style="color: rgb(0, 0, 255);"&gt;new&lt;/span&gt; Lyrics();&lt;br /&gt;Entry[] entries = space.readMultiple(packer.packForTemplate(templateLyrics), &lt;span style="color: rgb(0, 0, 255);"&gt;null&lt;/span&gt;, &lt;span style="color: rgb(0, 0, 255);"&gt;1000&lt;/span&gt;);&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;for&lt;/span&gt; (Entry entry:entries) {&lt;br /&gt;  Lyrics lyrics = packer.unpack(entry);&lt;br /&gt;  &lt;span style="color: rgb(0, 128, 0);"&gt;// process unpacked lyrics&lt;/span&gt;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;For more examples, check the  &lt;a href="http://www.openspaces.org/display/PRT/Project+Documentation"&gt;project documentation page&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Update:&lt;/span&gt; PackRat was submitted to the &lt;a href="http://openspaces.org/display/OS/OpenSpaces+Developer+Challenge"&gt;OpenSpaces Developer Challenge&lt;/a&gt;. It has won the Early Bird Draw.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-2998531681389602866?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/2998531681389602866/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=2998531681389602866" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/2998531681389602866?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/2998531681389602866?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/mNIif6-3FfY/packrat-project.html" title="Introducing the PackRat Project" /><author><name>Alexander</name><uri>http://www.blogger.com/profile/12758684695414200209</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="08497447863300918744" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/01/packrat-project.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0MGQHw5fCp7ImA9WxZUEUw.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-1122113902139730406</id><published>2008-03-31T20:43:00.000-07:00</published><updated>2008-04-01T23:57:01.224-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-04-01T23:57:01.224-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="grid dynamics" /><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="~Victoria Livschitz" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="data synapse" /><category scheme="http://www.blogger.com/atom/ns#" term="grid consulting" /><category scheme="http://www.blogger.com/atom/ns#" term="openspaces.org" /><title>April Fool's Day - Big OpenSpaces.org Release Date</title><content type="html">Late last year, our friends at GigaSpaces launched an open source community &lt;a href="http://www.openspaces.org/" rel="nofollow"&gt;OpenSpaces.org&lt;/a&gt; with the following mission (quote from the community web site):&lt;br /&gt;&lt;br /&gt;"OpenSpaces.org is a website sponsored by &lt;a href="http://www.gigaspaces.com/" rel="nofollow"&gt;GigaSpaces Technologies&lt;/a&gt; and dedicated to creating and serving a community that develops and shares open source software for the OpenSpaces development framework."&lt;br /&gt;&lt;br /&gt;For a company like ours this was a very good news. As an independent consulting firm with deep expertize in grid technologies and a whole practice dedicated to the popular GigaSpaces XAP technology, we constantly create tools, utilities and frameworks that make GigaSpaces product better and easier to use for our customers. Once the tool is created, we want to give it to the hands of all GigaSpaces users, and ideally let the users help us - and everyone else - to refine and improve the tool. OpenSpaces.org is a perfect venue for such projects.&lt;br /&gt;&lt;br /&gt;Let's take an example - project Gigapult. If you ever had to write a bunch of scripts that start up a cluster on your Windows-based development environment, then re-write them to work on a corporate Linux-based testing lab and then do this again for a differently configured production environment, you know how frustrating and counter-productive this is. We had to do something like this last summer for a customer where the production environment wasn't even known to the developers who were creating the deployment scripts. This script-porting business led to hard-to-find bugs and delays in the production roll out.&lt;br /&gt;&lt;br /&gt;Kirill Ishanov, the project lead, finally had enough and suggested an idea: &lt;span class="blog-text"&gt;let's make Gigaspaces configuration and bootstrapping work cross-platform, and in the process simplify the API and make the configuration files more maintainable and readable. After all, if everything else executes in the JVM, why should deployment logic be any different? Kirill explains his motivation in more details in his two blog postings, &lt;a href="http://griddynamics.blogspot.com/2008/02/updated-ive-finally-moved-gigapult.html" rel="nofollow"&gt;Project Gigapult&lt;/a&gt; and &lt;a href="http://griddynamics.blogspot.com/2008/02/gigaspaces-cluster-bootstrapping-and.html" rel="nofollow"&gt;GigaSpaces Cluster Bootstrapping and Automated Testing&lt;/a&gt;&lt;/span&gt;&lt;span class="blog-text"&gt;, but this is basically how project Gigapult was born. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Today &lt;a href="http://www.openspaces.org/display/GPT/Gigapult" rel="nofollow"&gt;Gigapult&lt;/a&gt; is a JRuby-based domain-specific language designed to specify all aspects of grid configurations in a simple and intuitive way. For more information and to download the code, please, visit the project's &lt;a href="http://www.openspaces.org/display/GPT/Gigapult" rel="nofollow"&gt;homepage&lt;/a&gt;. We've used Gigapult internally on several projects now, and the technology does make life easier.&lt;br /&gt;&lt;br /&gt;Now, along comes OpenSpaces.org and offers Gigapult the vehicle to release the technology into the hands of developers who need it most and can contribute to its enhancements going forward. Since the launch of OpenSpaces.org, Grid Dynamics opened three more active projects:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.openspaces.org/display/CVG/Convergence" rel="nofollow"&gt;Convergence&lt;/a&gt;: a pluggable architecture for interoperability between computational grids and in-memory data grids capable of data-aware scheduling, with initial bindings for GigaSpaces XAP and Data Synapse GridServer&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.openspaces.org/display/PRT/PackRat" rel="nofollow"&gt;PackRat&lt;/a&gt;: a library that helps increase the space capacity by efficiently packing a non-indexed part of entities in binary format&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.openspaces.org/display/OBC/OpenSpaces+Binary+Calculator" rel="nofollow"&gt;OpenSpaces Binary Calculator&lt;/a&gt;: a tool to accurately estimate memory size required to store objects in cache&lt;br /&gt;&lt;br /&gt;At the launch of OpenSpaces.org, GigaSpaces announced the &lt;a href="http://www.openspaces.org/display/OS/OpenSpaces+Developer+Challenge" rel="nofollow"&gt;OpenSpaces Developer Challenge&lt;/a&gt;. Naturally, our four projects entered the competition. And since the challenge ends today, tonight was the first beta release for all four! Nothing like a little competition to encourage the respect for deadlines... Despite today being April 1st, this is no joking matter. We want to win!&lt;br /&gt;&lt;br /&gt;On more serious note, over the next few days I expect project leads to blog in more details about their respective projects. Meanwhile,  please visit the projects, download the technology and give us the feedback - the good, the bad and the ugly so that we can make these tools better, for you and everyone else.&lt;br /&gt;&lt;br /&gt;And please cheer for us as we await the results of the competition!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-1122113902139730406?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/1122113902139730406/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=1122113902139730406" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1122113902139730406?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1122113902139730406?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/aozc7w6FMQ4/april-fools-day-big-openspacesorg.html" title="April Fool's Day - Big OpenSpaces.org Release Date" /><author><name>Victoria Livschitz</name><uri>http://www.blogger.com/profile/12264301035182704078</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07316905052902410928" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/03/april-fools-day-big-openspacesorg.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0YCSXo9eyp7ImA9WxZUE00.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-6133561742819170602</id><published>2008-03-16T23:43:00.000-07:00</published><updated>2008-04-04T02:26:08.463-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-04-04T02:26:08.463-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="~Victoria Livschitz" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="grid consulting" /><category scheme="http://www.blogger.com/atom/ns#" term="convergence" /><title>Notes from MS Developer Conference for Financial Services</title><content type="html">Last week a few folks from our company attended MS Developer Conference for Financial Services in Manhattan. The event was targeting Wall Street developers that came to learn what Microsoft is doing in the area of HPC. It is exciting to see Microsoft get serious about  performance and scale-out computing and have a broad plan to address the needs of this already vast and rapidly expanding space.&lt;br /&gt;&lt;br /&gt;I will not get into the details of technologies being showcased in this blog. Rather, I'd like to talk a little bit - selfishly - about what we were doing there, and then share one discovery - a grid company we didn't know about that has an interesting data caching solution for .NET community.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;1. Excel-on-the-Grid&lt;/h3&gt;We were showing a demo of Excel application acting as a client for a grid-based processing. This is a powerful concept that is gaining wide spread on Wall Street. Excel is traditionally used in financial applications for heavy-duty data aggregation, analysis and visualization, all of which requires local data and processing on the desktop. This approach has serious limitations:&lt;br /&gt;&lt;br /&gt;- If you wanted realtime data embedded in Excel app, how'd you deliver that data to the spreadsheet?&lt;br /&gt;- If you wanted &lt;em&gt;really&lt;/em&gt; computationally intensive spreadsheets, where'd you take the resources?&lt;br /&gt;- If you wanted to share your application with others, how'd you do that?&lt;br /&gt;&lt;br /&gt;Turns out, grid computing offers an elegant answer:&lt;br /&gt;&lt;br /&gt;Step 1: Move the data from spreadsheet to Data Grid&lt;br /&gt;Step 2: Move the computations from spreadsheet to Compute Grid&lt;br /&gt;Step 3: Instrument Excel to both push the computations to the Grid and pull the results off the Grid as appropriate.&lt;br /&gt;Step 4: Share the "dumb client" easily, it's now light-weight&lt;br /&gt;&lt;br /&gt;Many grid vendors - and their Excel customers - are recognizing the business benefits and building solutions that makes Excel-on-the-Grid work better, ideally someday out-of-the-box. Amongst such vendors are Microsoft, GigaSpaces, Data Synapse, and others. We at Grid Dynamics were asked to develop a demo that showed Microsoft Windows Compute Cluster Server 2008 as  a computational engine behind Excel-on-the-Grid, backed by GigaSpaces XAP 6.0 as in-memory data grid.&lt;br /&gt;&lt;br /&gt;The demo works as follows:&lt;br /&gt;&lt;br /&gt;1. Excel sends the job to MS WCCS&lt;br /&gt;2. The job is directed to &amp;amp; executed on the server that runs the right local data cache by XAP&lt;br /&gt;3. Excel gets a notification when the job is finished and displays the results&lt;br /&gt;&lt;br /&gt;The main points of the demo are three-fold:&lt;br /&gt;&lt;br /&gt;a. Excel-on-the-Grid is COOL&lt;br /&gt;b. One can mix-and-match various compute and data grid solutions to achieve business objectives&lt;br /&gt;c. Data-aware job scheduling (which is the ability of job scheduler to co-locate job execution local to the cached data) gives 2x - 3x performance boost to data-hungry jobs. What's especially important to point out is that the job itself is a "black box" to the  grid - no changes are done to the algorithm itself, it may be considered a legacy code for all intents and purposes.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;2. ScaleOut Software&lt;/h3&gt;An interesting discovery of the conference for our team had been a vendor, &lt;a href="http://www.scaleoutsoftware.com/"&gt;ScaleOut Software&lt;/a&gt;, who has a pure .NET implementation of data grid with seemingly good architecture and nice features.&lt;br /&gt;&lt;br /&gt;Traditionally, Java community had been way ahead of the game on Data Grids. GigaSpaces, Tangosol and GemStone - the three big mature commercial players in data grid space - are all Java-based technologies. Yet the applications that require data grid access are a good mix between Java and .NET these days (there is always some C/C++ and other varieties, of course). While these Java technologies are supporting .NET clients in various ways, it makes sense for pureplay .NET caching and data grid solutions to be available on the market as well.&lt;br /&gt;&lt;br /&gt;We weren't aware of such solutions when we ran into ScaleOut Software last week. They offer replicated data grid with a lot of nice features. They don't have two big ticket items that leading Java caching vendors do:  data partitioning and strong out-of-the-box DB persistence. Still, for application session data, and many other stateful application needs in Windows environments, I can see ScaleOut providing a good solution for developers.  We haven't really had a chance to do anything with ScaleOut software yet, but I hope to try it out soon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-6133561742819170602?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/6133561742819170602/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=6133561742819170602" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6133561742819170602?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6133561742819170602?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/LZlp9EXxLd8/notes-from-ms-developer-conference.html" title="Notes from MS Developer Conference for Financial Services" /><author><name>Victoria Livschitz</name><uri>http://www.blogger.com/profile/12264301035182704078</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07316905052902410928" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/03/notes-from-ms-developer-conference.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0YMRXkyeCp7ImA9WxZWEks.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-2211018977758265025</id><published>2008-03-10T04:12:00.000-07:00</published><updated>2008-03-11T11:46:24.790-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-03-11T11:46:24.790-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="filesystems" /><category scheme="http://www.blogger.com/atom/ns#" term="~Kirill Uvaev" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="convergence" /><title>DataGridFS - let your legacy code store data into IMDG</title><content type="html">The one of the most discussed topics on grid computing here in &lt;a href="http://www.griddynamics.com/"&gt;Grid Dynamics&lt;/a&gt; is bringing Computational Grids and In Memory Data Grids together. And DataGridFS concept was born in such discussions.&lt;br /&gt;&lt;br /&gt;First of all, let me describe where it applies most efficiently. There are a lot of systems, which are built around Computational Grid where jobs produce files as result of computations. These files stored to NFS, so they are accessible locally after mounting on every node and client. &lt;a href="http://gridengine.sunsource.net/"&gt;Sun Grid Engine&lt;/a&gt; and &lt;a href="http://www.platform.com/Products/platform-lsf-family/platform-lsf/product"&gt;Platform LSF&lt;/a&gt; are examples of such Computational Grids. Imagine that such system needs enhancement and such enhancement is based on bringing IMDG in this system. So, the codebase will consist of legacy code which saves results to files and modern code which uses IMDG API to get initial values and store results too. More over, these two parts should communicate with each other somehow. For example, results calculated by legacy code are initial values for new codebase jobs. It is clear, that we need the way to make new jobs able to process results stored in files on file system. There are several ways to achieve that. One is to build daemon process which scans directory on filesystem with results, parse them and stores data into IMDG using it's provided API. This approach has few cons. At least you should be able that this process is always alive and so on. The second way which is to use DataGridFS.&lt;br /&gt;&lt;br /&gt;DataGridFS is filesystem which stores files transparently in IMDG itself. So, this filesystem inherits most features of used IMDG. For example, if IMDG allows partitioning then filesystem automatically becomes distributed and so on. So, when job process writes file to filesystem object (or set of objects) appears in IMDG. This object can be accessed using regular IMDG API or DataGridFS API which just "wrappers" for IMDG API calls. "What is about file content?", you may ask, "Is it still to be parsed in object model?" Yes, but this can be done by workers inside IMDG, which is more convenient and flexible way than separate process.&lt;br /&gt;&lt;br /&gt;I've managed to code very simple prototype to illustrate idea. It is just read-only filesystem built using &lt;a href="http://fuse.sourceforge.net/"&gt;FUSE&lt;/a&gt; and &lt;a href="http://sourceforge.net/projects/fuse-j"&gt;FUSE-J&lt;/a&gt;. File data stored with &lt;a href="http://www.gigaspaces.com/pr_xap.html"&gt;GigaSpaces XAP&lt;/a&gt;. On the &lt;a href="http://www.griddynamicsconsulting.com/upload/blog/spacefs-experiment1.png"&gt;screenshot&lt;/a&gt; you can see space content via Space Browser, and the same content via UNIX command line  utils.&lt;br /&gt;&lt;br /&gt;&lt;span style="display: block; text-align: center"&gt;&lt;a href="http://www.griddynamicsconsulting.com/upload/blog/spacefs-experiment1.png" target="_blank"&gt;&lt;img src="http://www.griddynamicsconsulting.com/upload/blog/spacefs-experiment1.png" title="Screenshot" alt="Screenshot" height="350" width="503" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;(Click to enlarge)&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-2211018977758265025?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/2211018977758265025/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=2211018977758265025" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/2211018977758265025?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/2211018977758265025?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/Wldsg5u6WlQ/datagridfs-let-your-legacy-code-store.html" title="DataGridFS - let your legacy code store data into IMDG" /><author><name>Kirill Uvaev</name><uri>http://www.blogger.com/profile/00360201692259660350</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="16926792326225647122" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/03/datagridfs-let-your-legacy-code-store.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0EGQHs7fip7ImA9WxZXFUo.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-8344989838492452867</id><published>2008-03-03T01:15:00.000-08:00</published><updated>2008-03-03T11:13:41.506-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-03-03T11:13:41.506-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="~Kirill Ishanov" /><category scheme="http://www.blogger.com/atom/ns#" term="deployment" /><category scheme="http://www.blogger.com/atom/ns#" term="maven 2" /><category scheme="http://www.blogger.com/atom/ns#" term="gigapult" /><category scheme="http://www.blogger.com/atom/ns#" term="testing" /><category scheme="http://www.blogger.com/atom/ns#" term="convergence" /><title>GigaSpaces Cluster Bootstrapping and Automated Testing</title><content type="html">In projects which use GigaSpaces XAP it's often needed to have a set of functional tests which verify the interaction of your code and GigaSpaces cluster. For example, if an application makes initial data loading it's good to have a test that verifies that the client's code really found a space and loaded all necessary data.&lt;br /&gt;&lt;br /&gt;To implement such tests we need to bootstrap a cluster with an appropriate configuration and to run a set of tests for this cluster. It can be achieved by following the next steps:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;  &lt;li&gt;Use a testing framework which allows to run some set up code before running group of tests (looks like &lt;a href="http://testng.org/doc"&gt;TestNG&lt;/a&gt; is the most appropriate choice for it cause it allows to group tests and has a sexy @BeforeGroups annotation)&lt;/li&gt;&lt;br /&gt;  &lt;li&gt;Put all the code which starts a cluster to the set up method (kinda of prerequisites for tests)&lt;/li&gt;&lt;br /&gt;  &lt;li&gt;Implement a functional test which uses the standard assertion mechanism to verify the cluster's state&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;This approach looks very useful, but the 2nd point is very tricky cause GigaSpaces (5.* and 6.0*) doesn't provide a mechanism, which will allow to start a cluster from code without black magic and dances with tambourine.&lt;br /&gt;&lt;br /&gt;Here is a tricky workaround (which we've used on a &lt;a href="http://www.openspaces.org/display/CVG/Convergence"&gt;Convergence&lt;/a&gt; project for testing purposes). It assumes the usage of &lt;a href="http://maven.apache.org/"&gt;Maven&lt;/a&gt; as a build tool (sorry Maven folks, I know that it's project management framework, but this definition sounds too mysterious :) ) and the project is a multi-module.&lt;br /&gt;&lt;br /&gt;The idea behind it is very simple: to perform a cluster bootstrapping via Maven plugin before testing phase. I haven't found the appropriate plugin in Maven's repository so I wrote my own using Gigapult (check the &lt;a href="http://www.openspaces.org/display/GPT/Maven+Gigapult+Plugin+Usage+Guide"&gt;usage&lt;/a&gt; and &lt;a href="http://www.openspaces.org/display/GPT/Maven+Gigapult+Plugin+Installation+Guide"&gt;installation instructions&lt;/a&gt; on OpenSpaces.org).&lt;br /&gt;&lt;br /&gt;Such approach allows to write the functional tests using a testing framework of your choice without putting cluster configuration and bootstrapping code into the setup methods. Moreover, it simplifies the creation of different build configurations on &lt;a href="http://en.wikipedia.org/wiki/Continuous_integration"&gt;Continuous Integration server&lt;/a&gt;, cause it turns the configuration to the simple enumeration of Maven goals.&lt;br /&gt;&lt;br /&gt;But putting the cluster bootstrapping to the build lifecycle and a functional tests, which can take much more time, then an impatient developer can wait, leads to the skipping of these phases. To avoid it, all time-consuming tests can be put to the separate module and run on demand using Maven's profiles. To create such configuration add a submodule to the Maven project (it can be done using the &lt;a href="http://maven.apache.org/plugins/maven-archetype-plugin/"&gt;Maven Archetype Plugin&lt;/a&gt;) and adding the following section to the parent pom.xml:&lt;br /&gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;&amp;lt;&lt;/font&gt;&lt;font color="#0000ff"&gt;profiles&lt;/font&gt;&lt;font color="#0000ff"&gt;&amp;gt;&lt;/font&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&lt;font color="#0000ff"&gt;&amp;lt;&lt;/font&gt;&lt;font color="#0000ff"&gt;profile&lt;/font&gt;&lt;font color="#0000ff"&gt;&amp;gt;&lt;/font&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;font color="#0000ff"&gt;&amp;lt;&lt;/font&gt;&lt;font color="#0000ff"&gt;id&lt;/font&gt;&lt;font color="#0000ff"&gt;&amp;gt;&lt;/font&gt;functional-tests&lt;font color="#0000ff"&gt;&amp;lt;/id&amp;gt;&lt;/font&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;font color="#0000ff"&gt;&amp;lt;&lt;/font&gt;&lt;font color="#0000ff"&gt;modules&lt;/font&gt;&lt;font color="#0000ff"&gt;&amp;gt;&lt;/font&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;font color="#0000ff"&gt;&amp;lt;&lt;/font&gt;&lt;font color="#0000ff"&gt;module&lt;/font&gt;&lt;font color="#0000ff"&gt;&amp;gt;&lt;/font&gt;functional-tests&lt;font color="#0000ff"&gt;&amp;lt;/module&amp;gt;&lt;/font&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;font color="#0000ff"&gt;&amp;lt;/modules&amp;gt;&lt;/font&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&lt;font color="#0000ff"&gt;&amp;lt;/profile&amp;gt;&lt;/font&gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;&amp;lt;/profiles&amp;gt;&lt;/font&gt;&lt;br /&gt;&lt;br /&gt;After that, Maven will add the sub-module called "functional-tests" only if &lt;span style="font-family: monospace;"&gt;-Pfunctinal-tests&lt;/span&gt; was specified. Assuming that CI server supports Maven now we can specify goals for two build configurations:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;  &lt;li&gt;&lt;span style="font-family: monospace;"&gt;clean install&lt;/span&gt; - for running all unit test and packaging&lt;/li&gt;&lt;br /&gt;  &lt;li&gt;&lt;span style="font-family: monospace;"&gt;gigapult:gsm gigapult:gsc gigapult:pudeploy install -Pfunctional-tests&lt;/span&gt; - for running all unit tests, packaging, bootstrapping cluster and running functional tests (gigapult goals depend on cluster configuration)&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-8344989838492452867?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/8344989838492452867/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=8344989838492452867" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8344989838492452867?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8344989838492452867?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/pzZHhdmX4nM/gigaspaces-cluster-bootstrapping-and.html" title="GigaSpaces Cluster Bootstrapping and Automated Testing" /><author><name>kylichuku</name><uri>http://www.blogger.com/profile/14008911434054389383</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="10293433558291559807" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/02/gigaspaces-cluster-bootstrapping-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0MNSHwzfSp7ImA9WxZXFUg.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-288334895702602849</id><published>2008-02-27T02:39:00.000-08:00</published><updated>2008-03-03T06:44:59.285-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-03-03T06:44:59.285-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="~Victoria Livschitz" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="data synapse" /><category scheme="http://www.blogger.com/atom/ns#" term="convergence" /><title>Mathematics of Grid Computing - 2nd Installment</title><content type="html">I've been thinking some more about Nikita's equations, and it occurred to me to expand the system in the following way:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;EnterpriseGrid.Next&lt;/span&gt; = &lt;span style="color: rgb(51, 102, 255);"&gt;HPC.Next&lt;/span&gt; + &lt;span style="color: rgb(51, 102, 255);"&gt;EnterpriseApp.Next&lt;/span&gt; + &lt;span style="color: rgb(51, 102, 255);"&gt;WebApp.Next&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;HPC.Next&lt;/span&gt; = ComputeGrid + DataGrid + &lt;span style="color: rgb(255, 0, 0);"&gt;GlobalCheckpointing&lt;/span&gt; + &lt;span style="color: rgb(255, 0, 0);"&gt;Data-Aware Scheduling&lt;br /&gt;&lt;span style="color: rgb(0, 0, 0);"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="color: rgb(255, 0, 0);"&gt;&lt;span style="color: rgb(0, 0, 0);"&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;EnterpriseApp.Next&lt;/span&gt; = &lt;span style="color: rgb(51, 102, 255);"&gt;Scalable Stateful Services&lt;/span&gt; + Virtualized Resources&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="color: rgb(255, 0, 0);"&gt;&lt;span style="color: rgb(0, 0, 0);"&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;WebApp.Next&lt;/span&gt; =&lt;span style="color: rgb(51, 102, 255);"&gt; Scalable Stateful&lt;/span&gt; &amp;amp; Stateless &lt;span style="color: rgb(51, 102, 255);"&gt;Services&lt;/span&gt; + Virtualized Resources&lt;br /&gt;&lt;span style="color: rgb(51, 102, 255);"&gt;Scalable Stateful Services&lt;/span&gt; = SOA + &lt;span style="color: rgb(255, 0, 0);"&gt;Data-Aware Routing&lt;/span&gt; + &lt;span style="color: rgb(255, 0, 0);"&gt;Right-Sizable DataGrid&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;p style="line-height: normal; margin-top: 5pt; margin-bottom: 0pt; margin-left: 0.25in; text-indent: -0.25in; text-align: left; direction: ltr; unicode-bidi: embed; vertical-align: baseline; font-weight: bold;"&gt;&lt;br /&gt;&lt;/p&gt;&lt;span style="font-weight: bold;"&gt;Already exist on the market:&lt;/span&gt; concepts-in-black&lt;br /&gt;ComputeGrid&lt;br /&gt;DataGrid&lt;br /&gt;Virtualization&lt;br /&gt;SOA&lt;br /&gt;Ability to scale-out stateless services&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The gap:&lt;/span&gt;&lt;span style="color: rgb(51, 51, 51);"&gt;concepts-in-red&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(255, 0, 0);"&gt;GlobalCheckpointing&lt;/span&gt;: theoretically well understood,  commercial implementations are too few&lt;br /&gt;&lt;span style="color: rgb(255, 0, 0);"&gt;Data-Aware Routing&lt;/span&gt;: supported natively by DataGrid vendors, but not traditional routing infrastructure&lt;br /&gt;&lt;span style="color: rgb(255, 0, 0);"&gt;Data-Aware Scheduling: &lt;span style="color: rgb(51, 51, 51);"&gt;theoretically well-understood, commercial implementation are too few&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(255, 0, 0);"&gt;Right-Sizable DataGrids&lt;/span&gt;: active area of R&amp;amp;D for many vendors&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;These "concepts-in-red" should be of prime importance to the community and the focus of active research, development and commercial implementations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-288334895702602849?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/288334895702602849/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=288334895702602849" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/288334895702602849?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/288334895702602849?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/7s8DjCPois8/mathematics-of-grid-computing-2nd.html" title="Mathematics of Grid Computing - 2nd Installment" /><author><name>Victoria Livschitz</name><uri>http://www.blogger.com/profile/12264301035182704078</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07316905052902410928" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/02/mathematics-of-grid-computing-2nd.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0UHRH49eSp7ImA9WxZQFU0.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-4645754044078857703</id><published>2008-02-19T02:46:00.000-08:00</published><updated>2008-02-20T01:53:55.061-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-02-20T01:53:55.061-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="~Victoria Livschitz" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="data synapse" /><category scheme="http://www.blogger.com/atom/ns#" term="convergence" /><title>Mathematics of Grid Computing</title><content type="html">Today, I came across the recent blog entry by Nikita Ivanov of GridGain: &lt;a href="http://www.jroller.com/nivanov/entry/scale_out_on_grid_data"&gt;Scale Out on Grid = Data Partitioning + Affinity Map/Reduce&lt;/a&gt;. Interesting read that inspired comments I've posted under that blog, copied here:&lt;br /&gt;&lt;blockquote style="background-color: #eee; padding: 1em"&gt;So, Nikita, you propose two equations:&lt;br /&gt;&lt;br /&gt;(1) Grid Computing  = Compute Grid + Data Grid&lt;br /&gt;(2) Scale out on grid = Data Partition + Affinity MapReduce&lt;br /&gt;&lt;br /&gt;Both are interesting and important statements that should be carefully analyzed. The first equation basically comes down to data-aware scheduling which (as you eloquently explain in your last week’s blog) means that the algorithm that assigns jobs to specific compute resources must take into account the distribution of data over the grid and affinity property between the job and the data partitions.&lt;br /&gt;&lt;br /&gt;Reasonable data-aware schedulers are still rare and I am very glad to see GridGains coming up with a commercial implementation. We recently build a demo that measures the performance of a “typical” job with and without data affinity. The performance is affected by the factor of 2x to 3x, simply based on data-aware routing being switched on and off. Clearly, this is very important direction for grid middleware; I am convinced that data-aware, affinity-capable grid middleware will someday become mainstream.&lt;br /&gt;Let us not forget that all this concerns job-centric processing. For throughput computing that operates under a shower of real-time transactions, the equivalent concept to “data-aware scheduling” of jobs would be “data-aware routing” of these transactions. Mainstream Data Grid middleware, like &lt;a href="http://www.gigaspaces.com/"&gt;GigaSpaces&lt;/a&gt; and &lt;a href="http://www.oracle.com/technology/products/coherence/index.html"&gt;Oracle Coherence&lt;/a&gt;, have long been able to handle this scenario. Nowadays, &lt;a href="http://www.gigaspaces.com/"&gt;GigaSpaces&lt;/a&gt; is moving towards support of “data-aware scheduling” through the concept of Processing Unit on the Service Grid.&lt;br /&gt;&lt;br /&gt;Now, your second equation raises a question. Are you talking about scaling out the data grid in a static or dynamics sort of way? In other words, is the objective to allow for “a-priori” arbitrary large number of partitions with scalable MapReduce or to be able to adjust the number of partitions dynamically in response to the sporadic jumps in the payloads across the entire grid fabrics? If it’s the former, then data partitioning with affinity is the traditional answer. If it’s the latter, well, then we need to solve a lot of hard problems for statefull, data-aware services that are outside of your equation. I see dynamic scaling of statefull services as being increasingly important area of research and commercial implementations. Sun’s project &lt;a href="http://hedeby.sunsource.net/"&gt;Hedeby&lt;/a&gt; is an interesting step in this direction.&lt;br /&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-4645754044078857703?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/4645754044078857703/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=4645754044078857703" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4645754044078857703?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4645754044078857703?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/4M_6kCTLIHc/mathematics-of-grid-computing.html" title="Mathematics of Grid Computing" /><author><name>Victoria Livschitz</name><uri>http://www.blogger.com/profile/12264301035182704078</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07316905052902410928" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/02/mathematics-of-grid-computing.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkcMR3c7cSp7ImA9WxZbFk8.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-5551541054998709241</id><published>2008-01-25T01:06:00.000-08:00</published><updated>2008-04-19T09:54:46.909-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-04-19T09:54:46.909-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="~Eugene Steinberg" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="data synapse" /><category scheme="http://www.blogger.com/atom/ns#" term="convergence" /><title>Data aware routing DataSynapse GridServer and GigaSpaces XAP</title><content type="html">&lt;span class="blog-text"&gt;Let's consider a very simple data-intensive trading application that will illustrate the following discussion. Our trade application works with a large dataset of Trade objects &lt;em&gt;{id, bookId, xmlData}&lt;/em&gt;. A unit of work in this application is to evaluate all trades for a given &lt;em&gt;bookId&lt;/em&gt;. For this evaluation, the application code should fetch all the trades by its &lt;em&gt;bookId&lt;/em&gt; from the central data source and then perform some calculations with them. Sounds very simple, right?&lt;br /&gt;&lt;br /&gt;But what if we want to scale this application to huge numbers of trades in the dataset and ensure high throughput of evaluation jobs? Well, we can put our application on the computational grid, spreading our units of work among a large number of computational engines. Systems like &lt;a href="http://www.datasynapse.com/en/products/gridserver.php"&gt;DataSynapse GridServer&lt;/a&gt; allow us to easily scale computation-intensive jobs on the grid, effectively putting the power of hundreds of its engines at our disposal. This solves our problem with raw CPU power to perform calculation over the fetched dataset, but doesn't solve the problem of remote data access, which will finally limit the throughput of our solution.&lt;br /&gt;&lt;br /&gt;No matter how many engines we add to the processing, they all will just wait for data to arrive from the database layer. A mainstream database is bound by the need to search its disk to find the relevant data and time to move a large result set through the pipes. To exit from this dead end we need to distribute the duties of the central database on the grid and minimize expensive disk I/O. So, &lt;em&gt;In-Memory Data Grids (IMDG)&lt;/em&gt;, such as &lt;a href="http://www.gigaspaces.com/pr_xap.html"&gt;GigaSpaces XAP&lt;/a&gt; or &lt;a href="http://www.oracle.com/technology/products/coherence/index.html"&gt;Oracle Coherence&lt;/a&gt;, come to help. They both provide partitioned, replicated, transactional, persistent application memory. We can manage really big datasets in a really timely fashion with those products. Comparing those excellent products is out the scope of this posting, so let's pick GigaSpaces for the sake of example.&lt;br /&gt;&lt;br /&gt;We can load all our trade objects into the partitioned clustered space of GigaSpaces XAP and thus distribute data layer workload between a number of hosts. Our book evaluation task on the DS Engine node will use the GigaSpaces API to fetch all the trades by bookId from the clustered space. Thus we will get a performance gain by removing the data access bottleneck caused by a single-host disk I/O-based database. It's good, but can we do better? Read further ...&lt;br /&gt;&lt;br /&gt;If trades are randomly partitioned on the cluster, getting all trades by bookId will require the GigaSpaces API to contact EVERY cluster partition in parallel to fetch all relevant trades. This adds synchronization overhead and eats network bandwidth. What we can do remove this overhead? We can introduce &lt;em&gt;data affinity&lt;/em&gt; by ensuring that GigaSpaces routes trades to partitions according to their bookId. This way, all trades with same bookId will be &lt;em&gt;collocated within one partition&lt;/em&gt;. The GigaSpaces API will explore data affinity and will contact only ONE node within the cluster to fetch all the relevant trades at once. This significantly improves the scalability of our solution and saves cluster bandwidth. That's cool, but can we do even better? Yes, we can.&lt;br /&gt;&lt;br /&gt;With data affinity, our book evaluation task runs on DS engines and, using the GigaSpaces API, contacts a single node on the network to fetch many trades. This bulk network fetching consumes network bandwidth and may limit our throughput if we do many such fetches at once. However, we can place the IMDG cluster on the same nodes as the computational grid and employ &lt;em&gt;data aware routing&lt;/em&gt;, e.g., ensure that tasks requiring access to particular data will be scheduled to the node that runs the IMDG partition with /relevant/ data. In our example we need to make sure that the task to evaluate a particular bookId will be scheduled on the node running the GigaSpaces partition that contains all the trades with that bookId. Data aware scheduling makes data fetching operations &lt;em&gt;local&lt;/em&gt; in the sense that data is fetched via loopback, not going through the network switches. This is not only faster, but also saves cluster network bandwidth.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Data aware routing demo&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;We have put together a simple demo that illustrates the value of data aware routing and the performance gains it offers. The demo is based on the trading application example we just considered and runs a constant flow of book evaluation jobs.&lt;br /&gt;&lt;br /&gt;100K trade objects are loaded into a partitioned data grid. Those trades belong to 10 different equal-size books, e.g., each book has 10K trades. The job here is to evaluate all 10 books on the DataSynapse GridServer cluster in parallel. This means that a job consists of 10 tasks and each task should fetch 10K trades from the clustered space and perform some calculation over them. For simplicity, we just sum the ids of the trades and return the result.&lt;br /&gt;&lt;br /&gt;The demo runs a constant flow of such jobs and constantly measures job completion latency. To illustrate data aware routing advantages we introduced 3 different task scheduling modes. &lt;em&gt;“Data aware”&lt;/em&gt; mode is a mode where we perform data aware routing, ensuring local space access for the engine. &lt;em&gt;“Neutral”&lt;/em&gt; mode is a mode where we do unguided DS scheduling as GridServer sees fit. &lt;em&gt;“Anti data aware”&lt;/em&gt; mode is a mode where we deliberately violate data awareness and ensure network space access. The user can change the task scheduling mode on the fly and see the impact of the scheduling mode on performance.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://www.griddynamicsconsulting.com/upload/blog/15c/15c5a86c7345f6eb4382940549b1f29a.png" title="data aware task routing" alt="" width="600" /&gt;&lt;br /&gt;&lt;br /&gt;The chart on the right side of the demo screen shows latencies of recently submitted jobs. Bars marked with "+" show the latency of jobs submitted in data aware mode, bars marked with "*" show neutral mode, and "-" show anti data aware mode. So, for these kind of jobs we can see 3x performance gain when using data aware routing over anti data aware routing. Neutral routing performance is unstable due to the fact that some tasks can go to the right node due to random scheduling. The probability of that, however, gets negligible when the number of nodes grows.&lt;br /&gt;&lt;br /&gt;This 3x performance gain is in good correspondence with standard remote vs local space &lt;a href="http://www.gigaspaces.com/os_benchmarks.html#rem_tran"&gt;performance benchmark&lt;/a&gt; from GigaSpaces site.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What's under the hood?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;At the heart of the data aware routing implementation there are 2 components: &lt;em&gt;Monitor&lt;/em&gt; and &lt;em&gt;DataAwareService&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Monitor server&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Monitor performs continuous monitoring of the GigaSpaces XAP cluster and gathers information about clustered spaces that are deployed on the grid. Monitor “knows” on which host each GS Partition is located, e.g., knows the cluster topology.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;DataAwareService&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;DataAwareService is a thin client-side wrapper over the DataSynapse Service with a similar interface. DataAwareService is responsible for providing the correct GridServer Condition, which helps to route tasks to the proper Engines. It uses Monitor to get the host IP address running the relevant space partition. Client code just supplies space name and routing key on service invocation and the task gets routed to the proper Engine.&lt;br /&gt;&lt;br /&gt;Below is a diagram of the individual task routing workflow:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_Pr6lQFN_pUg/SAojXfb5VzI/AAAAAAAAAAg/aux7iJOiVHg/s1600-h/ba296384ad47912f4400d1c357e6f88e.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://bp2.blogger.com/_Pr6lQFN_pUg/SAojXfb5VzI/AAAAAAAAAAg/aux7iJOiVHg/s320/ba296384ad47912f4400d1c357e6f88e.png" alt="" id="BLOGGER_PHOTO_ID_5191000406980384562" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;1. Demo backend invokes DataAwareService, providing space name and routing key.&lt;br /&gt;2. DataAwareService queries Monitor for advice about where to route task with given space name and routing key.&lt;br /&gt;3. Monitor gives advice as an IP address of host running relevant GS partition.&lt;br /&gt;4. DataAwareService creates a Condition for GridServer to run a task on a given node and schedules it.&lt;br /&gt;5. GridServer invokes the task on one of the engines on the proper host.&lt;br /&gt;6. Task connects to GigaSpaces partition and fetches the data.&lt;br /&gt;7-10. Engines process the data and return results back.&lt;/ol&gt;&lt;br /&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;br /&gt;Data aware routing offers a good way to increase performance in a computational grid/IMDG combined environment. The routing scenario described here can be applied not only to a GigaSpaces + DataSynapse combination, but basically to any IMDG that supports data affinity (like Coherence) and to any grid engine that supports conditional routing (like SGE, LSF, etc.)&lt;br /&gt;&lt;br /&gt;We are going to put our demo on the web, so it will be available online soon.&lt;br /&gt;Stay tuned!&lt;br /&gt;&lt;br /&gt;References:&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.gigaspaces.com/os_papers.html#dataAwarenessAndLowLatencyOnTheEG"&gt;Data awareness and Low Latency on The EG&lt;/a&gt; by Nati Shalom, CTO at GigaSpaces&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://www.griddynamicsconsulting.com/blog/files/Convergence-VictoriaLivschitz.ppt"&gt;Bridging The Paradigms: Convergence of Compute Grids with In-Memory Data Grids&lt;/a&gt; by Victoria Livschitz, CEO at GridDynamics &lt;/li&gt;&lt;/ul&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-5551541054998709241?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/5551541054998709241/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=5551541054998709241" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5551541054998709241?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5551541054998709241?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/ykFi0CPDb04/data-aware-routing-datasynapse_14.html" title="Data aware routing DataSynapse GridServer and GigaSpaces XAP" /><author><name>Eugene</name><uri>http://www.blogger.com/profile/06887960894636399710</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="09277193071639676635" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://bp2.blogger.com/_Pr6lQFN_pUg/SAojXfb5VzI/AAAAAAAAAAg/aux7iJOiVHg/s72-c/ba296384ad47912f4400d1c357e6f88e.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/02/data-aware-routing-datasynapse_14.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0YCSXk_eSp7ImA9WxZRGUU.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-4317727710212541770</id><published>2007-12-05T03:31:00.000-08:00</published><updated>2008-02-14T03:39:28.741-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-02-14T03:39:28.741-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="~Eugene Steinberg" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><category scheme="http://www.blogger.com/atom/ns#" term="convergence" /><title>Bridging The Paradigms: Convergence of Compute Grids with In-Memory Data Grids</title><content type="html">&lt;span class="blog-text"&gt;                  Victoria Livschitz, Founder and CEO of GridDynamics Consulting,&lt;br /&gt;gave a presentation on bridging together two most successful cluster paradigms: ComputeGrids and In-Memory Data Grids on IGT 2007 Conference in Israel.&lt;br /&gt;&lt;br /&gt;As compute grids are becoming more wide spread in commercial data centers, the bottlenecks in application performance move from raw processing to searching, storing and retrieving the data. In-Memory Data Grid (IMDG) technology solve this fundamental problem by acting as super-efficient application accelerator, taking advantage of unused resources readily available on the grid – disk, memory, IO – to put the data in memory of the same computer that performs the calculations. The talk will explore how IMDG can be easily integrated with existing enterprise grids to create data-aware grid applications and provide application performance acceleration while improving application scalability and reliability.&lt;br /&gt;&lt;br /&gt;You can download presentation here:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.griddynamicsconsulting.com/blog/files/Convergence-VictoriaLivschitz.ppt" target="_blank"&gt;Bridging The Paradigms: Convergence of Compute Grids with In-Memory Data Grids&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Enjoy!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-4317727710212541770?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/4317727710212541770/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=4317727710212541770" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4317727710212541770?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4317727710212541770?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/qov1ti72iC0/bridging-paradigms-convergence-of.html" title="Bridging The Paradigms: Convergence of Compute Grids with In-Memory Data Grids" /><author><name>Eugene</name><uri>http://www.blogger.com/profile/06887960894636399710</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="09277193071639676635" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/02/bridging-paradigms-convergence-of.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUUMSHc_fyp7ImA9WxZQFU0.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-4381976090792011247</id><published>2007-11-30T03:22:00.000-08:00</published><updated>2008-02-20T02:28:09.947-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-02-20T02:28:09.947-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="~Kirill Ishanov" /><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="deployment" /><category scheme="http://www.blogger.com/atom/ns#" term="gigapult" /><title>Gigapult project</title><content type="html">&lt;span class="blog-text"&gt;&lt;b&gt;Update&lt;/b&gt;&lt;br /&gt;I've finally moved Gigapult project to the OpenSpaces, so now it can be found &lt;a href="http://www.openspaces.org/display/GPT/Gigapult"&gt;here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Gigaspaces products (such as &lt;a href="http://gigaspaces.com/pr_xap.html"&gt;GigaSpaces eXtreme Application Platform&lt;/a&gt;) are shipped with a set of scripts for running them under different platforms. These scripts are good enough for running containers and services with the default configuration, but if you need more advanced control over bootstrapping process, you'll have to spend some time setting up different configuration parameters. These parameters can be a part of space url, special environment variables, Java VM options, etc.&lt;br /&gt;&lt;br /&gt;The standard way to override these parameters or provide some additional ones is to create some additional scripts with all needed stuff. For example, this &lt;a href="http://pastie.caboo.se/152585"&gt;chunk of code&lt;/a&gt; was taken from examples of replicated data grid.&lt;br /&gt;&lt;br /&gt;Let's see the disadvantages of this approach.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.griddynamicsconsulting.com/upload/blog/a1e/a1ef1d822299c4b6feb1747fc77441f0.jpg"&gt;&lt;img src="http://www.griddynamicsconsulting.com/upload/blog/a1e/a1ef1d822299c4b6feb1747fc77441f0.jpg" width="250" style="float: right;margin:1em"/&gt;&lt;/a&gt;The bin directory in GigaSpaces distribution contains a set of system-specific scripts (shell scripts for Unix and batch scripts for Windows). These scripts provide the same functionality, so if you need to create an application and then support both operating systems, you'll have to write two different versions of scripts: one for the Unix system - the other for Windows. As a result, when configuration logic becomes more complex, the support and maintenance of both scripts' versions becomes a hell, and the administrator of the end system feels like this guy from Eiffel Tower maintenance squad.&lt;br /&gt;&lt;br /&gt;Another problem touches shell scripts only. Shell scripting language is pretty old and on different clons of Unix operating systems there are different dialects of it. On most of the linux distributives there is a bash pre-installed, on Sun Solaris there is a ksh, etc. To work properly with all these dialects shell scripts should be written in pure old shell, and it's a pain, cause it has lack of language tools to manage some advanced tasks. The fantom menace here is that on different platforms some tricky shell functions work differently. For example, toNative function from gs.sh (which is written with bash) produces different results on bash under the cygwin and linux and the ksh under the solaris. So, to test compatibility of these scripts on different platforms you'll need to test them manually on every platform. Of course, there are special parameters for each shell to check syntax of the script, but it doesn't validate the semantics. It leads to incompatibility of scripts. And because of incompatibility it is hard to migrate with the same configuration to different platforms, cause it can lead to some surprising bugs.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center"&gt;&lt;a href="http://www.griddynamicsconsulting.com/upload/blog/bbd/bbd94e1debc3e805a8bd393ed2c3f4e4.jpg"&gt;&lt;img src="http://www.griddynamicsconsulting.com/upload/blog/bbd/bbd94e1debc3e805a8bd393ed2c3f4e4.jpg" width="300"/&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The third major problem with GigaSpaces configuration process is that there are different ways to configure things. «Where should I place this value?», «In what xml file should I modify the value of this attribute?», etc. There is no unified way to configure all these. By the way, all of the configuration information can be represented as a set of key-value pairs, why do I need to modify them in different places? It looks like the configuration zoo, where different animals from different ecosystems live together on a small area.&lt;br /&gt;&lt;br /&gt;Yet another problem is that there are no any configuration validation tools, which will say that all properties were set up correctly, there is no typos, incompatible types of values (for example, when the value of the property should be a decimal number but the string value was provided). Sure, there are schema validation tools for XML files, but what about space url? The only way to get the error message is to run the configuration and look throw the log files with java stack trace messages to find out, where the error appeared. The scripts are silent and cannot notify about a mistake before this mistake causes problems.&lt;br /&gt;&lt;br /&gt;The last problem may sound like a caprice, but shell and batch are almost unreadable. As a developer, I just want to configure my cluster, but not to deal with the reverse engineering instead of concentrating on the domain.I had the feeling that I'm reading hieroglyphs when I saw shell scripts for the first time.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.griddynamicsconsulting.com/upload/blog/2fe/2fef0cae40196af9017e8b6ef0a22e57.jpg"&gt;&lt;img src="http://www.griddynamicsconsulting.com/upload/blog/2fe/2fef0cae40196af9017e8b6ef0a22e57.jpg" width="250" style="float: left;margin:1em" /&gt;&lt;/a&gt; That's why the &lt;a href="http://www.griddynamicsconsulting.com/opensource/gigapult.php" &gt;Gigapult project&lt;/a&gt; appeared.&lt;br /&gt;&lt;br /&gt;The main goal of it is to simplify the GigaSpaces' configuration and bootstrapping process and make configuration files more maintainable and readable.&lt;br /&gt;&lt;br /&gt;To do it, we should use the Force - the JDK force :) GigaSpaces is written in Java, so the first requirement for running it is the JDK installed. The JDK provides a great cross-platform API for building the applications so we can delegate system-specific tasks like choosing the correct classpath separator char. So, JDK can become such Force. As a result, we don't need to implement 2 versions of scripts. But Java itself is the compiling language, so it is not very handy for scripting purposes. As the real Jedys we need something interpreted. Fortunately, there is a nice solution: there are plenty of dynamic interpreted languages written in 100% pure Java. For example, &lt;a href="http://www.jython.org/"&gt;Jython&lt;/a&gt; - an implementation of Python programming language, &lt;a href="http://jruby.codehaus.org/"&gt;JRuby&lt;/a&gt; - an implementation of Ruby programming language and &lt;a href="http://groovy.codehaus.org/" &gt;Groovy&lt;/a&gt; - very young but ambitious programming language with some interesting features.&lt;br /&gt;&lt;br /&gt;OK, we can simply rewrite existing scripts, but what about simplicity, readability and intelligibility? Of course, the python or ruby code is more readable and clear, but it is still the code written on general-purpose programming language with it's own syntax and semantics. When we're exporting a LOOKUPGROUPS variable we think about assigning the value to the VM option, not about the configuring cluster. But we don't want to override the value of the variable, we want to configure the container and the space. The difference is almost imperceptible for the programmers, but for the fellow from support it makes sense.&lt;br /&gt;&lt;br /&gt;There is a &lt;a href="http://en.wikipedia.org/wiki/Sapir-Whorf_hypothesis" &gt;Sapir-Whorf hypothesis&lt;/a&gt; which describes this idea. Following this hypothesis we need to create a language to describe the configuration. Such language is called Domain Specific Language (or DSL) and it has been a hot topic in a software development for the recent years. There is &lt;a href="http://www.infoq.com/presentations/domain-specific-languages" target="_blank"&gt;a great presentation&lt;/a&gt; on this topic from ThoughtWorks and their chief scientist Martin Fowler.&lt;br /&gt;&lt;br /&gt;So the sample configuration file can be found &lt;a href="http://pastie.caboo.se/152121"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The project now in early beta phase and will be available as open-source project (will be distributed under &lt;a href="http://www.apache.org/licenses/LICENSE-2.0" &gt;Apache 2.0 License&lt;/a&gt;).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-4381976090792011247?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/4381976090792011247/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=4381976090792011247" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4381976090792011247?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4381976090792011247?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/veGUUafE52g/updated-ive-finally-moved-gigapult.html" title="Gigapult project" /><author><name>kylichuku</name><uri>http://www.blogger.com/profile/14008911434054389383</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="10293433558291559807" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2008/02/updated-ive-finally-moved-gigapult.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C04NQns8eyp7ImA9WxdTEE0.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-1685765388111878351</id><published>2007-11-21T04:04:00.000-08:00</published><updated>2008-05-05T08:39:53.573-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-05-05T08:39:53.573-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="gigaspaces" /><category scheme="http://www.blogger.com/atom/ns#" term="maven 2" /><category scheme="http://www.blogger.com/atom/ns#" term="~Victor Samoylov" /><title>GigaSpaces Maven Archetype</title><content type="html">&lt;span class="blog-text"&gt; As a consulting company focused on grid solutions we often create, modify and build small and big projects based on different grid middleware technologies.&lt;br /&gt;One of most popular technologies we use is &lt;a href="http://www.gigaspaces.com/"&gt;GigaSpaces&lt;/a&gt;.&lt;br /&gt;In day-to-day work it is often needed to quickly create IDE project with well-defined structure and all required GigaSpaces libs configured. Also it's often a case that different versions of GigaSpaces platform should be used for testing purposes. This can quickly become boring and annoying process. Here I will describe how to automate it using Maven 2 build tool.&lt;br /&gt;&lt;br /&gt;There is a good Maven 2 feature which allow to create your own structure of project and use it in the beginning of project. This is a &lt;em&gt;Maven archetype&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;The simplest project that works with GigaSpaces require only 3 jars from GigaSpaces XAP product: &lt;code&gt;JSpaces.jar&lt;/code&gt;, &lt;code&gt;jsk-lib.jar&lt;/code&gt; and &lt;code&gt;jsk-platform.jar&lt;/code&gt;. What we usually do to start project? Run the IDE, create new project, attach 3 libraries to project, write the first class with main method. Boring!&lt;br /&gt;&lt;br /&gt;To ease our life, we developed a GigaSpaces Maven archetype which you can download &lt;a href="http://www.griddynamicsconsulting.com/blog/files/gigaspaces-simple-archetype-6.0.2.zip"&gt;here&lt;/a&gt; and use right away.&lt;br /&gt;&lt;br /&gt;Steps to install gigaspaces-simple-archetype:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Download &lt;a href="http://www.griddynamicsconsulting.com/blog/files/gigaspaces-simple-archetype-6.0.2.zip"&gt;gigaspaces-simple-archetype-6.0.2.zip&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Unpack archive.&lt;/li&gt;&lt;li&gt;Run "mvn install" command from the folder with pom.xml file.&lt;/li&gt;&lt;li&gt;Check that installation succeeded (you will have "BUILD SUCCESSFUL" message on console). Now simple GigaSpaces archetype is in your local Maven repository.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Archetype which you just installed allows to create simple project structure for GigaSpaces XAP 6.0.2 (build 2002).&lt;br /&gt;&lt;br /&gt;The only thing you need to create GigaSpaces project is to run the following command (you can also find create-project script inside bin directory in downloaded archive):&lt;br /&gt;&lt;br /&gt;&lt;code&gt;mvn archetype:create -DarchetypeGroupId=com.gigaspaces&lt;br /&gt;-DarchetypeArtifactId=gigaspaces-simple-archetype -DarchetypeVersion=6.0.2&lt;br /&gt;-DgroupId=com.yourcompanyhere -DartifactId=your-project-name-here&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Bingo! You have the GigaSpaces project. You could modify its properties in pom.xml file.&lt;br /&gt;&lt;br /&gt;To compile or package the project use "mvn compile" and "mvn package"&lt;br /&gt;commands correspondingly.&lt;br /&gt;&lt;br /&gt;I'm sure you will see messages about 3 missing artifacts, it's OK. It's about our required jars from GigaSpaces, so we need to install them in local maven repository or your company dedicated maven repository.&lt;br /&gt;&lt;br /&gt;You should download GigaSpaces XAP 6.0.2 (build 2002) distributive and install its artifacts into local maven repository. Following commands will help you:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;mvn install:install-file -DgroupId=com.GigaSpaces -DartifactId=jspaces -Dversion=6.0.2-2002 -Dpackaging=jar -Dfile=JSpaces.jar&lt;br /&gt;&lt;br /&gt;mvn install:install-file -DgroupId=com.GigaSpaces.jini -DartifactId=jsk-lib -Dversion=6.0.2-2002 -Dpackaging=jar -Dfile=jsk-lib.jar&lt;br /&gt;&lt;br /&gt;mvn install:install-file -DgroupId=com.GigaSpaces.jini -DartifactId=jsk-platform -Dversion=6.0.2-2002 -Dpackaging=jar -Dfile=jsk-platform.jar&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;After that project is ready-to-use. Popular IDEs (like Eclipse, NetBeans, IDEA) natively support Maven 2 today. You could also automatically generate project for your IDE:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Eclipse&lt;/b&gt;: &lt;code&gt;mvn eclipse:clean eclipse:eclipse&lt;/code&gt;&lt;br /&gt;&lt;b&gt;IDEA&lt;/b&gt;: &lt;code&gt;mvn idea:clean idea:idea&lt;/code&gt;&lt;br /&gt;or even create &lt;b&gt;Ant&lt;/b&gt; build script: &lt;code&gt;mvn ant:clean ant:ant&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Enjoy the power of Maven archetypes!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;References and used products&lt;/b&gt;:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Maven 2 (version 2.0.7)&lt;/em&gt;&lt;br /&gt;&lt;a href="http://maven.apache.org/"&gt;http://maven.apache.org/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;GigaSpaces XAP 6.0.2 (build 2002)&lt;/em&gt;&lt;br /&gt;&lt;a href="http://www.gigaspaces.com/"&gt;http://www.gigaspaces.com/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;GigaSpaces eXtreme Application Platform (XAP) redefines scalability with a holistic Space-based Architecture (SBA) approach that provides a single platform for managing the data, messaging, and business logic associated with a business transaction.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-1685765388111878351?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/1685765388111878351/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=1685765388111878351" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1685765388111878351?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1685765388111878351?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/Y2vvn_aOL4w/gigaspaces-maven-archetype.html" title="GigaSpaces Maven Archetype" /><author><name>dynamic</name><uri>http://www.blogger.com/profile/16138478848927115670</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="02613711487131723820" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2007/11/gigaspaces-maven-archetype.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEcFQXc-cSp7ImA9WxZQEEo.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-4764323386908869980</id><published>2007-10-31T17:08:00.000-07:00</published><updated>2008-02-15T02:40:10.959-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-02-15T02:40:10.959-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="semantic web" /><category scheme="http://www.blogger.com/atom/ns#" term="~Stan Klimoff" /><category scheme="http://www.blogger.com/atom/ns#" term="management" /><title>Semantic grids</title><content type="html">&lt;span class="blog-text"&gt;Howdy, I'm Stan, with few quick words about the present and future of grid management.&lt;br /&gt;&lt;br /&gt;First, why do we need grid management? Well, the answer is simple: we need it essentially due to the same case we need any kind of management. When the grid is big, you can manage it as a whole. When the grid is too big, you can delegate the management to a small set of nodes. When you have several different grids, each of those too big, you're in trouble.&lt;br /&gt;&lt;br /&gt;Well, you're not in big trouble if all of your grids are homogeneous and are not partitioned vertically, so the management issues involve mostly scaling. But what if each of your grids serves a different purpose, managed by own subsystem, and, what's most painful, provided by different vendors with little or no integration points at all?&lt;br /&gt;&lt;br /&gt;One of the possible solutions is to build a layer on top of these service subsystems of each grid (Yes, every problem in software engineering can be solved by introducing another level of indirection :) ). But what's important – this level can be totally different from everything that lies beneath it. This level can be built to manipulate terms like “business processes”, “patterns”, “policies”, “configuration” and that sort of high-level stuff.&lt;br /&gt;&lt;br /&gt;But that's not new. It's natural for the level sitting on top to operate by metaphors of higher levels than the lower ones. How one could integrate this level with what's underneath?&lt;br /&gt;&lt;br /&gt;The idea we're dealing with in present time is to augment the service layers of the different grids with a common interface. It's not clear, however, how common the interface should be? Isn't it another “perfect protocol” no one bothers to support?&lt;br /&gt;&lt;br /&gt;Well, yes and no. It's a “perfect meta-protocol”, actually. It's came from W3C and is called RDF. I'm not going to dive into details here, there's enough information about it on the net. Those who's not familiar with it concepts can think about it as a super-XML that allows us to embed semantics in the document, not only the markup. (Yeah, I know that's not correct… technically.)&lt;br /&gt;&lt;br /&gt;So what do we need from a service layer of a grid to include it in our “semantic layer”? Hardly anything – to format the data it can provide in the RDF way. Relational databases, REST and SOAP services, configuration files – all of that can be augmented to present RDF in mostly no time.&lt;br /&gt;&lt;br /&gt;That semantic meta-layer becomes a grid itself. It has access to all of the information required to operate a grid, but has no idea what to do with it.&lt;br /&gt;&lt;br /&gt;Here's when more high-level protocols come into play. The semantic cloud is about to learn what “restart” is, how a given service is “started”, and what it means for the process to be “frozen”. Level of detail, you chose it. The more detailed it is, the more powerful your semantic cloud is. Just don't allow it to attempt world domination :).&lt;br /&gt;&lt;br /&gt;Well, that's a large set of data to input. We want our semantic cloud to become our knowledge repository, to keep formalized knowledge of everything implicit in the heads of operators. Fortunately, we don't need to put every bit of that knowledge into the cloud. We can provide it with explicit knowledge, letting it to infer implicit knowledge itself – and correct it if it's wrong.&lt;br /&gt;&lt;br /&gt;(It may sound like the neural networks or genetic approach, but it's not. It's about description logic, actually.)&lt;br /&gt;&lt;br /&gt;Now your semantic cloud knows about every bit of your system. It can trace every job coming in and out your cluster, has the possibility to optimize the load balancing based on the data it's receiving in realtime, can bring in its perfect copy in no time in case a failure a detected, answers your questions and poses its own :). What's next?&lt;br /&gt;&lt;br /&gt;Well… two ways here. One way is down the stack, replacing some of the high-level service layers of the grid with its own parts. The second way is upwards, sharing some of the data with other peer clouds, some day forming The Grid, which can redefine the grid computing the same way The Internet redefined the bulletin boards.&lt;br /&gt;&lt;br /&gt;Hope you've enjoyed this essay. I'll continue later, focusing on approaches and technologies we've selected for our semantic affairs. &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-4764323386908869980?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/4764323386908869980/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=4764323386908869980" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4764323386908869980?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4764323386908869980?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/KuYIQIq6a3s/semantic-grids.html" title="Semantic grids" /><author><name>Stan Klimoff</name><uri>http://www.blogger.com/profile/14586894425559474017</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="17889250551283834306" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2007/10/semantic-grids.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkYGRXk8eip7ImA9WxZRGUU.&quot;"><id>tag:blogger.com,1999:blog-3946011063058389308.post-8006781517160402651</id><published>2007-10-23T05:22:00.000-07:00</published><updated>2008-02-14T03:22:04.772-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2008-02-14T03:22:04.772-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="~Eugene Steinberg" /><category scheme="http://www.blogger.com/atom/ns#" term="grid computing" /><title>Fura, open source grid computing middleware – One-day impression</title><content type="html">&lt;span class="blog-text"&gt;                  Recently I came across one particularly interesting grid technology:&lt;br /&gt;&lt;a href="http://sourceforge.net/projects/fura/" target="_blank"&gt;Fura&lt;/a&gt;. This is an open source product of &lt;a href="http://www.gridsystems.com/" target="_blank"&gt;GridSystems&lt;/a&gt;, a company based in Spain…&lt;br /&gt;&lt;br /&gt;I have had a chance to play with it for a day.&lt;br /&gt;&lt;br /&gt;Conceptually, Fura is a collection of collaborating web services based on top of a WSDL/SOAP technology stack, and all its architecture follows the Web Services/SOA paradigm. This approach offers apparent benefits, such as easy standards-based integration—SOAP is well-known, and in addition, a C++/Java SDK is shipped with the product. The only obvious disadvantage of using SOAP for all communications is that a TCP+HTTP+XML+SOAP communication stack introduces unavoidable ~40 ms communication overhead, which makes the system inappropriate for extreme transaction processing scenarios. However, if the system is used for processing long jobs (from seconds to hours), SOAP communication costs may become negligible.&lt;br /&gt;&lt;br /&gt;Installation of Fura went smoothly using its friendly text-mode installer. A GUI installer is available on platforms that support it. The only disadvantage in Fura deployment I noticed is that all Fura services currently use static IP to communicate to each other, which is not very convenient in many environments. Anyway, Fura developers are going to support DNS names in the next minor release.&lt;br /&gt;&lt;br /&gt;Fura features a very nice and modern-looking web-based GUI, which allows you to inspect and change all needed system parameters, submit jobs and access a Virtual File System—another useful Fura feature. The Virtual File System is basically a folder hierarchy on the master machine in the Fura cluster that is exposed via webservices to other cluster machines. VFS has full ACL support and can be used as a file sharing service by job processing agents with performance similar to FTP.&lt;br /&gt;&lt;br /&gt;Each slave host in the Fura grid runs one or several lightweight agents; each agent can do one job at a time. The number of agents defaults to the number of CPUs. Agents report predefined set of attributes, such as CPU utilization and memory consumption, to the master scheduler service, which uses this information to schedule tasks fairly. Custom attributes are supported via assignment of keywords to hosts; e.g., some of the hosts can be tagged as “fast” or “excel” to indicate an Excel installation. Then, resource requirements can be added to the job description and the scheduler will use substring matching to find an appropriate host. Unfortunately, this model doesn't support custom attributes that require numeric matching; for example, size of the temporary space or number of logged in users.&lt;br /&gt;&lt;br /&gt;Fura has a well defined model for creating batch jobs, which offers several kinds of iterators that can iterate over indexes and filesets, thus providing an excellent integration framework for grid-enabling legacy applications. Also, with application packages stored on VFS, Fura is able to provision required software to slave hosts before running the job. The execution subsystem offers convenient support for grabbing task output and error streams, as well as other result files, and moving it to VFS to be accessible for the job submitter.&lt;br /&gt;&lt;br /&gt;Overall, Fura makes a good first impression. It has a well defined architecture, API and a nice web GUI. The main advantage of this system seems to be its integration capabilities, due to SOAP based architecture. Fura makes it very easy to grid-enable legacy application and looks like it is capable of supporting the computational grid needs of small-to-medium enterprises. It seems that for high-end job and transaction processing, however, the benefits of its webservices architecture can quickly turn into problems with performance and scalability.&lt;br /&gt;&lt;br /&gt;Anyway, good job, GridSystems!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3946011063058389308-8006781517160402651?l=blog.griddynamics.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog.griddynamics.com/feeds/8006781517160402651/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=8006781517160402651" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8006781517160402651?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8006781517160402651?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/pYF3z5ZpKx8/fura-open-source-grid-computing.html" title="Fura, open source grid computing middleware – One-day impression" /><author><name>Eugene</name><uri>http://www.blogger.com/profile/06887960894636399710</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="09277193071639676635" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://blog.griddynamics.com/2007/10/fura-open-source-grid-computing.html</feedburner:origLink></entry></feed>
