<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>business|bytes|genes|molecules</title>
	
	<link>http://mndoci.com/blog</link>
	<description>ruminations on the social and commercial potential of biotech, nanotech, infotech and computing</description>
	<lastBuildDate>Fri, 26 Jun 2009 14:32:40 +0000</lastBuildDate>
	
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<media:thumbnail url="http://mndoci.com/blog/images/hiv.jpg" /><geo:lat>47.618371</geo:lat><geo:long>-122.203083</geo:long><creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</creativeCommons:license><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/mndoci" type="application/rss+xml" /><feedburner:emailServiceId>mndoci</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><feedburner:browserFriendly>This is an XML content feed. It is intended to be viewed in a newsreader or syndicated to another site, subject to copyright and fair use.</feedburner:browserFriendly><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
		<title>The big machines</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/Jum_7KEqAsY/</link>
		<comments>http://mndoci.com/blog/2009/06/24/the-big-machines/#comments</comments>
		<pubDate>Thu, 25 Jun 2009 06:09:08 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[BioIT]]></category>

		<guid isPermaLink="false">http://mndoci.com/blog/?p=1272</guid>
		<description><![CDATA[



Image via Wikipedia



So the latest Top 500 list is out, so why doesn&#8217;t it excite me as much as it used to.  Well partly cause many of those machines are not easily accessible, while other computing resources are within reach.  Perhaps partly because for a lot of the work I am interested in [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img">
<div>
<dl class="wp-caption alignleft">
<dt class="wp-caption-dt"><a href="http://en.wikipedia.org/wiki/Image:Top500_logo.png"><img src="http://upload.wikimedia.org/wikipedia/en/b/be/Top500_logo.png" alt="TOP500" title="TOP500" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution">Image via <a href="http://en.wikipedia.org/wiki/Image:Top500_logo.png">Wikipedia</a></dd>
</dl>
</div>
</div>
<p>So the latest <a href="http://www.top500.org/" title="Home | TOP500 Supercomputing Sites">Top 500</a> list is out, so why doesn&#8217;t it excite me as much as it used to.  Well partly cause many of those machines are not easily accessible, while other computing resources are within reach.  Perhaps partly because for a lot of the work I am interested in doing, you don&#8217;t really need a machine in the top 500.  Of course, having access to a machine there allows you to address some problems you couldn&#8217;t any other way, and IMO they should only be used for such problems.  </p>
<p>One of the better posts about this years list comes from <a href="http://twitter.com/chris_p_intel" title="Chris Peters on Twitter">Chris Peters</a> at <a class="zem_slink" href="http://www.wikinvest.com/stock/Intel_%28INTC%29" title="Intel (INTC)" rel="wikinvest">Intel</a>.  He presents a <a href="http://communities.intel.com/community/openportit/server/blog/2009/06/23/the-top-500-list-a-different-perspective;jsessionid=2622324D57973C6A2C55A270259C057E" title="The Server Room Blog:  The TOP 500 List: A Different Perspective">different perspective</a> on the list and notes some trends.  For example, the 10th fasted machine on this years list drives more <a class="zem_slink" href="http://en.wikipedia.org/wiki/FLOPS" title="FLOPS" rel="wikipedia">FLOPS</a> than all 500 machines on the 2000 list.  </p>
<p>While the post has a definite Intel angle to it, Chris notes the point I made earlier.  Today, massive computing resources are available a lot more easily, you have new software stacks, whether for <a href="http://www.rocksclusters.org/wordpress/" title="Rocks Clusters">clustering</a>, or for <a href="http://hadoop.apache.org/core/" title="Welcome to Apache Hadoop Core!">massive data-intensive computing</a>.  Personally, I think how we consume computing and the nature of our compute codes is going to go through a transformation in the next decade and more people are going to be doing large scale computing and solving interesting problems.  </p>
<p>Will the Top 500 list become meaningless?  Not really. There is always room for massive floating point performance and certain problems for which you just need the kind of raw horsepower that the big iron provides.  For others, we have a lot of resources that we can get our hands on.<br />
<h6 class="zemanta-related-title">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://arstechnica.com/hardware/news/2009/06/top-500-list-supercomputing-is-now-an-8020-market.ars"> Top 500 list: supercomputing is now an 80/20 market </a> (arstechnica.com)</li>
</ul>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/4f746d57-bd94-4bdf-9673-d00fc36cf8d5/" title="Reblog this post [with Zemanta]"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_b.png?x-id=4f746d57-bd94-4bdf-9673-d00fc36cf8d5" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=Jum_7KEqAsY:tmKtMGIeipw:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=Jum_7KEqAsY:tmKtMGIeipw:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=Jum_7KEqAsY:tmKtMGIeipw:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=Jum_7KEqAsY:tmKtMGIeipw:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=Jum_7KEqAsY:tmKtMGIeipw:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=Jum_7KEqAsY:tmKtMGIeipw:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=Jum_7KEqAsY:tmKtMGIeipw:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=Jum_7KEqAsY:tmKtMGIeipw:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=Jum_7KEqAsY:tmKtMGIeipw:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=Jum_7KEqAsY:tmKtMGIeipw:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=Jum_7KEqAsY:tmKtMGIeipw:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=Jum_7KEqAsY:tmKtMGIeipw:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=Jum_7KEqAsY:tmKtMGIeipw:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/Jum_7KEqAsY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/blog/2009/06/24/the-big-machines/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/blog/2009/06/24/the-big-machines/</feedburner:origLink></item>
		<item>
		<title>Freerisk – An open platform for risk modeling</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/6emp2BBvsw4/</link>
		<comments>http://mndoci.com/blog/2009/06/18/freerisk-an-open-platform-for-risk-modeling/#comments</comments>
		<pubDate>Thu, 18 Jun 2009 15:20:47 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[Software & Internet]]></category>

		<guid isPermaLink="false">http://mndoci.com/blog/?p=1268</guid>
		<description><![CDATA[



Image by ynse via Flickr



I&#8217;ve been meaning to write about Freerisk.org for a while now, but only got reminded yesterday as I read the Wired article about Toby Segaran&#8217;s (and Jesper Anderson&#8217;s) new project.
Freerisk.org sucks in financial data from the SEC using the XBRL format, allows the community to add additional annotations, and then makes [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img">
<div>
<dl class="wp-caption alignleft">
<dt class="wp-caption-dt"><a href="http://www.flickr.com/photos/91887854@N00/2821306514"><img src="http://farm4.static.flickr.com/3134/2821306514_85ae9e9e91_m.jpg" alt="Various data from Bioinformatics Career Survey" title="Various data from Bioinformatics Career Survey" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution">Image by <a href="http://www.flickr.com/photos/91887854@N00/2821306514">ynse</a> via Flickr</dd>
</dl>
</div>
</div>
<p>I&#8217;ve been meaning to write about <a href="http://freerisk.org/" title="freerisk">Freerisk.org</a> for a while now, but only got reminded yesterday as I read the <a href="http://www.wired.com/techbiz/people/magazine/17-06/st_alphageek" title="Risk Calculators: Finance Geeks Use Open API to Crunch Market Numbers">Wired article</a> about <a href="http://blog.kiwitobes.com/" title="kiwitobes.com">Toby Segaran&#8217;s</a> (and <a href="http://radar.oreilly.com/jespera/" title="Jesper Andersen | O'Reilly Radar">Jesper Anderson&#8217;s</a>) new project.</p>
<p>Freerisk.org sucks in financial data from the <a class="zem_slink" href="http://www.sec.gov" title="U.S. Securities and Exchange Commission" rel="homepage">SEC</a> using the <a href="http://www.xbrl.org/" title="">XBRL</a> format, allows the community to add additional annotations, and then makes that data available to standard risk analysis algorithms and, this is the best part, available for others to apply their own algorithms.  My first reaction was, this is what we want to be able to do in <a class="zem_slink" href="http://en.wikipedia.org/wiki/Bioinformatics" title="Bioinformatics" rel="wikipedia">bioinformatics</a>.  Keep the data available, add annotations, and have this sandbox in which algorithms can be applied and developed.</p>
<p>The finance geek part of it is interesting enough, but I got interested in Freerisk for the general idea, especially coming from a field where there is a lot of data publicly available but not necessary sandoxes/platforms for analysis and testing out new algorithms, although there is a lot of intent.  From the about page of Freerisk.org</p>
<blockquote><p>
	Freerisk is a project with the goal of making freely available the data, algorithms and tools necessary to perform risk modeling. We believe that <a class="zem_slink" href="http://en.wikipedia.org/wiki/Risk_management" title="Risk management" rel="wikipedia">risk management</a> is too important to society to be an arcane subject or competitive advantage.
</p></blockquote>
<p>You could easily replace &#8220;risk management&#8221; with biology or genomics, or something similar.</p>
<p>The pieces that Freerisk contains are even more interesting</p>
<ul>
<li>An open repository of financial data, including financial statements for public companies</li>
<li>A standards-based <a class="zem_slink" href="http://en.wikipedia.org/wiki/Application_programming_interface" title="Application programming interface" rel="wikipedia">API</a> for querying financial data</li>
<li>A distributed method for designing and running risk models</li>
<li><a class="zem_slink" href="http://en.wikipedia.org/wiki/Open_source" title="Open source" rel="wikipedia">Open-source</a> tools for parsing and handling financial data</li>
<li>Educational materials on risk-management</li>
</ul>
<p>This is a hackers playground.  We need something like this in the informatics community, especially as our data volumes grow.  It&#8217;s just an ethos that we seem to lack in general, and part is due to the fact that we need to publish our data, but there is a broader community of analysts and developers this could appeal too.  Resources like these are needed, not just for finance, but in many other areas.  The key is to find enough interested people to contribute.  We have some aspects in the bioinformatics space, but it&#8217;s somewhat fragmented and the analytics part is the weakness at this point.</p>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/38af7ae7-5559-47c1-aae8-ff847e5d9606/" title="Reblog this post [with Zemanta]"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_b.png?x-id=38af7ae7-5559-47c1-aae8-ff847e5d9606" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=6emp2BBvsw4:kvK6sD_o7Io:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=6emp2BBvsw4:kvK6sD_o7Io:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=6emp2BBvsw4:kvK6sD_o7Io:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=6emp2BBvsw4:kvK6sD_o7Io:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=6emp2BBvsw4:kvK6sD_o7Io:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=6emp2BBvsw4:kvK6sD_o7Io:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=6emp2BBvsw4:kvK6sD_o7Io:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=6emp2BBvsw4:kvK6sD_o7Io:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=6emp2BBvsw4:kvK6sD_o7Io:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=6emp2BBvsw4:kvK6sD_o7Io:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=6emp2BBvsw4:kvK6sD_o7Io:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=6emp2BBvsw4:kvK6sD_o7Io:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=6emp2BBvsw4:kvK6sD_o7Io:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/6emp2BBvsw4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/blog/2009/06/18/freerisk-an-open-platform-for-risk-modeling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/blog/2009/06/18/freerisk-an-open-platform-for-risk-modeling/</feedburner:origLink></item>
		<item>
		<title>Hundred nanoseconds a day</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/n8F1gmyWjdQ/</link>
		<comments>http://mndoci.com/blog/2009/06/14/hundred-nanoseconds-a-day/#comments</comments>
		<pubDate>Mon, 15 Jun 2009 06:49:30 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[Life Science]]></category>
		<category><![CDATA[Modeling & Simulation]]></category>
		<category><![CDATA[D. E. Shaw & Co.]]></category>
		<category><![CDATA[Gromacs]]></category>
		<category><![CDATA[Molecular dynamics]]></category>

		<guid isPermaLink="false">http://mndoci.com/blog/?p=1263</guid>
		<description><![CDATA[



Image by vitroids via Flickr



100 nanoseconds a day.  100 nanoseconds a day.  100 nanoseconds a day
That is amazing.  I used to get supercomputing time to do 100 ns simulations during my PhD and those used to last days, but that&#8217;s exactly what NAMD has achieved recently.  A recent review article by [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img">
<div>
<dl class="wp-caption alignleft">
<dt class="wp-caption-dt"><a href="http://www.flickr.com/photos/77581941@N00/2655295940"><img src="http://farm4.static.flickr.com/3138/2655295940_ea9e2f2b64_m.jpg" alt="Water in hydrogen bond network" title="Water in hydrogen bond network" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution">Image by <a href="http://www.flickr.com/photos/77581941@N00/2655295940">vitroids</a> via Flickr</dd>
</dl>
</div>
</div>
<p>100 nanoseconds a day.  100 nanoseconds a day.  100 nanoseconds a day</p>
<p>That is amazing.  I used to get supercomputing time to do 100 ns simulations during my PhD and those used to last days, but that&#8217;s exactly what <a href="http://dx.doi.org/10.1529/biophysj.108.131565" title="Elsevier: Article Locator">NAMD has achieved recently</a>.  A recent review article by the folks at <a href="http://www.deshawresearch.com/" title="D.&nbsp;E.&nbsp;Shaw Research">D.E. Shaw Research</a> lays down the <a href="http://www.ncbi.nlm.nih.gov/pubmed/19361980" title="Long-timescale molecular dynamics simulations of p...[Curr Opin Struct Biol. 2009] - PubMed Result">state of protein simulations</a>.</p>
<p>To put the 100 ns in context.  That simulation was done on 300 cores.  Given that you can get 1000 cores increasingly easily, that&#8217;s 1000 ns in 3.3. days assuming linear scaling.  So when <a class="zem_slink" href="http://www.deshaw.com/" title="D. E. Shaw &amp; Co." rel="homepage">D.E. Shaw</a> and co write that microsecond simulations are getting practical (increasingly feasible would be a better statement), they&#8217;re not just saying that.   I think if access to 3000 cores and these compute scales becomes commoditized (not difficult looking at the kinds of trends I am seeing), then we are in business and it is indeed practical.</p>
<p><a href="http://www.ks.uiuc.edu/Research/namd/">NAMD</a>, <a class="zem_slink" href="http://www.gromacs.org/" title="GROMACS" rel="homepage">Gromacs</a>, <a href="http://www.deshawresearch.com/resources.html">Desmond</a>.  For the first time in a long time, I really want to do MD again.  Now to make the entire MD ecosystem more practical.  I would love to see services around such codes that make it easier to run large jobs, include system preparation, and perhaps even analysis.</p>
<h6 class="zemanta-related-title">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://proteome-zone.blogspot.com/2008/01/folding-time-scales-wwo-groel.html">Folding Time-Scales w/wo GroEL</a> (proteome-zone.blogspot.com)</li>
<li class="zemanta-article-ul-li"><a href="http://mndoci.com/blog/2009/03/22/thirty-years-of-biomolecular-simulation/">Thirty years of biomolecular simulation</a> (mndoci.com)</li>
</ul>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/aa497095-c23b-4062-901a-2db7e97b1c95/" title="Reblog this post [with Zemanta]"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_b.png?x-id=aa497095-c23b-4062-901a-2db7e97b1c95" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=n8F1gmyWjdQ:JZhbxtc-IVQ:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=n8F1gmyWjdQ:JZhbxtc-IVQ:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=n8F1gmyWjdQ:JZhbxtc-IVQ:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=n8F1gmyWjdQ:JZhbxtc-IVQ:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=n8F1gmyWjdQ:JZhbxtc-IVQ:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=n8F1gmyWjdQ:JZhbxtc-IVQ:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=n8F1gmyWjdQ:JZhbxtc-IVQ:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=n8F1gmyWjdQ:JZhbxtc-IVQ:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=n8F1gmyWjdQ:JZhbxtc-IVQ:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=n8F1gmyWjdQ:JZhbxtc-IVQ:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=n8F1gmyWjdQ:JZhbxtc-IVQ:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=n8F1gmyWjdQ:JZhbxtc-IVQ:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=n8F1gmyWjdQ:JZhbxtc-IVQ:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/n8F1gmyWjdQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/blog/2009/06/14/hundred-nanoseconds-a-day/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://mndoci.com/blog/2009/06/14/hundred-nanoseconds-a-day/</feedburner:origLink></item>
		<item>
		<title>Write heavy file system workloads</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/9FxigN-boEI/</link>
		<comments>http://mndoci.com/blog/2009/06/13/write-heavy-file-system-workloads/#comments</comments>
		<pubDate>Sun, 14 Jun 2009 06:14:55 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[BioIT]]></category>
		<category><![CDATA[Computing]]></category>

		<guid isPermaLink="false">http://mndoci.com/blog/?p=1259</guid>
		<description><![CDATA[In a blog post last year James Hamilton wrote about workloads in large scale network file systems.  In his summary about of study on the subject he writes

	Some of the important points that spring out for me: the percentage of random access is increasing; for those accesses that are sequential, the runs are longer; [...]]]></description>
			<content:encoded><![CDATA[<p>In a blog post last year James Hamilton wrote about <a href="http://perspectives.mvdirona.com/2008/09/28/MeasurementAndAnalysisOfLargeScaleNetworkFileSystemWorkloads.aspx" title="Perspectives - Measurement and Analysis of Large-Scale Network File System Workloads">workloads in large scale network file systems</a>.  In his summary about of <a href="http://perspectives.mvdirona.com/ct.ashx?id=a49b36de-1959-47a0-91a8-1c6273535e97&amp;url=http%3a%2f%2fwww.usenix.org%2fevent%2fusenix08%2ftech%2ffull_papers%2fleung%2fleung.pdf" title="Perspectives">study</a> on the subject he writes</p>
<blockquote><p>
	Some of the important points that spring out for me: the percentage of random access is increasing; for those accesses that are sequential, the runs are longer; file sizes are increasing, data is getting colder; file lifetimes are increasing; and client usage has very high skew.
</p></blockquote>
<p>Those patterns sound a lot like some of the patterns I have seen in the life sciences recently, especially as we have to handle increasingly larger data volumes, which have varying levels of access patterns and usage.  Seeing some of the data challenges that people close to home have been seeing, esp significantly higher write to read ratios, which makes caching of limited use, makes one realize that the scale challenges aren&#8217;t always the same as the ones you typically see on the web.  The study authors actually make a conclusion that since <a class="zem_slink" href="http://en.wikipedia.org/wiki/Metadata" title="Metadata" rel="wikipedia">metadata</a> is accessed far more regularly, larger metadata caches are beneficial.  Again, a typical access pattern for a lot of &#8216;omics&#8217; data.</p>
<p>Does it make sense for us to start sharing design patterns for scale in the life sciences?  Even in the world of the web and other high scale industries, those design patterns are not well understood, but I think the challenges in the life science world are a little greater since we typically try and make do without people who understand scale and systems, with a few notable exceptions.</p>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/74875b78-bfc6-452d-a500-cef673fe8796/" title="Reblog this post [with Zemanta]"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_b.png?x-id=74875b78-bfc6-452d-a500-cef673fe8796" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=9FxigN-boEI:kgklsbODSaU:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=9FxigN-boEI:kgklsbODSaU:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=9FxigN-boEI:kgklsbODSaU:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=9FxigN-boEI:kgklsbODSaU:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=9FxigN-boEI:kgklsbODSaU:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=9FxigN-boEI:kgklsbODSaU:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=9FxigN-boEI:kgklsbODSaU:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=9FxigN-boEI:kgklsbODSaU:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=9FxigN-boEI:kgklsbODSaU:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=9FxigN-boEI:kgklsbODSaU:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=9FxigN-boEI:kgklsbODSaU:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=9FxigN-boEI:kgklsbODSaU:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=9FxigN-boEI:kgklsbODSaU:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/9FxigN-boEI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/blog/2009/06/13/write-heavy-file-system-workloads/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/blog/2009/06/13/write-heavy-file-system-workloads/</feedburner:origLink></item>
		<item>
		<title>Supercomputing Masterclass – A request for information</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/ajvsnJzwVfI/</link>
		<comments>http://mndoci.com/blog/2009/06/13/supercomputing-masterclass-a-request-for-information/#comments</comments>
		<pubDate>Sun, 14 Jun 2009 04:51:36 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[BioIT]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[Event]]></category>
		<category><![CDATA[Informatics]]></category>
		<category><![CDATA[Omics]]></category>

		<guid isPermaLink="false">http://mndoci.com/blog/?p=1256</guid>
		<description><![CDATA[I have been invited to give a Masterworks talk on Data Challenges in Genomics for Supercomputing 09.  I would like to dive into the details about the technical and scientific challenges of high throughput genomics, from microarrays to next gen sequencing and beyond and how we need to be manage these data more efficiently. [...]]]></description>
			<content:encoded><![CDATA[<p>I have been invited to give a <a href="http://sc09.supercomputing.org/?pg=masterworks.html" title="SC09 - Masterworks">Masterworks talk on Data Challenges in Genomics</a> for <a href="http://sc09.supercomputing.org/index.php" title="SC09">Supercomputing 09</a>.  I would like to dive into the details about the technical and scientific challenges of high throughput genomics, from microarrays to next gen sequencing and beyond and how we need to be manage these data more efficiently.  While part of my talk will be about my <a href="http://aws.amazon.com/ec2">day job</a>, I want it to be informed by challenges we face today and will face tomorrow as a scientific community.  So to try and capture many of these challenges and gather facts and information I have started a <a href="http://deepaksingh.net/Talks/SC09" title="SC09 - Feeding the Curious">wiki page</a> which I have made public.  Please feel free to add to that page with ideas and topics that interest you.  To do so, you will have to login as user <em>sc09</em> and password <em>computing</em>.  I request that you add your name to any major input.  If I can figure out an alternative authentication mechanism, will update this post</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=ajvsnJzwVfI:2br4dpoKBm0:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=ajvsnJzwVfI:2br4dpoKBm0:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=ajvsnJzwVfI:2br4dpoKBm0:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=ajvsnJzwVfI:2br4dpoKBm0:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=ajvsnJzwVfI:2br4dpoKBm0:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=ajvsnJzwVfI:2br4dpoKBm0:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=ajvsnJzwVfI:2br4dpoKBm0:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=ajvsnJzwVfI:2br4dpoKBm0:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=ajvsnJzwVfI:2br4dpoKBm0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=ajvsnJzwVfI:2br4dpoKBm0:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=ajvsnJzwVfI:2br4dpoKBm0:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=ajvsnJzwVfI:2br4dpoKBm0:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=ajvsnJzwVfI:2br4dpoKBm0:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/ajvsnJzwVfI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/blog/2009/06/13/supercomputing-masterclass-a-request-for-information/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/blog/2009/06/13/supercomputing-masterclass-a-request-for-information/</feedburner:origLink></item>
		<item>
		<title>Web as platform: Linking data.  RDF stores and augmentation</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/WhqTENf6rPw/</link>
		<comments>http://mndoci.com/blog/2009/06/13/linking-data-rdf-stores-and-augmentation/#comments</comments>
		<pubDate>Sat, 13 Jun 2009 22:20:46 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Web as platform]]></category>

		<guid isPermaLink="false">http://mndoci.com/blog/?p=1252</guid>
		<description><![CDATA[Mashing up RDF data.  See how you can use the Talis Platform to augment one data set with another.]]></description>
			<content:encoded><![CDATA[<p><a class="zem_slink" href="http://en.wikipedia.org/wiki/Open_Data" title="Open Data" rel="wikipedia">Open Data</a> and <a class="zem_slink" href="http://en.wikipedia.org/wiki/Resource_Description_Framework" title="Resource Description Framework" rel="wikipedia">RDF</a> fit together like a well fitted glove, or so some of us think.  The folks at Talis definitely among those and usually have some cool things to talk about in this space. A <a href="http://blogs.talis.com/n2/archives/617" title="n²  » Blog Archive   » Augmenting Last.fm Data with BBC data on the Talis Platform">recent post</a> shows you how you can use linked data and leverage diverse data sets.  It also shows you some of the nicer aspects of the <a href="http://www.talis.com/platform" title="Talis Platform - Home">Talis Platform</a>.</p>
<p>Apparently the <a class="zem_slink" href="http://www.bbc.co.uk/" title="BBC" rel="homepage">BBC</a> recently created an <a href="http://welcomebackstage.com/2009/06/bbc-backstage-sparql-endpoint/" title="BBC Backstage SPARQL Endpoint - Backstage.bbc.co.uk">RDF store on the Talis platform</a>.  You can that store and create another for <a class="zem_slink" href="http://last.fm" title="Last.fm" rel="homepage">last.fm</a> data.  You can then, using appropriate endpoints augment the last.fm data with data from the BBC store.  </p>
<p>Similarly, I do think we need to create similar endpoints in the life science domain (and we are), allowing data from one resource to be augmented by data from another resource.  These data can be repurposed and displayed in different ways.  The argument that many of us in favor of Open Data and/or <a class="zem_slink" href="http://en.wikipedia.org/wiki/Linked_Data" title="Linked Data" rel="wikipedia">Linked Data</a> make is that given the appropriate resources smart people can come up with smart and useful solutions.  The web is a resource, a resource for data, a resource for making that data useful for others.  If we think about it as such, rather than just a means of displaying information, we&#8217;ll be that much more empowered.<br />
<h6 class="zemanta-related-title">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://peanutbutter.wordpress.com/2009/04/25/the-semantic-web-of-life-science/"> The Semantic Web of Life Science </a> (peanutbutter.wordpress.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.readwriteweb.com/archives/linked_data_is_blooming_why_you_should_care.php"> Linked Data is Blooming: Why You Should Care </a> (readwriteweb.com)</li>
<li class="zemanta-article-ul-li"><a href="http://mndoci.com/blog/2009/03/28/talis-connected-commons-linked-open-data-repository-opens-up-shop/">Talis Connected Commons: Linked open data repository opens up shop</a> (mndoci.com)</li>
<li class="zemanta-article-ul-li"><a href="http://blogs.talis.com/nodalities/2009/04/leigh-dodds-talks-about-the-talis-connected-commons-and-linked-open-data.php"> Nodalities (Talis): Leigh Dodds talks about the Talis Connected Commons and Linked Open Data </a> (blogs.talis.com)</li>
</ul>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/a0aab400-c225-4fc0-bb6e-51454c1a7ba7/" title="Reblog this post [with Zemanta]"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_b.png?x-id=a0aab400-c225-4fc0-bb6e-51454c1a7ba7" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=WhqTENf6rPw:7n6p3ImzQqo:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=WhqTENf6rPw:7n6p3ImzQqo:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WhqTENf6rPw:7n6p3ImzQqo:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WhqTENf6rPw:7n6p3ImzQqo:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=WhqTENf6rPw:7n6p3ImzQqo:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WhqTENf6rPw:7n6p3ImzQqo:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WhqTENf6rPw:7n6p3ImzQqo:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=WhqTENf6rPw:7n6p3ImzQqo:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WhqTENf6rPw:7n6p3ImzQqo:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=WhqTENf6rPw:7n6p3ImzQqo:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WhqTENf6rPw:7n6p3ImzQqo:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WhqTENf6rPw:7n6p3ImzQqo:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=WhqTENf6rPw:7n6p3ImzQqo:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/WhqTENf6rPw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/blog/2009/06/13/linking-data-rdf-stores-and-augmentation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/blog/2009/06/13/linking-data-rdf-stores-and-augmentation/</feedburner:origLink></item>
		<item>
		<title>TrendingTopics.org: A reference site for data analytics in Hadoop and Hive</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/WCFA7hOvCok/</link>
		<comments>http://mndoci.com/blog/2009/06/10/trendingtopicsorg-a-reference-site-for-data-analytics-in-hadoop-and-hive/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 14:28:29 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[Informatics]]></category>
		<category><![CDATA[Life Science]]></category>

		<guid isPermaLink="false">http://mndoci.com/blog/2009/06/10/trendingtopicsorg-a-reference-site-for-data-analytics-in-hadoop-and-hive/</guid>
		<description><![CDATA[In episode 21 of Coast to Coast Bio (not yet released) I talk about Hive.  For those who may not know, Hive is a data warehouse infrastructure built on top of Hadoop.
One of the most recent Amazon Public Data Sets is a sample of Wikipedia page stat statistics by Peter Skomoroch. The full data [...]]]></description>
			<content:encoded><![CDATA[<p>In episode 21 of <a href="http://c2cbio.com">Coast to Coast Bio</a> (not yet released) I talk about <a href="http://hadoop.apache.org/hive/" title="Welcome to Hive!">Hive</a>.  For those who may not know, Hive is a <a class="zem_slink" href="http://en.wikipedia.org/wiki/Data_warehouse" title="Data warehouse" rel="wikipedia">data warehouse</a> infrastructure built on top of <a href="http://hadoop.apache.org/core/" title="Welcome to Apache Hadoop Core!">Hadoop</a>.</p>
<p>One of the most recent <a href="http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=243" title="Amazon Web Services Developer Community : Public Data Sets">Amazon Public Data Sets</a> is a sample of <a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2596&amp;ref=featured" title="Amazon Web Services Developer Community : Wikipedia Page Traffic Statistics">Wikipedia page stat statistics</a> by <a href="http://www.datawrangling.com/" title="Data Wrangling - machine learning, datamining, algorithms, python code, and more">Peter Skomoroch</a>. The full data set powers <a href="http://www.trendingtopics.org/" title="Pages: index">trendingtopics.org</a>.  </p>
<p><a href="http://www.trendingtopics.org/about" title="Pages: about">What is TrendingTopics</a>?</p>
<blockquote><p>This site was built by Data Wrangling to demonstrate how Hadoop can power a simple data driven website. The trend statistics and <a class="zem_slink" href="http://en.wikipedia.org/wiki/Time_series" title="Time series" rel="wikipedia">time series data</a> that run the site are updated periodically by launching a temporary EC2 cluster running the <a class="zem_slink" href="http://www.cloudera.com" title="Cloudera" rel="homepage">Cloudera</a> Hadoop Distribution. Our initial seed data includes the content of wikipedia and hourly article traffic logs from the wikipedia squid proxy collected by Domas Mituzas.</p></blockquote>
<p>Why do I like this so much?  Apart from the fact that it is a website for <a class="zem_slink" href="http://en.wikipedia.org/wiki/Data_visualization" title="Data visualization" rel="wikipedia">data visualization</a> and analytics, it hits a lot of points that can be of huge value to the informatics community.</p>
<ul>
<li>It uses EC2 to compute on data as needed</li>
<li>Uses Hadoop and Hive</li>
<li>It is a reference architecture and you can find the source on <a href="http://github.com/datawrangling/trendingtopics/tree/master" title="datawrangling's trendingtopics at master - GitHub">Github</a>and an example dataset on <a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2596" title="Amazon Web Services Developer Community : Wikipedia Page Traffic Statistics">AWS public data sets</a></li>
</ul>
<p>That last bit is important.  Peter demostrates how you can use Hadoop, Hive, <a class="zem_slink" href="http://rubyonrails.org/" title="Ruby on Rails" rel="homepage">Ruby on Rails</a> and EC2 as a data crunching and data visualization resource.  WOuld love to see more such sites, with biological data sets.</p>
<p><em>Please see this <a href="http://mndoci.com/blog/about#disclaimer">disclaimer</a></em><br />
<h6 class="zemanta-related-title">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://www.cloudave.com/link/amazon-releases-elastic-mapreduce-web-service"> Amazon releases Elastic MapReduce web service </a> (cloudave.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.theregister.co.uk/2009/03/16/cloudera_hadoop_launch/">Cloudera floats commercial Hadoop distro</a> (theregister.co.uk)</li>
</ul>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/2a2e0508-5444-4047-b1c2-ecf227027ac0/" title="Reblog this post [with Zemanta]"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_b.png?x-id=2a2e0508-5444-4047-b1c2-ecf227027ac0" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=WCFA7hOvCok:orCax9wnVAU:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=WCFA7hOvCok:orCax9wnVAU:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WCFA7hOvCok:orCax9wnVAU:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WCFA7hOvCok:orCax9wnVAU:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=WCFA7hOvCok:orCax9wnVAU:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WCFA7hOvCok:orCax9wnVAU:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WCFA7hOvCok:orCax9wnVAU:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=WCFA7hOvCok:orCax9wnVAU:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WCFA7hOvCok:orCax9wnVAU:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=WCFA7hOvCok:orCax9wnVAU:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WCFA7hOvCok:orCax9wnVAU:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=WCFA7hOvCok:orCax9wnVAU:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=WCFA7hOvCok:orCax9wnVAU:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/WCFA7hOvCok" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/blog/2009/06/10/trendingtopicsorg-a-reference-site-for-data-analytics-in-hadoop-and-hive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/blog/2009/06/10/trendingtopicsorg-a-reference-site-for-data-analytics-in-hadoop-and-hive/</feedburner:origLink></item>
		<item>
		<title>High scale design patterns (missing) in the life sciences</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/QaTDdTcj_UQ/</link>
		<comments>http://mndoci.com/blog/2009/06/10/high-scale-design-patterns-in-the-life-sciences/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 07:06:01 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[BioIT]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[Life Science]]></category>

		<guid isPermaLink="false">http://mndoci.com/blog/?p=1242</guid>
		<description><![CDATA[I&#8217;ve written about software failures in the past. As I get a better understanding of scale and architectures and talk to others about some of the core design principles of systems at scale, e.g. Recover Oriented Computing (also see this talk by James Hamilton), I realize how little most of us in the life science [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve <a href="http://mndoci.com/blog/2009/05/23/when-software-embraces-failure/" title="When software embraces failure : business|bytes|genes|molecules">written about software failures</a> in the past. As I get a better understanding of scale and architectures and talk to others about some of the core design principles of systems at scale, e.g. <a href="http://roc.cs.berkeley.edu/" title="The UC Berkeley/Stanford Recovery-Oriented Computing (ROC) Project">Recover Oriented Computing</a> (also see this <a href="http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_POA20090226.pdf" title="">talk</a> by <a href="http://perspectives.mvdirona.com/" title="Perspectives">James Hamilton</a>), I realize how little most of us in the life science world think about some of these design principles.  Most people in this space do not run <a class="zem_slink" href="http://en.wikipedia.org/wiki/Data_center" title="Data center" rel="wikipedia">data centers</a>, and usually have limited access to resources at massive scale.  Yet, we are going towards a data intensive world, and a world in which we could, in princple, perform massive <a class="zem_slink" href="http://en.wikipedia.org/wiki/Optimization_%28mathematics%29" title="Optimization (mathematics)" rel="wikipedia">optimization</a> problems.</p>
<p>So the question to ask is, when will we start seeing some of these <a class="zem_slink" href="http://en.wikipedia.org/wiki/Design_pattern_%28computer_science%29" title="Design pattern (computer science)" rel="wikipedia">design patterns</a> start becoming the norm rather than the expception for computing in the life sciences?  When will we aggressively start thinking about data partitioning and replication, about building highly availabile, <a class="zem_slink" href="http://en.wikipedia.org/wiki/Fault-tolerant_system" title="Fault-tolerant system" rel="wikipedia">fault tolerant systems</a> that can do a good job of analyzing ever increasing data sets?  All of this in <a class="zem_slink" href="http://en.wikipedia.org/wiki/Service-oriented_architecture" title="Service-oriented architecture" rel="wikipedia">service oriented</a> models, with all the accompanying advantages of high scale architectures.  The kinds of things than enable <em><a href="http://friendfeed.com/themza/80b9911d/into-wonderful" title="Into The Wonderful - Matt Wood - FriendFeed">Into the Wonderful</a></em></p>
<p>The good news is that I get to see <a href="http://www.bleedingedgebiotech.com/blog/bioengineering/antibody-docking-on-the-amazon-cloud/" title="Antibody Docking on the Amazon Cloud at Bleeding Edge Biotech">some</a> of the <a href="http://pubs.acs.org/cen/coverstory/87/8721cover.html" title="The New Computing Pioneers | Cover Story | Chemical &amp; Engineering News">early adopters</a> in action.  I also hope to cover some of these paradigms at <a href="http://sc09.supercomputing.org">Supercomputing 09</a></p>
<h6 class="zemanta-related-title">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://insidehpc.com/2009/04/23/fault-tolerance-at-scale-embrace-your-inner-slacker/"> Fault tolerance at scale: embrace your inner slacker </a> (insidehpc.com)</li>
<li class="zemanta-article-ul-li"><a href="http://mndoci.com/blog/2009/04/12/the-future-of-big-compute-for-big-science/"> The future of big compute for big science </a> (mndoci.com)</li>
<li class="zemanta-article-ul-li"><a href="http://insidehpc.com/2009/03/16/bio-computing-thrust-at-sc09/">Bio-computing thrust at SC09</a> (insidehpc.com)</li>
</ul>
<div class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/6308040d-d1bd-4d0f-9c75-ee11f9160048/" title="Reblog this post [with Zemanta]"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_c.png?x-id=6308040d-d1bd-4d0f-9c75-ee11f9160048" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=QaTDdTcj_UQ:bTrFUb6mnb8:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=QaTDdTcj_UQ:bTrFUb6mnb8:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=QaTDdTcj_UQ:bTrFUb6mnb8:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=QaTDdTcj_UQ:bTrFUb6mnb8:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=QaTDdTcj_UQ:bTrFUb6mnb8:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=QaTDdTcj_UQ:bTrFUb6mnb8:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=QaTDdTcj_UQ:bTrFUb6mnb8:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=QaTDdTcj_UQ:bTrFUb6mnb8:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=QaTDdTcj_UQ:bTrFUb6mnb8:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=QaTDdTcj_UQ:bTrFUb6mnb8:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=QaTDdTcj_UQ:bTrFUb6mnb8:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=QaTDdTcj_UQ:bTrFUb6mnb8:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=QaTDdTcj_UQ:bTrFUb6mnb8:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/QaTDdTcj_UQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/blog/2009/06/10/high-scale-design-patterns-in-the-life-sciences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/blog/2009/06/10/high-scale-design-patterns-in-the-life-sciences/</feedburner:origLink></item>
		<item>
		<title>Our publishing model encourages bad behavior</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/iMitgiJJNBU/</link>
		<comments>http://mndoci.com/blog/2009/06/07/our-publishing-model-encourages-bad-behavior/#comments</comments>
		<pubDate>Mon, 08 Jun 2009 05:25:33 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Publishing]]></category>
		<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://mndoci.com/blog/?p=1239</guid>
		<description><![CDATA[My lack of enthusiasm for the current publish or perish model is well documented.  This thread from Friendfeed highlights one of the many problems with the current state of peer review and the publishing model and implications for tenure etc.   Everything is interconnected, and ends up leading to a system that hurts [...]]]></description>
			<content:encoded><![CDATA[<p>My lack of enthusiasm for the current <a class="zem_slink" href="http://en.wikipedia.org/wiki/Publish_or_perish" title="Publish or perish" rel="wikipedia">publish or perish</a> model is <a href="http://www.lijit.com/search/mndoci?q=publish+or+perish">well documented</a>.  <a href="http://friendfeed.com/the-life-scientists/fb15fcbc/advice-wanted-for-course-of-action-on-unethical" title="Advice wanted for course of action on unethical... - The Life Scientists - FriendFeed">This thread</a> from <a class="zem_slink" href="http://friendfeed.com" title="FriendFeed" rel="homepage">Friendfeed</a> highlights one of the many problems with the current state of peer review and the publishing model and implications for tenure etc.   Everything is interconnected, and ends up leading to a system that hurts good science.  It&#8217;s time for a change.  This is one model that has outlived it&#8217;s intended utility	</p>
<div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/5688adbf-d49f-488b-b122-1eef50e6503a/" title="Reblog this post [with Zemanta]"><img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_b.png?x-id=5688adbf-d49f-488b-b122-1eef50e6503a" alt="Reblog this post [with Zemanta]"></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=iMitgiJJNBU:iUSHFupndDY:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=iMitgiJJNBU:iUSHFupndDY:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=iMitgiJJNBU:iUSHFupndDY:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=iMitgiJJNBU:iUSHFupndDY:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=iMitgiJJNBU:iUSHFupndDY:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=iMitgiJJNBU:iUSHFupndDY:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=iMitgiJJNBU:iUSHFupndDY:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=iMitgiJJNBU:iUSHFupndDY:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=iMitgiJJNBU:iUSHFupndDY:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=iMitgiJJNBU:iUSHFupndDY:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=iMitgiJJNBU:iUSHFupndDY:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=iMitgiJJNBU:iUSHFupndDY:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=iMitgiJJNBU:iUSHFupndDY:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/iMitgiJJNBU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/blog/2009/06/07/our-publishing-model-encourages-bad-behavior/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://mndoci.com/blog/2009/06/07/our-publishing-model-encourages-bad-behavior/</feedburner:origLink></item>
		<item>
		<title>When Whole Genome Sequencing becomes passe</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/EXScSze48BQ/</link>
		<comments>http://mndoci.com/blog/2009/06/05/when-whole-genome-sequencing-becomes-passe/#comments</comments>
		<pubDate>Sat, 06 Jun 2009 04:37:23 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Life Science]]></category>
		<category><![CDATA[Omics]]></category>
		<category><![CDATA[Publishing]]></category>

		<guid isPermaLink="false">http://mndoci.com/blog/?p=1235</guid>
		<description><![CDATA[In a recent blog post at MassGenomics talking about the recently published sequence of a Korean individual, Dan Koboldt makes an interesting observation.  He notes

	This week’s publication of the genome of a Korean individual in Genome Research marks the fifth individual whole genome sequenced with massively parallel sequencing platforms.  The fact that this [...]]]></description>
			<content:encoded><![CDATA[<p>In a <a href="http://www.massgenomics.org/2009/06/whole-genome-sequencing-how-many-snps-remain.html" title="Whole Genome Sequencing: How Many SNPs Remain? | MassGenomics">recent blog post at MassGenomics</a> talking about the recently published <a href="http://genome.cshlp.org/content/early/2009/05/26/gr.092197.109" title="The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group  —  Genome Research">sequence of a Korean individual</a>, Dan Koboldt makes an interesting observation.  He notes</p>
<blockquote><p>
	This week’s publication of the genome of a Korean individual in Genome Research marks the fifth individual whole genome sequenced with massively parallel sequencing platforms.  The fact that this was not a <a class="zem_slink" href="http://www.nature.com/nature/index.html" title="Nature (journal)" rel="homepage">Nature</a> paper speaks as loudly as anything.  The window of time when single whole genome sequences merit high-profile publications is slowly closing.
</p></blockquote>
<p>It&#8217;s an increasing indication of the commoditization of <a class="zem_slink" href="http://en.wikipedia.org/wiki/Genomics" title="Genomics" rel="wikipedia">genomics</a> technologies which will mean a plethora of individual genomes, which will make the genome itself less interesting, but rather the characteristics and analysis the core component of any publication.  Dan&#8217;s probably right that after the 1000 genome project, whole genome sequencing papers won&#8217;t excite people that much, but rather a focus on disease related efforts will rule.  I know quite a few people who can&#8217;t wait for us to stop collecting and start analyzing.  The day isn&#8217;t too far away.</p>
<div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/0d646b2a-72fb-4168-a577-4de185ccd5a2/" title="Reblog this post [with Zemanta]"><img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_b.png?x-id=0d646b2a-72fb-4168-a577-4de185ccd5a2" alt="Reblog this post [with Zemanta]"></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=EXScSze48BQ:7JvvJDGXIq4:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=EXScSze48BQ:7JvvJDGXIq4:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=EXScSze48BQ:7JvvJDGXIq4:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=EXScSze48BQ:7JvvJDGXIq4:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=EXScSze48BQ:7JvvJDGXIq4:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=EXScSze48BQ:7JvvJDGXIq4:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=EXScSze48BQ:7JvvJDGXIq4:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=EXScSze48BQ:7JvvJDGXIq4:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=EXScSze48BQ:7JvvJDGXIq4:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=EXScSze48BQ:7JvvJDGXIq4:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=EXScSze48BQ:7JvvJDGXIq4:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=EXScSze48BQ:7JvvJDGXIq4:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=EXScSze48BQ:7JvvJDGXIq4:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/EXScSze48BQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/blog/2009/06/05/when-whole-genome-sequencing-becomes-passe/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/blog/2009/06/05/when-whole-genome-sequencing-becomes-passe/</feedburner:origLink></item>
	<media:credit role="author">Deepak Singh</media:credit><media:rating>nonadult</media:rating></channel>
</rss><!-- Dynamic page generated in 1.108 seconds. --><!-- Cached page generated by WP-Super-Cache on 2009-07-01 19:22:34 --><!-- Compression = gzip -->
