<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>business|bytes|genes|molecules</title>
	
	<link>http://mndoci.com</link>
	<description>ruminations on the social and commercial potential of biotech, nanotech, infotech and computing</description>
	<lastBuildDate>Sun, 12 Jun 2011 06:45:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/mndoci" /><feedburner:info uri="mndoci" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><media:thumbnail url="http://mndoci.com/blog/images/hiv.jpg" /><geo:lat>47.618371</geo:lat><geo:long>-122.203083</geo:long><creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</creativeCommons:license><feedburner:emailServiceId>mndoci</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><feedburner:browserFriendly>This is an XML content feed. It is intended to be viewed in a newsreader or syndicated to another site, subject to copyright and fair use.</feedburner:browserFriendly><item>
		<title>Startups, tenure and real-world systems</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/YfHJ5LxKF7I/</link>
		<comments>http://mndoci.com/2011/06/11/startups-tenure-and-real-world-systems/#comments</comments>
		<pubDate>Sun, 12 Jun 2011 06:44:50 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Careers]]></category>
		<category><![CDATA[Computing]]></category>

		<guid isPermaLink="false">http://mndoci.com/?p=2680</guid>
		<description><![CDATA[Earlier today I visited DBMS Musings to read Daniel Abadi&#8217;s blog post on peer review, but ended up re-reading an older post. In the post Daniel talks about why he is doing a startup pre-tenure. I am not an academic, neither have I ever started a company, although I&#8217;ve been in both environments, so I [...]]]></description>
			<content:encoded><![CDATA[<div class="tw_button" style=";float:right;margin-left:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Fmndoci.com%2F2011%2F06%2F11%2Fstartups-tenure-and-real-world-systems%2F&amp;via=mndoci&amp;text=Startups%2C%20tenure%20and%20real-world%20systems&amp;related=mza:Clouds%2C+Genomics%2C+Big+Data+and+Rails&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Fmndoci.com%2F2011%2F06%2F11%2Fstartups-tenure-and-real-world-systems%2F" class="twitter-share-button" id="tweetbutton2680" style="width:55px;height:22px;background:transparent url('http://mndoci.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>	<p>Earlier today I visited <em>DBMS Musings</em> to read Daniel Abadi&#8217;s <a href="http://dbmsmusings.blogspot.com/2011/05/why-sam-madden-is-wrong-about-peer.html">blog post on peer review</a>, but ended up re-reading an older post. In the post Daniel talks about <a href="http://dbmsmusings.blogspot.com/2011/03/why-im-doing-start-up-pre-tenure.html">why he is doing a startup pre-tenure</a>. I am not an academic, neither have I ever started a company, although I&#8217;ve been in both environments, so I can&#8217;t quite speak from experience here, but the post caught my attention at multiple levels. What makes this interesting is that Daniel has spent time on getting his company off the ground himself, with active involvement, and the resulting leave of absence limits his ability to do things junior faculty are expected to do, like publish. He admits that there is little precedent and that it&#8217;s a gamble. So why go on?</p>
	<p>Admittedly, this is computer science, but the key reason for doing so is to have the research make an impact and be used in real systems. It&#8217;s one reason that I&#8217;ve actually not been a fan of informatics papers. The best way to evaluate a data management system is to use it. The best metric is happy users. The example he gives is the Google Bigtable paper which has had a lot of impact, but other, similar, academic papers have languished. The reason; it had been proven in a real world system. He presents three choices for the kind of research he does</p>
	<ul>
	<li> You can use the resources in your lab to build a complete prototype of your idea. The challenge is that building a complete prototype is a significantly harder task than publishing the prototype.</li>
	<li>You can leave academia and go work at a company that values that kind of work, which is an option many choose, but he likes academia.</li>
	<li>Do a start up. To a degree he likens it to academic research, except that you get paid by angels and VCs. There are many other challenges with startups and it is non-trivial. It is also a very competitive space.</li>
	</ul>
	<p>So why the startup? Why does he think it&#8217;s the only way for his research to make an impact? I think to some extent it comes from having seen his graduate research commercialized successfully (not one but two projects), and being put to use at scale. Reading the post and the comments, and understanding the value of computing for modern biology, I can&#8217;t help but think that there is the possibility of doing something similar, perhaps in network biology, or something similar. Let me pose that question then. Is there any research in the life sciences that fits in here, especially for a pre-tenure prof? Do we value real systems? I still bristle at all the duplicated research and poor design of so many biological resources and publication for the sake of publishing. Thoughts?
</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=YfHJ5LxKF7I:x_fd6n3cSsg:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=YfHJ5LxKF7I:x_fd6n3cSsg:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=YfHJ5LxKF7I:x_fd6n3cSsg:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=YfHJ5LxKF7I:x_fd6n3cSsg:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=YfHJ5LxKF7I:x_fd6n3cSsg:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=YfHJ5LxKF7I:x_fd6n3cSsg:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=YfHJ5LxKF7I:x_fd6n3cSsg:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=YfHJ5LxKF7I:x_fd6n3cSsg:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=YfHJ5LxKF7I:x_fd6n3cSsg:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=YfHJ5LxKF7I:x_fd6n3cSsg:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=YfHJ5LxKF7I:x_fd6n3cSsg:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=YfHJ5LxKF7I:x_fd6n3cSsg:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=YfHJ5LxKF7I:x_fd6n3cSsg:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/YfHJ5LxKF7I" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/2011/06/11/startups-tenure-and-real-world-systems/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/2011/06/11/startups-tenure-and-real-world-systems/</feedburner:origLink></item>
		<item>
		<title>If you have too much data, then “good enough” is good enough</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/UPd_4dJzp1g/</link>
		<comments>http://mndoci.com/2011/06/06/if-you-have-too-much-data-then-good-enough-is-good-enough/#comments</comments>
		<pubDate>Mon, 06 Jun 2011 19:16:36 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[Informatics]]></category>
		<category><![CDATA[Software & Internet]]></category>

		<guid isPermaLink="false">http://mndoci.com/?p=2674</guid>
		<description><![CDATA[I would suggest that all my friends in the world of bioinformatics read this fabulous article by Pat Helland. Pat&#8217;s on of the leading experts in distributed transactions and knows more about databases than most of us put together. His ACM article goes into some the tradeoffs and changes in mindset that need to me [...]]]></description>
			<content:encoded><![CDATA[<div class="tw_button" style=";float:right;margin-left:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Fmndoci.com%2F2011%2F06%2F06%2Fif-you-have-too-much-data-then-good-enough-is-good-enough%2F&amp;via=mndoci&amp;text=If%20you%20have%20too%20much%20data%2C%20then%20%22good%20enough%22%20is%20good%20enough&amp;related=mza:Clouds%2C+Genomics%2C+Big+Data+and+Rails&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Fmndoci.com%2F2011%2F06%2F06%2Fif-you-have-too-much-data-then-good-enough-is-good-enough%2F" class="twitter-share-button" id="tweetbutton2674" style="width:55px;height:22px;background:transparent url('http://mndoci.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>	<p><img src="https://img.skitch.com/20110606-txk7wmsuw1i3s13a8i3iai5ufs.jpg" /></p>
	<p>I would suggest that all my friends in the world of bioinformatics read this fabulous <a href="http://queue.acm.org/detail.cfm?id=1988603">article</a> by <a href="http://blogs.msdn.com/b/pathelland/">Pat Helland</a>.  Pat&#8217;s on of the leading experts in distributed transactions and knows more about databases than most of us put together.  His ACM article goes into some the tradeoffs and changes in mindset that need to me made when working with data that changes and comes from different sources, and all so o ften has ambiguity associated with it.  It also tells you a little but about the differences in SQL and NoSQL systems when it comes to transaction semantics and in a way that meets complete sense.  </p>
	<p>Perhaps the most interesting part of the article was the section on &#8220;Mulligan stew&#8221; where we also provides the example of building a heterogeneous catalog.  A product catalog might be very far from a biological data source, a integrated genomics database, etc, but not quite so fast.  You have data from disparate sources, the data quality is not consistent and you might have a lot of duplicates.  That is a set of problems most people building biological information sources, or leveraging a bunch of them for their own needs, face all the time.  In all these cases we are trying to find a tenuous identity (probably less so in some biological domains) and providing information related to it.  At the first company I worked at, this was one of the core roles in the database schema and identifying the master identity was likely a critical part of the process.  Unfortunately, while those days were the beginning of my love for data systems, my primary interest was statistical potentials and simulation, so never dug deep.  A product description could be considered analogous to the information about a gene or protein.  In biology inference engines can be powerful tools to reconcile different entities, and various forms are in use today but we are probably not thinking about them as tools to gather add atonal information and metadata. Pat talks about provenance as well, another thing that scientists care and worry about.</p>
	<p>The reason I talk about all this is that life scientists are often quick to dismiss the data systems that have come out of the Internet, but fundamentally data is data and the folks from the Internet side of things have been hard at this problem, have built very robust solutions and can teach us a lot, and we should learn.  This becomes more important with the ever growing number of people and projects depending on multiple online data sources.  The core areas of discussion in this paper are very pertinent to what we do and should make all of us think about NoSQL and distributed systems in general.</p>
	<p><em>Interesting anecdote</em>.  First lengthy blog post written on the iPad using Papers2 and Simplenote</p>
	<p><em>Image Credit</em>: <a href="http://www.flickr.com/photos/erikbenson/3835146379/">Erik Benson</a></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=UPd_4dJzp1g:lseMtTPoGrQ:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=UPd_4dJzp1g:lseMtTPoGrQ:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=UPd_4dJzp1g:lseMtTPoGrQ:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=UPd_4dJzp1g:lseMtTPoGrQ:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=UPd_4dJzp1g:lseMtTPoGrQ:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=UPd_4dJzp1g:lseMtTPoGrQ:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=UPd_4dJzp1g:lseMtTPoGrQ:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=UPd_4dJzp1g:lseMtTPoGrQ:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=UPd_4dJzp1g:lseMtTPoGrQ:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=UPd_4dJzp1g:lseMtTPoGrQ:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=UPd_4dJzp1g:lseMtTPoGrQ:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=UPd_4dJzp1g:lseMtTPoGrQ:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=UPd_4dJzp1g:lseMtTPoGrQ:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/UPd_4dJzp1g" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/2011/06/06/if-you-have-too-much-data-then-good-enough-is-good-enough/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/2011/06/06/if-you-have-too-much-data-then-good-enough-is-good-enough/</feedburner:origLink></item>
		<item>
		<title>Computing and abstractions revisited</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/DCMLe-cIKq4/</link>
		<comments>http://mndoci.com/2011/05/29/computing-and-abstractions-revisited/#comments</comments>
		<pubDate>Sun, 29 May 2011 17:05:22 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Computing]]></category>

		<guid isPermaLink="false">http://mndoci.com/?p=2670</guid>
		<description><![CDATA[Computing is critical. Computing is not important. Two seeming contradictory statements, but I&#8217;ll point to Google. Their success is built on computing, lots of it. The end user doesn&#8217;t care. They get information back. They don&#8217;t have to set up clusters and queries. They just search. While that&#8217;s a gross oversimplication for science, there are [...]]]></description>
			<content:encoded><![CDATA[<div class="tw_button" style=";float:right;margin-left:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Fmndoci.com%2F2011%2F05%2F29%2Fcomputing-and-abstractions-revisited%2F&amp;via=mndoci&amp;text=Computing%20and%20abstractions%20revisited&amp;related=mza:Clouds%2C+Genomics%2C+Big+Data+and+Rails&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Fmndoci.com%2F2011%2F05%2F29%2Fcomputing-and-abstractions-revisited%2F" class="twitter-share-button" id="tweetbutton2670" style="width:55px;height:22px;background:transparent url('http://mndoci.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>	<p>Computing is critical. </p>
	<p>Computing is not important.</p>
	<p>Two seeming contradictory statements, but I&#8217;ll point to Google.  Their success is built on computing, lots of it.  The end user doesn&#8217;t care.  They get information back.  They don&#8217;t have to set up clusters and queries.  They just search.  While that&#8217;s a gross oversimplication for science, there are aspects of computing productivity that just don&#8217;t make sense anymore.  I was reminded of this point by a question on Biostar about <a href="http://biostar.stackexchange.com/questions/8641/setting-up-a-mpi-environment" title="Setting up a MPI environment - BioStar">setting up MPI</a>.  There are different level of skill sets required and I might be one of the first to argue that a computational scientist should understand their tool, the computer.  But at what level?  In the end I love abstractions and productivity.  The more you have to spend time setting stuff up, the less time you have for doing useful things.  This is something that I&#8217;ve talked about before [<a href="#1">1</a>, <a href="#2">2</a>].  Note that I am not arguing for complete dumbing down, although there is value there as well, depending on whom you are targeting, but really about enabling developer productivity, and yes I do think of informaticians in particular as being developers.  That&#8217;s why I get excited about things like the folks at Bioteam mashing up <a href="http://blog.bioteam.net/2011/03/dude-you-got-some-chef-in-my-starcluster/" title="Dude, you got some Chef in my StarCluster!  : BioTeam Blog">Chef and Starcluster</a> and tools like <a href="https://github.com/infochimps/cluster_chef/tree/version_2">Clusterchef</a>, and why I would like to try and do more with <a href="https://github.com/datawrangling/ec2cluster">ec2cluster</a> (which I&#8217;d love to mash up with Clusterchef to remove any AMI dependencies).</p>
	<p>There are a lot of smart devs out there and the number of tools that abstract computational complexity and fit within developer workflow are only increasing, so while we aren&#8217;t necessarily where we want to be, we are getting a lot closer.</p>
	<h3>References</h3>
	<p><a name="1">1</a>. <a href="http://mndoci.com/2009/05/08/hpc-and-abstractions/" title="HPC and abstractions">HPC and abstractions</a><br />
<a name="2">2</a>. <a href="http://mndoci.com/2010/08/29/dsls-for-high-performance-computing/" title="DSLs for High Performance Computing">DSLs for High Performance Computing</a></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=DCMLe-cIKq4:-OMreMY7jw0:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=DCMLe-cIKq4:-OMreMY7jw0:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=DCMLe-cIKq4:-OMreMY7jw0:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=DCMLe-cIKq4:-OMreMY7jw0:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=DCMLe-cIKq4:-OMreMY7jw0:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=DCMLe-cIKq4:-OMreMY7jw0:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=DCMLe-cIKq4:-OMreMY7jw0:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=DCMLe-cIKq4:-OMreMY7jw0:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=DCMLe-cIKq4:-OMreMY7jw0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=DCMLe-cIKq4:-OMreMY7jw0:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=DCMLe-cIKq4:-OMreMY7jw0:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=DCMLe-cIKq4:-OMreMY7jw0:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=DCMLe-cIKq4:-OMreMY7jw0:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/DCMLe-cIKq4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/2011/05/29/computing-and-abstractions-revisited/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/2011/05/29/computing-and-abstractions-revisited/</feedburner:origLink></item>
		<item>
		<title>Data, software, and money</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/NumKmYkZXMY/</link>
		<comments>http://mndoci.com/2011/05/28/data-software-and-money/#comments</comments>
		<pubDate>Sun, 29 May 2011 00:38:14 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Industry Watching]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Software & Internet]]></category>

		<guid isPermaLink="false">http://mndoci.com/?p=2666</guid>
		<description><![CDATA[Steve O&#8217;Grady has written a blog post about a recent talk he gave at OSBC. In the post he welcomes the Age of Data. The talk covers two topics of great interest, software and data. In the context of the life sciences I have worked on both the &#8220;data as a product&#8221; side and on [...]]]></description>
			<content:encoded><![CDATA[<div class="tw_button" style=";float:right;margin-left:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Fmndoci.com%2F2011%2F05%2F28%2Fdata-software-and-money%2F&amp;via=mndoci&amp;text=Data%2C%20software%2C%20and%20money&amp;related=mza:Clouds%2C+Genomics%2C+Big+Data+and+Rails&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Fmndoci.com%2F2011%2F05%2F28%2Fdata-software-and-money%2F" class="twitter-share-button" id="tweetbutton2666" style="width:55px;height:22px;background:transparent url('http://mndoci.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>	<p><img src="http://farm4.static.flickr.com/3210/3035489052_7a57df634d_m.jpg" alt="money" /><br />
Steve O&#8217;Grady has written a blog post about a recent talk he gave at OSBC.  In the post he welcomes the <em><a title="Welcome to the Age of Data: My OSBC Talk – tecosystems" href="http://redmonk.com/sogrady/2011/05/24/the-age-of-data/">Age of Data</a></em>.  The talk covers two topics of great interest, software and data.  In the context of the life sciences I have worked on both the &#8220;data as a product&#8221; side and on the packaged software side.  He notes that none of the top &#8220;software&#8221; companies in the world are of recent vintage.  These are companies making money from selling software (a really difficult business in the sciences).  He argues that data driven products is where the market is.  The success of Google and others is a testament to this, but in the sciences the entire model of data as product has never worked.  I would argue that this is partly cause we&#8217;ve always sold the data itself rather than provide value on top of the data, one reason why <a href="http://nextbio.com">NextBio</a> has always been on my radar.</p>
	<p>I disagree with Stephen on the data itself having value, and think about Google as a pure data driven software company, where the software framework exists to bring value to the end user from the data.  Twitter and Facebook are not going to be successful just because of the data.  Having said that his overall point still has value.  Open Source enables data and data production.  How can we derive value from the data and is there a commercial market for it in the sciences? The part of the talk that I wanted to take away was this.  A lot of science and data production is driven by open source and open data, but are there systems of value that we can build that help science while building value?  It&#8217;s hard, and you can argue that it&#8217;s not the correct direction, but in the end, I also believe that financial value begets quality and we do not have the best and the brightest working on software and data problems because there is no money to be made.</p>
	<div style="width:425px" id="__ss_8001520"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/sogrady/open-source-big-data-big-money-8001520" title="Open Source + Big Data = Big Money ">Open Source + Big Data = Big Money </a></strong> <iframe src="http://www.slideshare.net/slideshow/embed_code/8001520" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe><br />
<div style="padding:5px 0 12px"> View more presentations from <a href="http://www.slideshare.net/sogrady">sogrady</a> </div>
 </div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=NumKmYkZXMY:0mYd_JOeDYM:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=NumKmYkZXMY:0mYd_JOeDYM:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=NumKmYkZXMY:0mYd_JOeDYM:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=NumKmYkZXMY:0mYd_JOeDYM:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=NumKmYkZXMY:0mYd_JOeDYM:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=NumKmYkZXMY:0mYd_JOeDYM:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=NumKmYkZXMY:0mYd_JOeDYM:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=NumKmYkZXMY:0mYd_JOeDYM:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=NumKmYkZXMY:0mYd_JOeDYM:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=NumKmYkZXMY:0mYd_JOeDYM:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=NumKmYkZXMY:0mYd_JOeDYM:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=NumKmYkZXMY:0mYd_JOeDYM:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=NumKmYkZXMY:0mYd_JOeDYM:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/NumKmYkZXMY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/2011/05/28/data-software-and-money/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://mndoci.com/2011/05/28/data-software-and-money/</feedburner:origLink></item>
		<item>
		<title>Changes</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/1Hd9KipioOo/</link>
		<comments>http://mndoci.com/2011/05/05/changes/#comments</comments>
		<pubDate>Fri, 06 May 2011 03:02:36 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Admin]]></category>
		<category><![CDATA[Self]]></category>

		<guid isPermaLink="false">http://mndoci.com/?p=2654</guid>
		<description><![CDATA[As some you have probably noticed via a prior tweet and a LinkedIn profile change, I have a new role at AWS, getting back to my product management roots. This gives me a chance to get up close and personal with infrastructure, thinking about the types of applications and workloads people are running, high performance [...]]]></description>
			<content:encoded><![CDATA[<div class="tw_button" style=";float:right;margin-left:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Fmndoci.com%2F2011%2F05%2F05%2Fchanges%2F&amp;via=mndoci&amp;text=Changes&amp;related=mza:Clouds%2C+Genomics%2C+Big+Data+and+Rails&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Fmndoci.com%2F2011%2F05%2F05%2Fchanges%2F" class="twitter-share-button" id="tweetbutton2654" style="width:55px;height:22px;background:transparent url('http://mndoci.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>	<p><iframe width="425" height="349" src="http://www.youtube.com/embed/v0t-xvbfX1c" frameborder="0" allowfullscreen></iframe></p>
	<p>As some you have probably noticed via a prior tweet and a LinkedIn profile change, I have a new role at AWS, getting back to my product management roots.  This gives me a chance to get up close and personal with infrastructure, thinking about the types of applications and workloads people are running, high performance computing, etc.  I love product, and it&#8217;s good to be back.  Will definitely mean a reduction in travel and speaking engagements, but mostly result in being more picky about those.  Maybe this means that <a href="http://c2cbio.com">c2cbio</a> will make a semi-comeback.  We&#8217;ll see.   In the meantime, I am excited.</p>
	<p><img src="http://img.skitch.com/20110503-qjk6327i5h4m2wcxcp5kix28ed.jpg" alt="Deepak Singh | LinkedIn" /></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=1Hd9KipioOo:7ms8JKI-cKc:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=1Hd9KipioOo:7ms8JKI-cKc:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=1Hd9KipioOo:7ms8JKI-cKc:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=1Hd9KipioOo:7ms8JKI-cKc:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=1Hd9KipioOo:7ms8JKI-cKc:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=1Hd9KipioOo:7ms8JKI-cKc:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=1Hd9KipioOo:7ms8JKI-cKc:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=1Hd9KipioOo:7ms8JKI-cKc:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=1Hd9KipioOo:7ms8JKI-cKc:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=1Hd9KipioOo:7ms8JKI-cKc:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=1Hd9KipioOo:7ms8JKI-cKc:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=1Hd9KipioOo:7ms8JKI-cKc:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=1Hd9KipioOo:7ms8JKI-cKc:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/1Hd9KipioOo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/2011/05/05/changes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/2011/05/05/changes/</feedburner:origLink></item>
		<item>
		<title>Getting your grep on</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/hmgKtaIscWE/</link>
		<comments>http://mndoci.com/2011/04/17/getting-your-grep-on/#comments</comments>
		<pubDate>Sun, 17 Apr 2011 17:42:34 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Informatics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software & Internet]]></category>

		<guid isPermaLink="false">http://mndoci.com/?p=2652</guid>
		<description><![CDATA[In the life science software world, we are always trying to make software &#8220;easier to use by biologists&#8221;. Over the years, I&#8217;ve decided that means that the user interface and level of functionality should enable a biologist to ask the right questions and the workflow should fit their way of thinking and interrogation. Unfortunately, too [...]]]></description>
			<content:encoded><![CDATA[<div class="tw_button" style=";float:right;margin-left:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Fmndoci.com%2F2011%2F04%2F17%2Fgetting-your-grep-on%2F&amp;via=mndoci&amp;text=Getting%20your%20grep%20on&amp;related=mza:Clouds%2C+Genomics%2C+Big+Data+and+Rails&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Fmndoci.com%2F2011%2F04%2F17%2Fgetting-your-grep-on%2F" class="twitter-share-button" id="tweetbutton2652" style="width:55px;height:22px;background:transparent url('http://mndoci.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>	<p>In the life science software world, we are always trying to make software &#8220;easier to use by biologists&#8221;.  Over the years, I&#8217;ve decided that means that the user interface and level of functionality should enable a biologist to ask the right questions and the workflow should fit their way of thinking and interrogation.  Unfortunately, too often that ends up being interpreted as &#8220;let us make some form of GUI which is easy to use&#8221; without really thinking through that.  At the same time, to do any sort of in depth analysis and drill downs you need a certain level of skill which no amount of &#8220;let&#8217;s make it easier to use&#8221; can solve.  Which is why <a href="http://twitter.com/mikeloukides" title="">Mike Loukides&#8217;</a> post on <em><a href="http://radar.oreilly.com/2011/04/data-hand-tools.html" title="Data hand tools - O'Reilly Radar">Data Hand Tools</a></em> made me smile.  The post highlights some of the simple tools that make our lives easier in ways that a UI rarely can.  Does everyone need to know how to use them.  Some would argue that they are not required, but I would encourage most biologists to get comfortable with some of those hand tools.  I can&#8217;t count the number of times <em>grep</em>, <em>sort</em>, etc have come to my rescue.  So throw away the mouse and drop down menus for a bit and have some command line fun.
</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=hmgKtaIscWE:gXw8KYhZTGA:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=hmgKtaIscWE:gXw8KYhZTGA:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=hmgKtaIscWE:gXw8KYhZTGA:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=hmgKtaIscWE:gXw8KYhZTGA:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=hmgKtaIscWE:gXw8KYhZTGA:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=hmgKtaIscWE:gXw8KYhZTGA:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=hmgKtaIscWE:gXw8KYhZTGA:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=hmgKtaIscWE:gXw8KYhZTGA:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=hmgKtaIscWE:gXw8KYhZTGA:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=hmgKtaIscWE:gXw8KYhZTGA:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=hmgKtaIscWE:gXw8KYhZTGA:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=hmgKtaIscWE:gXw8KYhZTGA:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=hmgKtaIscWE:gXw8KYhZTGA:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/hmgKtaIscWE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/2011/04/17/getting-your-grep-on/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/2011/04/17/getting-your-grep-on/</feedburner:origLink></item>
		<item>
		<title>Michael Nielsen goes on tour</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/cBCPuytnkEA/</link>
		<comments>http://mndoci.com/2011/04/16/michael-nielsen-goes-on-tour/#comments</comments>
		<pubDate>Sun, 17 Apr 2011 02:24:55 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Open Science]]></category>

		<guid isPermaLink="false">http://mndoci.com/?p=2649</guid>
		<description><![CDATA[Michael Nielsen is on tour. For those who aren&#8217;t familiar with Michael&#8217;s thinking on open science and modern scientific publishing and communication, please read his blog and writings. There are few who can articulate the need open science as well, and back it up with hard data to boot. When they come with Michael&#8217;s scientific [...]]]></description>
			<content:encoded><![CDATA[<div class="tw_button" style=";float:right;margin-left:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Fmndoci.com%2F2011%2F04%2F16%2Fmichael-nielsen-goes-on-tour%2F&amp;via=mndoci&amp;text=Michael%20Nielsen%20goes%20on%20tour&amp;related=mza:Clouds%2C+Genomics%2C+Big+Data+and+Rails&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Fmndoci.com%2F2011%2F04%2F16%2Fmichael-nielsen-goes-on-tour%2F" class="twitter-share-button" id="tweetbutton2649" style="width:55px;height:22px;background:transparent url('http://mndoci.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>	<p>Michael Nielsen <a href="http://michaelnielsen.org/blog/speaking-about-open-science/" title="Speaking about open science | Michael Nielsen">is on tour</a>.  For those who aren&#8217;t familiar with Michael&#8217;s thinking on open science and modern scientific publishing and communication, please read his blog and writings.  There are few who can articulate the need open science as well, and back it up with hard data to boot.  When they come with Michael&#8217;s scientific pedigree, it makes those arguments even more impressive.  For more on what you can expect if Michael comes to your town, check out his recent TEDx Waterloo talk</p>
	<p><iframe title="YouTube video player" width="500" height="311" src="http://www.youtube.com/embed/DnWocYKqvhw" frameborder="0" allowfullscreen></iframe></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=cBCPuytnkEA:-fYfAD4GuE0:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=cBCPuytnkEA:-fYfAD4GuE0:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=cBCPuytnkEA:-fYfAD4GuE0:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=cBCPuytnkEA:-fYfAD4GuE0:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=cBCPuytnkEA:-fYfAD4GuE0:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=cBCPuytnkEA:-fYfAD4GuE0:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=cBCPuytnkEA:-fYfAD4GuE0:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=cBCPuytnkEA:-fYfAD4GuE0:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=cBCPuytnkEA:-fYfAD4GuE0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=cBCPuytnkEA:-fYfAD4GuE0:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=cBCPuytnkEA:-fYfAD4GuE0:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=cBCPuytnkEA:-fYfAD4GuE0:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=cBCPuytnkEA:-fYfAD4GuE0:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/cBCPuytnkEA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/2011/04/16/michael-nielsen-goes-on-tour/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://mndoci.com/2011/04/16/michael-nielsen-goes-on-tour/</feedburner:origLink></item>
		<item>
		<title>The data is the question</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/rSNtgf1kFnE/</link>
		<comments>http://mndoci.com/2011/04/05/the-data-is-the-question/#comments</comments>
		<pubDate>Tue, 05 Apr 2011 13:54:59 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[Informatics]]></category>

		<guid isPermaLink="false">http://mndoci.com/?p=2645</guid>
		<description><![CDATA[I have long channeled Jeff Jonas and his ideas around on data finds data. His recent blog post on the data being the query extends some of those thoughts. I find this trend fascinating, although I favor the just in time data approach, since not all information needs to be acted upon instantly, but the [...]]]></description>
			<content:encoded><![CDATA[<div class="tw_button" style=";float:right;margin-left:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Fmndoci.com%2F2011%2F04%2F05%2Fthe-data-is-the-question%2F&amp;via=mndoci&amp;text=The%20data%20is%20the%20question&amp;related=mza:Clouds%2C+Genomics%2C+Big+Data+and+Rails&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Fmndoci.com%2F2011%2F04%2F05%2Fthe-data-is-the-question%2F" class="twitter-share-button" id="tweetbutton2645" style="width:55px;height:22px;background:transparent url('http://mndoci.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>	<p><img src="http://farm1.static.flickr.com/139/327122302_bbc4a3935b.jpg" width="500" height="375" alt="Question mark in Esbjerg"></p>
	<p>I have long channeled <a href="http://jeffjonas.typepad.com/" title="Jeff Jonas">Jeff Jonas</a> and his ideas around on <a href="http://jeffjonas.typepad.com/jeff_jonas/2009/07/data-finds-data.html" title="Jeff Jonas: Data Finds Data">data finds data</a>. His recent blog post on the <a href="http://jeffjonas.typepad.com/jeff_jonas/2011/04/the-data-is-the-query.html" title="Jeff Jonas: The Data is the Query">data being the query</a> extends some of those thoughts.  I find this trend fascinating, although I favor the <a href="http://mndoci.com/2010/03/27/smarter-means-being-just-in-time/" title="Smarter means being &#8220;just in time&#8221;">just in time data</a> approach, since not all information needs to be acted upon instantly, but the broader point holds.  I had a similar discussion with <a href="http://en.wikipedia.org/wiki/Richard_Durbin_(scientist)" title="Richard Durbin (scientist) - Wikipedia, the free encyclopedia">Richard Durbin</a> recently around data first science, where we discussed collecting data and then querying it to generate hypotheses and to see how the new data impacted existing knowledge.</p>
	<p>It&#8217;s going to be interesting how today&#8217;s life science data systems evolve.  The data-driven approach which I talk about a lot is one that is essential for modern biological research (saw a great talk on this by <a href="http://twitter.com/jdudley" title="">Joel Dudley</a> recently); using data to drive hypothesis development, to enhance existing knowledge and doing some deep data exploration.  In some cases, e.g. adverse event detection, some of the real time systems that Jeff talks about will have a critical role to play, and in other key medical situations. In both cases, the data is the question and like generates even more questions, a path to follow.  Will we be able to develop the systems that know how to handle the questions and harness the answers to get to some meaningful scientific discoveries and understanding.</p>
	<p>Image credit: <a href="http://www.flickr.com/photos/drachmann/327122302/" title="Question mark in Esbjerg by alexanderdrachmann, on Flickr">alexanderdrachmann</a></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=rSNtgf1kFnE:bHDW4_ZTSWU:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=rSNtgf1kFnE:bHDW4_ZTSWU:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=rSNtgf1kFnE:bHDW4_ZTSWU:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=rSNtgf1kFnE:bHDW4_ZTSWU:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=rSNtgf1kFnE:bHDW4_ZTSWU:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=rSNtgf1kFnE:bHDW4_ZTSWU:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=rSNtgf1kFnE:bHDW4_ZTSWU:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=rSNtgf1kFnE:bHDW4_ZTSWU:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=rSNtgf1kFnE:bHDW4_ZTSWU:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=rSNtgf1kFnE:bHDW4_ZTSWU:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=rSNtgf1kFnE:bHDW4_ZTSWU:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=rSNtgf1kFnE:bHDW4_ZTSWU:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=rSNtgf1kFnE:bHDW4_ZTSWU:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/rSNtgf1kFnE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/2011/04/05/the-data-is-the-question/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://mndoci.com/2011/04/05/the-data-is-the-question/</feedburner:origLink></item>
		<item>
		<title>Something to ponder</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/X92wTUvET3w/</link>
		<comments>http://mndoci.com/2011/03/29/something-to-ponder/#comments</comments>
		<pubDate>Tue, 29 Mar 2011 07:03:03 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[BioIT]]></category>
		<category><![CDATA[Business]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[Software & Internet]]></category>

		<guid isPermaLink="false">http://mndoci.com/?p=2642</guid>
		<description><![CDATA[The scale of modern life science research, where scale is not just about data volume, but also about rate of change, number of users, geographic scale, etc means that resources have to look at how they provide services differently and, more importantly, funding agencies and philanthropists have to decide where to draw the line. Is [...]]]></description>
			<content:encoded><![CDATA[<div class="tw_button" style=";float:right;margin-left:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Fmndoci.com%2F2011%2F03%2F29%2Fsomething-to-ponder%2F&amp;via=mndoci&amp;text=Something%20to%20ponder&amp;related=mza:Clouds%2C+Genomics%2C+Big+Data+and+Rails&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Fmndoci.com%2F2011%2F03%2F29%2Fsomething-to-ponder%2F" class="twitter-share-button" id="tweetbutton2642" style="width:55px;height:22px;background:transparent url('http://mndoci.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>	<p>The scale of modern life science research, where scale is not just about data volume, but also about rate of change, number of users, geographic scale, etc means that resources have to look at how they provide services differently and, more importantly, funding agencies and philanthropists have to decide where to draw the line.  Is this an opportunity for commercial efforts?  Is the market ready to do this, or are they willing to live with overall inefficiencies and limitations?  Is there a tiered model that would be acceptable.</p>
	<p>Recent discussions and observations of what various companies and orgs are doing leads me to believe that we need to really think hard about overall efficiencies and consider the value of time.  More later
</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=X92wTUvET3w:-_F3XaRqYlw:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=X92wTUvET3w:-_F3XaRqYlw:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=X92wTUvET3w:-_F3XaRqYlw:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=X92wTUvET3w:-_F3XaRqYlw:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=X92wTUvET3w:-_F3XaRqYlw:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=X92wTUvET3w:-_F3XaRqYlw:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=X92wTUvET3w:-_F3XaRqYlw:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=X92wTUvET3w:-_F3XaRqYlw:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=X92wTUvET3w:-_F3XaRqYlw:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=X92wTUvET3w:-_F3XaRqYlw:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=X92wTUvET3w:-_F3XaRqYlw:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=X92wTUvET3w:-_F3XaRqYlw:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=X92wTUvET3w:-_F3XaRqYlw:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/X92wTUvET3w" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/2011/03/29/something-to-ponder/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/2011/03/29/something-to-ponder/</feedburner:origLink></item>
		<item>
		<title>Practical machine learning and scaling data platforms</title>
		<link>http://feedproxy.google.com/~r/mndoci/~3/3JkrNEoA2Ks/</link>
		<comments>http://mndoci.com/2011/03/05/practical-machine-learning-and-scaling-data-platforms/#comments</comments>
		<pubDate>Sat, 05 Mar 2011 19:38:59 +0000</pubDate>
		<dc:creator>Deepak Singh</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[Informatics]]></category>
		<category><![CDATA[Omics]]></category>

		<guid isPermaLink="false">http://mndoci.com/?p=2639</guid>
		<description><![CDATA[A couple of great posts on the Metamarkets blog recently that might be of relevance to the bioinformatics crowd. The first one, by Mike Driscoll, talks about lessons for building a petabyte data platform. Their four guiding principles Experiment often, fail fast Keep things simple to scale well Keep things modular to accommodate change Avoid [...]]]></description>
			<content:encoded><![CDATA[<div class="tw_button" style=";float:right;margin-left:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Fmndoci.com%2F2011%2F03%2F05%2Fpractical-machine-learning-and-scaling-data-platforms%2F&amp;via=mndoci&amp;text=Practical%20machine%20learning%20and%20scaling%20data%20platforms&amp;related=mza:Clouds%2C+Genomics%2C+Big+Data+and+Rails&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Fmndoci.com%2F2011%2F03%2F05%2Fpractical-machine-learning-and-scaling-data-platforms%2F" class="twitter-share-button" id="tweetbutton2639" style="width:55px;height:22px;background:transparent url('http://mndoci.com/blog/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>	<p>A couple of great posts on the <a href="http://metamarketsgroup.com/blog">Metamarkets blog</a> recently that might be of relevance to the bioinformatics crowd.  The first one, by <a href="http://www.dataspora.com/blog/" title="Dataspora Blog">Mike Driscoll</a>, talks about <a href="http://metamarketsgroup.com/blog/four-lessons-for-a-petabyte-platform/" title="Metamarkets Blog  &raquo; Blog Archive   &raquo; Four Lessons for Building A Petabyte Data Platform">lessons for building a petabyte data platform</a>.  Their four guiding principles</p>
	<ul>
	<li>Experiment often, fail fast</li>
	<li>Keep things simple to scale well</li>
	<li>Keep things modular to accommodate change</li>
	<li>Avoid undifferentiated heavy lifting</li>
	</ul>
	<p>I still feel that the data systems we have in the life science domain aren&#8217;t doing enough to learn good lessons from the web world, which is embracing change, complexity and scale, and even small teams, like the one at Metamarkets is able to do a lot with less, due to the kinds of principles mentioned in the post.  One of the problems I see in informatics is a lack of appreciation for some of the skills required to build and manage these stacks and the availability of cheap labor.  I return to my post on <a href="http://mndoci.com/2011/02/13/abundance/" title="Abundance">abundance</a>.  It&#8217;s why I think our approach to training researchers and how we value their time and skills needs to be re-evaluated.  We are stuck in a rut of inefficiency because of it.</p>
	<p>The second post comes from <a href="http://www.cs.utexas.edu/~joeraii/" title="joseph reisinger">Joe Reisinger</a>. He writes about machine learning, specifically <a href="http://metamarketsgroup.com/blog/machine-learning-in-wonderland/" title="Metamarkets Blog  &raquo; Blog Archive   &raquo; Why Generic Machine Learning Fails">models for practical machine learning</a>.  I think his key is that machine learning is not a commodity and should remain a core strength and include humans in the loop.  This is the core of data science. It can&#8217;t just be automated and abstracted away.  You can do a lot with machines, but humans still need to figure out the how&#8217;s and the why&#8217;s and in his words, &#8220;every model is a work in progress&#8221;.  I&#8217;d love to see informaticians start pushing the boundaries of statistical learning (I actually think the cheminformatics crowd does better here, or maybe I just know <a href="http://blog.rguha.net/" title="So much to do, so little time">Rajarshi</a>).  The other part I really liked was his part on stale data.  In genomics we have a lot of new data of all types.  How do we constantly evolve our models, at scale?  Problems to think about.</p>
	<h3>Further Reading</h3>
	<ul>
	<li><a href="http://mndoci.com/2010/06/22/the-biological-data-scientist/" title="The Biological Data Scientist">The Biological Data Scientist</a></li>
	<li><a href="http://mndoci.com/2010/07/13/data-science-roles-and-barriers/" title="Data science, roles, and barriers">Data science, roles and barriers</a></li>
	</ul>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/mndoci?a=3JkrNEoA2Ks:ZbrIBQWLTa8:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/mndoci?i=3JkrNEoA2Ks:ZbrIBQWLTa8:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=3JkrNEoA2Ks:ZbrIBQWLTa8:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/mndoci?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=3JkrNEoA2Ks:ZbrIBQWLTa8:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/mndoci?i=3JkrNEoA2Ks:ZbrIBQWLTa8:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=3JkrNEoA2Ks:ZbrIBQWLTa8:EpLpB3ZkKWg"><img src="http://feeds.feedburner.com/~ff/mndoci?d=EpLpB3ZkKWg" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=3JkrNEoA2Ks:ZbrIBQWLTa8:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/mndoci?i=3JkrNEoA2Ks:ZbrIBQWLTa8:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=3JkrNEoA2Ks:ZbrIBQWLTa8:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/mndoci?i=3JkrNEoA2Ks:ZbrIBQWLTa8:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=3JkrNEoA2Ks:ZbrIBQWLTa8:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/mndoci?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/mndoci?a=3JkrNEoA2Ks:ZbrIBQWLTa8:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/mndoci?i=3JkrNEoA2Ks:ZbrIBQWLTa8:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/mndoci/~4/3JkrNEoA2Ks" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://mndoci.com/2011/03/05/practical-machine-learning-and-scaling-data-platforms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://mndoci.com/2011/03/05/practical-machine-learning-and-scaling-data-platforms/</feedburner:origLink></item>
	<media:credit role="author">Deepak Singh</media:credit><media:rating>nonadult</media:rating></channel>
</rss>

