<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>BioBits</title>
	
	<link>http://chrisamiller.com/science</link>
	<description>Byte-size pieces of Bioinformatics and Biology</description>
	<lastBuildDate>Wed, 01 Jun 2011 15:35:55 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/Biobits" /><feedburner:info uri="biobits" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:browserFriendly></feedburner:browserFriendly><item>
		<title>New paper and open publishing</title>
		<link>http://chrisamiller.com/science/2011/06/01/new-paper-and-open-publishing/</link>
		<comments>http://chrisamiller.com/science/2011/06/01/new-paper-and-open-publishing/#comments</comments>
		<pubDate>Wed, 01 Jun 2011 15:35:55 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chrisamiller.com/science/?p=296</guid>
		<description><![CDATA[I had a paper go live last week, and you can read it here: Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors The interesting thing about this journal is that it&#8217;s not only open-access, but the peer-review process is completely open. You can see the original article that we submitted, [...]]]></description>
			<content:encoded><![CDATA[<p>I had a paper go live last week, and you can read it here: <a href="http://www.biomedcentral.com/1755-8794/4/34/">Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors</a></p>
<p>The interesting thing about this journal is that it&#8217;s not only open-access, but the peer-review process is completely open. You can see the original article that we submitted, the comments from the peer reviewers (and their names!) and the revisions that we made in response. </p>
<p>For non-scientists, or early grad students who have never submitted a paper, it&#8217;s an interesting look behind the curtain.</p>
<p>As someone who has published a few times, it was kind of refreshing to be able to attribute the review comments to specific authors. I think it helped in understanding where they were coming from with their criticisms and how to address them.</p>
<p>On the whole, the experience was a positive one. I feel like signing your name to a review probably makes people less likely to be dismissive (and sometimes just plain mean), and forces reviewers to justify their comments a little better.  </p>
<p>Is it the future of publishing? Only time will tell, but I&#8217;d certainly publish or review papers for such a journal again.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisamiller.com/science/2011/06/01/new-paper-and-open-publishing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bad Project</title>
		<link>http://chrisamiller.com/science/2011/01/21/bad-project/</link>
		<comments>http://chrisamiller.com/science/2011/01/21/bad-project/#comments</comments>
		<pubDate>Fri, 21 Jan 2011 18:22:39 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chrisamiller.com/science/2011/01/21/bad-project/</guid>
		<description><![CDATA[This Lady Gaga parody brought to you by the Zheng lab here at BCM.]]></description>
			<content:encoded><![CDATA[<p><iframe title="YouTube video player" class="youtube-player" type="text/html" width="560" height="345" src="http://www.youtube.com/embed/Fl4L4M8m4d0" frameborder="0" allowFullScreen></iframe></p>
<p>This Lady Gaga parody brought to you by the Zheng lab here at BCM.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisamiller.com/science/2011/01/21/bad-project/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PacBio Revealed</title>
		<link>http://chrisamiller.com/science/2011/01/03/pacbio-revealed/</link>
		<comments>http://chrisamiller.com/science/2011/01/03/pacbio-revealed/#comments</comments>
		<pubDate>Mon, 03 Jan 2011 21:25:47 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[pacbio]]></category>
		<category><![CDATA[sequencing]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://chrisamiller.com/science/?p=290</guid>
		<description><![CDATA[Oliver Elemento has done a pretty remarkable in-depth analysis of the first publicly available PacBio data. It&#8217;s all up on his blog, so jump over and read the whole thing, but here are a few of the highlights: The machine only produces about 48k reads per run. By Oliver&#8217;s reckoning, this works out to about [...]]]></description>
			<content:encoded><![CDATA[<p>Oliver Elemento has done a pretty <a href="http://oelemento.wordpress.com/2011/01/03/a-closer-look-at-the-first-pacbio-sequence-dataset/">remarkable in-depth analysis of the first publicly available PacBio data</a>. It&#8217;s all up on his blog, so jump over and read the whole thing, but here are a few of the highlights:</p>
<ul>
<li>The machine only produces about 48k reads per run. By Oliver&#8217;s reckoning, this works out to about 6,400 runs to get 10x coverage of a human genome. Ouch.</li>
<li>Single-pass sequence accuracy is remarkably low, at just over 80%. I heard rumors that PacBio had accuracy problems, but didn&#8217;t expect the error rate to be that ugly.</li>
<li>On a more positive note, read length is very high, with several runs *averaging* 2,300 bp, and overall read length averages ~850 bp.</li>
<li>Interesting, there is a positive correlation between read length and quality. This is somewhat different from what we see from other platforms, where read length is limited by the huge drops in quality near the end of the read.</li>
</ul>
<p>The bottom line, in my mind, is that unless PacBio can solve their problems in accuracy and throughput, they&#8217;re going to be relegated to niche applications.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisamiller.com/science/2011/01/03/pacbio-revealed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using BioRuby to fetch citations from Pubmed</title>
		<link>http://chrisamiller.com/science/2010/12/13/using-bioruby-to-fetch-citations-from-pubmed/</link>
		<comments>http://chrisamiller.com/science/2010/12/13/using-bioruby-to-fetch-citations-from-pubmed/#comments</comments>
		<pubDate>Mon, 13 Dec 2010 06:45:12 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[bioruby]]></category>
		<category><![CDATA[citation]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://chrisamiller.com/science/?p=283</guid>
		<description><![CDATA[As I write my thesis, I&#8217;m pulling together information from multiple papers that I&#8217;ve been an author on. These papers are in different journals, with wildly different citation styles. Within my thesis, though, citations need to be presented in a consistent format. Now, if I had all of these citations stored in a reference manager, [...]]]></description>
			<content:encoded><![CDATA[<p>As I write my thesis, I&#8217;m pulling together information from multiple papers that I&#8217;ve been an author on. These papers are in different journals, with wildly different citation styles. Within my thesis, though, citations need to be presented in a consistent format. </p>
<p>Now, if I had all of these citations stored in a reference manager, it would be a piece of cake to just export them all in a common format. Unfortunately, I wasn&#8217;t the first author on some of these papers, so I don&#8217;t have that data. Fortunately, with Pubmed and a little BioRuby, it&#8217;s not too difficult to get it.</p>
<p>Assuming that the paper is archived in Pubmed Central, there is a link on the right-hand side of the page titled &#8220;References for this PMC Article&#8221;.  If you click on it, then tweak the display options at the top, you can retrieve all of the PMIDs for the articles that were cited.</p>
<p>Save those as a list, then use the following BioRuby code to pull down the citation in BibTex format, for easy import into a citation manager:</p>
<p><code><br />
#!/usr/bin/ruby</p>
<p>require "rubygems"<br />
require 'bio'</p>
<p>File.new(ARGV[0]).each{|id|<br />
&nbsp;&nbsp;entry = Bio::PubMed.query(id)<br />
&nbsp;&nbsp;medline = Bio::MEDLINE.new(entry)<br />
&nbsp;&nbsp;reference = medline.reference<br />
&nbsp;&nbsp;puts reference.bibtex<br />
}</code></p>
<p>I&#8217;ll spare you the story of how I started out trying to use regexen to parse through the text of the citations. I pulled together something that sort of worked, but required a seperate regex for each journal, and often returned multiple results that I had to manually disambiguate. Yuck.</p>
<p>Thanks goes to <a href="http://friendfeed.com/the-life-scientists/3b3c2b06/does-anyone-know-of-reference-manager-that-can">Martin over at FriendFeed</a> for letting me know that PMC had citation info.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisamiller.com/science/2010/12/13/using-bioruby-to-fetch-citations-from-pubmed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bioinformatics in five years</title>
		<link>http://chrisamiller.com/science/2010/11/19/bioinformatics-in-five-years/</link>
		<comments>http://chrisamiller.com/science/2010/11/19/bioinformatics-in-five-years/#comments</comments>
		<pubDate>Fri, 19 Nov 2010 16:34:04 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[culture]]></category>
		<category><![CDATA[predilections]]></category>

		<guid isPermaLink="false">http://chrisamiller.com/science/?p=276</guid>
		<description><![CDATA[Over at BioStar, Keith asked: In five years time, how would the bioinformatics landscape be and what will probably be the main focus(es) in bioinformatics i.e the hottest areas in bioinformatics? Perhaps you&#8217;re looking for daring predictions, but I see lots of incremental progress, especially on the following fronts: If by &#8220;hottest&#8221;, you mean &#8220;number [...]]]></description>
			<content:encoded><![CDATA[<p>Over at <a href="http://biostar.stackexchange.com/questions/3782/future-directions-in-bioinformatics/">BioStar, Keith asked</a>:</p>
<blockquote><p>In five years time, how would the bioinformatics landscape be and what will probably be the main focus(es) in bioinformatics i.e the hottest areas in bioinformatics?</p></blockquote>
<p>Perhaps you&#8217;re looking for daring predictions, but I see lots of incremental progress, especially on the following fronts:</p>
<p>If by &#8220;hottest&#8221;, you mean &#8220;number of employees&#8221;, I think that there will be large number of openings for Masters-level (or lower) bioinformatics staff. These are the folks who will handle routine munging of huge data sets at most sequencing centers. At the present, a lot of that is still handled by either PhDs or grad students. As tools and standards get entrenched, though, you&#8217;ll see more and more offloaded to technical staff.</p>
<p>There&#8217;s bound to be a lot of movement in the health informatics field, building tools that can take in your personal genome sequence and spit out useful medical advice (in a format that&#8217;s useful to both patients and clinicians). This involves not only genomics skills, but also mining of the medical literature and building useful and searchable databases.</p>
<p>Though systems biology has been muted a little as the hype wears off, it&#8217;s poised to undergo a huge leap forward. With high-throughput data from tens or hundreds of thousands of cells, our models of how the cell works at a network or pathway level are only going to improve.</p>
<p>Other things that will be in demand:</p>
<p>Database and other &#8220;big data&#8221; skills &#8211; how are you going to store and access data from millions of genomes? We&#8217;re talking petabytes of information here.</p>
<p>Visualization &#8211; the larger the data gets, the less we&#8217;re able to really wrap our heads around it. A few good pictures can often tell us more than a million lines of data.</p>
<p>Truly interdisciplinary scientists. Not CS people who picked up a little bit of biology, or Bio majors who hack a little perl. We&#8217;re going to see the first generation of scientists who have really been trained to straddle the boundary between the two. They&#8217;re going to be well-poised to not only do solid research on their own, but be the lynchpins of successful collaborations.</p>
<p>Now, if you asked me where I saw the state of genomics in 5 years, or the state of cancer research, I&#8217;d think I&#8217;d have a lot bigger, bolder predictions. I just don&#8217;t see that the basic computational and statistical skillsets that bioinformaticians use today are likely to change tremendously. They&#8217;ll just get applied to bigger data, become more parallel, and be more in demand.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisamiller.com/science/2010/11/19/bioinformatics-in-five-years/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fun with Pubmed</title>
		<link>http://chrisamiller.com/science/2010/11/12/fun-with-pubmed/</link>
		<comments>http://chrisamiller.com/science/2010/11/12/fun-with-pubmed/#comments</comments>
		<pubDate>Fri, 12 Nov 2010 06:26:55 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chrisamiller.com/science/?p=268</guid>
		<description><![CDATA[After seeing Neil&#8217;s post about increasingly novel findings in the literature, I decided to modify the code a little bit and do some searches of my own. These graphs show the number of times that each term appeared in either the title or abstract of a article listed in PubMed. I limited the results to [...]]]></description>
			<content:encoded><![CDATA[<p>After seeing <a href="http://nsaunders.wordpress.com/2010/10/30/findings-increasingly-novel-scientists-say/">Neil&#8217;s post about increasingly novel findings in the literature</a>, I decided to modify the code a little bit and do some searches of my own. These graphs show the number of times that each term appeared in either the title or abstract of a article listed in PubMed. I limited the results to the last 60 years (well, okay 59).</p>
<p>First of all, it appears that having achieved better living through chemistry, we&#8217;ve moved squarely into the age of biology.<br />
<img src="http://www.chrisamiller.com/images/pubmed/biovchem.png"></p>
<p>If I want to blend in with the crowd, I should call myself a computational biologist, not a bioinformatician:<br />
<img src="http://www.chrisamiller.com/images/pubmed/biovcomp.png"></p>
<p>Finally, it&#8217;s all composed of the same stuff, but DNA beats genome handily.<br />
<img src="http://www.chrisamiller.com/images/pubmed/genomevdna.png"></p>
<p>These were just three queries off the top of my head, and are not intended to be rigorous analyses.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisamiller.com/science/2010/11/12/fun-with-pubmed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quick Links</title>
		<link>http://chrisamiller.com/science/2010/11/11/quick-links/</link>
		<comments>http://chrisamiller.com/science/2010/11/11/quick-links/#comments</comments>
		<pubDate>Fri, 12 Nov 2010 05:41:07 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chrisamiller.com/science/?p=261</guid>
		<description><![CDATA[In a long-overdue move, the U.S. says that genes should not be eligible for patents Neil describes a clever little hack that lets you add an index.html file to a Dropbox public folder In astronomy news, This picture of Mimas is pretty jaw-dropping. I think I have a new desktop image.]]></description>
			<content:encoded><![CDATA[<ul>
<li>In a long-overdue move, the <a href="http://www.nytimes.com/2010/10/30/business/30drug.html?_r=1&#038;adxnnl=1&#038;adxnnlx=1288918838-n/MCZuT1OzGG3JUp9Pngdg">U.S. says that genes should not be eligible for patents</a>
</li>
<li>Neil describes a clever little hack that lets you <a href="http://nsaunders.wordpress.com/2010/11/09/a-quick-bash-tip-add-an-index-html-file-to-a-dropbox-public-folder/">add an index.html file to a Dropbox public folder</a></li>
<li>In astronomy news, <a href="http://blogs.discovermagazine.com/badastronomy/2010/11/08/the-moon-that-almost-wasnt/">This picture of Mimas</a> is pretty jaw-dropping. I think I have a new desktop image.</li>
]]></content:encoded>
			<wfw:commentRss>http://chrisamiller.com/science/2010/11/11/quick-links/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Impact Factor is the lazy way out</title>
		<link>http://chrisamiller.com/science/2010/11/10/using-impact-factor-is-the-lazy-way-out/</link>
		<comments>http://chrisamiller.com/science/2010/11/10/using-impact-factor-is-the-lazy-way-out/#comments</comments>
		<pubDate>Wed, 10 Nov 2010 23:09:19 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chrisamiller.com/science/?p=263</guid>
		<description><![CDATA[I help moderate our graduate programs&#8217;s weekly journal club, and some of the faculty involved are proposing strict guidelines on which journals can be used. Specifically, they&#8217;d like to restrict it to only journals that have an Impact Factor of ten or higher. I think that&#8217;s a horrible idea, so I responded thusly: I oppose [...]]]></description>
			<content:encoded><![CDATA[<p>I help moderate our graduate programs&#8217;s weekly journal club, and some of the faculty involved are proposing strict guidelines on which journals can be used. Specifically, they&#8217;d like to restrict it to only journals that have an Impact Factor of ten or higher.  I think that&#8217;s a horrible idea, so I responded thusly: </p>
<p>I oppose excluding papers based on impact factor, because it&#8217;s a seriously flawed metric.  As an example, the journal  &#8220;Acta Crystallographica A&#8221; has a current impact factor of 49.93.<a href="http://www.nature.com/nature/journal/v466/n7303/full/466179b.html">[1]</a> For 72 articles published in 2008, 71 garnered no more than three citations, while a single article that racked up 5,624 citations skewed the metric.</p>
<p>On the other hand, a new breed of journals (like <a href="http://www.plosone.org/">PLoS One</a>) publish a huge number of papers and rely on post-publication statistics to measure impact. This inflates the denominator of the metric and leads to a low IF, even though there are certainly some fine papers in that journal.</p>
<p>As further evidence, a 2009 paper did Prinicpal Component Analysis of 39 different metrics of scholarly impact.<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0006022">[2]</a> They concluded that Impact Factor is positioned at the periphery of these rankings, which should lead us to carefully consider how much we rely upon it.</p>
<p>In short, I don&#8217;t think we as moderators should take the lazy way out and allow only so-called &#8220;top-tier&#8221; journals. Surely, out of all of us, at least one or two can spare 5 minutes each week to skim the article and judge it on its merits.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisamiller.com/science/2010/11/10/using-impact-factor-is-the-lazy-way-out/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>QOTD</title>
		<link>http://chrisamiller.com/science/2010/10/27/qotd-4/</link>
		<comments>http://chrisamiller.com/science/2010/10/27/qotd-4/#comments</comments>
		<pubDate>Thu, 28 Oct 2010 03:34:54 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://chrisamiller.com/science/?p=254</guid>
		<description><![CDATA[Q: Is Ozzy the first rock star to have his full genome sequenced? Conde: Yes, as far as I know. I can definitely tell you he&#8217;s the first prince of darkness to have his genome sequenced and analyzed. &#8211; Jorge Conde, in SciAm This came through my feed reader today and I had to laugh. [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>Q: Is Ozzy the first rock star to have his full genome sequenced?</p>
<p>Conde: Yes, as far as I know. I can definitely tell you he&#8217;s the first prince of darkness to have his genome sequenced and analyzed.</p>
<p>&#8211; Jorge Conde, <a href="http://www.scientificamerican.com/article.cfm?id=ozzy-osbourne-genome">in SciAm</a></p></blockquote>
<p>This came through my feed reader today and I had to laugh. As we might expect, they didn&#8217;t find all that much that was interesting in Ozzy&#8217;s genome, but it sure was great publicity for Knome and Cofactor. </p>
<p>With the announcement of the <a href="http://www.personalgenomes.org/pgp1k.html">PGP1000</a> and the recent release from <a href="http://www.nature.com/nature/journal/v467/n7319/full/nature09534.html">the 1000 Genomes Project</a>, the number of personal genomes has now easily reached several thousand. When I started graduate school 5 years ago, a single genome cost well over a million dollars and sequencing one guaranteed you the cover of Nature or Science. Now it&#8217;s just something that moderately wealthy people do on a lark.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisamiller.com/science/2010/10/27/qotd-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programming Note</title>
		<link>http://chrisamiller.com/science/2010/10/26/programming-note/</link>
		<comments>http://chrisamiller.com/science/2010/10/26/programming-note/#comments</comments>
		<pubDate>Tue, 26 Oct 2010 14:54:06 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[codedump]]></category>

		<guid isPermaLink="false">http://chrisamiller.com/science/?p=252</guid>
		<description><![CDATA[Just a quick update to say that the codedump project is still happening. I&#8217;ve just been sidetracked for a week or two with paper revisions and some experiments that had to be done ASAP. I&#8217;m carving a little time out over the next few days to work on it, though.]]></description>
			<content:encoded><![CDATA[<p>Just a quick update to say that the <a href="http://chrisamiller.com/science/2010/09/20/dumping-my-code-the-good-the-bad-and-the-ugly/">codedump project</a> is still happening. I&#8217;ve just been sidetracked for a week or two with paper revisions and some experiments that had to be done ASAP. I&#8217;m carving a little time out over the next few days to work on it, though.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisamiller.com/science/2010/10/26/programming-note/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss><!-- Dynamic page generated in 0.916 seconds. --><!-- Cached page generated by WP-Super-Cache on 2013-04-04 21:28:02 -->
