<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>MassGenomics</title>
	
	<link>http://massgenomics.org</link>
	<description>Medical genomics in the post-genome era</description>
	<lastBuildDate>Thu, 17 May 2012 16:09:38 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/Massgenomics" /><feedburner:info uri="massgenomics" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Genomics, Open Access, and China</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/-UG6n73bWAg/genomics-open-access-and-china.html</link>
		<comments>http://massgenomics.org/2012/05/genomics-open-access-and-china.html#comments</comments>
		<pubDate>Thu, 17 May 2012 15:26:29 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://massgenomics.org/?p=1758</guid>
		<description><![CDATA[The associate editor of the journal Genomics has resigned, stating that he can no longer work for a system that puts profit over access to research. In an article in The Guardian, Winston Hide announced his resignation from &#8220;system that provides solid profits for the publisher while effectively denying colleagues in developing countries access to [...]]]></description>
			<content:encoded><![CDATA[<p>The associate editor of the journal <em>Genomics</em> has resigned, stating that he can no longer work for a system that puts profit over access to research. In an article in The Guardian, <strong><a title="Winston Hide Resignation in the Guardian" href="http://www.guardian.co.uk/science/blog/2012/may/16/system-profit-access-research" target="_blank">Winston Hide announced his resignation</a></strong> from &#8220;system that provides solid profits for the publisher while effectively denying colleagues in developing countries access to research findings.&#8221;</p>
<p>The system he&#8217;s speaking of is <strong><a title="Elsevier" href="http://www.elsevier.com" target="_blank">Elsevier</a></strong>, one of the largest publishers in the world. Headquartered in Amsterdam, this company publishes around two thousand journals, in addition to 20,000 books and reference works. They work with 7,000 journal editors, 70,000 editorial board members, and over half a million authors.</p>
<h2>Research Access in Developing Nations</h2>
<p><a href="http://www.guardian.co.uk/science/blog/2012/may/16/system-profit-access-research"><img class="aligncenter size-full wp-image-1762" style="border: 1px solid black;" title="Genomics-Editor-Quote" src="http://massgenomics.org/wp-content/uploads/2012/05/Genomics-Editor-Quote.jpg" alt="Genomics Editor Winston Hide Quote" width="441" height="57" /></a></p>
<p>In his article, Hide is not openly critical of his former employer. Instead, he highlights the problem of performing research in developing countries where access to for-profit journals is scarce. Having worked for 10 years at a university in South Africa, he notes that <strong>researchers are often forced to devote big chunks of their budgets to purchasing a single subscription</strong>. When this isn&#8217;t possible, their mode of operation is the e-mail request, &#8220;Will you please send me a PDF?&#8221;</p>
<p>It is obviously a significant challenge for researchers in developing nations to perform competitive research without access to most of the literature.</p>
<h1>Elsevier&#8217;s Response to Criticism</h1>
<p>Elsevier defended its position in February this year with its <strong><a title="Elsevier Open Letter to Research Community" href="http://www.elsevier.com/wps/find/intro.cws_home/elsevieropenletter" target="_blank">open letter to the research community</a>. </strong>This was in response to an online petition that was &#8220;putting forward some serious negative judgments about Elsevier.&#8221; In their letter, the publisher responds on a few points, noting that:</p>
<ol>
<li>The cost of downloading articles has never been cheaper than it is today, approximately one-fifth of what it was 10 years ago. But what (besides oil) hasn&#8217;t seen a cost reduction thanks to the Internet?</li>
<li>Libraries are never forced to take bundled packages; they have number of subscription options, from single-article to journal to collection. Most that opt for larger packages, Elsevier claims, do so to get bigger discounts. This is a fancy way of saying that hey, there are lots of ways that you can pay us for access.</li>
<li>While committed to the <em>principle</em> that the public should have access to the output of publicly funded research, Elsevier opposes &#8220;potentially harmful&#8221; government laws that could undermine the sustainability of the peer-review publishing system. In other words, &#8220;We agree with your idea, but don&#8217;t make us go along with it.&#8221;</li>
</ol>
<p>There are a few tidbits in Elsevier&#8217;s letter that I find interesting. First, they tout their eight open-access journals as evidence of their commitment to research access &#8211; but <strong>that&#8217;s 8 out of 2,000 journals, or 0.4%</strong>. Second, they write that &#8220;Being criticized by even one researcher, let alone all the signatories of the petition, is difficult for a company whose reason for being is to serve the research community.&#8221;</p>
<p>I thought that their reason for being was to make money.</p>
<h1>Genomics, China, and the Future</h1>
<p>On Twitter, Nick Loman (<a title="Nick Loman on Twitter" href="https://twitter.com/#!/pathogenomenick" target="_blank">@pathgenomenick</a>) quipped that the most surprising part of the article was that there&#8217;s a journal named Genomics. I admit this journal gets a bit of lenience from me because it&#8217;s the first place where my name appeared on an authorship list in the scientific literature: way back in 2005, when members of the SNP Consortium led by my former P.I., Raymond D. Miller, published a pre-HapMap <strong><a title="High-density SNP map of the human genome" href="http://www.ncbi.nlm.nih.gov/pubmed/15961272" target="_blank">high-density SNP map of the human genome</a></strong>.</p>
<p>Indeed, <em>Genomics</em> has fallen somewhat from prominence; the articles seem to have narrower interest every year. Basically, it&#8217;s the place you can publish your obscure model organism&#8217;s draft genome. Intriguingly, Hide mentions in his article that <span style="color: #800000;"><strong>the majority of manuscripts he saw at <em>Genomics</em> now come from China</strong></span>. Are these submissions that have been turned down elsewhere? Or, and I think this may be more likely, are most biomedical journals being gradually overwhelmed with submissions  from the Middle Kingdom?</p>
<p>A colleague of mine said that Hide&#8217;s actions probably won&#8217;t change anything. <strong>Big companies like Elsevier are in business to make money</strong>. The concept of open-access, while a noble idea and lauded by many, has yet to really take hold in the research community. There must be some middle ground, though. It seems foolish that the U.S. and other western nations donate billions of dollars in foreign aid to the developing world, yet our massive corporations seem unable or unwilling to offer any special treatment to the struggling researchers who work there.</p>
<p>They need to take a lesson from Google: it is possible to make money without doing evil.</p>
<p><strong><span style="text-decoration: underline;">References</span></strong></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Genomics&amp;rft_id=info%3Apmid%2F15961272&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=High-density+single-nucleotide+polymorphism+maps+of+the+human+genome.&amp;rft.issn=0888-7543&amp;rft.date=2005&amp;rft.volume=86&amp;rft.issue=2&amp;rft.spage=117&amp;rft.epage=26&amp;rft.artnum=&amp;rft.au=Miller+RD&amp;rft.au=Phillips+MS&amp;rft.au=Jo+I&amp;rft.au=Donaldson+MA&amp;rft.au=Studebaker+JF&amp;rft.au=Addleman+N&amp;rft.au=Alfisi+SV&amp;rft.au=Ankener+WM&amp;rft.au=Bhatti+HA&amp;rft.au=Callahan+CE&amp;rft.au=Carey+BJ&amp;rft.au=Conley+CL&amp;rft.au=Cyr+JM&amp;rft.au=Derohannessian+V&amp;rft.au=Donaldson+RA&amp;rft.au=Elosua+C&amp;rft.au=Ford+SE&amp;rft.au=Forman+AM&amp;rft.au=Gelfand+CA&amp;rft.au=Grecco+NM&amp;rft.au=Gutendorf+SM&amp;rft.au=Hock+CR&amp;rft.au=Hozza+MJ&amp;rft.au=Hur+S&amp;rft.au=In+SM&amp;rft.au=Jackson+DL&amp;rft.au=Jo+SA&amp;rft.au=Jung+SC&amp;rft.au=Kim+S&amp;rft.au=Kimm+K&amp;rft.au=Kloss+EF&amp;rft.au=Koboldt+DC&amp;rft.au=Kuebler+JM&amp;rft.au=Kuo+FS&amp;rft.au=Lathrop+JA&amp;rft.au=Lee+JK&amp;rft.au=Leis+KL&amp;rft.au=Livingston+SA&amp;rft.au=Lovins+EG&amp;rft.au=Lundy+ML&amp;rft.au=Maggan+S&amp;rft.au=Minton+M&amp;rft.au=Mockler+MA&amp;rft.au=Morris+DW&amp;rft.au=Nachtman+EP&amp;rft.au=Oh+B&amp;rft.au=Park+C&amp;rft.au=Park+CW&amp;rft.au=Pavelka+N&amp;rft.au=Perkins+AB&amp;rft.au=Restine+SL&amp;rft.au=Sachidanandam+R&amp;rft.au=Reinhart+AJ&amp;rft.au=Scott+KE&amp;rft.au=Shah+GJ&amp;rft.au=Tate+JM&amp;rft.au=Varde+SA&amp;rft.au=Walters+A&amp;rft.au=White+JR&amp;rft.au=Yoo+YK&amp;rft.au=Lee+JE&amp;rft.au=Boyce-Jacino+MT&amp;rft.au=Kwok+PY&amp;rft.au=SNP+Consortium+Allele+Frequency+Project&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+Bioinformatics">Miller RD, Phillips MS, Jo I, Donaldson MA, Studebaker JF, Addleman N, Alfisi SV, Ankener WM, Bhatti HA, Callahan CE, Carey BJ, Conley CL, Cyr JM, Derohannessian V, Donaldson RA, Elosua C, Ford SE, Forman AM, Gelfand CA, Grecco NM, Gutendorf SM, Hock CR, Hozza MJ, Hur S, In SM, Jackson DL, Jo SA, Jung SC, Kim S, Kimm K, Kloss EF, Koboldt DC, Kuebler JM, Kuo FS, Lathrop JA, Lee JK, Leis KL, Livingston SA, Lovins EG, Lundy ML, Maggan S, Minton M, Mockler MA, Morris DW, Nachtman EP, Oh B, Park C, Park CW, Pavelka N, Perkins AB, Restine SL, Sachidanandam R, Reinhart AJ, Scott KE, Shah GJ, Tate JM, Varde SA, Walters A, White JR, Yoo YK, Lee JE, Boyce-Jacino MT, Kwok PY, &amp; SNP Consortium Allele Frequency Project (2005). High-density single-nucleotide polymorphism maps of the human genome. <span style="font-style: italic;">Genomics, 86</span> (2), 117-26 PMID: <a href="http://www.ncbi.nlm.nih.gov/pubmed/15961272" rev="review">15961272</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://massgenomics.org/2012/05/genomics-open-access-and-china.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://massgenomics.org/2012/05/genomics-open-access-and-china.html</feedburner:origLink></item>
		<item>
		<title>Integrating copy number and gene expression data in breast cancer</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/DWxdtT9_Xzk/integrating-copy-number-and-gene-expression-data-in-breast-cancer.html</link>
		<comments>http://massgenomics.org/2012/05/integrating-copy-number-and-gene-expression-data-in-breast-cancer.html#comments</comments>
		<pubDate>Fri, 11 May 2012 16:21:24 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://massgenomics.org/?p=1749</guid>
		<description><![CDATA[A study in Nature reports the genomic and transcriptomic architecture of breast cancer from a survey of ~2,000 tumors.  These samples were collected in Canada and the UK; what makes the collection particularly valuable is that they were fresh-frozen and clinically annotated, with long-term follow-up. Patients whose tumors were ER-negative and/or lymph-node-positive had received systematic [...]]]></description>
			<content:encoded><![CDATA[<p>A study in <em>Nature</em> reports the <strong><a title="Genomci and transcriptomic architecture of breast cancer" href="http://www.nature.com/nature/journal/vaop/ncurrent/full/nature10983.html" target="_blank">genomic and transcriptomic architecture of breast cancer</a></strong> from a survey of ~2,000 tumors.  These samples were collected in Canada and the UK; what makes the collection particularly valuable is that they were fresh-frozen and clinically annotated, with long-term follow-up. Patients whose tumors were ER-negative and/or lymph-node-positive had received systematic chemotherapy, ER-positive or LN-negative patients had not, and none of the patients with Her2+ tumors received Herceptin (trastuzumab). Thus, the <strong>tumors were all clinically homogeneous within subgroups</strong>, making this a great resource to study the genomic landscape of breast cancer.</p>
<h2>Breast Cancer Subtypes</h2>
<p>A quick overview of breast cancer subtypes seems appropriate here. Most breast cancers are <em>carcinomas</em>, meaning that they arise from epithelial cells. A histology review typically classifies these as originating from the milk-passage (ductal) or milk-producing glands (lobules) of the breast. Tumors can also be assigned to subgroups on the basis of gene expression: <span style="color: #008000;"><strong>a 50-gene assay called PAM50 is widely used to classify tumors as one of 4-5 &#8220;intrinsic&#8221; subtypes</strong></span>. Among the most important genes from a clinical perspective are those encoding estrogen receptor (ER), progesterone receptor (PR), and Her2 (<em>ERBB2</em>) receptor. The four most common intrinsic subtypes:</p>
<table border="1" cellspacing="0" cellpadding="2" align="center">
<tbody>
<tr>
<td><strong>Subtype</strong></td>
<td><strong>Typical ER/PR/Her2 Status</strong></td>
<td><strong>Prevalence</strong></td>
<td><strong>Notes</strong></td>
</tr>
<tr>
<td>Luminal A</td>
<td>ER+ and/or PR+, Her2-</td>
<td>42-59%</td>
<td>Most common and best prognosis</td>
</tr>
<tr>
<td>Luminal B</td>
<td>ER+ and/or PR+, Her2+</td>
<td>6-19%</td>
<td>Slightly worse prognosis</td>
</tr>
<tr>
<td>Her2-enriched</td>
<td>ER-, PR-, Her2+</td>
<td>14-20%</td>
<td>Often poor prognosis</td>
</tr>
<tr>
<td>Basal-like/Triple-negative</td>
<td>ER-, PR-, Her2-</td>
<td>7-12%</td>
<td>Often aggressive, poorer prognosis</td>
</tr>
<tr>
<td colspan="5">Source: <a title="Susan G. Komen Foundation" href="http://ww5.komen.org/BreastCancer/SubtypesofBreastCancer.html" target="_blank">Susan G. Komen Foundation</a></td>
</tr>
</tbody>
</table>
<p>There is substantial but incomplete <a title="Overlap between triple-negative and basal-like breast cancer" href="http://massgenomics.org/2012/04/the-genetic-architecture-of-triple-negative-breast-cancer.html" target="_blank">overlap between basal-like and triple-negative breast cancer.</a> Their genetic basis is not as well understood, and they typically don&#8217;t respond to targeted hormone therapies because they don&#8217;t express ER, PR, or Her2.</p>
<h2>Integrating SNP and Copy Number Data with Gene Expression</h2>
<p>In this study, the authors assessed the impact of SNPs, inherited copy number variants (CNVs), and acquired copy number alterations (CNAs) on the gene expression landscape. With the statistical power of 2,000 samples (half in a discovery set, half in a validation set), they were able to search for both cis-regulatory (variants affecting nearby genes) and trans-regulatory (variants affecting distant genes) relationships. Genome-wide analysis of variance (ANOVA) revealed that <span style="color: #ff0000;"><strong>germline SNPs/CNVs and somatic CNAs influenced &gt;39% of gene expression probes</strong></span>, roughly half acting in cis and half in trans.</p>
<ul>
<li><strong>Somatic CNAs dominated the regulatory picture</strong>, contributing to &gt;96% of significant expression associations</li>
<li>On a gene-by-gene basis, <strong>germline SNPs rivaled CNAs</strong> in explaining a greater <em>proportion</em> of the variation.</li>
<li>The contribution of inherited CNVs was minimal by comparison</li>
</ul>
<p>Although the dominating influence of somatic CNAs is understandable, the relatively small contribution of CNVs to the expression picture is rather surprising. It&#8217;s possible that inherited regions of CNV with strong influence on gene expression are targeted for amplification/deletion by cancer cells, which might obscure their effect in an otherwise normal cell. Otherwise, it does seem to suggest that <strong>germline SNPs have a greater influence than CNVs</strong> when it comes to modulating gene expression.</p>
<h2><em>Cis</em> versus <em>Trans</em> Regulation</h2>
<p>Some ~20% of loci examined exhibited <em>cis</em>-regulatory assocations between somatic CNAs and gene expression. In other words, <strong>acquired copy number alterations influence the expression of genes within them or nearby</strong>. The authors undertook a higher-resolution survey of these associations within tumor subtypes, finding known driver events, such as amplifications of <em>MYC</em>, <em>CCND1</em>, <em>ERBB2</em>, and <em>CCNE1</em> and deletions of <em>PTEN</em> and <em>MDM2</em>, as well as putative but suggestive events involving <em>MDM1, MDM4, CDK3, CDK4, PI4KB, NCOR1</em>, and others. They also highlight three apparently novel <em>cis</em>-regulatory associations that may influence breast cancer development and progression:</p>
<ol>
<li>Loss of <em>PPP2R2A</em>, a regulatory sub-unit of a complex that governs mitotic exit. Somatic mutations in another subunit of the same complex (PPP2R1A) were recently identified in clear cell ovarian cancers and endometrioid cancers.</li>
<li>Frequent deletion of <em>MTAP</em> that co-occurs with deletion of known tumor suppressors <em>CDKN2A</em> and <em>CDKN2B</em>.</li>
<li>Recurrent deletion of <em>MAP2K4</em> concomitant with outlying expression in ER-positive cases.</li>
</ol>
<p>To examine trans-regulatory events, the authors plotted matrices of CNA-expression relationships by chromosome (gene location on the Y-axis, CNA location on the X-axis). Visualized in this manner, any patterns off of the diagonal (where a CNA influences a gene on the same chromosome) indicate a trans-acting event. There was strong-evidence of such patterns on chromosomes 1q, 7p, 8, 11q, 14q, 16, 17q, and 20q, all of which are the targets of frequent large-scale copy number alteration in breast cancer.</p>
<p>The &#8220;hotspots&#8221; of these <em>trans</em> associations, when grouped by pathway, highlight known targets of dysregulation in breast cancer such as <em>ERBB2</em> and <em>MYC</em>. You might notice that these two were also cis-regulatory association list above, and make the intuitive leap to conclude that amplifications targeting <em>ERBB2</em> (on chr17) and <em>MYC</em> (on chr8) increase the expression of these genes, which in turn drives expression changes for genes elsewhere in the genome.</p>
<h2>Integrative Clustering Reveals Novel Subgroups</h2>
<p>The authors next took 997 tumors in the discovery set, integrated copy number and gene expression data, and performed clustering analyses to identify subgroups of tumors with distinct features and clinical outcomes. They came up with 10 &#8220;integrative clusters&#8221;, which they replicated in the validation set (995 cases). Among these clusters are some interesting subsets:</p>
<ul>
<li>A high-risk, ER-positive subgroup with a steep mortality trajectory (bad), composed of 11q13/14 cis-acting luminal tumors that harbor other common alterations. The authors note that 11q13 contains the <em>CCND1</em> gene, frequently targeted for amplification in breast cancer. This is an important exception to the often favorable prognosis for ER+ tumors.</li>
<li>A subgroup of predominantly luminal A cases with low genomic instability that was enriched for histology types with good prognoses (e.g. lobular and tubular carcinomas).</li>
<li>Another subgroup with favorable prognosis, but containing a mixture of ER statuses and subtypes. Their common feature was a nearly flat copy number landscape. The authors note that this &#8220;CNA-devoid&#8221; subgroup is <strong>&#8220;ripe for mutational profiling.&#8221;</strong></li>
<li>A stable, mostly high-genomic-instability subgroup comprising nearly all basal-like tumors with good long-term outcomes.</li>
<li>A group of Her2-enriched and ER-positive tumors with <em>ERBB2</em> amplification. These patients were all enrolled before Herceptin (trastuzumab) became available, and had the worst disease-specific survival.</li>
</ul>
<p>These findings demonstrate how useful it is to construct a cohort, not just of many cases, but with long-term follow-up so that researchers can link the genomic architecture of tumors to the eventual death or survival of the patients.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Adoi%2F10.1038%2Fnature10983&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=The+genomic+and+transcriptomic+architecture+of+2%2C000+breast+tumours+reveals+novel+subgroups&amp;rft.issn=0028-0836&amp;rft.date=2012&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fdoifinder%2F10.1038%2Fnature10983&amp;rft.au=Curtis%2C+C.&amp;rft.au=Shah%2C+S.&amp;rft.au=Chin%2C+S.&amp;rft.au=Turashvili%2C+G.&amp;rft.au=Rueda%2C+O.&amp;rft.au=Dunning%2C+M.&amp;rft.au=Speed%2C+D.&amp;rft.au=Lynch%2C+A.&amp;rft.au=Samarajiwa%2C+S.&amp;rft.au=Yuan%2C+Y.&amp;rft.au=Gr%C3%A4f%2C+S.&amp;rft.au=Ha%2C+G.&amp;rft.au=Haffari%2C+G.&amp;rft.au=Bashashati%2C+A.&amp;rft.au=Russell%2C+R.&amp;rft.au=McKinney%2C+S.&amp;rft.au=Caldas%2C+C.&amp;rft.au=Aparicio%2C+S.&amp;rft.au=Curtis%E2%80%A0%2C+C.&amp;rft.au=Shah%2C+S.&amp;rft.au=Caldas%2C+C.&amp;rft.au=Aparicio%2C+S.&amp;rft.au=Brenton%2C+J.&amp;rft.au=Ellis%2C+I.&amp;rft.au=Huntsman%2C+D.&amp;rft.au=Pinder%2C+S.&amp;rft.au=Purushotham%2C+A.&amp;rft.au=Murphy%2C+L.&amp;rft.au=Caldas%2C+C.&amp;rft.au=Aparicio%2C+S.&amp;rft.au=Caldas%2C+C.&amp;rft.au=Bardwell%2C+H.&amp;rft.au=Chin%2C+S.&amp;rft.au=Curtis%2C+C.&amp;rft.au=Ding%2C+Z.&amp;rft.au=Gr%C3%A4f%2C+S.&amp;rft.au=Jones%2C+L.&amp;rft.au=Liu%2C+B.&amp;rft.au=Lynch%2C+A.&amp;rft.au=Papatheodorou%2C+I.&amp;rft.au=Sammut%2C+S.&amp;rft.au=Wishart%2C+G.&amp;rft.au=Aparicio%2C+S.&amp;rft.au=Chia%2C+S.&amp;rft.au=Gelmon%2C+K.&amp;rft.au=Huntsman%2C+D.&amp;rft.au=McKinney%2C+S.&amp;rft.au=Speers%2C+C.&amp;rft.au=Turashvili%2C+G.&amp;rft.au=Watson%2C+P.&amp;rft.au=Ellis%2C+I.&amp;rft.au=Blamey%2C+R.&amp;rft.au=Green%2C+A.&amp;rft.au=Macmillan%2C+D.&amp;rft.au=Rakha%2C+E.&amp;rft.au=Purushotham%2C+A.&amp;rft.au=Gillett%2C+C.&amp;rft.au=Grigoriadis%2C+A.&amp;rft.au=Pinder%2C+S.&amp;rft.au=di+Rinaldis%2C+E.&amp;rft.au=Tutt%2C+A.&amp;rft.au=Murphy%2C+L.&amp;rft.au=Parisien%2C+M.&amp;rft.au=Troup%2C+S.&amp;rft.au=Caldas%2C+C.&amp;rft.au=Chin%2C+S.&amp;rft.au=Chan%2C+D.&amp;rft.au=Fielding%2C+C.&amp;rft.au=Maia%2C+A.&amp;rft.au=McGuire%2C+S.&amp;rft.au=Osborne%2C+M.&amp;rft.au=Sayalero%2C+S.&amp;rft.au=Spiteri%2C+I.&amp;rft.au=Hadfield%2C+J.&amp;rft.au=Aparicio%2C+S.&amp;rft.au=Turashvili%2C+G.&amp;rft.au=Bell%2C+L.&amp;rft.au=Chow%2C+K.&amp;rft.au=Gale%2C+N.&amp;rft.au=Huntsman%2C+D.&amp;rft.au=Kovalik%2C+M.&amp;rft.au=Ng%2C+Y.&amp;rft.au=Prentice%2C+L.&amp;rft.au=Caldas%2C+C.&amp;rft.au=Tavar%C3%A9%2C+S.&amp;rft.au=Curtis%2C+C.&amp;rft.au=Dunning%2C+M.&amp;rft.au=Gr%C3%A4f%2C+S.&amp;rft.au=Lynch%2C+A.&amp;rft.au=Rueda%2C+O.&amp;rft.au=Russell%2C+R.&amp;rft.au=Samarajiwa%2C+S.&amp;rft.au=Speed%2C+D.&amp;rft.au=Markowetz%2C+F.&amp;rft.au=Yuan%2C+Y.&amp;rft.au=Brenton%2C+J.&amp;rft.au=Aparicio%2C+S.&amp;rft.au=Shah%2C+S.&amp;rft.au=Bashashati%2C+A.&amp;rft.au=Ha%2C+G.&amp;rft.au=Haffari%2C+G.&amp;rft.au=McKinney%2C+S.&amp;rft.au=Langer%C3%B8d%2C+A.&amp;rft.au=Green%2C+A.&amp;rft.au=Provenzano%2C+E.&amp;rft.au=Wishart%2C+G.&amp;rft.au=Pinder%2C+S.&amp;rft.au=Watson%2C+P.&amp;rft.au=Markowetz%2C+F.&amp;rft.au=Murphy%2C+L.&amp;rft.au=Ellis%2C+I.&amp;rft.au=Purushotham%2C+A.&amp;rft.au=B%C3%B8rresen-Dale%2C+A.&amp;rft.au=Brenton%2C+J.&amp;rft.au=Tavar%C3%A9%2C+S.&amp;rft.au=Caldas%2C+C.&amp;rft.au=Aparicio%2C+S.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CGenetics%2C+Bioinformatics%2C+Cancer">Curtis, C., Shah, S., Chin, S., et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups <span style="font-style: italic;">Nature</span> DOI: <a href="http://dx.doi.org/10.1038/nature10983" rev="review">10.1038/nature10983</a></span></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://massgenomics.org/2012/05/integrating-copy-number-and-gene-expression-data-in-breast-cancer.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://massgenomics.org/2012/05/integrating-copy-number-and-gene-expression-data-in-breast-cancer.html</feedburner:origLink></item>
		<item>
		<title>Comparison of Benchtop Sequencers</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/Z_oxyVUdZik/comparison-of-benchtop-sequencers.html</link>
		<comments>http://massgenomics.org/2012/04/comparison-of-benchtop-sequencers.html#comments</comments>
		<pubDate>Fri, 27 Apr 2012 16:55:22 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://massgenomics.org/?p=1737</guid>
		<description><![CDATA[Working at a major genome center can skew one&#8217;s view of the scientific community. You forget, for example, that not every research lab has access to dozens of next-gen sequencers churning out data and an entire building of computing infrastructure to help analyze it. In fact, there&#8217;s a very strong market for smaller, cheaper instruments [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://massgenomics.org/wp-content/uploads/2012/04/Benchtop-Sequencers.jpg"><img class="alignright size-full wp-image-1744" title="Benchtop sequencer comparison" src="http://massgenomics.org/wp-content/uploads/2012/04/Benchtop-Sequencers.jpg" alt="Benchtop sequencer comparison" width="150" height="370" /></a>Working at a major genome center can skew one&#8217;s view of the scientific community. You forget, for example, that not every research lab has access to dozens of next-gen sequencers churning out data and an entire building of computing infrastructure to help analyze it. In fact, there&#8217;s a very strong market for smaller, cheaper instruments that meet the needs and budget requirements of a smaller lab. Three different benchtop next-gen sequencers have come on the market to address that need: the 454 GS Junior (Roche), MiSeq (Illumina), and Ion Torrent PGM (Life Technologies).</p>
<p>This week in <em>Nature Biotechnology</em>, <strong>Nick Loman</strong> and his colleagues from the University of Birmingham (UK) present a <strong><a title="Performance comparison of benchtop sequencers" href="http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2198.html" target="_blank">performance comparison of these three instruments</a></strong> by sequencing an <em>E. coli</em> strain linked to an outbreak of food poisoning in Germany last year. This experiment is well-suited to the comparison for two reasons:</p>
<ol>
<li>This strain of <em>E. coli</em> has already been characterized by previous whole-genome sequencing efforts; the key genetic structures underlying its toxicity have been extensively characterized.</li>
<li>The rapid turnaround and reduced throughput of benchtop sequencers is ideal for sequencing bacterial genomes.</li>
</ol>
<h2>Reference Assembly for E. coli O104:H4 strain 280</h2>
<div>To enable comparisons of the benchtop sequencers, the authors first generated a reference assembly for the bacterial isolate. Their sample came from a female traveler who, after traveling to Germany, developed hemolytic uremic syndrome and thrombotic thrombocytopenic purpura, which I take as fancy terms for &#8220;really bad food poisoning.&#8221; Using standard (not benchtop) Roche/454 instruments, the authors generated two datasets:</div>
<div>
<ul>
<li><strong>Long reads</strong> on the 454 FLX+ platform, with a model read length of 812 bp and maximum read lengths &gt;1100 bp.</li>
<li><strong>Large-insert (8kb) paired-end reads</strong> on the 454 FLX platform with Titanium chemistry.</li>
</ul>
</div>
<div>The combination of these datasets resulted in a &#8220;very high quality draft genome assembly&#8221; with three scaffolds, which are essentially big blocks of assembled sequence. The largest of these corresponded to the 5.3 Mbp bacterial chromosome, while two smaller scaffolds corresponded to two large <em>E. coli</em> plasmids. There were still some gaps in these assemblies, but overall they were pretty good.</div>
<h2>Benchtop Sequencer Characteristics</h2>
<div>There are a number of differences among the three benchtop platforms in terms of instrument cost, run time, sample prep, throughput, and even (as the authors found) data quality, all of which are important considerations for the lab looking to buy one. That&#8217;s part of why this study is important; it provides a direct and somewhat unbiased comparison of these platforms in a real-world application setting.</div>
<div>You should be aware of some possible financial conflicts of interest: the first author has been a paid speaker at IonTorrent and Illumina meetings, and the senior author won his IonTorrent instrument through the European PGM Grant Program. In spite of this, I find their comparison to be very fair with regard to all three platforms.</div>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td style="background: #FFCC00;"><strong>Platform</strong></td>
<td style="background: #FFCC00;"><strong>454 GS Junior</strong></td>
<td style="background: #FFCC00;"><strong>IonTorrent PGM</strong></td>
<td style="background: #FFCC00;"><strong>Illumina MiSeq</strong></td>
</tr>
<tr>
<td><strong>Instrument Cost:</strong></td>
<td>$108,000</td>
<td>$80,490</td>
<td>$125,000</td>
</tr>
<tr>
<td><strong>Sample Prep:</strong></td>
<td>Emulsion PCR</td>
<td>Emulsion PCR</td>
<td>On-instrument</td>
</tr>
<tr>
<td><strong>Run Time:</strong></td>
<td style="text-align: left;">4h</td>
<td style="text-align: left;">3h</td>
<td style="text-align: left;">27h</td>
</tr>
<tr>
<td><strong>Cost per Run:</strong></td>
<td>$1,100</td>
<td>$425 (316 chip)</td>
<td>$750</td>
</tr>
<tr>
<td><strong>Throughput/run:</strong></td>
<td>71-72 Mbp</td>
<td>260-304 Mbp</td>
<td>1,653 Mbp</td>
</tr>
<tr>
<td><strong>Avg. Read Length:</strong></td>
<td>522 bp</td>
<td>123 bp</td>
<td>2 x 150 bp</td>
</tr>
<tr>
<td><strong>Reads Aligned:</strong></td>
<td>99%</td>
<td>90%</td>
<td>99%</td>
</tr>
</tbody>
</table>
<p>Note that the MiSeq throughput was enough that the authors multiplexed some other samples on the same run; the actual dataset generated for the comparison totaled 250 Mbp.</p>
<h2>Base Quality Score</h2>
<p>Each instrument manufacturer has its own software algorithm to generate base qualities, so a direct comparison of these is difficult. To address this, the authors recalibrated base qualities by alignment to the reference genome. Their quality score takes into account the number of matches and mismatches between read and reference sequence, since these generally represent sequencing errors. By this metric, <span style="color: #800000;"><strong>the MiSeq produced the highest-quality reads with few mismatches and virtually no indel errors</strong></span>. There was generally good agreement between this score and the one provided by the manufacturer&#8217;s software, though the PGM slightly under-estimated base quality and the other instruments slightly over-estimated it.</p>
<h2>Homopolymer Errors on 454 and Ion Torrent</h2>
<p>The 454 sequencing platform is infamous for sequencing errors associated with runs of a single base (homopolymers). Indeed, the base quality recalibration revealed 0.38 indel errors per 100 sequenced bases, or 1.74 indels per read. This issue was a concern for the IonTorrent platform as well; homopolymer-associated errors were quite obvious in the first <strong><a title="First Look at IonTorrent 316 sequencing data" href="http://massgenomics.org/2011/06/first-look-data-from-iontorrents-316-chip.html" target="_blank">public release of IonTorrent 316 chip data</a></strong> despite the spoken assurances from Jonathan Rothberg at AGBT 2010 when the question was raised by a certain blogger.</p>
<p>In the current study, homopolymer issues were again apparent on the Ion Torrent PGM platform; there were 1.5 indel errors per 100 bases, or 1.72 indels per read. Even homopolymers of 2-3 bases caused a significant number of sequencing errors. This put the PGM at a disadvantage for sequence assembly; it had large numbers of gaps in its assemblies relative to the other two platforms, likely because it could not match the accuracy of the MiSeq or the read length of the GS Junior.</p>
<h2>Comparison of De Novo Assemblies</h2>
<p>Speaking of assemblies, we must appreciate the work of Nick Loman and his co-authors in dutifully generating assemblies with four different assembly programs (MIRA, Newbler, Velvet, and CLC Assembly Cell). That&#8217;s a lot of work. Depending on how assemblies were generated, they fell into two groups;</p>
<ol>
<li>Both IonTorrent PGM datasets, single 454 GS Junior runs, and single-end (ignoring pairing) MiSeq data yielded heavily fragmented assemblies</li>
<li>Combining both GS Junior runs, or utilizing the read pairing information in MiSeq, yielded less fragmented assemblies.</li>
</ol>
<p>None of the assemblies aligned unambiguously to cover 100% of the high-quality reference that was generated from long-read and long-insert data.  Contigs from the 454 data covered a greater proportion of the reference (96.28%) than MiSeq (96.05%) or PGM (95.4%).</p>
<h2>Choosing a Benchtop Sequencer</h2>
<p>To their credit, the authors find advantages for each benchtop sequencer. The GS Junior, though affected by homopolymer issues, yielded the longest read lengths and the greatest assembly coverage. The MiSeq offered the highest throughput and sequencing accuracy; it assemblies also generated accurate MLST (multi-locus sequence typing) profiles for the <em>E. coli strain</em>. It&#8217;s also the only platform that doesn&#8217;t require emulsion PCR as part of the sample prep. The IonTorrent is the lowest-price instrument, and offers greater flexibility in reagent costs because three different chips are available. It&#8217;s also a platform undergoing rapid development and improvement.</p>
<p><strong>There&#8217;s no clear winner among benchtop sequencers</strong> in this comparison, meaning that researchers will have to consider all of the pros and cons to make the choice that&#8217;s best for their individual needs.</p>
<p><span style="text-decoration: underline;"><strong>References </strong></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+biotechnology&amp;rft_id=info%3Apmid%2F22522955&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Performance+comparison+of+benchtop+high-throughput+sequencing+platforms.&amp;rft.issn=1087-0156&amp;rft.date=2012&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Loman+NJ&amp;rft.au=Misra+RV&amp;rft.au=Dallman+TJ&amp;rft.au=Constantinidou+C&amp;rft.au=Gharbia+SE&amp;rft.au=Wain+J&amp;rft.au=Pallen+MJ&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+Bioinformatics%2C+Microbiology">Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, &amp; Pallen MJ (2012). Performance comparison of benchtop high-throughput sequencing platforms. <span style="font-style: italic;">Nature biotechnology</span> PMID: <a href="http://www.ncbi.nlm.nih.gov/pubmed/22522955" rev="review">22522955</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://massgenomics.org/2012/04/comparison-of-benchtop-sequencers.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://massgenomics.org/2012/04/comparison-of-benchtop-sequencers.html</feedburner:origLink></item>
		<item>
		<title>Grants and Jobs at the Genome Institute</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/Lt_HZqr1IFI/grants-and-jobs-at-the-genome-institute.html</link>
		<comments>http://massgenomics.org/2012/04/grants-and-jobs-at-the-genome-institute.html#comments</comments>
		<pubDate>Fri, 20 Apr 2012 17:42:35 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1683</guid>
		<description><![CDATA[One of the world leaders in genomics and next-generation sequencing is ramping up. The Genome Institute at Washington University has had a busy year already. In January, TGI and its collaborators published three Nature papers that came online the same day: one on the genetic basis of an aggressive pediatric leukemia, another on genomic and [...]]]></description>
			<content:encoded><![CDATA[<p>One of the world leaders in genomics and next-generation sequencing is ramping up. The Genome Institute at Washington University has had a busy year already. In January, TGI and its collaborators published three <em>Nature</em> papers that came online the same day: one on the <strong><a title="Genetic basis of aggressive pediatric leukemia" href="http://www.massgenomics.org/2012/01/genetic-basis-of-an-aggressive-pediatric-leukemia.html">genetic basis of an aggressive pediatric leukemia</a></strong>, another on genomic and epigenetic <strong><a title="Genomic and epigenetic analyses of retinoblastoma" href="http://www.nature.com/nature/journal/vaop/ncurrent/full/nature10733.html">analyses of childhood retinoblastoma</a></strong>, and a third on <strong><a title="Tumor Evolution in AML" href="http://www.massgenomics.org/2012/01/a-tumor-evolved-relapsed-acute-myeloid-leukemia.html">tumor evolution in relapsed AML</a></strong>. In February, the funding period began for &#8220;A Turnkey System for High-Throughput Variant Discovery and Interpretation&#8221;, a U01 grant under which TGI will improve and share its genome analysis tools with the research community. And now, there are job openings in four different groups at the Genome Institute.</p>
<p><strong><span style="text-decoration: underline;">Contents</span></strong><br />
<strong><a title="Informatics Grant" href="#informatics-grant">Informatics Grant</a></strong><br />
<strong><a title="Job opening: human genetics staff scientist" href="#human-genetics-staff-job">Staff Scientist in Human Genetics</a></strong><br />
<strong><a title="Job opening: Medical Genomics Postdoc" href="#medical-genomics-postdoc">Postdoc Research Associate in Medical Genomics</a></strong><br />
<strong><a title="Job opening: Parasite Genomics Postdoc" href="#parasite-genomics-postdoc">Postdoc Research Associate in Parasite Genomics</a></strong><br />
<strong><a title="Job opening: Analysis Programmer" href="#analysis-programmer-job">Programmer/Analyst in Analysis Pipeline Group</a></strong></p>
<h2><a name="informatics-grant"></a>Next-Gen Sequencing Informatics Grant</h2>
<p>TGI won a four-year, $805,000 grant to develop its analysis pipelines into <a title="A turnkey system for variant discovery and interpretation" href="http://projectreporter.nih.gov/project_info_description.cfm?aid=8237076&amp;icde=11352336&amp;ddparam=&amp;ddvalue=&amp;ddsub=" target="_blank">A Turnkey system for High-Throughput Variant Discovery and Interpretation</a> (NIH project link), one of <a title="Informatics Grants in 2012" href="http://www.genomeweb.com/informatics/funding-update-nih-bioinformatics-grants-awarded-dec-7-2011-%E2%80%94-feb-9-2012" target="_blank">several informatics grants</a> reported by GenomeWeb&#8217;s <em>BioInform</em> last month. TGI has spent years developing a computational framework and innovative tools for NGS analysis, with a particular emphasis on variant discovery and annotation. The goal of the project is to make these tools available to the wider community, both individually and as part of a complete informatics solution from alignment to detection to interpretation. This &#8220;turnkey system&#8221; will be flexible and powerful enough to be adopted by experienced laboratories, and user-friendly enough to give push-button analysis capabilities to groups with little bioinformatics expertise.</p>
<p>Bottom line, anyone will be able to run &#8220;Washington University Genome Institute&#8221; analysis on their sequencing datasets with little bioinformatics expertise.</p>
<h2><a name="human-genetics-staff-job"></a>Staff Scientist in Human Genetics</h2>
<p>TGI&#8217;s Human Genetics Group is looking for a <strong><a title="Human Genetics Job Opening" href="https://jobs.wustl.edu/psp/APPLHRMS/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_JOB_DTL&amp;JobOpeningId=23423" target="_blank">statistical geneticist or biostatistician</a></strong> to work with a dedicated team of researchers investigating inherited human diseases. As you might expect for a major genome center, there are many projects from small family-based studies to large trio studies to massive studies of complex disease involving thousands of unrelated samples. We are looking for someone to help evaluate, design, and execute statistical analysis plans for sequencing projects.</p>
<p>For more details, see <strong><a title="Staff Scientist, Human Genetics Job Opening" href="https://jobs.wustl.edu/psp/APPLHRMS/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_JOB_DTL&amp;JobOpeningId=23423" target="_blank">Job Posting 23423</a></strong>.</p>
<h2><a name="medical-genomics-postdoc"></a>Postdoc Research Associate in Medical Genomics</h2>
<p>The Medical Genomics Group is looking for a postdoc research associate in the area of cancer genomics. There are many such projects here, ranging from studies of a single tumor to large-scale studies involving thousands of samples. The ultimate goal is to translate discoveries enabled by next-gen sequencing into medically actionable information. As such, an individual who can assist in the development, implementation, and application of algorithms to characterize and interpret sequence variation in the context of cancer is needed.</p>
<p>For more information, see <strong><a title="Cancer Genomics Job Opening" href="https://jobs.wustl.edu/psp/APPLHRMS/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_JOB_DTL&amp;JobOpeningId=23424" target="_blank">Job Posting 23424</a></strong>.</p>
<h2><a name="parasite-genomics-postdoc"></a>Postdoc Research Associate in Parasite Genomics</h2>
<p>There is also a postdoc research associate position open in parasite genomics, as part of an established and successful <strong><a title="Makedonka Mitreva Group" href="http://genome.wustl.edu/people/mitreva_makedonka" target="_blank">research group</a></strong> focused on integrating ‘omics’ approaches aimed at understanding organisms at a molecular level. This group is working on comparative analysis of parasitic helminthes to identify conserved and/or taxonomically restricted proteins that may prove useful as antiparasitic drugs. The responsibilities will include design, development, testing, and implementation of software applications for comparative analyses.</p>
<p>For more information, see <strong><a title="Parasite Genomics Job Opening" href="https://jobs.wustl.edu/psp/APPLHRMS/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_JOB_DTL&amp;JobOpeningId=23361" target="_blank">Job Posting 23361</a></strong>.</p>
<h2><a name="analysis-programmer-job"></a>Programmer/Analyst in Analysis Pipeline Group</h2>
<p>The analysis pipeline group has a position open for a <strong><a title="Software engineer job opening" href="https://jobs.wustl.edu/psp/APPLHRMS/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_JOB_DTL&amp;JobOpeningId=23269" target="_blank">software engineer</a></strong> in data management &amp; compression. The job will entail working on a team of 20 software engineers on next-gen sequence analysis pipelines, focusing on utilizing the information management capabilities of the analysis system to migrate data appropriately between different tiers of storage and eliminate data duplication. Responsibilities will include development, integration, and support of software tools/pipelines/databases in collaboration with data analysts.</p>
<p>For details, see <strong><a title="Programmer/Analyst Job Opening" href="https://jobs.wustl.edu/psp/APPLHRMS/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_JOB_DTL&amp;JobOpeningId=23269" target="_blank">Job Posting 23269</a></strong>.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://massgenomics.org/2012/04/grants-and-jobs-at-the-genome-institute.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://massgenomics.org/2012/04/grants-and-jobs-at-the-genome-institute.html</feedburner:origLink></item>
		<item>
		<title>The genetic architecture of triple-negative breast cancer</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/XIYtDUC21UY/the-genetic-architecture-of-triple-negative-breast-cancer.html</link>
		<comments>http://massgenomics.org/2012/04/the-genetic-architecture-of-triple-negative-breast-cancer.html#comments</comments>
		<pubDate>Thu, 19 Apr 2012 14:22:17 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://massgenomics.org/?p=1721</guid>
		<description><![CDATA[Triple-negative breast cancer (TNBC), a tumor type defined by its lack of estrogen receptor, progesterone receptor, and Her2 (ERBB2) amplification, accounts for 16% of breast cancers. This clinically defined tumor type overlaps substantially but not completely with &#8220;basal-like&#8221; breast cancer, a classification based upon gene expression signature. This is a highly heterogeneous disease with a [...]]]></description>
			<content:encoded><![CDATA[<p>Triple-negative breast cancer (TNBC), a tumor type defined by its lack of estrogen receptor, progesterone receptor, and Her2 (ERBB2) amplification, accounts for 16% of breast cancers. This clinically defined tumor type overlaps substantially but not completely with &#8220;basal-like&#8221; breast cancer, a classification based upon gene expression signature. This is a highly heterogeneous disease with a <strong>higher risk of recurrence</strong> in the absence of systemic therapy.</p>
<p>This month in <em>Nature</em>, researchers from BC Cancer Agency have characterized the landscape of genomic aberrations in 104 TNBC cases with a <strong>combination of whole-genome sequencing, exome sequencing, RNA-seq, and high-density SNP arrays</strong>. Using ultra-deep targeted resequencing, the authors validated ~2,500 somatic mutations and characterized their frequencies among heterogeneous clonal populations in each tumor.</p>
<div class="mceTemp">
<dl class="wp-caption alignnone" style="width: 439px;">
<dt class="wp-caption-dt"><a href="http://massgenomics.org/wp-content/uploads/2012/04/Basal-Circos-plot.jpg"><img class="size-full wp-image-1730" title="Basal TNBC Circos Plot" src="http://massgenomics.org/wp-content/uploads/2012/04/Basal-Circos-plot.jpg" alt="Genomics of Basal TNBC" width="429" height="429" /></a></dt>
</dl>
<address class="wp-caption-dd">Genomic architecture (Basal TNBC), Shah et al, Nature 2012</address>
</div>
<p>This is a complex study that&#8217;s hard to digest (the supplemental material had over 140 pages &#8211; come on, is that really necessary?) so I&#8217;ll do my best to break it down. I believe there are three highlights: frequent gene alterations in TNBC, under-representation of mutations in mRNA sequences, and a continuous distribution of mutation frequencies within tumors.</p>
<h2 style="font-size: 14px;">Genetic Alterations in Triple-Negative Breast Cancer</h2>
<p>The most frequently mutated gene should be familiar to you: <em>TP53</em>, which harbored validated somatic mutations in 62% of basal and 43% of non-basal TNBCs. Other patterns of alteration were as follows:</p>
<ul>
<li>Significantly mutated genes included <em>TP53, PIK3CA, RB1, PTEN, MYO3A</em>, and <em>GH1</em>. Here, significant means that the gene harbored more mutations than expected from background random mutation processes. The larger the gene, the more likely it is to catch a random mutation. That&#8217;s why <em>USH2A</em>, which was mutated in 9.2% of cases, was not significant (it&#8217;s a large gene).</li>
<li>Recurrent but not statistically significant mutations were observed in the synuclein genes (<em>SYNE1</em>/<em>SYNE2</em>), <em>BRCA2</em>, <em>BRAF</em>, <em>NRAS</em>, <em>ERBB2</em>, and <em>ERBB3</em>.</li>
</ul>
<h3 style="font-size: 13px;">Somatic Copy Number Alterations (CNAs)</h3>
<div>The patterns of somatic copy number changes, as assessed by high-density SNP array, suggest widespread segmental CNA instability:</div>
<div>
<div class="mceTemp">
<dl class="wp-caption alignnone" style="width: 395px;">
<dt class="wp-caption-dt"><a href="http://massgenomics.org/wp-content/uploads/2012/04/TNBC-CopyNumber.jpg"><img class=" wp-image-1725 " title="Somatic copy number in TNBC" src="http://massgenomics.org/wp-content/uploads/2012/04/TNBC-CopyNumber.jpg" alt="Somatic copy number in TNBC" width="385" height="182" /></a></dt>
</dl>
<address class="wp-caption-dd">Shah et al, Nature 2012 (Supp Fig 3)</address>
</div>
</div>
<div>These results are largely consistent with a separate study in the same issue that examined CNAs in 2,000 triple-negative breast cancers. I&#8217;ll have to cover that another time. Some of the known CNA patterns evident above include gains of chromosomes 1q, 3q, and 8q (where MYC is located). Note frequent deletions across many chromosome arms or entire chromosomes, many of which contain tumor suppressor genes (e.g. <em>TP53</em> on chromosome 17).</div>
<h2 style="font-size: 14px;">Expression of Somatic Mutations</h2>
<div>This group at BC Cancer Agency is a leader in transcriptome sequencing (RNA-Seq), which is a key component of this study. Strikingly, the authors found that just 36% of validated somatic mutations discovered in genomic DNA were present in mRNA transcripts. This number is a little deceptive, and I&#8217;ll tell you why. Supplementary figure 2 offers a summary of the expression of validated mutations across all cases with RNA-seq data:</div>
<div>
<div class="mceTemp">
<dl class="wp-caption alignnone" style="width: 298px;">
<dt class="wp-caption-dt"><a href="http://massgenomics.org/wp-content/uploads/2012/04/TNBC-Mutation-RNAseq-Expression.jpg"><img class="size-full wp-image-1726" title="TNBC-Mutation-RNAseq-Expression" src="http://massgenomics.org/wp-content/uploads/2012/04/TNBC-Mutation-RNAseq-Expression.jpg" alt="Expression of somatic mutations in TNBC" width="288" height="126" /></a></dt>
</dl>
<address class="wp-caption-dd">Mutation Expression Patterns (Shah et al, Nature 2012)</address>
</div>
</div>
<div>Notably, 23% of somatic mutations occur in genes with no observed transcripts: there&#8217;s no allelic effect of the mutation; the genes just aren&#8217;t expressed in any form. That leaves:</div>
<div>
<ul>
<li><strong>40.56% of genes where only the wild-type allele is expressed</strong>. Here, it&#8217;s possible that the mutation alters mRNA expression or stability and thus only the non-mutated allele is seen.</li>
<li><strong>31.48% where both alleles are expressed</strong>. The mutation may not affect expression, but it could still alter the translation or function of the encoded protein.</li>
<li><strong>5% where only the mutant allele is expressed</strong>. This could be due to genomic loss of the wild-type allele (LOH), mutations on the X-chromosome (one copy of which is inactivated), or even a gain-of-function mutation causing aberrant gene expression.</li>
</ul>
</div>
<div>Bottom line, just over one-third of somatic mutations in the genome are present in the transcriptome. This has important implications for clinical cancer genome sequencing: just because a druggable mutation is present doesn&#8217;t mean it&#8217;s expressed.</div>
<h2 style="font-size: 14px;">Continuous Distribution of Somatic Mutations</h2>
<p>With ultra-deep targeted sequencing, it&#8217;s possible to estimate the allele frequency of a somatic mutation with high accuracy, and from that, to infer the relative proportion of tumor cells harboring that mutation. A heterozygous founder mutation, for example, would be present in virtually all tumor cells and have a mutation frequency of 50% in diploid cells. Perhaps surprisingly, the authors find that somatic mutations occur at a continuous distribution in TNBC, and this appears independent of copy number alterations and tumor cellularity.</p>
<p>Part of this observation may technical in nature (i.e. false negatives in mutation discovery). However, this phenomenon has been noted in other epithelial cancers suggesting that the mutation content of cells within a single tumor may be differently shaped by biological processes and mutational mechanisms. It reinforces the notion that tumors (and TNBC in particular) are not a homogeneous mass of identical cells, but a collection of distinct sub-populations of cells evolving somewhat independently of one another. This is probably why they&#8217;re sometimes difficult to eliminate: you might destroy most of the subpopulations with therapy, but one or more minor clones could persist.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Apmid%2F22495314&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=The+clonal+and+mutational+evolution+spectrum+of+primary+triple-negative+breast+cancers.&amp;rft.issn=0028-0836&amp;rft.date=2012&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Shah+SP&amp;rft.au=Roth+A&amp;rft.au=Goya+R&amp;rft.au=Oloumi+A&amp;rft.au=Ha+G&amp;rft.au=Zhao+Y&amp;rft.au=Turashvili+G&amp;rft.au=Ding+J&amp;rft.au=Tse+K&amp;rft.au=Haffari+G&amp;rft.au=Bashashati+A&amp;rft.au=Prentice+LM&amp;rft.au=Khattra+J&amp;rft.au=Burleigh+A&amp;rft.au=Yap+D&amp;rft.au=Bernard+V&amp;rft.au=McPherson+A&amp;rft.au=Shumansky+K&amp;rft.au=Crisan+A&amp;rft.au=Giuliany+R&amp;rft.au=Heravi-Moussavi+A&amp;rft.au=Rosner+J&amp;rft.au=Lai+D&amp;rft.au=Birol+I&amp;rft.au=Varhol+R&amp;rft.au=Tam+A&amp;rft.au=Dhalla+N&amp;rft.au=Zeng+T&amp;rft.au=Ma+K&amp;rft.au=Chan+SK&amp;rft.au=Griffith+M&amp;rft.au=Moradian+A&amp;rft.au=Cheng+SW&amp;rft.au=Morin+GB&amp;rft.au=Watson+P&amp;rft.au=Gelmon+K&amp;rft.au=Chia+S&amp;rft.au=Chin+SF&amp;rft.au=Curtis+C&amp;rft.au=Rueda+OM&amp;rft.au=Pharoah+PD&amp;rft.au=Damaraju+S&amp;rft.au=Mackey+J&amp;rft.au=Hoon+K&amp;rft.au=Harkins+T&amp;rft.au=Tadigotla+V&amp;rft.au=Sigaroudinia+M&amp;rft.au=Gascard+P&amp;rft.au=Tlsty+T&amp;rft.au=Costello+JF&amp;rft.au=Meyer+IM&amp;rft.au=Eaves+CJ&amp;rft.au=Wasserman+WW&amp;rft.au=Jones+S&amp;rft.au=Huntsman+D&amp;rft.au=Hirst+M&amp;rft.au=Caldas+C&amp;rft.au=Marra+MA&amp;rft.au=Aparicio+S&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CHealth%2CGenetics%2C+Bioinformatics%2C+Cancer%2C+Reproductive+Health">Shah SP, Roth A, Goya R, Oloumi A, Ha G, Zhao Y, Turashvili G, Ding J, Tse K, Haffari G, Bashashati A, Prentice LM, Khattra J, Burleigh A, Yap D, Bernard V, McPherson A, Shumansky K, Crisan A, Giuliany R, Heravi-Moussavi A, Rosner J, Lai D, Birol I, Varhol R, Tam A, Dhalla N, Zeng T, Ma K, Chan SK, Griffith M, Moradian A, Cheng SW, Morin GB, Watson P, Gelmon K, Chia S, Chin SF, Curtis C, Rueda OM, Pharoah PD, Damaraju S, Mackey J, Hoon K, Harkins T, Tadigotla V, Sigaroudinia M, Gascard P, Tlsty T, Costello JF, Meyer IM, Eaves CJ, Wasserman WW, Jones S, Huntsman D, Hirst M, Caldas C, Marra MA, &amp; Aparicio S (2012). The clonal and mutational evolution spectrum of primary triple-negative breast cancers. <span style="font-style: italic;">Nature</span> PMID: <a href="http://www.ncbi.nlm.nih.gov/pubmed/22495314" rev="review">22495314</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://massgenomics.org/2012/04/the-genetic-architecture-of-triple-negative-breast-cancer.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://massgenomics.org/2012/04/the-genetic-architecture-of-triple-negative-breast-cancer.html</feedburner:origLink></item>
		<item>
		<title>Fast, Efficient Short Read Alignment with Gaps: Bowtie 2</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/cz8J3y4BQH0/fast-efficient-short-read-alignment-with-gaps-bowtie-2.html</link>
		<comments>http://massgenomics.org/2012/04/fast-efficient-short-read-alignment-with-gaps-bowtie-2.html#comments</comments>
		<pubDate>Thu, 12 Apr 2012 15:33:01 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1712</guid>
		<description><![CDATA[I&#8217;ve always been a fan of Bowtie, one of the first algorithms to leverage Burrows-Wheeler Transform for short read alignment. When I first encountered it in 2008, it was incredibly fast. Faster than Maq and Novoalign, two of the early popular algorithms for read mapping. Perhaps more importantly, it was ultra memory-efficient, enabling one to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/04/Bowtie-Logo-with-DNA.jpg"><img class="alignright size-full wp-image-1718" title="Bowtie-Logo-with-DNA" src="http://www.massgenomics.org/wp-content/uploads/2012/04/Bowtie-Logo-with-DNA.jpg" alt="Bowtie Aligner" width="200" height="100" /></a>I&#8217;ve always been a fan of <strong><a title="Bowtie" href="http://bowtie-bio.sourceforge.net/index.shtml" target="_blank">Bowtie</a></strong>, one of the first algorithms to leverage Burrows-Wheeler Transform for short read alignment. When I first encountered it in 2008, it was incredibly fast. Faster than Maq and Novoalign, two of the early popular algorithms for read mapping. Perhaps more importantly, it was ultra memory-efficient, enabling one to map millions of reads on a typical desktop computer. You&#8217;d still need the technical expertise to do anything with the alignments, but hey, it was a start. I liked it enough that the first version of <strong><a title="VarScan" href="http://www.massgenomics.org/varscan">VarScan</a></strong> included support for native Bowtie alignment formats (this was before the widespread adoption of <a title="SAM/BAM Format" href="http://samtools.sourceforge.net/SAM1.pdf" target="_blank">SAM/BAM format</a>).</p>
<h2>Early Bowtie Aligner Limitations</h2>
<p>Despite these features, Bowtie had a few limitations: First, it r<strong>equired all reads to have the same length</strong> and had an upper read length maximum that made it essentially incompatible with Roche/454 data. This wasn&#8217;t a big problem, because there were other aligners for 454 data that could handle its moderate level of throughput.</p>
<p>Even though it was faster, Bowtie was less suitable for paired-end data than Maq because it didn&#8217;t leverage the mate pairing information to improve alignment &#8211; it simply attempted to map each read in the mate pair independently, then went back to calculate the distance between them. This was kind of a bummer, but still made Bowtie quite suitable for fragment-end data which had the majority in 2008.</p>
<p>Another Bowtie limitation was that it <strong>didn&#8217;t align reads with gaps</strong>. In other words, if a read contained an insertion or deletion relative to the reference sequence, Bowtie wouldn&#8217;t map it. Side note: This also would have prevented Bowtie from working on Roche/454 data (and later IonTorrent data) due to the known homopolymer-associated sequencing errors. At the time, however, everyone was still struggling with SNP detection in next-gen sequencing data, so ungapped alignments weren&#8217;t a dealbreaker.</p>
<h2>Indels and Gapped Alignments</h2>
<p>In time, though, as our capability to detect insertion/deletion variants (indels) increased &#8212; due to algorithmic developments as well as longer reads &#8212; gapped alignment became more and more important. Benjamin Langmead, the developer and first author, once mentioned to me that it was the most-requested feature for Bowtie. The demand undoubtedly continued to increase as aligners such as BWA offered similar speed and memory performance, while making efforts to align reads across gaps. In paired-end data with one read anchored, BWA will even perform a more sensitive Smith-Waterman alignment to align its mate while allowing gaps. There was also Novoalign, a commercial aligner, which seemed the most sensitive to gaps in reads according to findings by Heng Li, myself, and others.</p>
<p>Interestingly, the Pindel algorithm, which identifies indels by splitting up the unmapped mate in a read pair where only one read mapped, nicely complements this limitation. In fact, the original Bowtie software paired with Pindel seems like it would be a powerful combination for efficient read mapping with indel detection.</p>
<h2>Bowtie 2: Fast Alignment with Gaps</h2>
<p>Several subsequent releases of Bowtie addressed some of the early limitations, and continued to increase its performance. And finally, we got the gapped alignment feature we were waiting for in <strong>Bowtie 2</strong>, which was just published in <em>Nature Methods</em>.</p>
<p style="text-align: center;"><a href="http://www.nature.com/nmeth/journal/v9/n4/full/nmeth.1923.html" target="_blank"><img class="aligncenter size-full wp-image-1714" style="border: 1px solid black;" title="Bowtie-screenshot" src="http://www.massgenomics.org/wp-content/uploads/2012/04/Bowtie-screenshot.jpg" alt="Bowtie 2 aligner" width="367" height="123" /></a></p>
<p>In the publication, Langmead and Salzberg describe a sort of hybrid algorithm that allows efficient gapped alignment of short reads. It essentially has four steps to it:</p>
<ol>
<li>&#8220;Seed&#8221; substrings, which are short segments that are likely to have unique matches in the genome, are extracted from each read</li>
<li>Seeds are aligned to the reference genome in ungapped fashion using the compressed index.</li>
<li>Seed placements in the genome are prioritized to find the most likely map location(s)</li>
<li>Seeds are extended into full alignments (allowing gaps) with a hardware-accelerated dynamic programming algorithm</li>
</ol>
<p>Here, Bowtie leverages the speed of its &#8220;full-text minute index&#8221; for ungapped alignment to rapidly place seed segments without gaps, and then an accelerated algorithm to do the full read alignment with gaps. According to the authors, it&#8217;s a combination that allows for high speed, sensitivity, and accuracy.</p>
<p>The ability of this new Bowtie algorithm to align with gaps will also aid RNA-Seq analysis using the TopHat package, which utilizes Bowtie as its core aligner, because the gaps that are present in mature mRNA are likely to be better handled.</p>
<p>Bottom line, even if you&#8217;re using something else to align reads right now, Bowtie might be worth a look.</p>
<p><strong>Download Bowtie 2</strong>:<strong> <a title="Bowtie 2 aligner" href="http://bowtie-bio.sourceforge.net/bowtie2/index.shtml" target="_blank">http://bowtie-bio.sourceforge.net/bowtie2/index.shtml</a></strong></p>
<p><span style="text-decoration: underline;"><strong>References</strong></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+methods&amp;rft_id=info%3Apmid%2F22388286&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Fast+gapped-read+alignment+with+Bowtie+2.&amp;rft.issn=1548-7091&amp;rft.date=2012&amp;rft.volume=9&amp;rft.issue=4&amp;rft.spage=357&amp;rft.epage=9&amp;rft.artnum=&amp;rft.au=Langmead+B&amp;rft.au=Salzberg+SL&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputer+Science+%2F+Engineering%2CGenetics%2C+Bioinformatics%2C+Algorithms%2C+Parallel+and+Distributed+Computing%2C+Software+Engineering">Langmead B, &amp; Salzberg SL (2012). Fast gapped-read alignment with Bowtie 2. <span style="font-style: italic;">Nature methods, 9</span> (4), 357-9 PMID: <a href="http://www.ncbi.nlm.nih.gov/pubmed/22388286" rev="review">22388286</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://massgenomics.org/2012/04/fast-efficient-short-read-alignment-with-gaps-bowtie-2.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://massgenomics.org/2012/04/fast-efficient-short-read-alignment-with-gaps-bowtie-2.html</feedburner:origLink></item>
		<item>
		<title>Genetic Evolution of Secondary AML from MDS</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/sTCbYx05eMY/genetic-evolution-of-secondary-aml-from-mds.html</link>
		<comments>http://massgenomics.org/2012/03/genetic-evolution-of-secondary-aml-from-mds.html#comments</comments>
		<pubDate>Wed, 14 Mar 2012 21:17:31 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1687</guid>
		<description><![CDATA[Contents: Whole-genome Sequencing • Recurrently Mutations • Clonal Evolution • References Myelodysplastic syndromes (MDS) are a group of disorders of ineffective blood production and the most common cause of acquired bone marrow failure in adults. One-third of cases go on to develop secondary AML (sAML), yet there remains uncertainty among patients, insurers, and funding agencies [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Contents: </strong> <strong><a title="Whole Genome Sequencing of AML" href="#mds-aml-wgs">Whole-genome Sequencing</a></strong> • <strong><a title="Recurrent mutations in MDS and AML" href="#mds-aml-recurrent-mutations">Recurrently Mutations</a></strong> • <strong><a title="Clonal Evolution from MDS to AML" href="#mds-aml-clonality">Clonal Evolution</a></strong> • <strong><a title="MDS to AML References" href="#mds-aml-references">References</a></strong><br />
Myelodysplastic syndromes (MDS) are a group of disorders of ineffective blood production and the most common cause of acquired bone marrow failure in adults. One-third of cases go on to develop secondary AML (sAML), yet there remains uncertainty among patients, insurers, and funding agencies about whether the myelodysplastic syndromes are actually cancers. A study online today at the <em>New England Journal of Medicine</em> has characterized the <strong><a title="Genetic evolution from MDS to AML" href="http://www.nejm.org/doi/full/10.1056/NEJMoa1106968?query=featured_home" target="_blank">genetic evolution from MDS to sAML</a></strong> using whole-genome sequencing.<strong> </strong></p>
<h2 style="font-size: 14px;"><a></a>Whole-genome Sequencing of sAML</h2>
<p>Matthew J. Walter and colleagues of the Washington University School of Medicine performed whole-genome sequencing of tumor samples and matched normal DNA from seven patients with secondary AML. For each subject, hundreds of somatic mutations were genotyped in sAML and MDS-stage samples to characterize the clonal architecture of each tumor. <strong>Figure 1A</strong> from the paper demonstrates the  resolution that can be obtained from deep resequencing of somatic  mutations in both sAML and MDS samples:</p>
<div>
<dl id="attachment_1688">
<dt><a href="http://www.massgenomics.org/wp-content/uploads/2012/03/Mutant-Allele-Frequency.jpg"><img title="Mutant-Allele-Frequency" src="http://www.massgenomics.org/wp-content/uploads/2012/03/Mutant-Allele-Frequency.jpg" alt="Somatic mutation frequencies in MDS and AML" width="436" height="343" /></a></dt>
<address>Credit: Walter et al, NEJM (2012)</address>
</dl>
</div>
<p>Notice the five clusters (differently colored) representing five clonal populations. In yellow  (cluster 1) are mutations present in virtually all cells of both the  MDS and the sAML sample. In orange (cluster 2) are mutations present at  low frequency in MDS but enriched in sAML. Three more clusters (red,  purple, and black) along the y-axis represent mutations that were absent  in the MDS sample but acquired during the progression to sAML. The  patterns of these mutations suggest that <strong>sAML evolved from a clonal population of MDS cells</strong> that acquired new mutations along the way.</p>
<h2>Identification of Recurrently Mutated Genes</h2>
<p>In the very near future, it may become feasible and cost-effective to  perform whole-genome sequencing (WGS) on hundreds or thousands of  tumors of a certain type to exhaustively identify recurrently mutated  genes. Until then, <strong>WGS of a discovery cohort followed by extension screening in a larger cohort</strong> offers a powerful and cost-effective strategy. Two genes were already recurrently mutated in the 7 WGS cases: <em>RUNX1</em>, a known myeloid tumor suppressor, and <em>UMODL1</em>,  for which mutations were recently reported in multiple myeloma and  ovarian cancer. The authors extended their findings via targeted  screening for additional coding mutations in 200 AML cases. This enabled  the identification of 9 more recurrently mutated genes, for a total of  11.</p>
<p><strong>Recurrently Mutated Genes in MDS and sAML</strong></p>
<table border="1" cellspacing="0" cellpadding="1">
<tbody>
<tr>
<td align="center"><strong>Gene</strong></td>
<td align="center"><strong>Mutation(s)</strong></td>
</tr>
<tr>
<td><em>CDH23</em></td>
<td>1235insL</td>
</tr>
<tr>
<td><em>NPM1</em></td>
<td>W288fs</td>
</tr>
<tr>
<td><em>PTPN11</em></td>
<td>G60R</td>
</tr>
<tr>
<td><em>RUNX1</em></td>
<td>G170fs; del21q22.11</td>
</tr>
<tr>
<td><em>SMC3</em></td>
<td>e8-1 splice</td>
</tr>
<tr>
<td><em>STAG2</em></td>
<td>H738fs</td>
</tr>
<tr>
<td><em>TP53</em></td>
<td>V272M</td>
</tr>
<tr>
<td><em>U2AF1</em></td>
<td>S34F</td>
</tr>
<tr>
<td><em>UMODL1</em></td>
<td>T533P; V882M</td>
</tr>
<tr>
<td><em>WT1</em></td>
<td>D436E</td>
</tr>
<tr>
<td><em>ZSWIM4</em></td>
<td>P18A</td>
</tr>
</tbody>
</table>
<p>Notably, four of the genes (<em>CDH23, SMC3, UMODL1, </em>and<em> ZSWIM4</em>) had not been implicated in MDS or AML. A specific codon (34) in <em>U2AF1</em> harbored missense mutations in multiple AML tumors, suggesting a <strong>gain-of-function for the splicing factor</strong> encoded by that gene. The recurrent mutations in <em>STAG2</em>,  a gene located on the X-chromosome, were all protein truncation  mutations (nonsense or frameshift) suggesting that a loss-of-function of  this gene contributes to MDS and AML pathogenesis.</p>
<h2>Clonal Evolution: from MDS to AML</h2>
<p>By characterizing mutations from secondary AML tumors in the MDS  precursors for the same patient, the authors reconstructed the clonal  architecture of the disease from early to advanced stages. The findings  are summarized in Figure 2A:</p>
<div>
<dl id="attachment_1698">
<dt><a href="http://www.massgenomics.org/wp-content/uploads/2012/03/Clonal-Evolution-MDS-AML.jpg"><img title="Clonal-Evolution-MDS-AML" src="http://www.massgenomics.org/wp-content/uploads/2012/03/Clonal-Evolution-MDS-AML.jpg" alt="Clonal Evolution from MDS to AML" width="418" height="171" /></a></dt>
<address>Credit: Walter et al, NEJM (2012)</address>
</dl>
</div>
<p>In  all 7 cases, the results suggest a linear model of clonal evolution, in  which progression from MDS to sAML was characterized by persistence of a  single founder clone (defined by ~200-700 mutations) and the outgrowth  of at least one new subclone which contained dozens or hundreds of  additional mutations. In other words, <strong>a single population of MDS cells underwent multiple rounds of mutation and selection</strong>, giving rise to multiple subpopulations present in full-blow secondary AML.</p>
<p>Please go <strong><a title="Clonal architecture of MDS and AML" href="http://www.nejm.org/doi/full/10.1056/NEJMoa1106968?query=featured_home" target="_blank">read this fascinating study</a></strong> at the <em>New England Journal of Medicine</em>.</p>
<p><strong> </strong></p>
<p><span style="text-decoration: underline;"><strong>References</strong></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=New+England+Journal+of+Medicine&amp;rft_id=info%3A%2F&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Clonal+architecture+of+secondary+acute+myeloid+leukemia&amp;rft.issn=1533-4406&amp;rft.date=2012&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fwww.nejm.org%2F&amp;rft.au=Walter+MJ&amp;rft.au=Shen+D&amp;rft.au=Ding+L&amp;rft.au=Shao+J&amp;rft.au=Koboldt+DC&amp;rft.au=Chen+K&amp;rft.au=Larson+DE&amp;rft.au=McLellan+MD&amp;rft.au=Dooling+D&amp;rft.au=Abbott+R&amp;rft.au=Fulton+R&amp;rft.au=Magrini+V&amp;rft.au=Schmidt+H&amp;rft.au=Kalicki-Veizer+J&amp;rft.au=O%27Laughlin+M&amp;rft.au=Fan+X&amp;rft.au=Grillot+M&amp;rft.au=Witowski+S&amp;rft.au=Heath+S&amp;rft.au=Frater+JL&amp;rft.au=Eades+W&amp;rft.au=Tomasson+M&amp;rft.au=Westervelt+P&amp;rft.au=DiPersio+JF&amp;rft.au=Link+DC&amp;rft.au=Mardis+ER&amp;rft.au=Ley+TJ&amp;rft.au=Wilson+RK&amp;rft.au=Graubert+TA&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CGenetics%2C+Bioinformatics%2C+Cancer%2C+Evolutionary+Biology">Walter MJ, Shen D, Ding L, Shao J, Koboldt DC, Chen K, Larson DE, McLellan MD, Dooling D, Abbott R, Fulton R, Magrini V, Schmidt H, Kalicki-Veizer J, O&#8217;Laughlin M, Fan X, Grillot M, Witowski S, Heath S, Frater JL, Eades W, Tomasson M, Westervelt P, DiPersio JF, Link DC, Mardis ER, Ley TJ, Wilson RK, &amp; Graubert TA (2012). <a title="Clonal architecture of secondary acute myeloid leukemia" href="http://www.nejm.org/doi/full/10.1056/NEJMoa1106968?query=featured_home" target="_blank">Clonal architecture of secondary acute myeloid leukemia</a> <span style="font-style: italic;">New England Journal of Medicine</span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://massgenomics.org/2012/03/genetic-evolution-of-secondary-aml-from-mds.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://massgenomics.org/2012/03/genetic-evolution-of-secondary-aml-from-mds.html</feedburner:origLink></item>
		<item>
		<title>5 Things to Know About SAMtools Mpileup</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/AUpi7OpBkYU/5-things-to-know-about-samtools-mpileup.html</link>
		<comments>http://massgenomics.org/2012/03/5-things-to-know-about-samtools-mpileup.html#comments</comments>
		<pubDate>Fri, 02 Mar 2012 16:32:31 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1676</guid>
		<description><![CDATA[Next-generation sequencing instruments might be considered a disruptive technology. The incredible throughput of these machines, even 4-5 years ago, clearly mandated the development of a new generation of algorithms and data formats capable of storing, processing, and analyzing huge amounts of sequence data. One key achievement in next-generation sequencing bioinformatics was the specification of sequence [...]]]></description>
			<content:encoded><![CDATA[<p>Next-generation sequencing instruments might be considered a disruptive technology. The incredible throughput of these machines, even 4-5 years ago, clearly mandated the development of a new generation of algorithms and data formats capable of storing, processing, and analyzing huge amounts of sequence data. One key achievement in next-generation sequencing bioinformatics was the specification of <strong><a title="SAM format specification" href="http://samtools.sourceforge.net/SAM1.pdf" target="_blank">sequence alignment/map format</a></strong> (SAM) and its binary equivalent (BAM). These formats were widely adopted by a community of scientists desperate to have a common format in which to store next-gen sequencing reads and their alignments to a reference. BAM files quickly became a standard for the <a title="The Cancer Genome Atlas" href="http://cancergenome.nih.gov" target="_blank">Cancer Genome Atlas</a>, the <a title="The 1,000 Genomes Project" href="http://www.1000genomes.org" target="_blank">1,000 Genomes Project</a>, and other large-scale sequencing efforts. The formats were accompanied with a software package, <strong><a title="SAMtools" href="http://samtools.sourceforge.net" target="_blank">SAMtools</a></strong>, that is probably the most pervasive tool for next-gen sequencing in the world.</p>
<p>To aid in variant calling and other analyses, SAMtools can generate a pileup of read bases using the alignments to a reference sequence. There&#8217;s a lot you can do with pileup-like output, and indeed, SAMtools variant calling is quite popular. The actual command is <em>samtools mpileup</em>, and here are five things that you should know about it.</p>
<ol>
<li><strong>SAMtools <em>mpileup</em> has permanently replaced <em>pileup</em></strong>. Replaced as in the latter command no longer works. This is generally a good thing; mpileup can do nearly everything that pileup could, and a lot more. You still use it to generate pileup output for a single sample. However, some features have gone away, such as simple consensus calling with the <em>-c</em> parameter, and the option to output mapping qualities for each base (I think that was <em>-k</em>). Consensus calling can be done in <em>mpileup</em> with a couple of extra steps using <em>bcftools</em>; see the <a title="SAMtools mpileup and variant calling" href="http://samtools.sourceforge.net/mpileup.shtml" target="_blank">mpileup page</a> for details.</li>
<li><strong>Base alignment quality (BAQ) computation is turned on by default</strong>. BAQ is a phred-like score representing the probability that a read base is mis-aligned; it lowers the base quality score of mismatches that are near indels. This is to help rule out false positive SNP calls due to alignment artifacts near small indels. There have been recent suggestions, however, that BAQ may be too strict and cause real SNPs to be missed. Several users of the <span style="color: #ff0000;"><strong><a title="VarScan variant caller" href="http://varscan.sourceforge.net" target="_blank">VarScan variant caller </a></strong></span>have reported that its <span style="color: #800000;"><strong>read counts disagree with what is seen in IGV</strong></span>, or <a title="Somatic mutations missed due to BAQ" href="http://sourceforge.net/projects/varscan/forums/forum/1073559/topic/5061782" target="_blank">somatic mutations were missed</a> when mpileup was used instead of pileup. These issues are almost always due to BAQ&#8217;s downgrade of base qualities to 0 or 1. This adjustment can&#8217;t be seen in IGV, but it&#8217;s below VarScan&#8217;s default base quality threshold. You can disable BAQ with the <em>-B</em> parameter, or perform a more sensitive BAQ calculation with <em>-E</em>. I&#8217;ve heard that the latter option will be turned on by default in the next version of SAMtools.</li>
<li><strong>Analyze multiple samples at once</strong>. The principal feature to SAMtools <em>mpileup</em> is the ability to analyze data from multiple samples simultaneously. You do this by providing more than one BAM file. This feature is nice because it provides data across all samples on a per-position basis. The first three columns (chromosome, position, and reference base) are the same. Following those, you get three columns per BAM file indicating the read depth, bases, and base qualities for that sample at that position. The VarScan <em>mpileup2cns</em> command will take this raw input and call a genotype for each sample, as well as a consensus genotype based on the data from ALL samples. This is useful for detecting variants in low-coverage regions by leveraging data across samples. You can also use the <em>bcftools</em> pipeline for multi-sample calling.</li>
<li><strong>Rule out false positive with strand bias or poor mapping</strong>. Many groups working variant calling in next-generation sequencing have independently converged on several key factors that influence false positive rates. The <strong><a title="VarScan 2 paper" href="http://genome.cshlp.org/content/early/2012/02/02/gr.129684.111?top=1" target="_blank">VarScan 2 paper</a></strong>, for example, describes <span style="color: #ff0000;"><strong>9 empirically-derived filtering criteria</strong></span> that we use to identify and remove artifacts. The strand representation and number of mismatches in supporting reads are two important indicators of false positive arising from systematic alignment artifacts. SAMtools <em>mpileup</em> helps users address these issues as well: the -C parameter lets you downgrade the mapping quality of reads with lots of mismatches, and the -S parameter tells SAMtools to report a per-sample strand bias p-value.</li>
<li><strong>Random position retrieval that works</strong>. One of the most powerful features of <em>mpileup</em> is that you can specify a region with <em>-r chrom:start-stop</em> and it will report pileup output for the specified position(s). The old <em>pileup</em> command had this option, but took a long time because it looked at all positions and just reported the ones within your desired region. Instead, <em>mpileup</em> leverages BAM file indexing to retrieve data quite rapidly: In my experience, it takes about 1 second to retrieve the pileup for several samples at any given position in the human genome. Multi-sample, rapid random access has lots of uses for bio-informaticians; for example, I can retrieve all bases observed in all samples at a variant of interest to look at the evidence in each sample.</li>
</ol>
<p>These features are the results of hard work by Heng Li and others who contribute to the development and improvement of SAMtools. It&#8217;s great to see a key piece of software under continued, active development, and I think most of us look forward to what the next SAMtools has in store.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&amp;rft_id=info%3Apmid%2F19505943&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=The+Sequence+Alignment%2FMap+format+and+SAMtools.&amp;rft.issn=1367-4803&amp;rft.date=2009&amp;rft.volume=25&amp;rft.issue=16&amp;rft.spage=2078&amp;rft.epage=9&amp;rft.artnum=&amp;rft.au=Li+H&amp;rft.au=Handsaker+B&amp;rft.au=Wysoker+A&amp;rft.au=Fennell+T&amp;rft.au=Ruan+J&amp;rft.au=Homer+N&amp;rft.au=Marth+G&amp;rft.au=Abecasis+G&amp;rft.au=Durbin+R&amp;rft.au=1000+Genome+Project+Data+Processing+Subgroup&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputer+Science+%2F+Engineering%2CGenetics%2C+Bioinformatics%2C+Algorithms">Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, &amp; 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. <span style="font-style: italic;">Bioinformatics (Oxford, England), 25</span> (16), 2078-9 PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19505943">19505943</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://massgenomics.org/2012/03/5-things-to-know-about-samtools-mpileup.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://massgenomics.org/2012/03/5-things-to-know-about-samtools-mpileup.html</feedburner:origLink></item>
		<item>
		<title>Human Genetics Challenges in an Era of Cheap Sequencing</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/A_k8PhiAF8c/human-genetics-challenges-in-an-era-of-cheap-sequencing.html</link>
		<comments>http://massgenomics.org/2012/02/human-genetics-challenges-in-an-era-of-cheap-sequencing.html#comments</comments>
		<pubDate>Wed, 22 Feb 2012 15:26:25 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1639</guid>
		<description><![CDATA[Next-generation sequencing promises to reach unprecedented levels of throughput this year, driving down the cost of sequencing dramatically. Somewhere between the GridION, the Ion Proton, and the HiSeq2500, we may see the first single-day, $1,000-per-genome technologies in 2012. Even so, a 90% reduction in sequencing cost this year will not magically solve all medical problems, [...]]]></description>
			<content:encoded><![CDATA[<p>Next-generation sequencing promises to reach unprecedented levels of throughput this year, driving down the cost of sequencing dramatically. Somewhere between the GridION, the Ion Proton, and the HiSeq2500, <strong>we may see the first single-day, $1,000-per-genome technologies in 2012</strong>. Even so, a 90% reduction in sequencing cost this year will not magically solve all medical problems, even the ones that are clearly genetic. We are already reaching a point where getting enough sequencing coverage and finding the variants no longer present a significant problem. Instead, the field of human genetics faces three significant challenges as we enter an era of ultra-low-cost sequencing.<br />
<strong>1. <a title="Human Genetics Samples for Sequencing" href="#human-genetics-samples-for-sequencing">Obtaining Sufficient, Relevant, Consented Samples</a></strong><br />
<strong>2. <a title="Human Clinical Annotation" href="#human-genetics-clinical-annotation">Clinical Annotation of Genetic Variants</a></strong><br />
<strong>3. <a title="Human Genetics Interpretation of Complex Genomes" href="#human-genetics-interpretation-complex-genomes">Interpretation of Complex Genomes</a></strong></p>
<h2 style="font-size: 14px;"><a name="human-genetics-samples-for-sequencing"></a>Obtaining Sufficient, Relevant, Consented Samples</h2>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/02/Human-genetics-pedigree-large.jpg"><img class="alignright size-full wp-image-1650" title="Human-genetics-pedigree-large" src="http://www.massgenomics.org/wp-content/uploads/2012/02/Human-genetics-pedigree-large.jpg" alt="Human genetics pedigree, autosomal recessive" width="201" height="220" /></a>Samples will become a major challenge. Specifically, obtaining sufficient numbers of high-quality, accurately phenotyped, properly consented samples for sequencing. I know for a fact that many, <span style="color: #800000;"><strong>many studies are not facing a bottleneck at sequencing capacity but at sample collection, consent, and banking</strong></span>. There are even internationally renowned cancer centers where banking tumor samples and patient blood samples is not a standard or required practice for oncologists. A sad reality is that, every day, people succumb to diseases such as cancer, metabolic syndromes, and heart disease where genetics undoubtedly plays a role. Those samples, if not banked, are lost to the world of science.</p>
<p>The good news is that there are many excellent cohorts out there. There are entire populations that have been catalogued, sampled, and followed-up-with over the course of decades, with huge amounts of qualitative and quantitative clinical data. The commoditization of sequencing means that the proprietors of these cohorts will have their choice of sequencing providers. <strong>Informative samples, especially those from patients suffering from rare inherited disorders, will be in high demand</strong>. Tumor samples will be fought over by researchers, drug companies, and the treating physicians. In a world of cheap sequencing, samples are the new commodity.</p>
<h2 style="font-size: 14px;"><a name="human-genetics-clinical-annotation"></a>Clinical Annotation of Genetic Variation</h2>
<p>With long enough reads and sufficient coverage, finding mutations will no longer be a problem. The new challenge will be in assessing their functional significance and determine which have clinical relevance.</p>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/02/Herceptin-pathway.jpg"><img class="size-full wp-image-1651 aligncenter" style="border: 1px solid black;" title="Herceptin-pathway" src="http://www.massgenomics.org/wp-content/uploads/2012/02/Herceptin-pathway.jpg" alt="" width="480" height="279" /></a></p>
<p>Imagine a breast cancer patient whose germline and tumor genomes have been sequenced to high depth. You have the full spectrum of germline/somatic mutations, copy number alterations, and structural variants. And you also have so many questions:</p>
<ul>
<li>• Which of these mutations are <strong>drivers</strong>? Which are passengers?</li>
<li>• What do the variants say about diagnosis or <strong>prognosis</strong>?</li>
<li>• Are there any <span style="color: #800000;"><strong>clinically actionable</strong></span> mutations?</li>
<li>• Have any been seen before in this tumor type, or other tumor types?</li>
<li>• Are there <strong>germline susceptibility </strong>variants that predisposed this patient to developing cancer?</li>
<li>• If so, should that be communicated back to the patient&#8217;s family? Can it?</li>
</ul>
<p>It is certain that <strong><a title="Whole genome sequencing and clinical annotation" href="http://www.massgenomics.org/2011/09/whole-genome-sequencing-and-clinical-annotation.html" target="_blank">clinical annotation and risk assessment</a></strong> will be more costly and time-consuming than whole-genome sequencing.</p>
<h2 style="font-size: 14px;"><a name="human-genetics-interpretation-complex-genomes"></a>Interpretation of Complex Genomes</h2>
<p>Let&#8217;s face it, people, even with thousands of samples and accurate genotype information for millions of SNPs, we&#8217;re still struggling to suss out the genetic underpinnings of most common diseases. Just last week, I heard about the whole genome sequencing of a family quartet in which the two offspring, monozygotic twins, had a neurological phenotype of likely genetic origin. Yet even after numerous fancy variant-calling and filtering approaches were applied, <strong>the researchers were unable to pinpoint a cause</strong>. We&#8217;ll undoubtedly hear dozens of stories like these as large-scale efforts to determine the genetic basis of inherited diseases (e.g. Mendelian disorders) get under way this year. Yes, with sufficient samples, precise phenotyping, and comprehensive variant detection, we will have the statistical power to detect small-effect changes associated with a given phenotype. But that&#8217;s association, not causation. <strong>High-throughput functional assays</strong> may be required to determine if a certain variant is the actual cause.</p>
<div id="attachment_1652" class="wp-caption alignnone" style="width: 430px"><a href="http://www.massgenomics.org/wp-content/uploads/2012/02/human-genome-ten-years.jpg"><img class="size-full wp-image-1652 " style="border: 1px solid black;" title="human-genome-ten-years" src="http://www.massgenomics.org/wp-content/uploads/2012/02/human-genome-ten-years.jpg" alt="Human genetics complexity" width="420" height="237" /></a><p class="wp-caption-text">Credit: National Geographic</p></div>
<p>When it comes to coding regions of the genome, we have a number of tools at our disposal to evaluate the consequences of an observed variant. <span style="color: #800000;"><strong>RNA-seq</strong></span> can tell us if the gene is expressed, and if both alleles are represented. Computational algorithms can determine the likelihood that the change is damaging to the protein. High-throughput proteomics can even assess the level of protein in the cell. We can do a lot to investigate coding variants.</p>
<p>I wish I could say the same about noncoding variation. With the recent availability of exome sequencing, we&#8217;ve all had the luxury of cherry-picking variants in coding regions because these are less numerous and easier to interpret. But the simple reality is this: <strong>the vast majority of genetic variation in humans lies outside the exons of protein-coding genes</strong>. Anecdotal examples tell us that noncoding variation is quite capable of exerting influence on a phenotype, though the effect may be quite subtle. We have a lot more to learn about noncoding DNA, and we&#8217;ll need to study up in order to correctly annotate and interpret the vast catalogue of genetic variation in human genomes.</p>
]]></content:encoded>
			<wfw:commentRss>http://massgenomics.org/2012/02/human-genetics-challenges-in-an-era-of-cheap-sequencing.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://massgenomics.org/2012/02/human-genetics-challenges-in-an-era-of-cheap-sequencing.html</feedburner:origLink></item>
		<item>
		<title>AGBT 2012 Last Day: Elephants in the Room</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/SGA5XKgACPQ/agbt-2012-last-day-elephants-in-the-room.html</link>
		<comments>http://massgenomics.org/2012/02/agbt-2012-last-day-elephants-in-the-room.html#comments</comments>
		<pubDate>Sat, 18 Feb 2012 23:42:08 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1629</guid>
		<description><![CDATA[Sunset on Marco Island A comment from the current speaker (Vivian Cheung) inspired this post&#8217;s title, and it seems to me that the final day of AGBT 2012 has many elephants in the room. There&#8217;s the Roche hostile takeover, which has had relatively little chatter this week. I managed to meet Illumina CEO Jay Flatley [...]]]></description>
			<content:encoded><![CDATA[<div class="mceTemp">
<dl id="attachment_1630" class="wp-caption alignright" style="width: 256px;">
<dt class="wp-caption-dt"><a href="http://www.massgenomics.org/wp-content/uploads/2012/02/AGBT-Sunset-Narrow.jpg"><img class="size-full wp-image-1630 " style="border: 1px solid black;" title="AGBT-Sunset-Narrow" src="http://www.massgenomics.org/wp-content/uploads/2012/02/AGBT-Sunset-Narrow.jpg" alt="Sunset on Marco Island at AGBT 2012" width="246" height="439" /></a></dt>
<address class="wp-caption-dd">Sunset on Marco Island</address>
</dl>
</div>
<p>A comment from the current speaker (Vivian Cheung) inspired this post&#8217;s title, and it seems to me that the final day of <strong><a title="AGBT 2012" href="http://www.agbt.org" target="_blank">AGBT 2012</a></strong> has many elephants in the room. There&#8217;s the Roche hostile takeover, which has had relatively little chatter this week. I managed to meet <strong>Illumina CEO Jay Flatley</strong> after a talk yesterday (before Oxford&#8217;s announcement); he was polite and exuded nothing but cheerful, casual confidence.</p>
<p>The talks today have been spectacular. The morning session included <strong>Michel Georges </strong>on the genetic basis of &#8220;color-sidedness&#8221;, a coloration trait observed in Belgian blue cattle that was mapped to a duplication near the KIT gene in the bovine genome. <strong>Jesse Gray</strong> of Harvard Medical School presented his work on &#8220;steady-state&#8221; RNA-seq to decipher the kinetics of transcription and splicing. <strong>Patrick Schnable</strong> of Iowa State talked about gene loss during domestication of modern maize from its ancestor, Teosinte (tee oh sin tay) by Native Americans about 10,000 years ago.</p>
<p>Over the coffee break I met <strong>James Hadfield</strong>; he and Nick Loman are the creators of the <strong><a title="NGS Maps" href="http://omicsmaps.com/" target="_blank">next-generation sequencing maps</a></strong>, a visualization tool of NGS installations across the world.</p>
<p>After the break was one of my favorite talks, a survey of DNA methylation in hematopoietic stem cells, lymphoid cells, and myeloid cells given by <strong>Emily Hodges</strong> of CSHL. Then <strong>James Galagan</strong> walked us through systems biology approaches to study tuberculosis, whose pathogen has the unique ability to survive inside macrophages (in the face of hypoxia and even drug exposure) and does so by eating your cholesterol!</p>
<p>In the final session, chaired by <strong>Elaine Mardis</strong>, we heard about RNA-DNA differences in B-cells (<strong>Vivian Cheung</strong>) and streaming algorithms for RNA-Seq analysis from <strong>Lior Pachter</strong>, who related that someone had contacted him recently about processing 14 billion RNA-seq reads. That&#8217;s a lot.</p>
<p>The meeting is still abuzz with talk of Oxford Nanopore; which I think we can all agree is a disruptive technology, if the stock market is any indication:</p>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/02/Seq-Stocks-AGBTweek-2012.jpg"><img class="alignnone size-medium wp-image-1633" title="Seq-Stocks-AGBTweek-2012" src="http://www.massgenomics.org/wp-content/uploads/2012/02/Seq-Stocks-AGBTweek-2012-300x117.jpg" alt="" width="300" height="117" /></a></p>
<p>My colleagues at <strong><a title="Genomes Unzipped on the MinION" href="http://www.genomesunzipped.org/2012/02/making-sequencing-simpler-with-nanopores.php" target="_blank">Genomes Unzipped</a></strong> have a thorough take on this new technology, its promise, and what it could mean for the field. For my part, I remain cautious. I grew up in St. Louis, Missouri, the &#8220;show-me&#8221; state, and I&#8217;ll be convinced the moment I hold a minION in my hand.</p>
]]></content:encoded>
			<wfw:commentRss>http://massgenomics.org/2012/02/agbt-2012-last-day-elephants-in-the-room.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://massgenomics.org/2012/02/agbt-2012-last-day-elephants-in-the-room.html</feedburner:origLink></item>
	</channel>
</rss>

