<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>MassGenomics</title>
	
	<link>http://www.massgenomics.org</link>
	<description>Medical genomics in the post-genome era</description>
	<lastBuildDate>Fri, 03 Feb 2012 20:08:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/Massgenomics" /><feedburner:info uri="massgenomics" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Recurrent histone alterations in pediatric brain cancer</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/9E9voafMgek/recurrent-histone-alterations-in-pediatric-brain-cancer.html</link>
		<comments>http://www.massgenomics.org/2012/02/recurrent-histone-alterations-in-pediatric-brain-cancer.html#comments</comments>
		<pubDate>Fri, 03 Feb 2012 06:36:37 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1497</guid>
		<description><![CDATA[Brain tumors are a particularly deadly form of cancer, and the leading cause of cancer-related death in children. Two studies published this week in Nature journals applied next-generation sequencing to pediatric brain tumors, revealing a striking pattern of recurrent somatic mutations in H3F3A, a gene encoding the histone prorein H3.3. These are the first unbiased [...]]]></description>
			<content:encoded><![CDATA[<p>Brain tumors are a particularly deadly form of cancer, and the leading cause of cancer-related death in children. Two studies published this week in <em>Nature</em> journals applied next-generation sequencing to pediatric brain tumors, revealing a <strong>striking pattern of recurrent somatic mutations in <em>H3F3A</em></strong>, a gene encoding the histone prorein H3.3. These are the first unbiased surveys of childhood brain cancer, and also the first reports of driver mutations in histone H3.</p>
<p><strong>Contents: <a title="Brain cancer stats" href="#brain-cancer-stats">Brain Cancer</a></strong> • <strong><a title="Glioma and glioblastoma multiforme" href="#glioma-glioblastoma">Glioma and GBM</a></strong> • <strong><a title="Exome sequencing of pediatric glioblastoma" href="#exome-sequencing-pediatric-gbm">Pediatric Exome Sequencing</a></strong> • <strong><a title="Recurrent mutations in histone H3" href="#recurrent-mutations-H3F3A">H3F3A Mutations</a></strong> • <strong><a title="Whole Genome Sequencing of Pediatric Glioma" href="#genome-sequencing-pediatric-glioma">Glioma Sequencing</a></strong> •<strong><a title="Histone H3 and Cancer Pathogenesis" href="#histone-h3-pathogenesis"> Histone H3</a></strong></p>
<h2 style="font-size: 14px;"><a name="brain-cancer-stats"></a>Brain Cancer in Adults and Children</h2>
<p>About 22,000 new cases of brain and central nervous system (CNS) tumors are diagnosed each year in the United States, and <span style="color: #800000;"><strong>3,000 of those are in children</strong></span>. Both incidence and mortality are higher in Caucasians than other racial/ethnic groups.</p>
<div id="attachment_1522" class="wp-caption alignnone" style="width: 399px"><a href="http://www.massgenomics.org/wp-content/uploads/2012/02/US-Brain-Cancer-Mortality.jpg"><img class="size-full wp-image-1522" title="US-Brain-Cancer-Mortality" src="http://www.massgenomics.org/wp-content/uploads/2012/02/US-Brain-Cancer-Mortality.jpg" alt="US Brain cancer mortality" width="389" height="334" /></a><p class="wp-caption-text">Source: NCI</p></div>
<p>Across all ethnicities, men have higher incidence and mortality than women. And though brain cancers are not listed among the &#8220;common&#8221; cancer types, about $3.5 <em>billion</em> is spent annually treating them in the United States.</p>
<h2 style="font-size: 14px;"><a name="glioma-glioblastoma"></a>Glioma and Glioblastoma Multiforme</h2>
<p>Brain tumors are the most prevalent of any pediatric cancer (27%),  and certainly among the more lethal. One such cancer, <strong>pediatric diffuse  intrinsic pontine glioma</strong> (DIPG), arises almost exclusively in children  and has a long-term survival of less than 10%. Glioblastoma multiforme (GBM), an aggressive brain tumor affecting both adults and children, has dismal outcomes as well:</p>
<table border="0" cellpadding="2">
<tbody>
<tr>
<td><a href="http://www.massgenomics.org/wp-content/uploads/2012/02/glioblastoma-survival.jpg"><img class="size-full wp-image-1503 alignnone" title="glioblastoma-survival" src="http://www.massgenomics.org/wp-content/uploads/2012/02/glioblastoma-survival.jpg" alt="brain cancer survival" width="265" height="312" /></a></td>
<td>Long-term survival in GBM in the 80&#8242;s (blue), 90&#8242;s (red), and 2000&#8242;s (green). Adapted from Kohler et al, Nat. J. Canc. Inst. (2011).</td>
</tr>
</tbody>
</table>
<p>Most patients die within a few years of diagnosis, despite aggressive therapy. Ted Kennedy died of it. GBM is less common in the pediatric setting, but <strong>the primary tumors that are diagnosed in children arise <em>de novo</em> and are morphologically indistinguishable for adult GBM</strong>. While the Cancer Genome Atlas consortium characterized several thousand genes in adult GBM, this disease is understudied in the pediatric setting.</p>
<h2 style="font-size: 14px;"><a name="exome-sequencing-pediatric-gbm"></a>Exome Sequencing in Pediatric Glioblastoma</h2>
<p>To better understand this disease, Schwartzenruber et al performed exome sequencing (they call it &#8220;whole-exome&#8221; sequencing, but we know better) on <a title="Exome sequencing of pediatric glioblastoma" href="http://www.nature.com/nature/journal/vaop/ncurrent/full/nature10833.html#/affil-auth" target="_blank">48 well-characterized pediatric GBMs</a>, 6 of which had matched normal DNA allowing identification of somatic mutations. Using <strong>Illumina&#8217;s TruSeq exome kit and HiSeq 2000</strong> platform, the authors generated <strong>~13.4 Gbp</strong> of data per sample, achieving 91% of target bases covered at least 10x, with an average depth of 61x.</p>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/02/Exome-Gbp-Generated.jpg"><img class="alignnone size-full wp-image-1528" title="Exome-Gbp-Generated" src="http://www.massgenomics.org/wp-content/uploads/2012/02/Exome-Gbp-Generated.jpg" alt="GBM exome data generated" width="551" height="266" /></a></p>
<p>The six tumors with matched normal DNA harbored <span style="color: #800000;"><strong>15 somatic mutations on average</strong></span> (range 3-31), a mutation rate lower than adult GBM but higher than another pediatric brain tumor, medulloblastoma.</p>
<h2 style="font-size: 14px;"><a name="recurrent-mutations-H3F3A"></a>Recurrent Mutations in H3F3A</h2>
<p>Four of the six cases had heterozygous mutations in the <em>H3F3A</em> gene, <strong>two at K27M and two at G34R</strong>. This gene encodes the replication-independent histone variant H3.3. To find two recurrent mutations in four unrelated cases is quite striking, particularly because both occurred in the amino-tail of the protein at positions believed to undergo repressing or activating post-translational modifications. Extending the analysis to the 42 other exomes, the authors found that:</p>
<ul>
<li>31% of tumors (15/48) had mutations at residue 27 or 34 in <em>H3F3A</em>.</li>
<li>31% of tumors (15/48, including 8 of the above) had mutations in chromatin remodeling genes <em>ATRX/DAXX</em>.</li>
<li> 54% of cases had somatic TP53 mutations. Among samples with <em>H3F3A/ATRX/DAXX</em> mutations, this frequency jumped to 86%.</li>
</ul>
<h2 style="font-size: 14px;"><a name="genome-sequencing-pediatric-glioma"></a>Whole-genome Sequencing of Pediatric Glioma</h2>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2010/01/st-judes-cancer-project.jpg"><img class="alignright size-full wp-image-538" title="st-judes-cancer-project" src="http://www.massgenomics.org/wp-content/uploads/2010/01/st-judes-cancer-project.jpg" alt="" width="193" height="193" /></a>Also this week, members of the St. Jude Children&#8217;s Research Hospital &#8211; Washington University Pediatric Cancer Genome Project (PCGP) described the <a title="Sequencing of pediatric glioma" href="http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.1102.html" target="_blank">whole-genome sequencing pediatric diffuse intrinsic pontine glioma</a> (DIPG), 7 cases with tumor and matched germline DNA. Writing in <em>Nature Genetics</em>, Wu et al reported that <strong>4 0f the 7 cases harbored the K27M mutation in <em>H3F3A</em></strong>. They next undertook targeted sequencing of the lysine-27 residue in all 16 genes that code for H3 isoforms in a validation cohort comprising 43  DIPGs and 36 non-brainstem glioblastomas.</p>
<p>They found striking recurrence of lysine-27 mutations in <em>H3F3A</em> or the closely related gene <em>HIST1H3B</em> (which encodes H3.1) in <span style="color: #800000;"><strong>78% of gliomas and 22% of non-brainstem glioblastomas</strong></span>. They also identified the second recurrent mutation (Gly34Arg) in 14% of the glioblastomas, but none of the gliomas, findings remarkably consistent with the GBM study. Screening of several other pediatric brain tumors revealed that the histone H3 mutations seem exclusive to pediatric high-grade gliomas.</p>
<h2 style="font-size: 14px;"><a name="histone-h3-pathogenesis"></a>Histone H3 and Glioma Pathogenesis</h2>
<p>To my knowledge, this is the first report of somatic mutations in histone H3. The isoforms of this histone play complex regulatory roles in several cellular processes &#8212; epigenetic regulation of gene expression, selective regulation of developmental genes, and telomere maintenance &#8212; that could contributed to tumor growth and progression if dysregulated. Indeed, the precise location of these recurrent events at two specific residues, both of which are at or near regulatory post-translation modification positions, suggest a <span style="color: #008000;"><strong>gain-of-function</strong></span>. If true, then the mutated forms of these genes may offer new targets for the surveillance and/or treatment of this lethal disease.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Apmid%2F22286061&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Driver+mutations+in+histone+H3.3+and+chromatin+remodelling+genes+in+paediatric+glioblastoma.&amp;rft.issn=0028-0836&amp;rft.date=2012&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Schwartzentruber+J&amp;rft.au=Korshunov+A&amp;rft.au=Liu+XY&amp;rft.au=Jones+DT&amp;rft.au=Pfaff+E&amp;rft.au=Jacob+K&amp;rft.au=Sturm+D&amp;rft.au=Fontebasso+AM&amp;rft.au=Quang+DA&amp;rft.au=T%C3%B6njes+M&amp;rft.au=Hovestadt+V&amp;rft.au=Albrecht+S&amp;rft.au=Kool+M&amp;rft.au=Nantel+A&amp;rft.au=Konermann+C&amp;rft.au=Lindroth+A&amp;rft.au=J%C3%A4ger+N&amp;rft.au=Rausch+T&amp;rft.au=Ryzhova+M&amp;rft.au=Korbel+JO&amp;rft.au=Hielscher+T&amp;rft.au=Hauser+P&amp;rft.au=Garami+M&amp;rft.au=Klekner+A&amp;rft.au=Bognar+L&amp;rft.au=Ebinger+M&amp;rft.au=Schuhmann+MU&amp;rft.au=Scheurlen+W&amp;rft.au=Pekrun+A&amp;rft.au=Fr%C3%BChwald+MC&amp;rft.au=Roggendorf+W&amp;rft.au=Kramm+C&amp;rft.au=D%C3%BCrken+M&amp;rft.au=Atkinson+J&amp;rft.au=Lepage+P&amp;rft.au=Montpetit+A&amp;rft.au=Zakrzewska+M&amp;rft.au=Zakrzewski+K&amp;rft.au=Liberski+PP&amp;rft.au=Dong+Z&amp;rft.au=Siegel+P&amp;rft.au=Kulozik+AE&amp;rft.au=Zapatka+M&amp;rft.au=Guha+A&amp;rft.au=Malkin+D&amp;rft.au=Felsberg+J&amp;rft.au=Reifenberger+G&amp;rft.au=von+Deimling+A&amp;rft.au=Ichimura+K&amp;rft.au=Collins+VP&amp;rft.au=Witt+H&amp;rft.au=Milde+T&amp;rft.au=Witt+O&amp;rft.au=Zhang+C&amp;rft.au=Castelo-Branco+P&amp;rft.au=Lichter+P&amp;rft.au=Faury+D&amp;rft.au=Tabori+U&amp;rft.au=Plass+C&amp;rft.au=Majewski+J&amp;rft.au=Pfister+SM&amp;rft.au=Jabado+N&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CGenetics%2C+Bioinformatics%2C+Cancer">Schwartzentruber J, Korshunov A, Liu XY, et al (2012). Driver mutations in histone H3.3 and chromatin remodelling genes in paediatric glioblastoma. <span style="font-style: italic;">Nature</span> PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/22286061">22286061</a></span><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+genetics&amp;rft_id=info%3Apmid%2F22286216&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Somatic+histone+H3+alterations+in+pediatric+diffuse+intrinsic+pontine+gliomas+and+non-brainstem+glioblastomas.&amp;rft.issn=1061-4036&amp;rft.date=2012&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=St.+Jude+Children%27s+Research+Hospital%E2%80%93Washington+University+Pediatric+Cancer+Genome+Project&amp;rft.au=Wu+G&amp;rft.au=Broniscer+A&amp;rft.au=McEachron+TA&amp;rft.au=Lu+C&amp;rft.au=Paugh+BS&amp;rft.au=Becksfort+J&amp;rft.au=Qu+C&amp;rft.au=Ding+L&amp;rft.au=Huether+R&amp;rft.au=Parker+M&amp;rft.au=Zhang+J&amp;rft.au=Gajjar+A&amp;rft.au=Dyer+MA&amp;rft.au=Mullighan+CG&amp;rft.au=Gilbertson+RJ&amp;rft.au=Mardis+ER&amp;rft.au=Wilson+RK&amp;rft.au=Downing+JR&amp;rft.au=Ellison+DW&amp;rft.au=Zhang+J&amp;rft.au=Baker+SJ&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CGenetics%2C+Bioinformatics%2C+Cancer">St. Jude Children&#8217;s Research Hospital–Washington University Pediatric Cancer Genome Project (2012). Somatic histone H3 alterations in pediatric diffuse intrinsic pontine gliomas and non-brainstem glioblastomas. <span style="font-style: italic;">Nature genetics</span> PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/22286216">22286216</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.massgenomics.org/2012/02/recurrent-histone-alterations-in-pediatric-brain-cancer.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.massgenomics.org/2012/02/recurrent-histone-alterations-in-pediatric-brain-cancer.html</feedburner:origLink></item>
		<item>
		<title>The Current State of dbSNP</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/oCY5EZZo1VU/the-current-state-of-dbsnp.html</link>
		<comments>http://www.massgenomics.org/2012/01/the-current-state-of-dbsnp.html#comments</comments>
		<pubDate>Tue, 24 Jan 2012 06:30:30 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1470</guid>
		<description><![CDATA[Contents: dbSNP Growth • Build 135 Stats • Variant Composition • Function Classes • SNPs and Indels • Coding/Noncoding Tiers Less than a decade ago, the leading experts estimated that there were approximately 10 million SNPs in the human genome. Those were the early days of post-genome research, when &#8220;The SNP Consortium&#8221; was formed and [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Contents: <a title="Growth of dbSNP database" href="#growth-of-dbsnp">dbSNP Growth</a></strong> • <strong><a title="Current state of dbSNP build 135" href="#current-state-dbsnp-135">Build 135 Stats</a></strong> • <strong><a title="Variants in dbSNP" href="#dbsnp-variant-composition">Variant Composition</a></strong> • <strong><a title="dbSNP variant function classes" href="#dbsnp-variant-function-class">Function Classes</a></strong> • <strong><a title="dbSNP SNPs and small indels" href="#snps-small-indels">SNPs and Indels</a></strong> • <strong><a title="SNPs and indels by annotation tier" href="#snps-indels-by-annotation-tier">Coding/Noncoding Tiers</a></strong><br />
Less than a decade ago, the leading experts estimated that there were approximately 10 million SNPs in the human genome. Those were the early days of post-genome research, when &#8220;The SNP Consortium&#8221; was formed and began BAC overlap comparisons to routinely identify and report SNPs. Believe it or not, in my old lab there were binders full of paper records documenting the evidence for each newly discovered SNP. These variants were submitted to a central repository of human sequence variation hosted at NCBI, appropriately named dbSNP.</p>
<h2><a name="growth-of-dbsnp"></a>Growth of dbSNP</h2>
<p>The database has grown substantially, already exceeding the 10 million mark by 2006:</p>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/01/dbSNP-Growth-HapMap-1000G.jpg"><img class="alignnone size-full wp-image-1472" title="dbSNP-Growth-HapMap-1000G" src="http://www.massgenomics.org/wp-content/uploads/2012/01/dbSNP-Growth-HapMap-1000G.jpg" alt="dbSNP Growth from Sequencing" width="486" height="389" /></a></p>
<p>I highlighted some of the key driving forces of this growth that I happen to know about. These include the &#8220;BAC overlap&#8221; project of the SNP Consortium and similar SNP discovery efforts (2001-2003), The <strong><a title="HapMap Project" href="http://www.hapmap.org" target="_blank">HapMap Project</a></strong> Phases I (2003-2005) and II (2005-2007), the advent of next-generation sequencing, of course, and most recently the 1,000 Genomes Project. You probably noticed a few trends in the figure above:</p>
<ol>
<li><strong>Less-frequent dbSNP updates</strong>. In 2003-2004 when the HapMap consortium direly needed new loci, dbSNP was updating almost every month. New build releases have slowed down considerably, probably because (1) they&#8217;re less critical, and (2) it&#8217;s a much bigger job.</li>
<li>Overall, and quite obviously, there&#8217;s been a <strong>rapid increase in submissions</strong> over time, with some phases of near exponential growth.</li>
<li>The relationship between submissions (blue) and unique refSNP clusters (red). You&#8217;ll note that dbSNP gets more and more submissions, of which a <strong>shrinking fraction are truly novel loci</strong>.</li>
</ol>
<p>Still, by 2009, there were about 18 million unique SNPs, nearly twice the predicted number. And large variant discovery projects fueled by next-generation sequencing, such as the <a title="1,000 Genomes Project" href="http://www.1000genomes.org" target="_blank">1,000 Genomes Project</a> were just ramping up.</p>
<h2><a name="current-state-dbsnp-135"></a>The Current State: dbSNP Build 135</h2>
<p>Downloading the <a title="dbSNP database" href="http://www.ncbi.nlm.nih.gov/projects/SNP/" target="_blank">dbSNP database</a> is not for the faint of heart. Even for bioinformaticians, the file formats offered (ASN1?) are somewhat intractable compared to BED files. I prefer instead to wait until the excellent team at the <a title="UCSC Genome Browser" href="http://genome.ucsc.edu" target="_blank">UCSC Genome Browser Database</a> releases their annotation tracks for dbSNP builds, which contain the necessary information in far more accessible formats. They have just done so for build 135, and I did some quick-and-dirty parsing to come up with some statistics.</p>
<h2><a name="dbsnp-variant-composition"></a>dbSNP Variant Composition</h2>
<p>You might be surprised to learn that dbSNP contains not just SNPs, but several types of DNA sequence variation:</p>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/01/dbSNP-variant-composition.jpg"><img class="size-full wp-image-1477 alignnone" title="dbSNP-variant-composition" src="http://www.massgenomics.org/wp-content/uploads/2012/01/dbSNP-variant-composition.jpg" alt="dbSNP Variant Composition" width="292" height="236" /></a></p>
<p>In the current build there are <strong>54,212,076 unique variants</strong> with RS numbers, of which 47.8 million, or <strong><span style="color: #008000;">88%, were single nucleotide polymorphisms</span></strong>. The remainder comprises insertion-deletion variants (indels, 11%), multiple nucleotide polymorphisms (MNPs, 0.1%), as well as ~420,000 other classes (named, mixed, and microsatellite). The named variants are old-school genetic markers (e.g. DS128384). Mixed polymorphisms are messy loci where multiple variant types (e.g. DNP and indel) are seen. Microsatellites, of course, are long stretches of repetitive sequences, such as di-nucleotide or tri-nucleotide repeats, whose length varies between individuals. Among these are the 15 short tandem repeats (STRs) utilized for forensic DNA profiling in <a title="CODIS DNA database" href="http://www.fbi.gov/about-us/lab/codis" target="_blank">CODIS</a>, the FBI&#8217;s national DNA database.</p>
<h2><a name="dbsnp-variant-function-class"></a>dbSNP Function Classification</h2>
<p>Variants in dbSNP are classified by their relationship to NCBI&#8217;s view of known protein-coding genes. There are about a dozen &#8220;function class&#8221; categories, but they can be grouped together into five types of sequences:</p>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/01/dbSNP-variant-gene-locus.jpg"><img class="alignnone size-full wp-image-1482" title="dbSNP-variant-gene-locus" src="http://www.massgenomics.org/wp-content/uploads/2012/01/dbSNP-variant-gene-locus.jpg" alt="Gene locations of dbSNP variants" width="305" height="285" /></a></p>
<p>You will note that the vast majority have function classification of &#8220;Unknown&#8221; suggesting that these are non-coding variants not immediately adjacent from NCBI protein-coding genes. Even for variants in or around genes, 90% are classified as intronic. If we break down the variants that are in coding regions according to dbSNP:</p>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/01/dbSNP-coding-variants.jpg"><img class="alignnone size-full wp-image-1483" title="dbSNP-coding-variants" src="http://www.massgenomics.org/wp-content/uploads/2012/01/dbSNP-coding-variants.jpg" alt="Breakdown of dbSNP coding variants" width="249" height="276" /></a></p>
<p>You can see that the majority of coding variants (just over half a million) are classified as &#8220;missense&#8221;, meaning that they&#8217;re predicted to cause an amino acid substitution in the encoded protein. Most of the remainder are silent (synonymous), though there are also around 40,000 variants predicted to cause premature termination (nonsense) or a shift in translation frame (frameshift) in the encoded protein.</p>
<h2><a name="snps-small-indels"></a>Honing in on SNPs and Small Indels</h2>
<p>For next-generation sequencing analysis, I&#8217;m generally interested in two   types of variation represented in dbSNP: SNPs and small (&lt;50 bp)   indels.</p>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/01/Tier-Space-build36.jpg"><img class="alignright size-full wp-image-1484" title="Tier-Space-build36" src="http://www.massgenomics.org/wp-content/uploads/2012/01/Tier-Space-build36.jpg" alt="" width="176" height="408" /></a>The other types are either uncommon or too large to be readily detected with short reads, and further, there are curated, devoted databases that probably do a better job of representing them (e.g. <a title="Databse of Genomic Variants" href="http://projects.tcag.ca/variation/" target="_blank">Database of Genomic Variants</a> for large indels and structural variants). Further, although the dbSNP functional classification is useful, we use an internal &#8220;tiering&#8221; system to represent variants according to their locations in the genome:</p>
<ul>
<li><strong>Tier 1</strong> variants affect coding sequences, including exons, splice sites, and non-coding RNA genes</li>
<li><strong>Tier 2</strong> variants occur in evolutionarily conserved or putative regulatory sequences</li>
<li><strong>Tier 3</strong> variants are in non-coding, non-conserved, unique regions of the genome</li>
<li><strong>Tier 4</strong> variants are in repetitive regions of the genome</li>
</ul>
<p>Every base in the reference sequence falls into one, and only one tier. Build 36 (hg18) of the human reference sequence is broken down to the right. There are <span style="color: #800000;"><strong>44 megabases of &#8220;tier 1&#8243; coding sequence in the human genome</strong></span>; that&#8217;s 1.53%, straight out of the textbooks. Tier 2 comprises 248 megabases, or 8.6%, which is slightly higher than the 5% expected rate of evolutionary conservation, probably because we&#8217;re fairly inclusive with what constitutes a putative regulatory element.</p>
<h2><a name="snps-indels-by-annotation-tier"></a>Distribution of SNPs and Indels by Tier</h2>
<p>Next, we look at the distribution of dbSNP&#8217;s ~48 million SNPs and ~6 million small indels among the four tiers of genome space:</p>
<p><a href="http://www.massgenomics.org/wp-content/uploads/2012/01/dbsnp-variant-type-tier.jpg"><img class="alignnone size-full wp-image-1487" title="dbsnp-variant-type-tier" src="http://www.massgenomics.org/wp-content/uploads/2012/01/dbsnp-variant-type-tier.jpg" alt="dbSNP variants by tier" width="450" height="192" /></a></p>
<p>Strikingly, less than 10% of variants of both types fall into regions that are &#8220;interpretable&#8221; whereas the rest are in noncoding regions. The proportions of variants in tier 1 (1.3% of SNPs, 1% of indels) remains lower than the tier 1 fraction (above right), presumably due to purifying selection against changes to coding sequences. Many studies have shown this through far more careful analyses that account for ascertainment bias, population allele frequency, and other factors. It&#8217;s just fascinating to see the signature of natural selection in your basic pie chart.</p>
<p>I&#8217;m uncertain why the distributions in tiers 3 and 4 differ between variant types above, but there are likely a number of contributing factors. From a biological perspective, <span style="color: #008000;"><strong>indels are both less frequent and subjected to greater natural selection than SNPs</strong></span>. From a technical perspective, SNP discovery algorithms are far more mature than indel discovery algorithms, owing in part to the difficulties of detecting the latter in relatively short sequence reads. <strong>We are currently, and have always been, better at finding SNPs than indels</strong>. With luck, the &#8220;accuracy gap&#8221; between SNPs and indels will diminish as sequencing technologies and detection algorithms continue to evolve.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nucleic+acids+research&amp;rft_id=info%3Apmid%2F11125122&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=dbSNP%3A+the+NCBI+database+of+genetic+variation.&amp;rft.issn=0305-1048&amp;rft.date=2001&amp;rft.volume=29&amp;rft.issue=1&amp;rft.spage=308&amp;rft.epage=11&amp;rft.artnum=&amp;rft.au=Sherry+ST&amp;rft.au=Ward+MH&amp;rft.au=Kholodov+M&amp;rft.au=Baker+J&amp;rft.au=Phan+L&amp;rft.au=Smigielski+EM&amp;rft.au=Sirotkin+K&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputer+Science+%2F+Engineering%2CGenetics%2C+Bioinformatics%2C+Databases">Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, &amp; Sirotkin K (2001). dbSNP: the NCBI database of genetic variation. <span style="font-style: italic;">Nucleic acids research, 29</span> (1), 308-11 PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/11125122">11125122</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.massgenomics.org/2012/01/the-current-state-of-dbsnp.html/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		<feedburner:origLink>http://www.massgenomics.org/2012/01/the-current-state-of-dbsnp.html</feedburner:origLink></item>
		<item>
		<title>Genomic Structural Variation: Methods &amp; Protocols</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/kNyLkq0b6HE/methods-protocols-genomic-structural-variation.html</link>
		<comments>http://www.massgenomics.org/2012/01/methods-protocols-genomic-structural-variation.html#comments</comments>
		<pubDate>Wed, 18 Jan 2012 23:18:44 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1434</guid>
		<description><![CDATA[The draft human genome sequence, completed more than a decade ago, was an important starting point for understanding genetic variation in humans. Intensive efforts to characterize single-nucleotide polymoprhisms (SNPs), and later the discovery of extensive copy number variation (CNV) and structural variation, have highlighted the complex and dynamic nature of the our genome. Earlier this [...]]]></description>
			<content:encoded><![CDATA[<p>The draft human genome sequence, completed more than a decade ago, was  an important starting point for understanding genetic variation in  humans. Intensive efforts to characterize single-nucleotide  polymoprhisms (SNPs), and later the discovery of extensive copy number  variation (CNV) and structural variation, have highlighted the complex  and dynamic nature of the our genome.</p>
<p><a title="Genomic Structural Variants" href="http://www.springerlink.com/content/978-1-61779-506-0/"><img class="alignright size-medium wp-image-1449" style="border: 1px solid black;" title="Genomic Structural Variants " src="http://www.massgenomics.org/wp-content/uploads/2012/01/cover-trimmed-210x300.jpg" alt="Genomic Structural Variants " width="136" height="194" /></a>Earlier this month, Springer Press published  <strong><a title="Genomic Structural Variation" href="http://www.springerlink.com/content/978-1-61779-506-0/" target="_blank">Genomic Structural Variants</a></strong>, an outstanding new volume edited by <a title="Lars Feuk" href="http://www.igp.uu.se/Research/genetics_genomics/lars_feuk/" target="_blank">Lars Feuk</a> of Uppsala University in Sweden. This book provides an in-depth description of key developments in our understanding of structural variation and its implications for human disease.</p>
<p>Most, if not all of these advances have been driven by technological  innovation. The editor writes, &#8220;Over the past decade, the introduction  of array-based technologies has revolutionized genomics and genetic  diagnostics&#8230; Now, we are on the brink of a paradigm shift in genetics  with the advent of massively parallel sequencing in research and  diagnostics.&#8221;</p>
<p>The contributors include <strong>James Lupski, Stephen Scherer, Ira Hall, Aaron Quinlan, Deanna Church, Bauke Ylstra, Richard K. Wilson</strong>, and a number of other distinguished scientists. Here are the topics covered, with my own brief summary and a link to each article.</p>
<p><strong>Genome Architecture</strong><br />
<a title="What Have Studies of Genomic Disorders Taught Us About Our Genome?" href="#genomic-disorders-human-genome"><strong>What Genomic Disorders Have Taught Us</strong></a><br />
<a title="Structural Variation in Subtelomeres" href="#structural-variation-subtelomeres"><strong>SV in Subtelomeres</strong></a><br />
<a title="The Challenges of Studying Complex and Dynamic Regions of the Human Genome" href="#complex-dynamic-regions-genome"><strong>Complex Regions of the Genome</strong></a></p>
<p><strong>Nature of Structural Variation</strong><br />
<a title="Structural Variation and Its Effect on Expression" href="#sv-effect-expression"><strong>SV Effect on Gene Expression</strong></a><br />
<a title="Population Genetic Nature of Copy Number Variation" href="#population-genetics-copy-number"><strong>CNV Population Genetics</strong></a><br />
<a title="Structural Genetic Variation in the Context of Somatic Mosaicism" href="#sv-somatic-mosaicism"><strong>SV and Somatic Mosaicism</strong></a></p>
<p><strong>Detection of SV</strong><br />
<a title="Characterizing and Interpreting Genetic Variation from Personal Genome Sequencing" href="#genetic-variation-personal-genome"><strong>Interpreting SVs in Personal Genome Sequencing</strong></a><br />
<a title="Massively Parallel Sequencing Approaches for Characterization of Structural Variation" href="#massively-parallel-sequencing-sv"><strong>Massively Parallel Sequencing Approaches for SVs</strong></a><br />
<a title="Array-Based Approaches in Prenatal Diagnosis" href="#array-prenatal-diagnosis"><strong>Array-based Prenatal Diagnosis</strong></a><br />
<a title="Detection and Interpretation of Genomic Structural Variation in Mammals" href="#genomic-structural-variation-mammals"><strong>GSV in Mammals</strong></a></p>
<p><strong>SV and CNV in Human Disease</strong><br />
<a title="Microdeletion and Microduplication Syndromes" href="#microdeletion-microduplication-syndromes"><strong>Microdeletion and Microduplication Syndromes</strong></a><br />
<a title="Structural Genomic Variation in Intellectual Disability" href="#sv-intellectual-disability"><strong>SV in Intellectual Disability</strong></a><br />
<a title="Detection and Characterization of Copy Number Variation in Autism Spectrum Disorder" href="#cnv-autism-spectrum"><strong>CNV in Autism Spectrum Disorder</strong></a><br />
<a title="Copy Number Variation and Psychiatric Disease Risk" href="#cnv-psychiatric-disease"><strong>CNV and Psychiatric Disease Risk</strong></a></p>
<p><strong>Methods and Resources</strong><br />
<a title="Online Resources for Genomic Structural Variation" href="#online-resources-genomic-structural-variation"><strong>Online SV Resources</strong></a><br />
<a title="Algorithm Implementation for CNV Discovery Using Affymetrix and Illumina SNP Array Data" href="#cnv-discovery-snp-array"><strong>SNP Array Algorithms for CNV</strong></a><br />
<a title="Targeted Screening and Validation of Copy Number Variations" href="#targeted-screening-validation-cnv"><strong>Targeted Screening of CNVs</strong></a><br />
<a title="High-Resolution Copy Number Profiling by Array CGH Using DNA Isolated from Formalin-Fixed, Paraffin-Embedded Tissues" href="#copy-number-acgh-ffpe"><strong>Array-CGH of FFPE Tissues</strong></a></p>
<h3 style="margin-bottom: 1px;"><a name="genomic-disorders-human-genome"></a>What Have Studies of Genomic Disorders Taught Us About Our Genome?</h3>
<p><strong>Alexandra D. Simmons, Claudia M. B. Carvalho and James R. Lupski</strong><br />
An overview of high-resolution analysis methods and what they&#8217;ve taught us about the architectural features, structure, and rearrangement mechanisms of the genome.   <a href="http://www.springerlink.com/content/r24447hqr0x82112/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/r24447hqr0x82112/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="microdeletion-microduplication-syndromes"></a>Microdeletion and Microduplication Syndromes</h3>
<p><strong>Lisenka E. L. M. Vissers and Paweł Stankiewicz</strong><br />
An overview of how microdeletions and microduplications form in the genome and the wide variety of phenotypes &#8212; including Mendelian and complex diseases &#8212; that they can cause.   <a href="http://www.springerlink.com/content/r56536442j8h1062/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/r56536442j8h1062/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="sv-intellectual-disability"></a>Structural Genomic Variation in Intellectual Disability</h3>
<p><strong>Rolph Pfundt and Joris A. Veltman</strong><br />
A review of detection and interpretation of copy number variations in mental retardation, with a focus on diagnostic application and interpretation.   <a href="http://www.springerlink.com/content/tl114l8k3w6617k5/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/tl114l8k3w6617k5/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="cnv-psychiatric-disease"></a>Copy Number Variation and Psychiatric Disease Risk</h3>
<p><strong>Rebecca J. Levy, Bin Xu, Joseph A. Gogos and Maria Karayiorgou</strong><br />
An update on the substantial progress toward understanding the role of rare CNVs in the etiology of complex psychiatric diseases, such as schizophrenia.   <a href="http://www.springerlink.com/content/j1641m815q238543/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/j1641m815q238543/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="cnv-autism-spectrum"></a>Detection and Characterization of Copy Number Variation in Autism Spectrum Disorder</h3>
<p><strong>Christian R. Marshall and Stephen W. Scherer</strong><br />
A description of the history of genomic structural variation in ASD and how CNV discovery has been used to pinpoint novel ASD-susceptibility loci.   <a href="http://www.springerlink.com/content/h3788l81827008t0/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/h3788l81827008t0/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="structural-variation-subtelomeres"></a>Structural Variation in Subtelomeres</h3>
<p><strong>M. Katharine Rudd</strong><br />
A guide to the composition and structural variation of subtelomeres, and how FISH and array technologies have been applied to characterize them.   <a href="http://www.springerlink.com/content/w634576w564q11l8/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/w634576w564q11l8/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="array-prenatal-diagnosis"></a>Array-Based Approaches in Prenatal Diagnosis</h3>
<p><strong>Paul D. Brady, Koenraad Devriendt, Jan Deprest and Joris R. Vermeesch</strong><br />
An overview of the recent developments on the use of array CGH in the prenatal setting and a discussion of how to best implement it.   <a href="http://www.springerlink.com/content/k6m23377008g4378/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/k6m23377008g4378/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="sv-effect-expression"></a>Structural Variation and Its Effect on Expression</h3>
<p><strong>Louise Harewood, Evelyne Chaignat and Alexandre Reymond</strong><br />
A discussion of the profound and dramatic effect that SVs can have on the expression of genes mapping within them, nearby, and elsewhere in the genome.   <a href="http://www.springerlink.com/content/r84063755425732q/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/r84063755425732q/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="complex-dynamic-regions-genome"></a>The Challenges of Studying Complex and Dynamic Regions of the Human Genome</h3>
<p><strong>Edward J. Hollox</strong><br />
A review of key advances in the understanding of the variable structure of our genome, and a discussion of methods that may allow us to analyse this structure in fine detail.   <a href="http://www.springerlink.com/content/k363p0124p852k71/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/k363p0124p852k71/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="population-genetics-copy-number"></a>Population Genetic Nature of Copy Number Variation</h3>
<p><strong>Per Sjödin and Mattias Jakobsson</strong><br />
An update on recent progress on understanding CNVs, and discussion of population genetics, recombination, mutation, selection, and demography of these variants.   <a href="http://www.springerlink.com/content/m346012v63213378/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/m346012v63213378/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="genomic-structural-variation-mammals"></a>Detection and Interpretation of Genomic Structural Variation in Mammals</h3>
<p><strong>Ira M. Hall and Aaron R. Quinlan</strong><br />
A summary of the current state of knowledge of SV in mammals, and an exploration of the key biological insights that can be gained by applying NGS methods to model organisms.   <a href="http://www.springerlink.com/content/l285013156045585/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/l285013156045585/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="sv-somatic-mosaicism"></a>Structural Genetic Variation in the Context of Somatic Mosaicism</h3>
<p><strong>Jan P. Dumanski and Arkadiusz Piotrowski</strong><br />
A review combining evidence of structural variation in the context of somatic cells, highlighting the methodoligcal aspects of detection, challenges, and opportunities related to this field.   <a href="http://www.springerlink.com/content/u4281p226k367378/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/u4281p226k367378/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="online-resources-genomic-structural-variation"></a>Online Resources for Genomic Structural Variation</h3>
<p><strong>Tam P. Sneddon and Deanna M. Church</strong><br />
A description of current structural variation online resources highlighting how major databases have addressed the challenges in capturing, storing, and displaying SV data.   <a href="http://www.springerlink.com/content/gt5236r3pp348342/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/gt5236r3pp348342/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="cnv-discovery-snp-array"></a>Algorithm Implementation for CNV Discovery Using Affymetrix and Illumina SNP Array Data</h3>
<p><strong>Laura Winchester and Jiannis Ragoussis</strong><br />
A review of approaches to detect SVs by SNP array intensities, the importance of the quality control, and some guidelines for implementation.   <a href="http://www.springerlink.com/content/q4665102tu733161/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/q4665102tu733161/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="targeted-screening-validation-cnv"></a>Targeted Screening and Validation of Copy Number Variations</h3>
<p><strong>Shana Ceulemans, Karlijn van der Ven and Jurgen Del-Favero</strong><br />
A description of methods used for SV screening and validation, including FISH, qPCR, paralogue ratio test, molecular copy-number counting, multiplex PCR, and others.    <a href="http://www.springerlink.com/content/gn4452358p6v57gg/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/gn4452358p6v57gg/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="copy-number-acgh-ffpe"></a>High-Resolution Copy Number Profiling by Array CGH Using DNA Isolated from Formalin-Fixed, Paraffin-Embedded Tissues</h3>
<p><strong>Hendrik F. van Essen and Bauke Ylstra</strong><br />
A series of protocols tailored to array CGH of FFPE solid malignancies: from sectioning FFPE blocks to specific cynosures for pathological revision, DNA isolation, quality testing, and amplification.   <a href="http://www.springerlink.com/content/v146371177556320/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/v146371177556320/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="genetic-variation-personal-genome"></a>Characterizing and Interpreting Genetic Variation from Personal Genome Sequencing</h3>
<p><strong>Anna C. V. Johansson and Lars Feuk</strong><br />
An overview of whole-genome sequences completed to date and the challenge of interpreting the whole-genome sequence data both from a technical and clinical perspective.   <a href="http://www.springerlink.com/content/l5g430447p4h05p7/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/l5g430447p4h05p7/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<h3><a name="massively-parallel-sequencing-sv"></a>Massively Parallel Sequencing Approaches for Characterization of Structural Variation</h3>
<p><strong>Daniel C. Koboldt, David E. Larson, Ken Chen, Li Ding and Richard K. Wilson</strong><br />
Our own contribution to this volume is a review of methods and software applications for detecting, assembling, and characterizing SVs by next-generation sequencing.<br />
<a href="http://www.springerlink.com/content/x837j23405316526/fulltext.html" target="_blank"><strong>Article</strong></a> |  <a href="http://www.springerlink.com/content/x837j23405316526/fulltext.pdf" target="_blank"><strong>PDF</strong></a></p>
<p><span style="text-decoration: underline;"><strong>References</strong></span><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Methods+in+molecular+biology+%28Clifton%2C+N.J.%29&amp;rft_id=info%3Apmid%2F22228008&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Copy+number+variation+and+psychiatric+disease+risk.&amp;rft.issn=1064-3745&amp;rft.date=2012&amp;rft.volume=838&amp;rft.issue=&amp;rft.spage=97&amp;rft.epage=113&amp;rft.artnum=&amp;rft.au=Levy+RJ&amp;rft.au=Xu+B&amp;rft.au=Gogos+JA&amp;rft.au=Karayiorgou+M&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CHealth%2CGenetics%2C+Bioinformatics%2C+Clinical+Research%2C+Psychiatry">Levy RJ, Xu B, Gogos JA, &amp; Karayiorgou M (2012). Copy number variation and psychiatric disease risk. <span style="font-style: italic;">Methods in molecular biology (Clifton, N.J.), 838</span>, 97-113 PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/22228008">22228008</a></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Methods+in+molecular+biology+%28Clifton%2C+N.J.%29&amp;rft_id=info%3Apmid%2F22228021&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Characterizing+and+interpreting+genetic+variation+from+personal+genome+sequencing.&amp;rft.issn=1064-3745&amp;rft.date=2012&amp;rft.volume=838&amp;rft.issue=&amp;rft.spage=343&amp;rft.epage=67&amp;rft.artnum=&amp;rft.au=Johansson+AC&amp;rft.au=Feuk+L&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+Bioinformatics">Johansson AC, &amp; Feuk L (2012). Characterizing and interpreting genetic variation from personal genome sequencing. <span style="font-style: italic;">Methods in molecular biology (Clifton, N.J.), 838</span>, 343-67 PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/22228021">22228021</a></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Methods+in+molecular+biology+%28Clifton%2C+N.J.%29&amp;rft_id=info%3Apmid%2F22228022&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Massively+parallel+sequencing+approaches+for+characterization+of+structural+variation.&amp;rft.issn=1064-3745&amp;rft.date=2012&amp;rft.volume=838&amp;rft.issue=&amp;rft.spage=369&amp;rft.epage=84&amp;rft.artnum=&amp;rft.au=Koboldt+DC&amp;rft.au=Larson+DE&amp;rft.au=Chen+K&amp;rft.au=Ding+L&amp;rft.au=Wilson+RK&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CHealth%2CGenetics%2C+Bioinformatics%2C+Evolutionary+Biology%2C+Molecular+Biology%2C+Clinical+Research">Koboldt DC, Larson DE, Chen K, Ding L, &amp; Wilson RK (2012). Massively parallel sequencing approaches for characterization of structural variation. <span style="font-style: italic;">Methods in molecular biology (Clifton, N.J.), 838</span>, 369-84 PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/22228022">22228022</a></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Methods+in+molecular+biology+%28Clifton%2C+N.J.%29&amp;rft_id=info%3Apmid%2F22228009&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Detection+and+characterization+of+copy+number+variation+in+autism+spectrum+disorder.&amp;rft.issn=1064-3745&amp;rft.date=2012&amp;rft.volume=838&amp;rft.issue=&amp;rft.spage=115&amp;rft.epage=35&amp;rft.artnum=&amp;rft.au=Marshall+CR&amp;rft.au=Scherer+SW&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CHealth%2CGenetics%2C+Bioinformatics%2C+Developmental+Biology%2C+Clinical+Research">Marshall CR, &amp; Scherer SW (2012). Detection and characterization of copy number variation in autism spectrum disorder. <span style="font-style: italic;">Methods in molecular biology (Clifton, N.J.), 838</span>, 115-35 PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/22228009">22228009</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.massgenomics.org/2012/01/methods-protocols-genomic-structural-variation.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.massgenomics.org/2012/01/methods-protocols-genomic-structural-variation.html</feedburner:origLink></item>
		<item>
		<title>A Tumor Evolved: Relapsed Acute Myeloid Leukemia</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/8a4cXUJmTIU/a-tumor-evolved-relapsed-acute-myeloid-leukemia.html</link>
		<comments>http://www.massgenomics.org/2012/01/a-tumor-evolved-relapsed-acute-myeloid-leukemia.html#comments</comments>
		<pubDate>Fri, 13 Jan 2012 15:22:47 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1407</guid>
		<description><![CDATA[Contents: AML Sequencing • Capture Validation • Somatic Alterations • Mutation Profile • Evolution and Clonality • Convergent on IDH2 • Carl Zimmer Article • References Acute myeloid leukemia (AML) is a cancer of myeloid blood cells, in which abnormal white blood cells accumulate in the bone marrow and interfere with normal blood cell production. [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Contents: <a title="AML Whole-genome sequencing" href="#whole-genome-sequencing-relapsed-aml">AML Sequencing</a></strong> • <strong><a title="Capture sequencing validation" href="#tumor-capture-sequencing-validation">Capture Validation</a></strong> • <strong><a title="AML somatic alterations" href="#tumor-sequencing-alterations">Somatic Alterations</a></strong> • <strong><a title="Relapsed AML Mutation Profile" href="#aml-mutation-profile">Mutation Profile</a></strong> • <strong><a title="Tumor Clonality and Evolution" href="#tumor-evolution-clonality">Evolution and Clonality</a></strong> • <strong><a title="Convergent cancer evolution and IDH2" href="#cancer-convergent-evolution-idh2">Convergent on IDH2</a></strong> • <strong><a title="Carl Zimmer Article" href="#carl-zimmer-article">Carl Zimmer Article</a></strong> • <strong><a title="Tumor Evolution References" href="#tumor-evolution-references">References</a></strong><br />
Acute myeloid leukemia (AML) is a cancer of myeloid blood cells, in which abnormal white blood cells accumulate in the bone marrow and interfere with normal blood cell production. This is a highly malignant tumor affecting 13,000 adults in the United States each year; if left untreated, it progresses rapidly and leads to death within weeks or months. The standard treatment is chemotherapy: <strong>induction therapy</strong> to achieve remission, followed by <strong>consolidation therapy</strong> to eliminate any residual disease. Most of the 8,800 annual deaths in the United States are of patients who relapse with a tumor that has undergone clonal evolution at the cytogenetic level. These relapsed tumors, unlike primary AML, are resistant to chemotherapy and progress rapidly.</p>
<p><a name="whole-genome-sequencing-relapsed-aml"></a></p>
<h2 style="font-size: 16px;">Whole-genome sequencing of Relapsed AML</h2>
<p>By sequencing the complete genomes of primary tumor, relapsed tumor, and matched normal (skin) samples from 8 AML patients, our group was able to study <a title="Clonal evolution of leukemia tumors" href="www.nature.com/nature/journal/vaop/ncurrent/full/nature10738.html" target="_blank">clonal evolution of AML tumors</a> at the genetic level. The patients comprised 5 different French-American-British hematological subtypes; the time to relapse after initial diagnosis ranged from 235 to 961 days. One of these 8 cases was from patient UPN 933124, which we informally call &#8220;AML1&#8243; and whose primary tumor was the <strong><a title="AML1 First Cancer Genome" href="http://www.massgenomics.org/2008/11/aml-a-new-era-of-cancer-genomics.html">first cancer genome</a></strong> to be published, also by our group, back in 2008.  Whole-genome sequencing of the 8 cases (24 samples) achieved &gt;25x haploid coverage with &gt;97% of diploid alleles represented.<br />
<a name="tumor-capture-sequencing-validation"></a></p>
<h2 style="font-size: 16px;">Custom capture and deep sequencing validation</h2>
<p>One technical achievement of this project was the high-throughput, deep sequencing validation strategy. For each patient, we designed a custom hybridization capture array (Nimblegen) targeting all candidate somatic events from the primary tumor and relapse sample (median: 539 per case). This enabled <span style="color: #800000;"><strong>genome-wide validation of all variants in uniquely mapped regions</strong></span>, providing a sufficiently large set of data points for clonality analyses. Such a strategy is especially critical in AML, as tumors harbor relatively few somatic coding mutations (~21 per tumor in this study; the range is typically 5 to 50). Fragmented DNA from the primary tumor, relapse sample, and skin (normal) sample was individually hybridized with the validation probeset. Captured libraries then underwent deep sequencing on the Illumina platform, achieving a median of 590-fold coverage for each site. Such high redundancy of coverage enabled us to accurately compute allele frequency, and thereby, the fraction of tumor cells harboring each mutation.<br />
<a name="tumor-sequencing-alterations"></a></p>
<h2 style="font-size: 16px;">Patterns of somatic alteration</h2>
<p>We validated a total of 4,315 somatic events genome-wide. These included tier 1 (coding), tier 2 (conserved/regulatory), and tier 3 (unique noncoding) variants.</p>
<div class="mceTemp">
<dl id="attachment_1411" class="wp-caption alignnone" style="width: 411px;">
<dt class="wp-caption-dt"><a href="http://www.massgenomics.org/wp-content/uploads/2012/01/Tumor-Evolution-Mutations.jpg"><img class="size-full wp-image-1411 " title="Tumor-Evolution-Mutations" src="http://www.massgenomics.org/wp-content/uploads/2012/01/Tumor-Evolution-Mutations.jpg" alt="Somatic mutations in relapsed acute myeloid leukemia" width="401" height="298" /></a></dt>
<address class="wp-caption-dd">Validated Somatic Mutations (Credit: Ding et al, Nature 2012)</address>
</dl>
</div>
<p>As expected, tier 1 mutations comprised the smallest category and tier 3 the largest in all tumors. This generally reflects the proportion of the genome in each tier (tier 1 is just 5%), though we and other groups have observed that the mutation rates of many tumors are lower in coding regions, likely due in part to transcription-coupled DNA repair. We utilized exome data for 200 AML cases sequenced by the Cancer Genome Atlas research network to identify recurrently mutated genes. These included:</p>
<ul>
<li><strong>Known AML genes</strong>, including DNMT3A, FLT3, NPM1, IDH1, IDH2, WT1, RUNX1, PTPRT, PHF6, and ETV6</li>
<li><strong>Novel recurrently-mutated genes</strong>, including WAC, SMC3, DIS3, DDX41, and DAXX.</li>
</ul>
<p>Details on the recurrently mutated genes, and structural and functional analysis of somatic rearrangements, are provided as supplementary materials.<br />
<a name="aml-mutation-profile"></a></p>
<h2 style="font-size: 16px;">Mutational Profile of Relapsed AML</h2>
<p>AML1 (UPN 933124) exemplifies the analysis approach we applied to relapsed AML. There were 413 validated somatic events in AML1, of which 78 were relapse-specific, 5 were primary-tumor-specific, and 330 were shared between tumors. Deep sequencing validation revealed some interesting allele frequency patterns for validated mutations:</p>
<div class="mceTemp">
<dl id="attachment_1412" class="wp-caption alignnone" style="width: 606px;">
<dt class="wp-caption-dt"><a href="http://www.massgenomics.org/wp-content/uploads/2012/01/Tumor-Evolution-Clones.jpg"><img class="size-full wp-image-1412" title="Tumor-Evolution-Clones" src="http://www.massgenomics.org/wp-content/uploads/2012/01/Tumor-Evolution-Clones.jpg" alt="Tumor clones in AML" width="596" height="385" /></a></dt>
<address class="wp-caption-dd">Mutation allele frequency in AML1 (Credit: Ding et al, Nature 2012)</address>
</dl>
</div>
<p>You will note that most of the somatic events found in the primary tumor were also present in the relapse and vice-versa. Assuming that all mutations are heterozygous (which is likely), the observed allele frequencies suggest that shared mutations are present in virtually all tumor cells. The allele frequencies are higher in the primary tumor because its tumor cellularity, or purity (93.7%) was higher than that of the relapse (84.5%). Notably, we estimated that the tumor content of the &#8220;normal&#8221; skin sample was 29% due to infiltrating leukemic cells. This illustrates a key challenge in studying leukemia: most of the somatic mutations are observed at moderate allele frequency in the matched normal. A simplistic approach to somatic mutation detection, in which one simply subtracts all variants called in the normal from those called in the tumor, is poorly suited here. <strong><a title="VarScan 2 somatic mutation detection" href="http://varscan.sourceforge.net" target="_blank">VarScan 2</a></strong> and <strong><a title="SomaticSniper Mutation Detection Algorithm" href="http://gmt.genome.wustl.edu/somatic-sniper/current/" target="_blank">SomaticSniper</a></strong> are two mutation-detection algorithms developed by our group capable of addressing this problem.<br />
<a name="tumor-evolution-clonality"></a></p>
<h2 style="font-size: 16px;">Tumor Clonality and Evolution</h2>
<p>A clustering analysis of mutant allele frequencies suggested that there were four clones (tumor subpopulations) in the primary tumor defined by distinct sets of mutations.</p>
<ol>
<li>Clone 1 (46.86% of primary tumor) was the founding clone.</li>
<li>Clone 2 (24.89% of primary tumor) was derived from clone 1.</li>
<li>Clone 3 (16.00% of primary tumor) was derived from clone 1.</li>
<li>Clone 4 (2.39% of primary tumor) likely arose from clone 3.</li>
</ol>
<div class="mceTemp">
<dl id="attachment_1414" class="wp-caption alignnone" style="width: 619px;">
<dt class="wp-caption-dt"><a href="http://www.massgenomics.org/wp-content/uploads/2012/01/Tumor-Evolution-Model.jpg"><img class="size-full wp-image-1414" title="Tumor-Evolution-Model" src="http://www.massgenomics.org/wp-content/uploads/2012/01/Tumor-Evolution-Model.jpg" alt="Model of Tumor Cell Evolution in Leukemia" width="609" height="245" /></a></dt>
<address class="wp-caption-dd">Tumor Evolution Model in AML1 (Credit: Ding et al, Nature 2012)</address>
</dl>
</div>
<p>The evidence suggests that a relatively minor subpopulation of tumor cells (clone 4) survived chemotherapy and arose to become the dominant clone at relapse. In the process, it gained additional mutations, possibly via the DNA damage induced by chemotherapy. Four other AML cases in this study were consistent with this model of tumor evolution. We also observed another model, in which the dominant clone in the primary tumor gained mutations found only at relapse:</p>
<div class="mceTemp">
<dl id="attachment_1415" class="wp-caption alignnone" style="width: 481px;">
<dt class="wp-caption-dt"><a href="http://www.massgenomics.org/wp-content/uploads/2012/01/Tumor-Evolution-Dominant-Model.jpg"><img class="size-full wp-image-1415" title="Tumor-Evolution-Dominant-Model" src="http://www.massgenomics.org/wp-content/uploads/2012/01/Tumor-Evolution-Dominant-Model.jpg" alt="Dominant clone model for tumor evolution" width="471" height="153" /></a></dt>
<address class="wp-caption-dd">Dominant clone model of tumor evolution (Credit: Ding et al, Nature 2012)</address>
</dl>
</div>
<p>Three of the cases in this study were consistent with this model of tumor evolution.<br />
<a name="cancer-convergent-evolution-idh2"></a></p>
<h2 style="font-size: 16px;">Convergent Evolution of IDH2 Mutations</h2>
<p>Here&#8217;s an interesting side-story to this study that wasn&#8217;t really discussed in the paper. Two cases harbored mutations in isocitrate dehydrogenase 2 (IDH2), a gene known to be recurrently mutated in AML, glioblastoma, and other tumors. In patient AML28 (UPN 573988), we detected a C to A substitution at chr15:88432938 (build36) causing an amino acid change (R140L) in IDH2. This mutation was also present in the relapse sample at a moderate frequency. However, we also detected a G to A mutation at chr15:88432938 (one base downstream) affecting the same arginine residue but changing it to tryptophan (R140W) instead of leucine. And these two mutations were mutually exclusive. Thus, two subclones of the tumor in AML35 both acquired activating mutations of IDH2 at the same residue but by different mutations.</p>
<h2 style="font-size: 16px;">Zimmer Article on Discover Blogs</h2>
<p>Carl Zimmer has a <a title="Inside Darwin's Tumor by Carl Zimmer" href="http://blogs.discovermagazine.com/loom/2012/01/12/inside-darwins-tumor/" target="_blank">nice article on this study</a> on his blog at Go now, and read this paper.</p>
<p><a name="tumor-evolution-references"></a><span style="text-decoration: underline;"><strong>References</strong></span><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Adoi%2F10.1038%2Fnature10738&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Clonal+evolution+in+relapsed+acute+myeloid+leukaemia+revealed+by+whole-genome+sequencing&amp;rft.issn=0028-0836&amp;rft.date=2012&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fdoifinder%2F10.1038%2Fnature10738&amp;rft.au=Ding%2C+L.&amp;rft.au=Ley%2C+T.&amp;rft.au=Larson%2C+D.&amp;rft.au=Miller%2C+C.&amp;rft.au=Koboldt%2C+D.&amp;rft.au=Welch%2C+J.&amp;rft.au=Ritchey%2C+J.&amp;rft.au=Young%2C+M.&amp;rft.au=Lamprecht%2C+T.&amp;rft.au=McLellan%2C+M.&amp;rft.au=McMichael%2C+J.&amp;rft.au=Wallis%2C+J.&amp;rft.au=Lu%2C+C.&amp;rft.au=Shen%2C+D.&amp;rft.au=Harris%2C+C.&amp;rft.au=Dooling%2C+D.&amp;rft.au=Fulton%2C+R.&amp;rft.au=Fulton%2C+L.&amp;rft.au=Chen%2C+K.&amp;rft.au=Schmidt%2C+H.&amp;rft.au=Kalicki-Veizer%2C+J.&amp;rft.au=Magrini%2C+V.&amp;rft.au=Cook%2C+L.&amp;rft.au=McGrath%2C+S.&amp;rft.au=Vickery%2C+T.&amp;rft.au=Wendl%2C+M.&amp;rft.au=Heath%2C+S.&amp;rft.au=Watson%2C+M.&amp;rft.au=Link%2C+D.&amp;rft.au=Tomasson%2C+M.&amp;rft.au=Shannon%2C+W.&amp;rft.au=Payton%2C+J.&amp;rft.au=Kulkarni%2C+S.&amp;rft.au=Westervelt%2C+P.&amp;rft.au=Walter%2C+M.&amp;rft.au=Graubert%2C+T.&amp;rft.au=Mardis%2C+E.&amp;rft.au=Wilson%2C+R.&amp;rft.au=DiPersio%2C+J.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CGenetics%2C+Bioinformatics%2C+Cancer%2C+Hematology%2C+Stem+Cells">Ding, L., Ley, T., Larson, D., Miller, C., Koboldt, D., Welch, J., Ritchey, J., Young, M., Lamprecht, T., McLellan, M., McMichael, J., Wallis, J., Lu, C., Shen, D., Harris, C., Dooling, D., Fulton, R., Fulton, L., Chen, K., Schmidt, H., Kalicki-Veizer, J., Magrini, V., Cook, L., McGrath, S., Vickery, T., Wendl, M., Heath, S., Watson, M., Link, D., Tomasson, M., Shannon, W., Payton, J., Kulkarni, S., Westervelt, P., Walter, M., Graubert, T., Mardis, E., Wilson, R., &amp; DiPersio, J. (2012). Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing <span style="font-style: italic;">Nature</span> DOI: </span><a rev="review" href="http://dx.doi.org/10.1038/nature10738">10.1038/nature10738</a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.massgenomics.org/2012/01/a-tumor-evolved-relapsed-acute-myeloid-leukemia.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.massgenomics.org/2012/01/a-tumor-evolved-relapsed-acute-myeloid-leukemia.html</feedburner:origLink></item>
		<item>
		<title>Genetic Basis of an Aggressive Pediatric Leukemia</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/VJP5TZ2Un3c/genetic-basis-of-an-aggressive-pediatric-leukemia.html</link>
		<comments>http://www.massgenomics.org/2012/01/genetic-basis-of-an-aggressive-pediatric-leukemia.html#comments</comments>
		<pubDate>Thu, 12 Jan 2012 06:21:00 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1385</guid>
		<description><![CDATA[Contents: Early T-cell ALL • Whole-genome Sequencing • Genetic Architecture of ETP-ALL • A Stem-cell Leukemia Acute lymphoblastic leukemia (ALL) is the most common pediatric cancer, comprising two forms: B-cell ALL (85% of cases) and T-cell ALL (15% of cases). In this week&#8217;s issue of Nature, Jinghui Zhang and colleagues report the whole-genome sequencing of [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Contents: <a title="T-cell Acute lymphoblastic leukemia" href="#pediatric-leukemia">Early T-cell ALL</a></strong> • <strong><a title="Whole-genome sequencing of pediatric leukemia" href="#whole-genome-sequencing-cancer">Whole-genome Sequencing</a></strong> • <strong><a title="Genetic architecture of ETP Leukemia" href="#genetic-architecture-etp-leukemia">Genetic Architecture of ETP-ALL</a></strong> • <strong><a title="Stem cell-like Leukemia" href="#stem-cell-leukemia">A Stem-cell Leukemia</a></strong></p>
<table border="0" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td>Acute lymphoblastic leukemia (ALL) is the most common pediatric cancer, comprising two forms: B-cell ALL (85% of cases) and T-cell ALL (15% of cases). In this week&#8217;s issue of Nature, Jinghui Zhang and colleagues report the whole-genome sequencing of 12 cases of early T-cell precursor acute lymphoblastic leukemia (ETP-ALL), a recently described and aggressive subtype of T-ALL whose genetic basis was unknown. This is the first major publication of the <a title="Pediatric Cancer Genome Project" href="http://www.pediatriccancergenomeproject.org/site/" target="_blank">Pediatric Cancer Genome Project</a>, a collaborative effort between St. Jude Children&#8217;s Research Hospital and Washington University in St. Louis.</td>
<td><img class="alignright" src="http://www.massgenomics.org/wp-content/uploads/2010/01/st-judes-cancer-project.jpg" alt="" width="139" height="139" /></td>
</tr>
</tbody>
</table>
<p><a name="pediatric-leukemia"></a></p>
<h3>Early T-cell precursor acute lymphoblastic leukemia</h3>
<p>ETP-ALL is associated with a high risk of treatment failure, and bears some distinct characteristics:</p>
<ol>
<li> Lack of expression of T-lineage cell surface markers CD1a and CD8</li>
<li>Weak or absent expression of CD5</li>
<li>Aberrant expression of myeloid and hematopoietic stem cell markers (such asCD13, CD33, CD34, CD117)</li>
<li>&#8220;Early&#8221; cells that can differentiate into T-cell and myeloid lineages (but not B-cell).</li>
<li>Gene expression profiles remniscent of the mouse early T-cell precursor</li>
</ol>
<p>ETP-ALL tumors exhibit an unusually high burden of DNA copy number alterations, but no unifying genetic have been identified.<br />
<a name="whole-genome-sequencing-cancer"></a></p>
<h3>Whole-Genome Sequencing of ETP-ALL</h3>
<p>Zhang et al performed whole-genome sequencing on tumor samples and matched normals from 12 children with ETP-ALL. Tumor samples from two WGS cases also underwent transcriptome sequencing (RNA-seq). To extend their findings, the authors assembled a recurrence cohort of 94 T-cell ALL cases (52 ETP and 42 non-ETP). Three of the ETP samples in the extension cohort also underwent exome sequencing. On average, the authors 1,140 somatic mutations, including 154 that altered protein sequence, and 12 somatic structural rearrangements. More than half of the missense mutations were predicted to be deleterious, suggesting an enrichment for driver mutations involved in leukemogenesis. Notably, 51% of the validated SVs had breakpoints in protein-coding genes, including several with roles in hematopoiesis or leukemogenesis.<br />
<a name="genetic-architecture-etp-leukemia"></a></p>
<h3>Genetic Architecture of ETP-ALL</h3>
<p>Mutation discovery in the WGS cohort followed by recurrence testing in the extension cohort enabled the authors to identify several genetic patterns in ETP-ALL.</p>
<table border="1" cellspacing="0" cellpadding="2" bordercolor="#000000">
<tbody>
<tr>
<td style="background-color: #ffcc00;">Lesion Type</td>
<td style="background-color: #ffcc00;">Pathway</td>
<td style="background-color: #ffcc00;">Frequency</td>
<td style="background-color: #ffcc00;">Genes</td>
</tr>
<tr>
<td>Activating mutations</td>
<td>Cytokine receptor and RAS signaling</td>
<td>67%</td>
<td>NRAS, KRAS, FLT3, IL7R, JAK3, SH2B3, BRAF</td>
</tr>
<tr>
<td>Inactivating lesions</td>
<td>Hematopoietic development</td>
<td>58%</td>
<td>GATA3, ETV6, RUNX1, IKZF1, EP300</td>
</tr>
<tr>
<td>Inactivating lesions</td>
<td>Histone modification</td>
<td>48%</td>
<td>EZH2, EED, SUZ12, SETD2, EP300</td>
</tr>
</tbody>
</table>
<p><a name="stem-cell-leukemia"></a></p>
<h3>ETP-ALL is a stem-cell leukemia</h3>
<p>A detailed comparison of gene expression signatures between ETP ALL tumors and and normal human hematopoietic progenitor cells revealed a somewhat surprising finding: ETP-ALL expression patterns were less consistent with early T-cell precursors, as might have been expected, but more similar to the expression profile of normal hematopoietic stem cells and granulocyte macrophage precursors. They were also enriched for genes expressed in leukemic stem cells of poor-prognosis AML. The evidence from this study suggests that the genetic alterations in ETP ALL cause &#8220;gross maturational arrest&#8221; resulting in a poorly-differentiated, stem-cell-like leukemia. This observation raises the possibility that treatment regimens for AML, such as high-dose cytarabine, may be beneficial in treating this deadly malignancy.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Adoi%2F10.1038%2Fnature10725&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=The+genetic+basis+of+early+T-cell+precursor+acute+lymphoblastic+leukaemia&amp;rft.issn=0028-0836&amp;rft.date=2012&amp;rft.volume=481&amp;rft.issue=7380&amp;rft.spage=157&amp;rft.epage=163&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fdoifinder%2F10.1038%2Fnature10725&amp;rft.au=Zhang%2C+J.&amp;rft.au=Ding%2C+L.&amp;rft.au=Holmfeldt%2C+L.&amp;rft.au=Wu%2C+G.&amp;rft.au=Heatley%2C+S.&amp;rft.au=Payne-Turner%2C+D.&amp;rft.au=Easton%2C+J.&amp;rft.au=Chen%2C+X.&amp;rft.au=Wang%2C+J.&amp;rft.au=Rusch%2C+M.&amp;rft.au=Lu%2C+C.&amp;rft.au=Chen%2C+S.&amp;rft.au=Wei%2C+L.&amp;rft.au=Collins-Underwood%2C+J.&amp;rft.au=Ma%2C+J.&amp;rft.au=Roberts%2C+K.&amp;rft.au=Pounds%2C+S.&amp;rft.au=Ulyanov%2C+A.&amp;rft.au=Becksfort%2C+J.&amp;rft.au=Gupta%2C+P.&amp;rft.au=Huether%2C+R.&amp;rft.au=Kriwacki%2C+R.&amp;rft.au=Parker%2C+M.&amp;rft.au=McGoldrick%2C+D.&amp;rft.au=Zhao%2C+D.&amp;rft.au=Alford%2C+D.&amp;rft.au=Espy%2C+S.&amp;rft.au=Bobba%2C+K.&amp;rft.au=Song%2C+G.&amp;rft.au=Pei%2C+D.&amp;rft.au=Cheng%2C+C.&amp;rft.au=Roberts%2C+S.&amp;rft.au=Barbato%2C+M.&amp;rft.au=Campana%2C+D.&amp;rft.au=Coustan-Smith%2C+E.&amp;rft.au=Shurtleff%2C+S.&amp;rft.au=Raimondi%2C+S.&amp;rft.au=Kleppe%2C+M.&amp;rft.au=Cools%2C+J.&amp;rft.au=Shimano%2C+K.&amp;rft.au=Hermiston%2C+M.&amp;rft.au=Doulatov%2C+S.&amp;rft.au=Eppert%2C+K.&amp;rft.au=Laurenti%2C+E.&amp;rft.au=Notta%2C+F.&amp;rft.au=Dick%2C+J.&amp;rft.au=Basso%2C+G.&amp;rft.au=Hunger%2C+S.&amp;rft.au=Loh%2C+M.&amp;rft.au=Devidas%2C+M.&amp;rft.au=Wood%2C+B.&amp;rft.au=Winter%2C+S.&amp;rft.au=Dunsmore%2C+K.&amp;rft.au=Fulton%2C+R.&amp;rft.au=Fulton%2C+L.&amp;rft.au=Hong%2C+X.&amp;rft.au=Harris%2C+C.&amp;rft.au=Dooling%2C+D.&amp;rft.au=Ochoa%2C+K.&amp;rft.au=Johnson%2C+K.&amp;rft.au=Obenauer%2C+J.&amp;rft.au=Evans%2C+W.&amp;rft.au=Pui%2C+C.&amp;rft.au=Naeve%2C+C.&amp;rft.au=Ley%2C+T.&amp;rft.au=Mardis%2C+E.&amp;rft.au=Wilson%2C+R.&amp;rft.au=Downing%2C+J.&amp;rft.au=Mullighan%2C+C.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CGenetics%2C+Bioinformatics%2C+Cancer%2C+Hematology%2C+Immunology">Zhang, J., Ding, L., Holmfeldt, L., Wu, G., Heatley, S., Payne-Turner, D., Easton, J., Chen, X., Wang, J., Rusch, M., Lu, C., Chen, S., Wei, L., Collins-Underwood, J., Ma, J., Roberts, K., Pounds, S., Ulyanov, A., Becksfort, J., Gupta, P., Huether, R., Kriwacki, R., Parker, M., McGoldrick, D., Zhao, D., Alford, D., Espy, S., Bobba, K., Song, G., Pei, D., Cheng, C., Roberts, S., Barbato, M., Campana, D., Coustan-Smith, E., Shurtleff, S., Raimondi, S., Kleppe, M., Cools, J., Shimano, K., Hermiston, M., Doulatov, S., Eppert, K., Laurenti, E., Notta, F., Dick, J., Basso, G., Hunger, S., Loh, M., Devidas, M., Wood, B., Winter, S., Dunsmore, K., Fulton, R., Fulton, L., Hong, X., Harris, C., Dooling, D., Ochoa, K., Johnson, K., Obenauer, J., Evans, W., Pui, C., Naeve, C., Ley, T., Mardis, E., Wilson, R., Downing, J., &amp; Mullighan, C. (2012). The genetic basis of early T-cell precursor acute lymphoblastic leukaemia <span style="font-style: italic;">Nature, 481</span> (7380), 157-163 DOI: <a rev="review" href="http://dx.doi.org/10.1038/nature10725">10.1038/nature10725</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.massgenomics.org/2012/01/genetic-basis-of-an-aggressive-pediatric-leukemia.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.massgenomics.org/2012/01/genetic-basis-of-an-aggressive-pediatric-leukemia.html</feedburner:origLink></item>
		<item>
		<title>Cancer Genome and Exome Sequencing in 2011</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/TgdbJrFO3D8/cancer-genome-and-exome-sequencing-in-2011.html</link>
		<comments>http://www.massgenomics.org/2012/01/cancer-genome-and-exome-sequencing-in-2011.html#comments</comments>
		<pubDate>Thu, 05 Jan 2012 18:53:37 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1327</guid>
		<description><![CDATA[As a follow-up to my previous post, Disease-causing Mutations Discovered by NGS in 2011, I&#8217;ve attempted to compile cancer genome and exome sequencing studies published last year. For this compilation, I&#8217;ve emphasized publications in which whole-genome or exome sequencing was employed to multiple tumors or cancer cell lines, with the goal of identifying significantly altered [...]]]></description>
			<content:encoded><![CDATA[<p>As a follow-up to my previous post, <a title="Disease-causing mutations discovered by NGS in 2011" href="http://www.massgenomics.org/2011/12/disease-causing-mutations-discovered-by-ngs-in-2011.html" target="_blank">Disease-causing Mutations Discovered by NGS in 2011</a>, I&#8217;ve attempted to compile <strong>cancer genome and exome sequencing studies</strong> published last year. For this compilation, I&#8217;ve emphasized publications in which whole-genome or exome sequencing was employed to multiple tumors or cancer cell lines, with the goal of identifying significantly altered genes or pathways in malignant cells. Even with these restrictions, the list of papers in 2011 was rather extensive. Links are provided to Pubmed for the citations; when a &#8220;key finding&#8221; is a link, it points to previous Massgenomics posts on the study. I&#8217;ve broken the publications down into several categories:</p>
<p><a href="#"><strong>Cancer whole-genome sequencing</strong></a><br />
<a href="#"><strong>Exome sequencing of leukemia</strong></a><br />
<a href="#"><strong>Exome sequencing of carcinoma</strong></a><br />
<a href="#"><strong>Exome sequencing of melanoma</strong></a><br />
<a href="#"><strong>Pancreatic, gastric, and prostate cancer exomes</strong></a></p>
<h2>Whole-genome sequencing (WGS) studies</h2>
<p>Several whole-genome sequencing studies of cancer were published last year. The largest of these was a study from the Broad Institute, in which whole-genome sequencing or exome sequencing was applied to 38 multiple myeloma tumors.</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="4" align="center"><strong>Cancer Whole-genome Sequencing</strong></td>
</tr>
<tr>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="25%"><strong>Cancer Type</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="15%"><strong>#Samples</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;"><strong>Key Finding(s)</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="20%"><strong>Publication</strong></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Myelodysplastic syndrome (MDS)</td>
<td style="vertical-align: top; background: #FFFFFF;">1 WGS<br />
150 case<br />
extension</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Recurrent mutations of U2AF1 in MDS" href="http://www.massgenomics.org/2011/12/recurrent-splicing-mutations-in-mds-and-leukemia.html" target="_blank">Recurrent mutations the U2AF1 splicing factor</a></td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Whole Genome Sequencing of myelodysplastic syndromes" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22158538" target="_blank">Graubert et al. Nat. Genet.</a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">HCV+ Hepatocellular carcinoma</td>
<td style="vertical-align: top; background: #FFFFFF;">1 case</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Virus Associated Liver Cancer Genome" href="http://www.massgenomics.org/2011/05/a-virus-associated-liver-cancer-genome.html" target="_blank">Transcription-coupled DNA repair; several gene fusion events including BCORL1-ELF4</a></td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of Hepatocellular carcinoma" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21499249" target="_blank">Wei et al. Nat. Genet. </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Chronic lymphocytic leukemia</td>
<td style="vertical-align: top; background: #FFFFFF;">4 cases</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Whole genome sequencing of CLL" href="http://www.massgenomics.org/2011/06/recurrent-mutations-in-chronic-lymphocytic-leukemia.html" target="_blank">Recurrent mutations in NOTCH1, XPO1, MYD88, and KLHL6</a></td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Whole-genome Sequencing of CLL" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21642962" target="_blank">Puente et al. Nature</a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Colorectal adenocarcinoma</td>
<td style="vertical-align: top; background: #FFFFFF;">9 cases</td>
<td style="vertical-align: top; background: #FFFFFF;">Recurrent VTI1A-TCF7L2 fusions (3%)</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Whole-genome Sequencing of Colorectal adenocarcinoma" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21892161" target="_blank">Bass et al. Nat. Genet. </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Prostate cancer</td>
<td style="vertical-align: top; background: #FFFFFF;">7 cases</td>
<td style="vertical-align: top; background: #FFFFFF;">Rearrangements disrupting CADM2, PTEN, or MAGI2</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Whole-genome Sequencing of Prostate Cancer" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21307934" target="_blank">Berger et al. Nature </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Multiple myeloma</td>
<td style="vertical-align: top; background: #FFFFFF;">22 WGS,<br />
15 Exome,<br />
1 both</td>
<td style="vertical-align: top; background: #FFFFFF;">Activating mutations in BRAF (4%)</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Whole-genome Sequencing of Multiple Myeloma" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21430775" target="_blank">Chapman et al. Nature </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Therapy-related AML</td>
<td style="vertical-align: top; background: #FFFFFF;">1 case</td>
<td style="vertical-align: top; background: #FFFFFF;">Novel heterozygous 3-kilobase deletion affecting TP53 in germline DNA</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Whole-genome Sequencing of tAML" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21505135" target="_blank">Link et al. JAMA </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Non-Hodgkin lymphoma</td>
<td style="vertical-align: top; background: #FFFFFF;">14 cases<BR>+113 RNA-seq</td>
<td style="vertical-align: top; background: #FFFFFF;">Frequent mutation of MLL2 (32%/89%) and MEF2B (11.4%/13.4%) in DLBCL/FL.</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Whole-genome Sequencing of Non-hodgkin lymphoma" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21796119" target="_blank">Morin et al. Nature</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<h2>Exome sequencing of leukemia</h2>
<p>Several types of leukemia were characterized by exome sequencing, including chronic lymphocytic leukemia (CLL), acute myeloid leukemia (AML), acute monocytic leukemia (M5 AML), and pediatric acute lymphocytic leukemia (ALL).</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="4" align="center"><strong>Exome Sequencing of Leukemia</strong></td>
</tr>
<tr>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="25%"><strong>Cancer Type</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="15%"><strong>#Samples</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;"><strong>Key Finding(s)</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="20%"><strong>Publication</strong></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">CLL</td>
<td style="vertical-align: top; background: #FFFFFF;">105 cases</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Recurrent mutations of SF3B1 in CLL" href="http://www.massgenomics.org/2011/12/recurrent-splicing-mutations-in-mds-and-leukemia.html" target="_blank">Recurrent mutations of the splicing factor SF3B1</a></td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of CLL" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22158541" target="_blank">Quesada et al. Nat Genet 2011. </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">M5 AML</td>
<td style="vertical-align: top; background: #FFFFFF;">9 cases</td>
<td style="vertical-align: top; background: #FFFFFF;">Somatic mutations of DNMT3A</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of acute monocytic leukemia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21399634" target="_blank">Yan et al. Nat Genet 2011. </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">AML (normal karyotype)</td>
<td style="vertical-align: top; background: #FFFFFF;">1 exome<br />
553 case extension</td>
<td style="vertical-align: top; background: #FFFFFF;">Somatic mutations in BCOR</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of acute myeloid leukemia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22012066" target="_blank">Grossmann et al. Blood 2011. </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Pediatric ALL</td>
<td style="vertical-align: top; background: #FFFFFF;">2 cases</td>
<td style="vertical-align: top; background: #FFFFFF;">4-7 somatic protein-altering mutations per case</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of pediatric leukemia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22094584" target="_blank">Lilljebjorn et al. Leukemia 2011. </a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h2>Exome sequencing of carcinoma</h2>
<p>Exome sequencing was also applied to many solid organ tumors, particularly carcinomas. The <a href="http://cancergenome.nih.gov/ TITLE=" target="_blank">Cancer Genome Atlas</a> research network published a <a title="Genome Atlas of Ovarian Cancer" href="http://www.massgenomics.org/2011/07/genome-atlas-of-ovarian-cancer.html" target="_blank">landmark study of high-grade serous ovarian carcinoma</a>, the first of several common cancers to be characterized by TCGA. A number of other TCGA efforts will likely see publication in 2012, including breast cancer, colorectal cancer, glioblastoma, and leukemia.</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="4" align="center"><strong>Exome Sequencing of Carcinoma</strong></td>
</tr>
<tr>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="25%"><strong>Cancer Type</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="15%"><strong>#Samples</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;"><strong>Key Finding(s)</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="20%"><strong>Publication</strong></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Serous ovarian</td>
<td style="vertical-align: top; background: #FFFFFF;">316 cases</td>
<td style="vertical-align: top; background: #FFFFFF;">Recurrent mutations in TP53, NF1, BRCA1, BRCA2, RB1 and CDK12</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of ovarian cancer" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21720365" target="_blank">TCGA, Nature 2011. </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Head and neck squamous cell</td>
<td style="vertical-align: top; background: #FFFFFF;">74 cases</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Mutations in NOTCH1 by exome sequencing" href="http://www.massgenomics.org/2011/08/notch-tumor-suppression-in-hnscc.html" target="_blank">Mutations in TP53, CDKN2A, PIK3CA, HRAS, and squamous differentiation genes</a>.</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of squamous cell carcinoma" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21798893" target="_blank">Stransky et al. Science.</a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Head and neck squamous cell</td>
<td style="vertical-align: top; background: #FFFFFF;">32 cases</td>
<td style="vertical-align: top; background: #FFFFFF;">Mutations in TP53, CDKN2A, PIK3CA, and HRAS, FBXW7 and NOTCH1. <a title="Tumor suppressor NOTCH1 by exome sequencing" href="http://www.massgenomics.org/2011/08/notch-tumor-suppression-in-hnscc.html" target="_blank">Tumor-suppressor role for NOTCH1</a>.</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of HNSCC" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21798897" target="_blank">Agrawal et al. Science 2011. </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Renal carcinoma</td>
<td style="vertical-align: top; background: #FFFFFF;">7 cases</td>
<td style="vertical-align: top; background: #FFFFFF;">Frequent mutation of the SWI/SNF complex gene PBRM1</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of Renal Cancer" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21248752" target="_blank">Varela et al. Nature 2011. </a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h2>Exome sequencing of melanoma</h2>
<p>Melanoma was another cancer type surveyed by multiple exome sequencing studies in 2011. Interestingly, two studies of melanoma cell lines revealed <span style="color: #800000;"><strong>frequent mutations in mitogen activated protein</strong></span> (MAP) kinase kinase [kinase] genes <em>MAP2K1, MAP2K2, MAP3K5, </em>and<em> MAP3K9.</em></p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="4" align="center"><strong>Exome Sequencing of Melanoma</strong></td>
</tr>
<tr>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="25%"><strong>Cancer Type</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="15%"><strong>#Samples</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;"><strong>Key Finding(s)</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="20%"><strong>Publication</strong></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Melanoma</td>
<td style="vertical-align: top; background: #FFFFFF;">14 cases</td>
<td style="vertical-align: top; background: #FFFFFF;">Frequent mutations in GRIN2A</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of Melanoma" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21499247" target="_blank">Wei et al. Nat Genet. 2011. </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Metastatic Melanoma</td>
<td style="vertical-align: top; background: #FFFFFF;">8 cell lines</td>
<td style="vertical-align: top; background: #FFFFFF;">Mutations in MAP3K5 and MAP3K9</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of Metastatic Melanoma" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22197930" target="_blank">Stark et al. Nat Genet. 2011 </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Melanoma</td>
<td style="vertical-align: top; background: #FFFFFF;">7 cell lines</td>
<td style="vertical-align: top; background: #FFFFFF;">Recurring somatic MAP2K1 and MAP2K2 mutations (8%)</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of Melanoma cell lines" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22197931" target="_blank">Nikolaev et al. Nat Genet. 2011</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h2>Pancreatic, gastric, and prostate cancer</h2>
<p>Finally, I have a category of &#8220;other cancers&#8221; with exomes published in 2011; these include pancreastic cysts and cell lines, gastric cancer tumors, and prostate cancer samples derived through mouse xenograft models.</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="4" align="center"><strong>Pancreatic, Gastric, and Prostate Cancer Exomes</strong></td>
</tr>
<tr>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="25%"><strong>Cancer Type</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="15%"><strong>#Samples</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;"><strong>Key Finding(s)</strong></td>
<td style="align: center; vertical-align: top; background: #FFFFFF;" width="20%"><strong>Publication</strong></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Pancreatic cancer</td>
<td style="vertical-align: top; background: #FFFFFF;">15 cell lines</td>
<td style="vertical-align: top; background: #FFFFFF;">Genomic instability caused by MLH1 haploinsufficiency and complete deficiency</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of Pancreatic cancer" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22156295" target="_blank">Wang et al. Genome Res. 2011 </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Pancreatic neoplastic cysts</td>
<td style="vertical-align: top; background: #FFFFFF;">8 cyst resections</td>
<td style="vertical-align: top; background: #FFFFFF;">Recurrent mutations in components of ubiquitin-dependent pathways</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of pancreatic neoplastic cysts" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22158988" target="_blank">Wu et al. PNAS 2011. </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Gastric cancer</td>
<td style="vertical-align: top; background: #FFFFFF;">22 cases</td>
<td style="vertical-align: top; background: #FFFFFF;">Frequent mutation of ARID1A</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of gastric cancer" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22037554" target="_blank">Wang et al. Nat Genet 2011. </a></td>
</tr>
<tr>
<td style="vertical-align: top; background: #FFFFFF;">Prostate cancer</td>
<td style="vertical-align: top; background: #FFFFFF;">3 primaries<br />
16 metastases</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Prostate cancer exomes" href="http://www.massgenomics.org/2011/11/prostate-cancer-exomes-and-sequencing-matched-normals.html" target="_blank">Recurrent alterations in TP53, DLK2, GPC6, and SDF4</a></td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome Sequencing of prostate cancer" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21949389" target="_blank">Kumar et al. PNAS 2011 </a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h2>Conclusion and Outlook</h2>
<p>Around <strong>700 individual tumors</strong> representing <strong>17 major cancer types</strong> were characterized in the studies above. That&#8217;s an astonishing number, but it&#8217;s only the tip of the iceberg. This year will see the publication of thousands or tens of thousands of cancer genomes/exomes. New genes and pathways will be identified. Mutational patterns will be revealed. Novel therapeutic targets will be suggested. I think that 2012 will be an exciting year for cancer genomics.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Human+molecular+genetics&amp;rft_id=info%3Apmid%2F20843826&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Analysis+of+next-generation+genomic+data+in+cancer%3A+accomplishments+and+challenges.&amp;rft.issn=0964-6906&amp;rft.date=2010&amp;rft.volume=19&amp;rft.issue=R2&amp;rft.spage=&amp;rft.epage=96&amp;rft.artnum=&amp;rft.au=Ding+L&amp;rft.au=Wendl+MC&amp;rft.au=Koboldt+DC&amp;rft.au=Mardis+ER&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics%2C+Bioinformatics">Ding L, Wendl MC, Koboldt DC, &amp; Mardis ER (2010). <strong>Analysis of next-generation genomic data in cancer: accomplishments and challenges</strong>. <span style="font-style: italic;">Human Molecular Genetics, 19</span> (R2):R188-96. PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/20843826">20843826</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.massgenomics.org/2012/01/cancer-genome-and-exome-sequencing-in-2011.html/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		<feedburner:origLink>http://www.massgenomics.org/2012/01/cancer-genome-and-exome-sequencing-in-2011.html</feedburner:origLink></item>
		<item>
		<title>Disease-causing Mutations Discovered by NGS in 2011</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/DHd5ASRRW20/disease-causing-mutations-discovered-by-ngs-in-2011.html</link>
		<comments>http://www.massgenomics.org/2011/12/disease-causing-mutations-discovered-by-ngs-in-2011.html#comments</comments>
		<pubDate>Thu, 29 Dec 2011 18:04:26 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[exome]]></category>
		<category><![CDATA[genomics]]></category>
		<category><![CDATA[mutations]]></category>
		<category><![CDATA[sequencing]]></category>
		<category><![CDATA[variant detection]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1265</guid>
		<description><![CDATA[The number of human genetic diseases unraveled by next-generation sequencing skyrocketed this year. Several factors contributed to this growth, two of which were the ever-increasing throughput of sequencing instruments and widespread availability of commercial exome platforms. A number of large-scale initiatives to discovery disease genes by exome sequencing, particularly for Mendelian disorders, got off the [...]]]></description>
			<content:encoded><![CDATA[<p>The number of human genetic diseases unraveled by next-generation sequencing skyrocketed this year. Several factors contributed to this growth, two of which were the ever-increasing throughput of sequencing instruments and widespread availability of commercial exome platforms. A number of large-scale initiatives to discovery disease genes by exome sequencing, particularly for Mendelian disorders, got off the ground. I&#8217;d also argue that the rapid pace of discovery is also aided by a growing acceptance of sequencing as a clinical tool.</p>
<p>A PubMed search restricted to keywords &#8220;exome&#8221; and &#8220;sequencing&#8221; and year 2011 returned over 100 publications, of which more than 60 were studies linking genetic variation to human disease. I&#8217;ve whittled the list down to around 40 and (after consulting a medical dictionary for most) divided them down by rough disease categories.</p>
<h3>Developmental Disorders</h3>
<p>The largest of these was what I call &#8220;developmental disorders&#8221; &#8211; mental retardation, dysplasia (abnormal growth), dyskinesia (impaired movement), and the like. There were at least 14 gene-disease associations published this year, many of them in the <em>Americal Journal of Human Genetics</em>.</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="3" align="center"><strong>Developmental Disorders</strong></td>
</tr>
<tr>
<td style="background: #FFFFFF;">SMOC2</td>
<td style="vertical-align: top; background: #FFFFFF;">Major dental developmental defects</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome major dental developmental defects" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22152679" target="_blank">Bloch-Zupan et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">SYT14</td>
<td style="vertical-align: top; background: #FFFFFF;">Spinocerebellar ataxia with psychomotor retardation</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome ataxia with psychomotor" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21835308" target="_blank">Doi et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">TECR</td>
<td style="vertical-align: top; background: #FFFFFF;">Non-syndromic mental retardation</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome mental retardation" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21212097" target="_blank">Caliskan et al. Hum Mol Genet.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">PRRT2</td>
<td style="vertical-align: top; background: #FFFFFF;">Paroxysmal kinesigenic dyskinesia</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of dyskinesia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22101681" target="_blank">Chen et al. Nat Genet.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">SERPINF1</td>
<td style="vertical-align: top; background: #FFFFFF;">Osteogenesis imperfecta</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of osteogenesis imperfecta" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21353196" target="_blank">Becker et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">KIF22</td>
<td style="vertical-align: top; background: #FFFFFF;">Spondyloepimetaphyseal dysplasia with joint laxity</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of dysplasia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22152677" target="_blank">Min et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">KAT6B</td>
<td style="vertical-align: top; background: #FFFFFF;">Say-Barber-Biesecker syndrome</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of Say-Barber-Biesecker syndrome" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22077973" target="_blank">Clayton-Smith et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">POP1</td>
<td style="vertical-align: top; background: #FFFFFF;">Novel skeletal dysplasia</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of novel skeletal dysplasia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21455487" target="_blank">Glazov et al. PLoS Genet.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">CCDC8</td>
<td style="vertical-align: top; background: #FFFFFF;">3-M syndrome</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of 3-M syndrome" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21737058" target="_blank">Hanson et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">SLCO2A1</td>
<td style="vertical-align: top; background: #FFFFFF;">Primary hypertrophic osteoarthropathy</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of osteoarthropy" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22197487" target="_blank">Zhang et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">WDR62</td>
<td style="vertical-align: top; background: #FFFFFF;">Recurrent polymicrogyria</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of recurrent polymicroglia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21834044" target="_blank">Murdock et al. Am J Med Genet A</a>.</td>
</tr>
<tr>
<td style="background: #FFFFFF;">FAM20A</td>
<td style="vertical-align: top; background: #FFFFFF;">Amelogenesis imperfecta</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of amelogenesis imperfecta" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21549343" target="_blank">O&#8217;Sullivan et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">SHROOM3</td>
<td style="vertical-align: top; background: #FFFFFF;">Heterotaxy</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of heterotaxy" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21936905" target="_blank">Tariq et al. Genome Biol.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">MCT8</td>
<td style="vertical-align: top; background: #FFFFFF;">X-linked leucoencephalopathy</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of X-linked leukoencephalopathy" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21415082" target="_blank">Tsurusaki et al. J Med Genet.</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h3>Familial Cancer Syndromes</h3>
<p>Sequencing of individuals with hereditary cancer syndromes enabled the identification of some new cancer susceptibility genes. This category will undoubtedly explode in the coming year as thousands of cancer patients have their genomes or exomes sequenced.</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0" width="100%">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" width="100%" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="3" align="center"><strong>Hereditary Cancer Syndromes</strong></td>
</tr>
<tr>
<td style="background: #FFFFFF;">MAX</td>
<td style="vertical-align: top; background: #FFFFFF;">Hereditary pheochromocytoma</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of hereditary pheochromocytoma" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21685915" target="_blank">Comino-Mendez et al. Nat Genet.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">RET</td>
<td style="vertical-align: top; background: #FFFFFF;">Familial medullary thyroid carcinoma</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of family medullary thyroid carcinoma" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21655256" target="_blank">Qi et al. PLoS One.</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h3>Metabolic Disorders</h3>
<p>Next up, metabolic disorders. Interestingly, a study by Vissers and colleagues linked germline variants in <em>IDH1</em> &#8212; a gene recurrently mutated in leukemia, glioblastoma, and other cancers &#8212; to &#8220;metaphyseal chondromatosis&#8221;, a rare disorder of severe bone dysplasia, neurodevelopmental problems, and strongly increased secretion of D-2-hydroxy-glutaric acid.</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0" width="100%">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" width="100%" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="3" align="center"><strong>Metabolic Disorders</strong></td>
</tr>
<tr>
<td style="background: #FFFFFF;">ACSF3</td>
<td style="vertical-align: top; background: #FFFFFF;">Combined malonic and methylmalonic aciduria</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of combined malonic aciduria" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21785126" target="_blank">Alfares et al. J Med Genet.</a> and <a title="Exome sequencing of methylmalonic aciduria" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21841779" target="_blank">Sloan et al Nat. Genet.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">MTHFD1</td>
<td style="vertical-align: top; background: #FFFFFF;">Novel inborn error of folate metabolism</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of inborn error of folate metabolism" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21813566" target="_blank">Watkins et al. J Med Genet.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">IDH1</td>
<td style="vertical-align: top; background: #FFFFFF;">Metaphyseal chondromatosis with aciduria</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of metaphyseal chondromatosis" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22025298" target="_blank">Vissers et al. Am J Med Genet A.</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h3>Blood and Lymphatic Deficiencies</h3>
<p>Several inherited deficiencies of the blood and lymphatic system were linked to causal mutations. What I liked about this category was that half of the publications came out in &#8220;non-genome&#8221; journals &#8212; <em>Blood</em> and <em>Haematologica</em> &#8212; indicating that medical specialists in the field recognize the importance of (and in some cases, are already applying) exome sequencing to study such diseases.</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0" width="100%">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" width="100%" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="3" align="center"><strong>Blood and Lymphatic Disorders<br />
</strong></td>
</tr>
<tr>
<td style="background: #FFFFFF;">NBEAL2</td>
<td style="vertical-align: top; background: #FFFFFF;">Gray platelet syndrome</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing mutations in Gray platelet syndrome" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21765411" target="_blank">Albers et al. Nat Genet.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">GATA2</td>
<td style="vertical-align: top; background: #FFFFFF;">Dendritic cell, monocyte, B and NK lymphoid deficiency</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of lymphoid deficiency" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21765025" target="_blank">Dickinson et al. Blood.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">MPL</td>
<td style="vertical-align: top; background: #FFFFFF;">Familial aplastic anemia</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of familial aplastic anemia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22180433" target="_blank">Walne et al. Haematologica.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">GJC2</td>
<td style="vertical-align: top; background: #FFFFFF;">Primary lymphoedema</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of primary lymphoedema" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21266381" target="_blank">Ostergaard et al. J Med Genet.</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h3>Neurological Diseases</h3>
<p>Neurological disorders win the prize for making me look up the layman&#8217;s term for virtually every disorder whose causal gene was pinpointed by sequencing this year. These include such genes as <strong>lipofuscinosis</strong> (excessive accumulation of lipopigments), <strong>paraparesis</strong> (lower limb paralysis), and <strong>dystonia</strong> (abnormal muscle tone leading to movement and stature problems).</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="3" align="center"><strong>Neurodegenerative Disorders</strong></td>
</tr>
<tr>
<td style="background: #FFFFFF;">DNAJC5</td>
<td style="vertical-align: top; background: #FFFFFF;">Adult neuronal ceroid-lipofuscinosis</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of adult neuronal ceroid-lipofuscinosis" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22073189" target="_blank">Benitez et al. PLoS One.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">KIF1A</td>
<td style="vertical-align: top; background: #FFFFFF;">Hereditary spastic paraparesis</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing identifies mutations in hereditary spastic paraparesis" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21487076" target="_blank">Erlich et al. Genome Res.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">GCDH</td>
<td style="vertical-align: top; background: #FFFFFF;">Early-onset generalized dystonia</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of progressive dystonia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21912879" target="_blank">Marti-Masso et al. Hum Genet.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">FA2H</td>
<td style="vertical-align: top; background: #FFFFFF;">Fatty acid hydroxylase-associated neurodegeneration.</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing for fatty acid hytroxylas-associated" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22146942" target="_blank">Pierson et al. Eur J Hum Genet</a>.</td>
</tr>
<tr>
<td style="background: #FFFFFF;">AFG3L2</td>
<td style="vertical-align: top; background: #FFFFFF;">Spastic ataxia-neuropathy syndrome</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing in spastic ataxia-neuropathy syndrome" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=22022284" target="_blank">Pierson et al. PLoS Genet.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">BANF1</td>
<td style="vertical-align: top; background: #FFFFFF;">Hereditary progeroid syndrome</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of progeroid syndrome" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21549337" target="_blank">Puente et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">DYNC1H1</td>
<td style="vertical-align: top; background: #FFFFFF;">Dominant axonal Charcot-Marie-Tooth disease.</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of Charcot-Marie Tooth disease" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21820100" target="_blank">Weedon et al. AJHG.</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h3>Myopathies</h3>
<p>New disease genes were identified for several muscle fiber diseases (myopathies), including cardiomyopathy (heart muscle deficiency, usually fatal) and ophthalmoplegia, in which the muscles that control eye movement are paralyzed. Interestingly, the two mitochondrial cardiomyopathy disease genes (MRPL3 and AARS2) reported both encode products required for mitochondrial ribosomal function (MRPL3 encodes a ribosomal sub-unit, while AARS2 encodes a t-RNA synthetase).</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0" width="100%">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" width="100%" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="3" align="center"><strong>Myopathies</strong></td>
</tr>
<tr>
<td style="background: #FFFFFF;">MRPL3</td>
<td style="vertical-align: top; background: #FFFFFF;">Mitochondrial cardiomyopathy</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of mitochondrial cardiomyopathy" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21786366" target="_blank">Galmiche et al. Hum Mutat.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">AARS2</td>
<td style="vertical-align: top; background: #FFFFFF;">Infantile mitochondrial cardiomyopathy</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of infantile mitochondrial cardiomyopathy" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21549344" target="_blank">Gotz et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">RRM2B</td>
<td style="vertical-align: top; background: #FFFFFF;">Progressive external ophthalmoplegia</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of external ophthalmoplegia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21951382" target="_blank">Takata et al.  Genome Biol.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">BAG3</td>
<td style="vertical-align: top; background: #FFFFFF;">Dilated cardiomyopathy</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of dilated cardiomyopathy" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21353195" target="_blank">Norton et al. AJHG.</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h3>Vision-loss Disorders</h3>
<p>The last disease category I&#8217;ll mention is that of vision (loss) disorders. A number of new disease-causing genes were identified this year, mostly by exome sequencing. Two studies were particularly interesting. First, <strong>Bowne and colleagues</strong> (including myself) identified a mutation in the <em>RPE65</em> gene causing autosomal dominant retinitis pigmentosa. This gene had only been associated with autosomal <em>recessive</em> RP; finding that it acts in dominant fashion suggests previously unknown routes of disease pathogenesis and new therapeutic possibilities. Second, Shi et al linked mutations in the <em>ZNF644</em> gene to high myopia (severe nearsightedness), a common cause of blindness. Have you ever heard of a ZNFxxx gene that actually does something? Most of the time, you look these up and it says &#8220;May be involved in transcriptional regulation.&#8221; It&#8217;s good to know that at least one of them serves a purpose, namely, keeping most of us from virtual blindness.</p>
<table style="background: #AAAAAA;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="margin: 0x 0px 0px 0px;" align="center">
<table style="font-size: 10px; line-height: 14px;" border="0" cellspacing="1" cellpadding="1" bordercolor="#666666">
<tbody>
<tr>
<td style="background: #FFCC00;" colspan="3" align="center"><strong>Vision Disorders</strong></td>
</tr>
<tr>
<td style="background: #FFFFFF;">RPE65</td>
<td style="vertical-align: top; background: #FFFFFF;">Retinitis pigmentosa with choroidal involvement</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing identifies RPE65 mutations in autosomal dominant retinitis pigmentosa" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21654732" target="_blank">Bowne et al. Eur J Hum Genet</a>.</td>
</tr>
<tr>
<td style="background: #FFFFFF;">MAK</td>
<td style="vertical-align: top; background: #FFFFFF;">Retinitis pigmentosa</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of retinitis pigmentosa" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21835304" target="_blank">Ozgul et al. AJHG.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">ZNF644</td>
<td style="vertical-align: top; background: #FFFFFF;">High myopia</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of high myopia" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21695231" target="_blank">Shi et al. PLoS Genet.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">MAK</td>
<td style="vertical-align: top; background: #FFFFFF;">Retinitis pigmentosa</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing mutations of retinitis pigmentosa" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21825139" target="_blank">Tucker et al. PNAS.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">ALMS1<br />
IQCB1<br />
CNGA3<br />
MYO7A</td>
<td style="vertical-align: top; background: #FFFFFF;">Leber congenital amaurosis</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of LCA" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21901789" target="_blank">Wang et al. Hum Mutat.</a></td>
</tr>
<tr>
<td style="background: #FFFFFF;">DHDDS</td>
<td style="vertical-align: top; background: #FFFFFF;">Retinitis pigmentosa</td>
<td style="vertical-align: top; background: #FFFFFF;"><a title="Exome sequencing of retinitis pigmentosa" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Citation&amp;list_uids=21295283" target="_blank">Zuchner et al. AJHG.</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>And there you have it. The genetic basis of dozens of inherited disorders, pinpointed by next-generation sequencing. There is simply no plausible way to deny the importance of next-generation sequencing to advancing human health and medicine. One can only imagine what we&#8217;ll know by next December, as large federally-funded initiatives ramp up their efforts to systematically apply exome and whole-genome sequencing to inherited disorders.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Genome+Biology&amp;rft_id=info%3Adoi%2F10.1186%2Fgb-2011-12-9-408&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Next-generation+human+genetics&amp;rft.issn=1465-6906&amp;rft.date=2011&amp;rft.volume=12&amp;rft.issue=9&amp;rft.spage=408&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fgenomebiology.com%2Fcontent%2F12%2F9%2F408&amp;rft.au=Shendure%2C+J.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CGenetics%2C+Bioinformatics%2C+Clinical+Research%2C+Metabolism%2C+Cardiovascular%2C+Neurology">Shendure, J. (2011). Next-generation human genetics <span style="font-style: italic;">Genome Biology, 12</span> (9) DOI: <a rev="review" href="http://dx.doi.org/10.1186/gb-2011-12-9-408">10.1186/gb-2011-12-9-408</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.massgenomics.org/2011/12/disease-causing-mutations-discovered-by-ngs-in-2011.html/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		<feedburner:origLink>http://www.massgenomics.org/2011/12/disease-causing-mutations-discovered-by-ngs-in-2011.html</feedburner:origLink></item>
		<item>
		<title>Recurrent splicing mutations in MDS and leukemia</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/yrMpT98PY50/recurrent-splicing-mutations-in-mds-and-leukemia.html</link>
		<comments>http://www.massgenomics.org/2011/12/recurrent-splicing-mutations-in-mds-and-leukemia.html#comments</comments>
		<pubDate>Thu, 15 Dec 2011 14:58:10 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cancer]]></category>
		<category><![CDATA[genome]]></category>
		<category><![CDATA[sequencing]]></category>
		<category><![CDATA[splicing]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1223</guid>
		<description><![CDATA[Myelodysplastic syndrome (MDS, also called preleukemia) is a blood disorder characterized by ineffective production of myeloid cells, or leukocytes. The disorderly and ineffective production of blood cells from stem cells in the bone marrow results in low blood counts, or cytopenias. As many of 30% of MDS cases progress to full-blown, chemotherapy-resistant secondary AML. This [...]]]></description>
			<content:encoded><![CDATA[<p>Myelodysplastic syndrome (MDS, also called preleukemia) is a blood disorder characterized by ineffective production of myeloid cells, or leukocytes. The disorderly and ineffective production of blood cells from stem cells in the bone marrow results in low blood counts, or cytopenias. As many of 30% of MDS cases progress to full-blown, chemotherapy-resistant secondary AML. This week in <em>Nature Genetics</em>, two studies report recurrent mutations in splicing-related genes in blood tumors.</p>
<div class="mceTemp">
<dl id="attachment_1230" class="wp-caption alignleft" style="width: 164px;">
<dt class="wp-caption-dt"><a href="http://www.massgenomics.org/wp-content/uploads/2011/12/MDS-neutrophil.jpg"><img class="size-full wp-image-1230 " style="border: 1px solid black;" title="MDS-neutrophil" src="http://www.massgenomics.org/wp-content/uploads/2011/12/MDS-neutrophil.jpg" alt="" width="154" height="220" /></a></dt>
<address class="wp-caption-dd">MDS Cells (Wikipedia)</address>
</dl>
</div>
<p>First, Tim Graubert and colleagues describe the <a title="Recurrent mutations in U2AF1 in MDS" href="http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.1031.html" target="_blank">whole-genome sequencing of an MDS-derived secondary AML tumor</a> and a matched normal (skin) sample. They detected and validated <strong>507 somatic single-nucleotide variants in the tumor, nearly all of which (505) were detected in the MDS sample</strong>. Among these were 30 coding SNVs, of which one was a missense mutation in the <em>U2AF1</em> gene. The same codon of <em>U2AF1</em> was also mutated in two other MDS cases evaluated by WGS, highlighting it as a potential recurrently mutated gene. The authors undertook systematic sequencing <span style="color: #800000;"><strong>of <em>U2AF1</em> exons in 150 MDS cases, and found that 8.7% had mutations</strong></span> at Ser34.</p>
<h3>Characterization of Recurrent U2AF1 Mutations</h3>
<p>The authors undertook deep genomic resequencing, cDNA sequencing, and other experiments to characterize the nature of the <em>U2AF1</em> mutations, finding that:</p>
<ul>
<li>Mutant <strong>allele frequencies were at 40-50%</strong>, suggesting that the mutation was present in most or all tumor cells.</li>
<li>• SNP arrays and WGS indicated <strong>no large deletions</strong> or uniparental disomy spanning the <em>U2AF1</em> locus.</li>
<li>• Deep cDNA sequencing demonstrated that both <strong>wild-type and mutant alleles were expressed</strong>.</li>
<li>• There were <strong>no apparent differences in the amount of <em>U2AF1</em> mRNA</strong> between wild-type and mutated samples.</li>
<li>• In the 150 cases examined, <strong>no other positions in the gene were mutated</strong> (other than residue 34).</li>
</ul>
<p>Taken together, these observations suggest that <span style="color: #008000;"><strong><em>U2AF1</em> alteration was an early, initiating event</strong></span> and likely represents a gain-of-function mutation.</p>
<h3>U2AF1 and Splicing Factors</h3>
<p><em>U2AF1</em> encodes a small regulatory subunit of the U2AF splicing factor. It binds the 3′ AG splice acceptor dinucleotide of the pre-mRNA target intron, and forms a heterodimer with <em>U2AF2</em>, which binds the adjacent polypyrimidine tract. <em>U2AF1</em> is highly conserved, and loss of both copies is lethal in many species. Although it&#8217;s not known which domain of <em>U2AF1</em> binds the mRNA, <span style="color: #000080;"><strong>the Ser34 mutation occurs in a zinc-finger motif that may have RNA binding activity</strong></span>. Interestingly, in vitro reporter assays revealed that the Ser34 mutation causes an increase in splicing activity and more exon skipping relative to wild-type <em>U2AF1</em>. Further, an analysis of differentially expressed genes (by microarray) between samples with or without <em>U2AF1</em> mutations revealed that three of the top functional categories for down-regulated genes were splicing- or RNA-recognition-motif-related genes. This observation may reflect one or more compensatory mutations for the increased splicing activity of <em>U2AF1</em> mutants.</p>
<h3>Recurrent Mutation of SF3B1 in Chronic Lymphocytic Leukemia</h3>
<p>A second study in <em>Nature Genetics</em>, led by Victor Quesada and colleagues, employed <a title="Exome sequencing of CLL" href="http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.1032.html" target="_blank">exome sequencing to identify recurrent mutations in chronic lymphocytic leukemia</a> (CLL), the most common form of adult leukemia in western nations. The authors sequenced the <strong>exomes of tumor samples and matched controls from 105 patients with CLL</strong>, 60 of which had mutated <em>IGHV</em> regions (a common alteration in CLL) and 45 of which did not. They reported <span style="color: #800000;"><strong>~45 somatic mutations per case</strong></span>, and observed more protein-altering mutations in IGHV-mutated samples (12.8 +/- 0.7) than non-IGHV-mutated (10.6 +/- 0.7). Comparing this study to their previous work (<a title="Recurrent Mutations in CLL by whole-genome sequencing" href="http://www.massgenomics.org/2011/06/recurrent-mutations-in-chronic-lymphocytic-leukemia.html" target="_blank">WGS of 4 CLL cases</a>), the authors identified several new recurrently-mutated genes, including:</p>
<ul>
<li>• <em>SF3B1</em>, a subunit of the spliceosomal U2 snRNP11;</li>
<li>• <em>POT1</em>, a nuclear protein involved in telomere maintenance12;</li>
<li>• <em>CHD2</em>, which regulates gene expression by modification of chromatin structure13</li>
<li>• <em>LRP1B</em>, which has recently been defined as a tumor suppressor in different malignancies</li>
</ul>
<p>The authors focused on <em>SF3B1</em> mutations, which was altered by somatic point mutations in ~10% of cases. Systematic screening of 279  cases by 3730 sequencing revealed that <span style="color: #008000;"><strong>9.7% of CLL tumors harbored SF3B1 mutations</strong></span>, making this the most frequently mutated gene in CLL identified to date. The protein encoded by <em>SF3B1</em> is involved in the binding of the  U2 snRNP to the branch point near 3&#8242; splice sites. It interacts with RNA  sequences and at least two proteins near the branch point: the early  3′-splice-site recognition factor U2AF65 and the branch point–binding  protein SF3B14, as well as the RNA sequences near the branch point.</p>
<div class="mceTemp">
<dl id="attachment_1234" class="wp-caption alignnone" style="width: 457px;">
<dt class="wp-caption-dt"><a href="http://www.massgenomics.org/wp-content/uploads/2011/12/CLL-SF3B1-mutations.png"><img class="size-full wp-image-1234 " style="border: 1px solid black;" title="CLL-SF3B1-mutations" src="http://www.massgenomics.org/wp-content/uploads/2011/12/CLL-SF3B1-mutations.png" alt="" width="447" height="180" /></a></dt>
<address class="wp-caption-dd">SF3B1 Mutations. Credit: Quesada et al., Nat. Genet., 2011</address>
</dl>
</div>
<p>RNA-seq of SF3B1-mutated cases revealed some patterns of aberrant splicing, most of which paired a <strong>known 5&#8242; donor site with a new, abnormal 3&#8242; acceptor site</strong>. An analysis of splicing target genes revealed truncated versions of <em>SLC23A2</em>, a vitamin C transporter, and <em>TCIRG1</em>, one of whose gene products is a T-cell immune regulator. Another altered gene was FOXP1, known to be dysregulated in diffuse large B-cell lymphoma; the altered transcript lacked two PEST sequences normally required for protein degradation.</p>
<h3>Role of Splicing in Tumor Development and Progression</h3>
<p>Most adult tumors harbor hundreds or thousands of somatic mutations, only a fraction of which are likely to drive development and growth. Recurrence of mutations in the same gene or pathway remains the best way to isolate these &#8220;driver&#8221; mutations from background passenger events. These two studies, and a handful of others published late this year, suggest an important role for aberrant splicing in the early development of myeloproliferative disorders, such as MDS/sAML and CLL. What&#8217;s particularly important is that these appear to be <strong>gain-of-function mutations</strong>, which opens the door to new potential targeted therapies. It&#8217;s one step closer to personalized medicine for cancer patients, brought to you by next-generation sequencing.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+genetics&amp;rft_id=info%3Apmid%2F22158538&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Recurrent+mutations+in+the+U2AF1+splicing+factor+in+myelodysplastic+syndromes.&amp;rft.issn=1061-4036&amp;rft.date=2011&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Graubert+TA&amp;rft.au=Shen+D&amp;rft.au=Ding+L&amp;rft.au=Okeyo-Owuor+T&amp;rft.au=Lunn+CL&amp;rft.au=Shao+J&amp;rft.au=Krysiak+K&amp;rft.au=Harris+CC&amp;rft.au=Koboldt+DC&amp;rft.au=Larson+DE&amp;rft.au=McLellan+MD&amp;rft.au=Dooling+DJ&amp;rft.au=Abbott+RM&amp;rft.au=Fulton+RS&amp;rft.au=Schmidt+H&amp;rft.au=Kalicki-Veizer+J&amp;rft.au=O%27Laughlin+M&amp;rft.au=Grillot+M&amp;rft.au=Baty+J&amp;rft.au=Heath+S&amp;rft.au=Frater+JL&amp;rft.au=Nasim+T&amp;rft.au=Link+DC&amp;rft.au=Tomasson+MH&amp;rft.au=Westervelt+P&amp;rft.au=Dipersio+JF&amp;rft.au=Mardis+ER&amp;rft.au=Ley+TJ&amp;rft.au=Wilson+RK&amp;rft.au=Walter+MJ&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CGenetics%2C+Bioinformatics%2C+Cancer%2C+Hematology%2C+Stem+Cells">Graubert TA, Shen D, Ding L, Okeyo-Owuor T, Lunn CL, Shao J, Krysiak K, Harris CC, Koboldt DC, Larson DE, McLellan MD, Dooling DJ, Abbott RM, Fulton RS, Schmidt H, Kalicki-Veizer J, O&#8217;Laughlin M, Grillot M, Baty J, Heath S, Frater JL, Nasim T, Link DC, Tomasson MH, Westervelt P, Dipersio JF, Mardis ER, Ley TJ, Wilson RK, &amp; Walter MJ (2011). Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes. <span style="font-style: italic;">Nature genetics</span> PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/22158538">22158538</a></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+genetics&amp;rft_id=info%3Apmid%2F22158541&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Exome+sequencing+identifies+recurrent+mutations+of+the+splicing+factor+SF3B1+gene+in+chronic+lymphocytic+leukemia.&amp;rft.issn=1061-4036&amp;rft.date=2011&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Quesada+V&amp;rft.au=Conde+L&amp;rft.au=Villamor+N&amp;rft.au=Ord%C3%B3%C3%B1ez+GR&amp;rft.au=Jares+P&amp;rft.au=Bassaganyas+L&amp;rft.au=Ramsay+AJ&amp;rft.au=Be%C3%A0+S&amp;rft.au=Pinyol+M&amp;rft.au=Mart%C3%ADnez-Trillos+A&amp;rft.au=L%C3%B3pez-Guerra+M&amp;rft.au=Colomer+D&amp;rft.au=Navarro+A&amp;rft.au=Baumann+T&amp;rft.au=Aymerich+M&amp;rft.au=Rozman+M&amp;rft.au=Delgado+J&amp;rft.au=Gin%C3%A9+E&amp;rft.au=Hern%C3%A1ndez+JM&amp;rft.au=Gonz%C3%A1lez-D%C3%ADaz+M&amp;rft.au=Puente+DA&amp;rft.au=Velasco+G&amp;rft.au=Freije+JM&amp;rft.au=Tub%C3%ADo+JM&amp;rft.au=Royo+R&amp;rft.au=Gelp%C3%AD+JL&amp;rft.au=Orozco+M&amp;rft.au=Pisano+DG&amp;rft.au=Zamora+J&amp;rft.au=V%C3%A1zquez+M&amp;rft.au=Valencia+A&amp;rft.au=Himmelbauer+H&amp;rft.au=Bay%C3%A9s+M&amp;rft.au=Heath+S&amp;rft.au=Gut+M&amp;rft.au=Gut+I&amp;rft.au=Estivill+X&amp;rft.au=L%C3%B3pez-Guillermo+A&amp;rft.au=Puente+XS&amp;rft.au=Campo+E&amp;rft.au=L%C3%B3pez-Ot%C3%ADn+C&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CGenetics%2C+Bioinformatics%2C+Cancer%2C+Hematology">Quesada V, Conde L, Villamor N, Ordóñez GR, Jares P, Bassaganyas L, Ramsay AJ, Beà S, Pinyol M, Martínez-Trillos A, López-Guerra M, Colomer D, Navarro A, Baumann T, Aymerich M, Rozman M, Delgado J, Giné E, Hernández JM, González-Díaz M, Puente DA, Velasco G, Freije JM, Tubío JM, Royo R, Gelpí JL, Orozco M, Pisano DG, Zamora J, Vázquez M, Valencia A, Himmelbauer H, Bayés M, Heath S, Gut M, Gut I, Estivill X, López-Guillermo A, Puente XS, Campo E, &amp; López-Otín C (2011). Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. <span style="font-style: italic;">Nature genetics</span> PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/22158541">22158541</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.massgenomics.org/2011/12/recurrent-splicing-mutations-in-mds-and-leukemia.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.massgenomics.org/2011/12/recurrent-splicing-mutations-in-mds-and-leukemia.html</feedburner:origLink></item>
		<item>
		<title>Somatic Mutation Detection in Whole Genome Sequencing Data</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/RKD7xcLnc3U/somatic-mutation-detection-in-whole-genome-sequencing-data.html</link>
		<comments>http://www.massgenomics.org/2011/12/somatic-mutation-detection-in-whole-genome-sequencing-data.html#comments</comments>
		<pubDate>Thu, 08 Dec 2011 23:13:02 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cancer]]></category>
		<category><![CDATA[genome]]></category>
		<category><![CDATA[mutations]]></category>
		<category><![CDATA[sequencing]]></category>
		<category><![CDATA[variant detection]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1206</guid>
		<description><![CDATA[A paper online at Bioinformatics describes our flagship algorithm for detecting somatic point mutations in whole-genome sequencing of tumor samples. This freely available software package, called SomaticSniper, performs a Bayesian comparison of the genotype likelihoods in tumor and normal samples at every [covered] position in the genome. Overview Documentation Install The study includes a detailed [...]]]></description>
			<content:encoded><![CDATA[<p>A <a title="SomaticSniper Mutation Detection" href="http://bioinformatics.oxfordjournals.org/content/early/2011/12/05/bioinformatics.btr665.abstract" target="_blank">paper online</a> at <em>Bioinformatics</em> describes our flagship algorithm for detecting somatic point mutations in whole-genome sequencing of tumor samples. This <a title="SomaticSniper Download" href="http://gmt.genome.wustl.edu/somatic-sniper/1.0.0/index.html" target="_blank">freely available software package</a>, called SomaticSniper, performs a <strong>Bayesian comparison of the genotype likelihoods</strong> in tumor and normal samples at every [covered] position in the genome.</p>
<table border="0" cellspacing="0" cellpadding="1" bgcolor="#333333">
<tbody>
<tr>
<td>
<table border="0" cellspacing="0" cellpadding="2" align="center" bgcolor="#FFFFFF">
<tbody>
<tr>
<td bgcolor="#333333"><a href="http://gmt.genome.wustl.edu/somatic-sniper/1.0.0/index.html"><img class="alignnone size-full wp-image-1210" title="Sniper-Logo" src="http://www.massgenomics.org/wp-content/uploads/2011/12/Sniper-Logo.png" alt="" width="406" height="60" /></a></td>
</tr>
<tr>
<td><a title="Somatic Sniper Overview" href="http://gmt.genome.wustl.edu/somatic-sniper/1.0.0/index.html" target="_blank">Overview</a><br />
<a title="SomaticSniper Documentation" href="http://gmt.genome.wustl.edu/somatic-sniper/1.0.0/documentation.html" target="_blank">Documentation</a><br />
<a title="Install SomaticSniper" href="http://gmt.genome.wustl.edu/somatic-sniper/1.0.0/install.html" target="_blank">Install</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>The study includes a detailed investigation of common sources of false positive mutation calls (usually from sequencing- or alignment-related artifacts) and describes a filtering strategy to remove them from mutation callsets.</p>
<h3>Inception: First Cancer Genomes</h3>
<p>Like many bioinformatics algorithms, SomaticSniper reached publication after a long and colorful history. It began in 2008 when we <a title="First Cancer Genome Sequenced" href="http://www.massgenomics.org/2008/11/aml-a-new-era-of-cancer-genomics.html" target="_blank">sequenced the first cancer genome</a>, AML1. At the time, we were generating fragment-end, 32 bp reads on early Illumina GA instruments. It took over a hundred lanes to achieve ~30-fold coverage on each sample (tumor and normal). We were in <strong>dire need of a short read aligner</strong> that could handle this amount of data, and <a title="MAQ Aligner" href="http://maq.sourceforge.net" target="_blank">Maq</a> answered the call (see my <a title="Top Ten Maq" href="http://www.massgenomics.org/2008/08/ten-favorite-things-about-maq.html" target="_blank">Maq Top Ten</a>).</p>
<p>In addition to serving as one of the most widely used short read aligners, Maq included a probabilistic genotype calling model for detecting germline SNPs in a single genome. Dave Larson (the lead author) and others from our group developed an algorithm to compare genotype likelihoods between tumor and normal, to compute the probability that a site is not somatic given the sequence data. Putative somatic mutations receive a <span style="color: #800000;"><strong>somatic score</strong></span>, a Phred-scaled value representing the quality of the call. Here&#8217;s something interesting: during the data generation phase for AML1, as we added more sequence, the number of candidate mutations went down. This is because only a tiny fraction of variants in a tumor genome are somatic; the vast majority are germline variants also present in the normal. As better coverage was achieved, more and more variants turned out to be germline. By the end, it turned out that there were <strong>just ten somatic coding mutations</strong> in the tumor genome of AML1, a cytogenetically normal leukemia. A lot of people were flabbergasted. Ten little changes, and a woman got leukemia.</p>
<h3>More Genomes, Better Algorithm</h3>
<p>This algorithm became the core of our cancer whole-genome sequencing analysis pipeline, evolving and improving over the course of the <a title="Second cancer genome in New England Journal" href="http://www.massgenomics.org/2009/08/second-cancer-genome-in-new-england-journal.html" target="_blank">second cancer genome</a> (AML2) in the <em>New England Journal</em>, a breast cancer genome (<a title="Breast Cancer Genome" href="http://www.massgenomics.org/2009/10/first-breast-cancer-genome-in-nature.html" target="_blank">BRC1</a>), and others. It found, among others, mutations in <em>IDH1</em> and <em>DNMT3A</em> that we and others showed to be recurrent across many tumors. The algorithm&#8217;s name changed a few times, settling at last on SomaticSniper. It&#8217;s now a <strong>lean and hungry animal</strong>, capable of processing high-coverage whole-genome sequence pairs in a matter of hours.</p>
<h3>Filtering Out the Noise</h3>
<p>No matter how good the mutation caller, there are going to be some false positives. This is because you&#8217;re looking for a one-in-a-million event, a true somatic mutation. Raw SomaticSniper calls therefore undergo a series of Maq-inspired filters. Sites are retained if they meet these criteria:</p>
<ul>
<li>Covered by at least 3 reads</li>
<li>Consensus quality of at least 20</li>
<li>Called a SNP in the tumor sample with SNP quality of at least 20</li>
<li>Maximum mapping quality of at least 40</li>
<li>No high-quality predicted indel within 10 bp</li>
<li>No more than 2 other SNVs called within 10 bp</li>
</ul>
<p>Sites passing these criteria are subjected to two additional filters: a <strong>screen against germline variants from dbSNP</strong> (remove if matches position and allele of known non-cancer dbSNP) and an <strong>LOH filter</strong> (remove if normal is heterozygous and tumor homozygous for the same variant allele). Sites removed by the former are probably inherited variants under-sampled in the matched normal, while sites removed by the latter are likely due to large-scale structural changes (e.g. deletions) causing the loss of one allele. Finally, the filter-passed mutations are classified as high-confidence (HC) if the somatic score is at least 40 and the mapping quality is at least 40 (for BWA) or 70 (for Maq).</p>
<h3>Frequent Sources of False Positives</h3>
<p>Even sites that pass the filters above are vulnerable to certain sequencing and alignment artifacts that produce false positive calls. A detailed study revealed (as many in the field know already) <span style="color: #000080;"><strong>a few common sources of false positives</strong></span>: strand bias, homopolymer sequences, paralogous reads (deriving from a paralogous region of the genome, but mapped to the wrong region, usually three or more substitutions), and the read position of the predicted variant. The latter type of artifact is something new; it turned out that variants only seen near the &#8220;effective&#8221; 3&#8242; end of reads (the start of soft-trimmed bases or the actual end of the read if untrimmed) were more likely to be false positives. This may be a combination of sequencing error, which is higher at the 3&#8242; end of reads, and alignment bias favoring mismatches over gaps near the ends of reads. In any case, false positives deriving from these common causes tend to have certain properties enabling them to be identified and removed while maintaining sensitivity for true mutations.</p>
<p>SomaticSniper adds to the growing arsenal of tools developed by our group to address the significant challenges presented by next-generation sequencing data analysis.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Bioinformatics&amp;rft_id=info%3A%2Fdoi%3A+10.1093%2Fbioinformatics%2Fbtr665&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=SomaticSniper%3A+Identification+of+Somatic+Point+Mutations+in+Whole+Genome+Sequencing+Data&amp;rft.issn=1460-2059&amp;rft.date=2011&amp;rft.volume=Online&amp;rft.issue=&amp;rft.spage=1&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fbioinformatics.oxfordjournals.org%2Fcontent%2Fearly%2F2011%2F12%2F05%2Fbioinformatics.btr665.abstract&amp;rft.au=Larson%2C+DE.&amp;rft.au=Harris%2C+CC.&amp;rft.au=Chen%2C+K.&amp;rft.au=Koboldt%2C+DC.&amp;rft.au=Abbott%2C+TE.&amp;rft.au=Dooling%2C+DJ.&amp;rft.au=Ley%2C+TJ.&amp;rft.au=Mardis%2C+ER.&amp;rft.au=Wilson%2C+RK.&amp;rft.au=Ding%2C+L.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputer+Science+%2F+Engineering%2CGenetics%2C+Bioinformatics%2C+Algorithms">Larson, DE., Harris, CC., Chen, K., Koboldt, DC., Abbott, TE., Dooling, DJ., Ley, TJ., Mardis, ER., Wilson, RK., &amp; Ding, L. (2011). SomaticSniper: Identification of Somatic Point Mutations in Whole Genome Sequencing Data <span style="font-style: italic;">Bioinformatics, Online</span> : <a rev="review" href="doi: 10.1093/bioinformatics/btr665">doi: 10.1093/bioinformatics/btr665</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.massgenomics.org/2011/12/somatic-mutation-detection-in-whole-genome-sequencing-data.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.massgenomics.org/2011/12/somatic-mutation-detection-in-whole-genome-sequencing-data.html</feedburner:origLink></item>
		<item>
		<title>Prostate cancer exomes, and sequencing matched normals</title>
		<link>http://feedproxy.google.com/~r/Massgenomics/~3/kwXHNMUm8ZQ/prostate-cancer-exomes-and-sequencing-matched-normals.html</link>
		<comments>http://www.massgenomics.org/2011/11/prostate-cancer-exomes-and-sequencing-matched-normals.html#comments</comments>
		<pubDate>Tue, 29 Nov 2011 23:03:33 +0000</pubDate>
		<dc:creator>Dan Koboldt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cancer]]></category>
		<category><![CDATA[genome]]></category>
		<category><![CDATA[mutations]]></category>
		<category><![CDATA[sequencing]]></category>
		<category><![CDATA[variant detection]]></category>

		<guid isPermaLink="false">http://www.massgenomics.org/?p=1197</guid>
		<description><![CDATA[A new study in PNAS from Jay Shendure&#8217;s group at the University of Washington describes exome sequencing of 23 prostate cancers. These tumors were derived from aggressive primary tumors or lethal metastases, and propagated in immunocompromised mice as xenografts. For most of the tumors, matched normal DNA was unavailable, so the authors developed a filtering [...]]]></description>
			<content:encoded><![CDATA[<p>A new study in <em>PNAS</em> from Jay Shendure&#8217;s group at the University of Washington describes <a title="Exome sequencing of prostate cancer" href="http://www.pnas.org/content/108/41/17087.full" target="_blank">exome sequencing of 23 prostate cancers</a>. These tumors were derived from aggressive primary tumors or lethal metastases, and propagated in immunocompromised mice as xenografts. For most of the tumors, matched normal DNA was unavailable, so the authors developed a filtering strategy in which the <strong>growing catalogs of human sequence variation are employed to identify and remove germline polymorphisms</strong> from the lists of tumor genetic variants. Specifically, the authors used pilot project data from the 1,000 Genomes Project, and internally-available variants from ~2,000 additional exomes they&#8217;d sequenced. For the majority of tumors, <strong>this reduced ~13,500 coding SNVs down to ~350 &#8220;nov-SNVs&#8221; per tumor</strong> (a reduction of 97.4%). The authors readily admit that these nov-SNVs comprise a mixture of:</p>
<ol>
<li>Somatic mutations that were present in the original tumor.</li>
<li>Somatic mutations that occurred during tumor propagation and evolution in the mouse model.</li>
<li>Germline variants present in the patient&#8217;s constitutional genome that are absent from public databases, presumably due to rarity (e.g. private SNPs).</li>
<li>False-positive variant calls.</li>
</ol>
<h3>Recurrently Altered Gene Filtering</h3>
<p>Given a set of mutations from multiple tumors of the same type, the logical next step was to look for genes recurrently altered in the group, since <span style="color: #000080;"><strong>recurrence offers perhaps the best evidence of genes harboring &#8220;driver&#8221; mutations</strong></span>, which confer advantages for tumor growth and progression, as opposed to &#8220;passenger&#8221; mutations which do not. The problems for this study were two-fold: First, 16 unique tumors (from unrelated individuals) is a small cohort size with correspondingly small power to identify recurrent alterations. Nothing to be done about that. Second, even looking at just 16 tumors, there were 135 genes harboring non-synonymous nov-SNVs in two or more exomes. A substantial fraction of these are undoubtedly due to <strong>rare germline variants missed by the filter</strong>, rather than recurrently mutated genes.</p>
<p>To address this, the authors excluded from consideration the 1% of all genes (not just ones mutated in this study) with the highest rate of rare germline variants in control exomes. In other words, <span style="color: #008000;"><strong>they removed genes with the highest rate of germline polymorphism</strong></span>, which I note likely includes (1) genes with high genetic diversity, and (2) genes whose sequence characteristics make them more likely to give rise to false-positive variant calls. The danger of this strategy is that, in principle, genes with high genetic diversity are more prone to mutations, and it&#8217;s quite possible that some of these are driver genes for carcinogenesis. Nevertheless, this strategy reduced the list to 104 genes altered in two or more exomes. That&#8217;s still too many to tell a story about, so another step was taken.</p>
<p>Using a <strong>control set of 1,865 exomes</strong>, the authors performed an <strong>iterative sampling</strong> (I believe this is a bootstrap) to estimate the probability that a given gene would harbor recurrent nov-SNVs that were due to germline variation. Any genes with a germline recurrence probability of 0.001 or higher were excluded from the list, which dropped it sharply down to 20 genes with nov-SNVs in two or more prostate tumors (10 of these were found in three or more).</p>
<p>After all of these steps were taken, <span style="color: #800000;"><strong>the top recurrent gene was <em>TP53</em></strong></span>, which was altered in 5 of 16 tumors (31.25%). No other gene had as many recurrent hits in the study. This is a vote of confidence for the approach, because <em>TP53</em> is the one of the most frequently perturbed gene in many solid tumor types, including breast and ovarian cancers. Another believable recurrent gene was <em>GPC6</em>, which encodes a cell surface proteoglycan believed to act as a receptor for growth factors and other signaling molecules. Other recurrent genes highlighted in this study (<em>DLK2 </em>and <em>SDF4</em>) are less convincing. The simple fact is that we don&#8217;t know for certain which mutations are truly somatic in the primary tumor, so it&#8217;s difficult to draw strong conclusions.</p>
<h3>Direct Comparison with Matched Normals</h3>
<p>A few of the tumors did have matched normal tissue available, and the authors examined these in detail to assess the accuracy of their germline filtering approach. <strong>For three tumors, the authors had (1) mouse xenograft tumor tissue, (2) tumor tissue taken from the patient prior to metastasis, and (3) matched normal tissue.</strong> They applied exome sequencing to these to determine set of true somatic mutations (valid mutations) in the original tumor exomes. Valid mutations were compared with the xenograft&#8217;s predicted nov-SNVs to determine the number of valid mutations detected (valid detected), the number missed (valid missed), the fraction detected (sensitivity), and the proportion of nov-SNVs that were actually false positives (either germline variants or mis-calls).</p>
<table border="1" cellspacing="0" cellpadding="2" align="center" bordercolor="#333333">
<tbody>
<tr>
<td style="text-align: center;"><strong>Tumor ID</strong></td>
<td style="text-align: center;"><strong>nov-SNV</strong>s</td>
<td><strong>Valid Mutations</strong></td>
<td style="text-align: center;"><strong>Valid Detected</strong></td>
<td><strong>Valid Missed</strong></td>
<td><strong>Sensit- ivity</strong></td>
<td><strong>False Positives</strong></td>
</tr>
<tr>
<td style="text-align: left;">LuCap92</td>
<td style="text-align: right;">193</td>
<td style="text-align: right;">56</td>
<td style="text-align: right;">51</td>
<td style="text-align: right;">5</td>
<td style="text-align: right;">91.07%</td>
<td style="text-align: right;">73.58%</td>
</tr>
<tr>
<td>LuCap145.2</td>
<td style="text-align: right;">281</td>
<td style="text-align: right;">122</td>
<td style="text-align: right;">106</td>
<td style="text-align: right;">16</td>
<td style="text-align: right;">86.89%</td>
<td style="text-align: right;">62.28%</td>
</tr>
<tr>
<td>LuCap147*</td>
<td style="text-align: right;">2,122</td>
<td style="text-align: right;">2,045</td>
<td style="text-align: right;">1,823</td>
<td style="text-align: right;">222</td>
<td style="text-align: right;">89.14%</td>
<td style="text-align: right;">14.09%</td>
</tr>
</tbody>
</table>
<p>Note that only LuCap 92 was the same tumor tissue that was used to make the xenograft; the other two (LuCap 145.2 and LuCap 147) were neighboring metastases, and presumably closely related to the xenografted tumor. Exome sequencing and germline filtering of the xenograft enabled detection of ~89% of valid somatic mutations across all three cases. This is worrisome, because it means that <strong>11% of valid somatic mutations were removed by the germline filtering strategy</strong>. More on that later. Perhaps even more troubling is the <span style="color: #800000;"><strong>inferred false positive rate</strong></span> (fraction of nov-SNVs that are not valid somatic mutations in the tumor), <span style="color: #800000;"><strong>which was ~68%</strong></span> for LuCap 92 and LuCap 145.2.</p>
<p>LuCap 147 is notable in that it was one of three <strong>&#8220;hypermutated&#8221;  prostate cancer tumors, with 10-fold the number of nov-SNVs</strong>. It also had a lower false-positive rate because there were so many valid somatic mutations to detect. There were  no distinctive feature to explain the high number of mutations in hypermutated tumors, though  it suggests an acquired defect in DNA repair machinery. As only 15% of tumors had this mutation phenotype, the low false positive rate is an outlier. For most tumors, two thirds of the nov-SNVs obtained by the filtering approach are not valid somatic mutations.</p>
<h3>Reasons to Always Sequence the Matched Normal</h3>
<p>I have heard it said that sometime in the near future, our catalogs of human genetic variation will be complete enough that we won&#8217;t need to sequence matched normal tissue when studying cancer samples. The authors of this study claim that their results give credence to that notion. I respectfully disagree. True, the germline filtering strategy provided a 150-fold enrichment for valid somatic mutations. However, <strong>more than half of the final set of nov-SNVs were false positives</strong> (not somatic), <strong>and 11% of valid somatic mutations were inadvertently removed</strong>. I give you, then, my reasons why I believe we should always sequence the matched normal:</p>
<ol>
<li><strong>Public databases are not as good as you think</strong>. In this study, curated catalogs of sequence variants from known sources (the authors themselves, and the 1,000 Genomes Project) overlapped with 11% of valid somatic mutations, causing their removal. A filter based on the latest dbSNP is even more dangerous because, as some of us have recently discovered, <span style="color: #800000;"><strong>dbSNP contains a lot of somatic (not inherited) mutations</strong></span>. This is because certain cancer projects have submitted their somatic mutation callsets to dbSNP, and these have been accepted. Also, given the low barrier to entry, one should be aware that a lot of dbSNP entries are experimental false positives. Both of these can overlap with mutations in a tumor genome and cause them to be dismissed as germline variants.</li>
<li><strong>Non-SNV alterations are not amenable to filtering</strong>. Tumor genomes acquire insertions, deletions, structural variants, and copy number alterations, some of which may activate oncogenes or disrupt tumor suppressors. Let&#8217;s be honest: <span style="color: #008000;"><strong>the databases of non-SNV variants in germline form are woefully incomplete</strong></span>. Unlike SNVs, the coordinates and alleles of larger variants are ambiguous, which makes comparisons to existing variant catalogs very difficult. There are also other types of genetic changes in a tumor, such as loss of heterozygosity (LOH), that will be missed when you don&#8217;t know the normal genotype.</li>
<li><strong>True somatic mutations are exceptionally rare</strong> compared to germline variants. Inherited sequence variants occur at a rate of one per 500-1000 base pairs. In contrast, for most tumors, somatic mutations occur at a rate of one per <em>million</em> base pairs. Let&#8217;s say you have 20,000 coding variants in a tumor and 98% of those are in dbSNP. That leaves 400 private SNPs that filtering won&#8217;t remove, whereas most solid tumors harbor less than 100 somatic coding mutations. In this realistic scenario, only one out of every five post-filtered variants is a somatic mutation.</li>
<li><strong>Sequencing is cheap, but mistakes are not.</strong> Not long ago, you could argue that sequencing matched normals was too costly to be done systematically, even if they were available. That&#8217;s no longer the case. A single HiSeq lane gives you enough sequence for two exomes. Why not eliminate the largest source of false-positive mutations &#8211; the constitutional genome &#8211; by sequencing it as well? It will give you better predictions, and if you go on to validate candidate mutations (as you certainly should), it will probably end up saving you money. Trust me, it&#8217;s far better to sequence tumor-normal pairs together, at the same time, same exome platform, ideally same instrument run, to minimize batch effects between them.</li>
</ol>
<h3>Availability of Matched Normals</h3>
<p>Of course, sequencing a matched normal sample requires that such material is available. I recognize that this is not always the case. Some of the better-studied cancer cell lines, for example, were made from the tumors of long-dead cancer patients. For less common cancer types, <strong>many of the available samples will be frozen or FFPE samples, and getting a matched normal won&#8217;t be possible</strong>. However, if matched normal tissue is available, I&#8217;d argue that it should be assigned for sequencing under identical protocols as the tumor sample. And when you find those germline variants, don&#8217;t forget to submit them to dbSNP.</p>
<p><span style="text-decoration: underline;"><strong>References</strong></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+of+the+United+States+of+America&amp;rft_id=info%3Apmid%2F21949389&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Exome+sequencing+identifies+a+spectrum+of+mutation+frequencies+in+advanced+and+lethal+prostate+cancers.&amp;rft.issn=0027-8424&amp;rft.date=2011&amp;rft.volume=108&amp;rft.issue=41&amp;rft.spage=17087&amp;rft.epage=92&amp;rft.artnum=&amp;rft.au=Kumar+A&amp;rft.au=White+TA&amp;rft.au=MacKenzie+AP&amp;rft.au=Clegg+N&amp;rft.au=Lee+C&amp;rft.au=Dumpit+RF&amp;rft.au=Coleman+I&amp;rft.au=Ng+SB&amp;rft.au=Salipante+SJ&amp;rft.au=Rieder+MJ&amp;rft.au=Nickerson+DA&amp;rft.au=Corey+E&amp;rft.au=Lange+PH&amp;rft.au=Morrissey+C&amp;rft.au=Vessella+RL&amp;rft.au=Nelson+PS&amp;rft.au=Shendure+J&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CMedicine%2CGenetics%2C+Bioinformatics%2C+Cancer%2C+Endocrinology">Kumar A, White TA, MacKenzie AP, Clegg N, Lee C, Dumpit RF, Coleman I, Ng SB, Salipante SJ, Rieder MJ, Nickerson DA, Corey E, Lange PH, Morrissey C, Vessella RL, Nelson PS, &amp; Shendure J (2011). Exome sequencing identifies a spectrum of mutation frequencies in advanced and lethal prostate cancers. <span style="font-style: italic;">Proceedings of the National Academy of Sciences of the United States of America, 108</span> (41), 17087-92 PMID: <a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/21949389">21949389</a></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.massgenomics.org/2011/11/prostate-cancer-exomes-and-sequencing-matched-normals.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.massgenomics.org/2011/11/prostate-cancer-exomes-and-sequencing-matched-normals.html</feedburner:origLink></item>
	</channel>
</rss>

