<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:georss="http://www.georss.org/georss"><id>tag:blogger.com,1999:blog-36768584</id><updated>2009-07-09T08:35:54.183-04:00</updated><title type="text">Omics! Omics!</title><subtitle type="html">A computational biologist's personal views on new technologies &amp; publications on genomics &amp; proteomics and their impact on drug discovery</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default?start-index=26&amp;max-results=25" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>276</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><link rel="self" href="http://feeds.feedburner.com/OmicsOmics" type="application/atom+xml" /><entry><id>tag:blogger.com,1999:blog-36768584.post-4817535674843798272</id><published>2009-07-01T23:55:00.004-04:00</published><updated>2009-07-02T00:21:40.709-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="expression profiling" /><title type="text">Gene Expression from A-Z</title><content type="html">I was playing with the data from &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/18516045?ordinalpos=13&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum"&gt;an early RNA-Seq paper&lt;/a&gt; just to have a general idea of what such data looks like and to check out some favorite genes.  It was also an exercise in learning the latest Spotfire -- I had Spotfire back at MLNM but it's been over 2 years and a completely new interface was rolled out.&lt;br /&gt;&lt;br /&gt;An easy way to find favorite genes was and compare across the three tissues (brain, liver, muscle) is to set up a trellis plot with expression as the y-axis and the gene name as the x-axis, and then use the filtering tools to find my genes.  Of course, it's hard to avoid looking at the overall plot -- and picking out some fortuitous patterns.&lt;br /&gt;&lt;br /&gt;What immediately jumps out are the three semi-blank vertical zones (on the original you can spot a fourth very thin one convincingly in the original; it's vaguely there in the PNG shown here).  What are these?  Take a guess before reading below.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_N2AOZejgjyA/SkwxMVt3YDI/AAAAAAAAADc/HQdkGC8g0ZQ/s1600-h/AlphabeticalExpression.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 292px;" src="http://3.bp.blogspot.com/_N2AOZejgjyA/SkwxMVt3YDI/AAAAAAAAADc/HQdkGC8g0ZQ/s200/AlphabeticalExpression.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5353708145094385714" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The big one are all genes starting with "Olf" -- the olfactory receptors.  This is a large subfamily of type I G-protein coupled receptors (GPCRs) whose discovery &lt;a href="http://nobelprize.org/nobel_prizes/medicine/laureates/2004/press.html"&gt;netted a Nobel Prize&lt;/a&gt;.  In general, these are expressed solely in the olfactory epithelium, but a little more on that later.&lt;br /&gt;&lt;br /&gt;The thin line to the left of it has genes starting with Mirn -- micrornas, which this particularly sequencing effort wasn't very tuned for.  The next one to the left has genes starting with Ig -- immunoglobulin genes.  Since B-cells are not one of the samples, low expression there is no shocker.  The very thin line to the right of the Olf cluster which you might not see all start with Vr1 -- the vomeronasal receptors, another bit of specialized GPCRs involved in pheromone recognition.&lt;br /&gt;&lt;br /&gt;Of course, especially having an interactive display, you can find other patterns.  A block of genes starting with Mrp have very similar, high expressions in all three tissues -- the mitochondrial ribosomal proteins. A clump enriched for names starting with Psm shows a similar pattern -- the proteasome subunits.&lt;br /&gt;&lt;br /&gt;I don't recommend spending a lot of time doing this analysis -- the visual cortex is too good at picking up patterns &amp; clearly gene names were not picked to make this a great way to find biology.  But it is mildly fascinating.&lt;br /&gt;&lt;br /&gt;One further note.  While the Olf cluster has a lot of low expression, it isn't devoid of expression (below; ignore the sides as I'm still learning how to quite get the boundaries set precisely in SF).  Furthermore, some of the same genes are seen in all three samples.  Now, this could be erroneous due to improper fragment mapping or some other transcriptionally active gene that overlaps these, but I think we should also be open to the idea that some of the olfactory receptors may have been co-opted for other purposes.  After all, if there is a battery of diverse proteins with a spectacular range and sensitivity for different compounds, why wouldn't some be used for something other than exploring the environment?  &lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_N2AOZejgjyA/Skw0wJiZp5I/AAAAAAAAADk/vShRS0CJOJQ/s1600-h/Olfr.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 292px;" src="http://4.bp.blogspot.com/_N2AOZejgjyA/Skw0wJiZp5I/AAAAAAAAADk/vShRS0CJOJQ/s200/Olfr.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5353712058835249042" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4817535674843798272?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/4817535674843798272/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4817535674843798272" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4817535674843798272" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4817535674843798272" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/07/gene-expression-from-z.html" title="Gene Expression from A-Z" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_N2AOZejgjyA/SkwxMVt3YDI/AAAAAAAAADc/HQdkGC8g0ZQ/s72-c/AlphabeticalExpression.png" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-6450950435985988053</id><published>2009-06-29T23:38:00.002-04:00</published><updated>2009-06-30T00:02:22.154-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="genome sequencing" /><title type="text">Lox: The last genome for electrophoretic Sanger?</title><content type="html">Amongst the news last week is a bit of a surprise: the salmon genome project &lt;a href="http://www.genomeweb.com/sequencing/consortium-seeks-sanger-or-equivalent-multi-million-dollar-salmon-sequencing-con"&gt;is choosing Sanger sequencing for the first phase of the project&lt;/a&gt;.  Alas, one needs a premium subscription to In Sequence, which I lack, so I can't read the full article.  But, &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/18755037?ordinalpos=6&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum"&gt;the group has published (open access) a pilot study on some BACs&lt;/a&gt;, which concluded that 454 sequencing couldn't resolve a bunch of the sequence, and so shorter read technologies are presumably ruled out as well.  A goal of the project is a high quality reference sequence to serve as a benchmark for related fish, demanding very high quality.&lt;br /&gt;&lt;br /&gt;This announcement is a jolt for anyone who has concluded that Sanger has been largely put to pasture, confined to niches such as verifying clones and low-throughput projects.  Despite the gaudy throughput of the next-gen sequencers, read length remains a problem.  However, that hasn't stopped de novo assembly projects such as panda from apparently proceeding forward.  Apparently salmon is even nastier when it comes to repeats.&lt;br /&gt;&lt;br /&gt;Still playing the armchair next-gen sequencer (for the moment!), it is an interesting gedanken experiment.  Suppose you had a rough genome you really, really wanted to sequence and get a high-quality reference sequence.  On the one hand, Sanger sequencing is very well proven.  However, it is also more expensive per base than the newer technologies.  Furthermore, Sanger is pretty much a mature technology, with little investment in further improvement.  This is in contrast to next gen platforms, which are being pushed harder and harder both by the manufacturers as well as the more adventurous users.  This includes novel sequencing protocols to address difficult DNA, such as the &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/18941527?ordinalpos=2&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum"&gt;recently published Long March technique&lt;/a&gt; (which I'm still fully wrapping my head around) that generates nested libraries for next-gen sequencing using a serial Type IIS digestion scheme.  Complete Genomics has some trick for inserting multiple priming sites per circular DNA template.  Plus, &lt;a href="http://www.sciencemag.org/cgi/content/abstract/1162986?ijkey=jDCYbOratwXCc&amp;keytype=ref&amp;siteid=sci"&gt;Pacific Biosciences has demonstrated really long reads in a next gen platform&lt;/a&gt; -- but demonstrating is different than having it in production.&lt;br /&gt;&lt;br /&gt;So it boils down to the key question: do you spend your resources on the tried-and-true, but potentially pricey approach or try to bet that emerging techniques and technologies can deliver the goods soon enough.  Put another way, how critical is a high quality reference sequence?  Perhaps it would be better to generate very piecemeal drafts of multiple species now and then go for finishing the genomes when the new technologies come on line.  But what experiments dependent on that high quality reference would be put off a few years?  And what if the new technologies don't deliver, in which case you must fall back on Sanger and be quite a bit behind schedule. &lt;br /&gt;&lt;br /&gt;It's not an easy call.  Will salmon be the last Sanger genome?  It all depends on whether the new approaches and platforms can really deliver -- and someone is daring enough to try them on a really challenging genome.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-6450950435985988053?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/6450950435985988053/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=6450950435985988053" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/6450950435985988053" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/6450950435985988053" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/06/lox-last-genome-for-electrophoretic.html" title="Lox: The last genome for electrophoretic Sanger?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2523803750334992408</id><published>2009-06-21T19:55:00.006-04:00</published><updated>2009-06-21T21:11:04.909-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="genome sequencing" /><category scheme="http://www.blogger.com/atom/ns#" term="cancer" /><title type="text">Cancer Genome Sequencing--A (Pessimistic) Interim Analysis</title><content type="html">The current issue of Cancer Research carries a very brief (3 pages, with one page mostly tables &amp; figures) &lt;a href="http://cancerres.aacrjournals.org/cgi/content/abstract/69/12/4948"&gt;review of the first pulse of cancer genome sequencing papers&lt;/a&gt; (sub required to read article).  While sub-titled 'An Interim Analysis', perhaps a better subtitle would be 'A Uniformly Negative Analysis'.&lt;br /&gt;&lt;br /&gt;A full-press cancer genomics project has been a controversial drive, with many bemoaning the huge amount of resources devoted it and believing other avenues would be better suited for enhancing our ability to help cancer patients.  But it has gone forward, and a spate of papers over the last year have reported the early results.&lt;br /&gt;&lt;br /&gt;The initial papers have covered 4 of the big cancers in terms of incidence and mortality (lung, breast, colorectal and pancreatic) as well as glioblastoma and leukemia.  Different studies have taken different tacks.  In leukemia, we have the &lt;a href="http://cancerres.aacrjournals.org/cgi/external_ref?access_num=18987736&amp;link_type=MED"&gt;first parallel complete sequencing of a patient and their tumor&lt;/a&gt;.  Papers in breast, colorectal (together covered in two papers &lt;a href="http://cancerres.aacrjournals.org/cgi/ijlink?linkType=ABST&amp;journalCode=sci&amp;resid=318/5853/1108"&gt;here&lt;/a&gt; and &lt;a href="http://cancerres.aacrjournals.org/cgi/ijlink?linkType=ABST&amp;journalCode=sci&amp;resid=314/5797/268"&gt;here&lt;/a&gt;), &lt;a href="http://cancerres.aacrjournals.org/cgi/ijlink?linkType=ABST&amp;journalCode=sci&amp;resid=321/5897/1801"&gt;pancreatic&lt;/a&gt; and &lt;a href="http://cancerres.aacrjournals.org/cgi/ijlink?linkType=ABST&amp;journalCode=sci&amp;resid=321/5897/1807"&gt;glioblastoma&lt;/a&gt; looked at huge numbers of coding exons in small numbers of patients (11 patients x 18.2Kgenes for breast and colorectal; 21 patients x 20.6Kgenes for glioblastoma; 24 patients x 20.6Kgenes for pancreatic).  A &lt;a href="http://cancerres.aacrjournals.org/cgi/external_ref?access_num=18948947&amp;link_type=MED"&gt;lung paper&lt;/a&gt; and the other &lt;a href="http://cancerres.aacrjournals.org/cgi/external_ref?access_num=18772890&amp;link_type=MED"&gt;glioblastoma&lt;/a&gt; paper looked at ~600 genes, but in larger numbers of patients (188 in lung and 91 in glioblastoma).&lt;br /&gt;&lt;br /&gt;Personally, I would take a more nuanced view of the results.  I think it is hard to argue that these papers have had a shortage of fireworks there have been some important observations made, which curiously the Cancer Research review ignore completely.  In the lung study (which I have studied the closest) these include important exclusion and cooperativity relationships between mutations  and a number of novel, druggable candidate driver genes (protein kinases) not previously suspected in lung cancer.  In the many genes few patients glioblastoma study, it was the identificaiton of a mutational hotspot in isocitrate dehydrogenase 1 (later found to be present, though less frequently mutated, in &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/19228619?ordinalpos=5&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum"&gt;isocitrate dehydrogenase 2&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Of course, one thing which is changing rapidly is the cost of doing these studies.  Most of these papers used conventional PCR amplification and Sanger sequencing, which I would lowball estimate at $1/well (very lowball, but Sandra Porter caught some serious flak suggesting [as I have] a number much higher than this for the sequencing part, and I don't have the accounting experience to argue -- but I do know people who calculated it at Codon and this would be a very low estimate) -- so those studies looking at nearly every coding exon were at least a quarter million per patient (those 20+K genes explode out to about a quarter million exons).  Clearly this isn't how things will tend to be done going forward; Illumina will now blow away genomes for $48K each and other companies are now quoting even lower.  This is still well in excess of the per patient estimate for the very focused studies, and I believe these (particularly the lung study) demonstrate the value of lots of patients, since this started to give the numbers required to look at interactions between mutations.&lt;br /&gt;&lt;br /&gt;One of the reasons the Cancer Research authors aren't terribly pleased with the progress is clear: they feel the experiments aren't the correct ones.  But whereas some of the flak I had seen directed at the cancer genome sequence concept was instead promoting more functional approaches (such as RNAi library screening), what these authors want (or at least set as the minimum bar of for interesting) is cancer genome screening on an almost monomaniacal scale: thousands if not millions of individual cells from the same tumor!  Clearly this would be fascinating, as there is plenty of evidence that tumors are a motley collection of genetically variant cells (but clonal -- all the tumor cells have the same ancestor, but they also are all sloppy DNA copyists).  And, as they note, no DNA sequencing technology here now or on the immediate horizon has any shot at a project of this scale.&lt;br /&gt;&lt;br /&gt;While I do believe this would be interesting, I'm not as certain it would be informative for patient care.  Since many of these mutations are under very little selection, the spectrum of observed mutations is likely to be enormous.  Given that there is already a horrendous backlog of characterizing mutations seen in the studies to date (though there has been &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/19359588?ordinalpos=5&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum"&gt;a paper already functionally characterizing the isocitrate dehydrogenase mutations&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;What is particularly strange about this view is that a more reasonable intermediate step would be to look at those cells that do escape the primary tumor (most of the cancer genome papers so far have focused on primary tumors, though the IDH mutations are primarily found in secondary glioblastomas) -- sequence the metastases.  Ideally, this would mean finding multiple patients willing to consent to their genome, their primary's genome, and multiple metastases' genomes being sequenced -- the latter quite likely coming from autopsies (otherwise it is a lot of painful biopsying without much hope of helping the patient, an ethically questionable activity).  Or, in leukemias one could more easily resequence after each relapse.  Such studies would be doable technically and not cost ridiculous (though clearly not chump change either). &lt;br /&gt;&lt;br /&gt;There's also the open question as to whether the real fireworks will come from sequencing less studied cancers, such as the recent success in &lt;a href="http://search.nejm.org/search?p=R&amp;srid=S9%2d3&amp;lbc=nejm&amp;w=Granulosa%20Cell&amp;url=http%3a%2f%2fcontent%2enejm%2eorg%2fcgi%2fcontent%2fshort%2fNEJMoa0902542%3fresourcetype%3dHWCIT&amp;rk=3&amp;uid=329532472&amp;sid=2&amp;ts=subs&amp;rsc=yRISaicMlUCFHutm&amp;method=and&amp;isort=score&amp;start%5fyear=1999&amp;start%5fmonth=6&amp;rurl=yes&amp;nurl=http%3a%2f%2fcontent%2enejm%2eorg%2fcgi%2fcontent%2fabstract%2fNEJMoa0902542%3fresourcetype%3dHWCIT"&gt;using transcriptome sequencing to identify the probable causative mutation in a rare type of ovarian cancer&lt;/a&gt; (see also the &lt;a href="http://content.nejm.org/cgi/content/extract/NEJMe0903433v1"&gt;News and Views piece&lt;/a&gt;).  Perhaps we've mined the rich ore out of some of these veins, and it is the less worked seams which will yield fine genomic insights.  Time will tell.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2523803750334992408?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/2523803750334992408/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2523803750334992408" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2523803750334992408" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2523803750334992408" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/06/cancer-genome-sequencing-pessimistic.html" title="Cancer Genome Sequencing--A (Pessimistic) Interim Analysis" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7579848659213147454</id><published>2009-05-31T21:45:00.004-04:00</published><updated>2009-05-31T22:19:04.194-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="bioinformatics" /><category scheme="http://www.blogger.com/atom/ns#" term="DNA sequencing" /><title type="text">Teasing small insertion/deletion events from next-gen data</title><content type="html">My interest in next-generation sequencing is well on the way from shifting from hobby to work-central, which is exciting.  So I'm now really paying attention to the literature on the subject.&lt;br /&gt;&lt;br /&gt;One of the interesting uses for next-generation sequencing is identifying insertion or deletion alleles (indels) in genomes, particularly the human genome.  Of course, the best way to do this is to do a lot of sequencing, compare the sequence reads against a reference genome, and identify specific insertions or deletions in the reads.  However, this is generally going to require a full genome run &amp; a certain amount of luck, especially in a diploid organism as you might not sample both alleles enough to see a heterozygous indel. A cancer genome might be even worse: these often have many more than two copies of the DNA at a given position and potentially there could be more than two different versions.  In any case, full genome runs are in the ballpark of $50K, so if you really want to look at a lot of genomes a more efficient strategy is needed.&lt;br /&gt;&lt;br /&gt;The most common approach is to sequence both ends of a DNA molecule and then compare the predicted distance between those ends with the distance on the reference genome.  If you know the distribution of lengths that the sequence library has, then you can spot cases where the length on the reference is very different.  In effect, you've lengthened (but made less precise) your ruler for measuring indels, and so you need many fewer measurements to find them.&lt;br /&gt;&lt;br /&gt;One aside: in a recent &lt;a href="http://w.on24.com/r.htm?e=141156&amp;s=1&amp;k=DA5CA29CA60EF656406214AADD886CE7"&gt;Cancer Genomics webinar&lt;/a&gt; I watched a distinction was made between "mate pairs" and "paired ends" -- except now I forget which they assigned to which label (and am too lazy/time strapped to watch the webinar right now).  In short, one is the case of sequencing both ends of a standardly prepared next-generation library, and the other involves snipping the middle out of a very large fragment to create the next-gen sequencing target.  Here I was prepared to go pedantic and I'm caught napping!&lt;br /&gt;&lt;br /&gt;Of course, that is &lt;span style="font-style:italic;"&gt;if&lt;/span&gt; you know the distribution of DNA insert sizes.  While you might have an estimate from the way the library is prepared, an obvious extension would be to infer the library's distribution from the actual data.  An even more clever approach would be to use this distribution to pick out candidates in which the paired end sequences lie well within the distribution, but are consistently shifted relative to that distribution.&lt;br /&gt;&lt;br /&gt;A paper fresh out of Nature Methods (subscription required &amp; no abstract) incorporates precisely these ideas into a program called &lt;a href="http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.f.256.html"&gt;MoDIL&lt;/a&gt;.  The program also explicitly models heterozygosity, allowing it to find heterozygous indels.&lt;br /&gt;&lt;br /&gt;In performance analysis on actual human shotgun sequence, the MoDIL paper claims 95+% sensitivity for detecting indels of &gt;=20bp.  I tfor library used, this is detecting 10% length difference (insert size mean: 208; stdev: 13).  The &lt;a href="http://www.nature.com/nmeth/journal/vaop/ncurrent/extref/nmeth.f.256-S1.pdf"&gt;supplementary materials&lt;/a&gt; also look at the ability to detect heterozygous deletions of various sizes as a function of genome coverage (the actual sequencing data used had 120X clone coverage, meaning the average nucleotide in the genome would be found in 120 DNA fragments in the sequencing run).  Dropping the coverage by a factor of 3 would be expect to still pick up most indels of &gt;=40.&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+Methods&amp;rft_id=info%3Adoi%2F10.1038%2Fnmeth.f.256&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=MoDIL%3A+detecting+small+indels+from+clone-end+sequencing+with+mixtures+of+distributions&amp;rft.issn=1548-7091&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=0&amp;rft.epage=0&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fdoifinder%2F10.1038%2Fnmeth.f.256&amp;rft.au=Lee%2C+S.&amp;rft.au=Hormozdiari%2C+F.&amp;rft.au=Alkan%2C+C.&amp;rft.au=Brudno%2C+M.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputer+Science%2CBioinformatics%2C+Computational+Biology%2C+Genetics+%2C+Algorithms"&gt;Lee, S., Hormozdiari, F., Alkan, C., &amp; Brudno, M. (2009). MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions &lt;span style="font-style: italic;"&gt;Nature Methods&lt;/span&gt; DOI: &lt;a rev="review" href="http://dx.doi.org/10.1038/nmeth.f.256"&gt;10.1038/nmeth.f.256&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7579848659213147454?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/7579848659213147454/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7579848659213147454" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7579848659213147454" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7579848659213147454" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/05/teasing-small-insertiondeletion-events.html" title="Teasing small insertion/deletion events from next-gen data" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-6086957171242502429</id><published>2009-05-25T08:13:00.003-04:00</published><updated>2009-05-25T08:50:39.622-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cancer" /><title type="text">Pondering tumor suppressors</title><content type="html">Now that I'm back in the cancer field full-time, I spend a lot of that time pondering the mysteries of the disease.  Despite an explosion of knowledge about the disease during my lifetime, we truly don't understand how it works.  In many ways we're still at the stage of the old story of seven blind men, not having figured out the elephant in front of us.&lt;br /&gt;&lt;br /&gt;Sometimes when genes acquire mutations this moves a cell on the road to cancer.  Such genes fall into two general categories.  Oncogenes acquire activating mutations or are amplified and then play an active role in cancer.  Tumor suppressors lead to disease when they are inactivated by mutations.  A handful of genes have a very murky status, seemingly able to play both roles.&lt;br /&gt;&lt;br /&gt;Many tumor suppressors were discovered through rare hereditary syndromes characterized by tumors.  For example, &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&amp;Cmd=ShowDetailView&amp;TermToSearch=5925&amp;ordinalpos=4&amp;itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum"&gt;RB1&lt;/a&gt; is the retinoblastoma gene; inactivation of this gene in the retina leads to horrific tumors of the eye.  &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&amp;Cmd=ShowDetailView&amp;TermToSearch=4763&amp;ordinalpos=3&amp;itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum"&gt;NF1&lt;/a&gt; is the neurofibramatosis gene; inactivation leads to benign tumors from nerves.  Perhaps the best known in the popular space are &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&amp;Cmd=ShowDetailView&amp;TermToSearch=672&amp;ordinalpos=2&amp;itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum"&gt;BRCA1&lt;/a&gt; and &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&amp;Cmd=ShowDetailView&amp;TermToSearch=675&amp;ordinalpos=2&amp;itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum"&gt;BRCA2&lt;/a&gt;, which greatly raise the risk of breast and ovarian cancer.&lt;br /&gt;&lt;br /&gt;A great mystery for many such genes is why the tissue specificity of the tumor syndrome?  In each of the genes mentioned above, the tumor syndrome appears to be very specific to a tissue type, yet in each of these cases the genes involved have been shown to be parts of cellular machinery used by every cell.  Why does a failure of a general part manifest itself so specifically?&lt;br /&gt;&lt;br /&gt;As we dig deeper into the genes and cancer, some of these distinctions do start smudging.  BRCA1 mutations, for example, do also raise the risk of pancreatic cancer -- but not nearly to the extent as for breast cancer.  If we look not at known hereditary links to cancer but the genes mutated in any cancer, we see these same players showing up.  For example,  &lt;a href="http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=bygene&amp;ln=RB1&amp;start=1&amp;end=1864&amp;coords=AA:AA"&gt;RB1 is frequently mutated in a variety of cancers, including lung cancers&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Here's an interesting further bit to ponder.  BRCA1 and BRCA2 are &lt;a href="http://ccr.coriell.org/Sections/Collections/NIGMS/fanc_pathway.aspx?PgId=253"&gt;in a pathway together&lt;/a&gt;, so it is not surprising that mutating either one would have a similar effect.  But again, mutations in other members of the pathway lead to other genetic disorders with different spectra of cancers.&lt;br /&gt;&lt;br /&gt;Now a new bit of the puzzle that continues the puzzling.  One of the physical partners of BRCA1 is BARD1.  A lot of effort has gone into finding variants in BARD1  and attempting to demonstrate their relevance to breast cancer risk.  While many variants have been found in BARD1, the linkage to breast cancer is weak if it exists at all.  But &lt;a href="http://www.nature.com/ng/journal/vaop/ncurrent/abs/ng.374.html"&gt;a new paper now links germline variation in BARD1 to the risk of aggressive neuroblastomas&lt;/a&gt;.  &lt;br /&gt;&lt;br /&gt;The one clear thread in this is that continuing to cross-reference these known tumor suppressors and their partners (such as &lt;a href="http://www.sciencemag.org/cgi/content/abstract/324/5924/217"&gt;this recent report on PALB2&lt;/a&gt;, a physical partner of BRCA2 with links now to breast and pancreatic cancer) with emerging genetic information will yield fruit.  There are probably many more such associations to be found and perhaps additional proteins in these pathways to be uncovered.  But when will we finally conceptualize the elephant?  That remains to be seen&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-6086957171242502429?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/6086957171242502429/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=6086957171242502429" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/6086957171242502429" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/6086957171242502429" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/05/pondering-tumor-suppressors.html" title="Pondering tumor suppressors" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7458145479677101570</id><published>2009-05-19T21:05:00.003-04:00</published><updated>2009-05-19T21:40:22.504-04:00</updated><title type="text">is Wolfram Alpha good for anything???</title><content type="html">The much heralded web tool &lt;a href="http://www.wolframalpha.com/"&gt;Wolfram Alpha&lt;/a&gt; debuted yesterday -- and I completely forgot about it.  But today a coworker asked me about it &amp; I kicked into full-blown test mode.  Count me as underwhelmed.&lt;br /&gt;&lt;br /&gt;Now, one of things which it is supposed to excel at is collecting information or doing calculations.  To be glib: it's not a search tool, but a find tool.  I've thrown a bunch of queries at it, and have yet to find something really cool.&lt;br /&gt;&lt;br /&gt;My first queries were complete duds.  Asking for the fastest train time between New York and Chicago yielded a flight time from New York to Chicago usually elicits the "I don't understand you" message, though some wording I've lost gave me a time to a town in Europe called Train.  &lt;br /&gt;&lt;br /&gt;If you plug in a human gene name, the result is a sort of simplified Entrez gene name query.  In some ways it is nice, but in others I found it less than fulfilling.  Plug in &lt;a href="http://www46.wolframalpha.com/input/?i=KRAS+gene"&gt;KRAS&lt;/a&gt; and you get an overview of KRAS's genetic structure, but nothing about the fact that certain mutations in this gene are oncogenic.  Don't put "gene" in the query and it guesses you mean some airport, though it does suggest the gene as an alternate option. Similarly, if you plug in EGFR, it's disappointing that it doesn't mention any of the important chemotherapeutics which target this.  &lt;br /&gt;&lt;br /&gt;Calculating things is supposed to be its forte, so I tried a bunch.  The first few didn't work well (e.g. how many carbon atoms in human chromosome X), but I do now know where I can convert from millimeters to furlongs.  So useful!  Or even better, convert 60mph to angstroms per nanosecond -- how did I ever live without this?  &lt;br /&gt;&lt;br /&gt;One side complaint: Wolfram Alpha seems to be a nearly closed universe.  Occasionally it will link out to Wikipedia on the side, but most of the facts it presents are dead ends.  So if you think it's wrong, such as below, there's no obvious way to figure out how it figured out what it told you.&lt;br /&gt;&lt;br /&gt;Similarly, it could use to explain itself a bit more.  I asked it to opine on the most important classification question in the world, and after several attempts "taxonomy of panda" (won't work with "pandas") I get the message "Assuming Ailuropoda melanoleuca | Use Ailurus fulgens instead" -- but nowhere does it give a common name or picture for either of these critters.  Curiously, Wolfram Alpha puts "Ailurus fulgens" (the red panda) in with bears, where it definitely doesn't belong.  I hadn't kept up with their taxonomy; according to both NCBI &amp; Wikipedia they're now their own branch of carnivores and not in the Raccoon family.&lt;br /&gt;&lt;br /&gt;The front page suggests typing in dates. Just putting in a day and month with no year was particularly useless, but other things I put in had curious results.  September 11th, 2001 notes that the World Trade Center was destroyed, along with the death of one of the terrorists. December 7th, 1941 yields the attack on Pearl Harbor.  &lt;br /&gt;&lt;br /&gt;But can you &lt;i&gt;believe&lt;/i&gt; that the only significant event it can remember for July 20th, 1969 is the birth of a minor TV actor Josh Holloway?  That &lt;a href="http://en.wikipedia.org/wiki/Apollo_11"&gt;most glorious day in human technological achievement&lt;/a&gt; and it can only find some face-of-the-moment?  AIIGGGHH!!!!!!!!!!!!!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7458145479677101570?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/7458145479677101570/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7458145479677101570" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7458145479677101570" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7458145479677101570" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/05/is-wolfram-alpha-good-for-anything.html" title="is Wolfram Alpha good for anything???" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-5992375862815093177</id><published>2009-05-11T20:36:00.002-04:00</published><updated>2009-05-11T23:04:31.572-04:00</updated><title type="text">Gene Tests Don't Blow Up!</title><content type="html">Today's Globe has &lt;a href="http://www.boston.com/news/science/articles/2009/05/11/do_it_yourself_genetic_sleuthing/?comments=all&amp;plckCurrentPage=0"&gt;a profile of the do-it-yourself genetic testing experiment&lt;/a&gt; that my former colleague Kay Aull is performing.  Among the people quoted is yours truly.&lt;br /&gt;&lt;br /&gt;Okay, it's really cool.  I did once get &lt;a href="http://www.newsweek.com/id/108878"&gt;a mention with several sentences in Newsweek&lt;/a&gt; (with a very distressed Mickey Mouse on the cover) but this time I got several column inches.  However, after I gave the phone interview I came down with a small case of the worries.  What if I was misquoted?  Worse, what if I was correctly quoted but pulled a Watson?  Luckily, what made it in fails to induce embarrassment, though there are bits which I wish hadn't been left out.&lt;br /&gt;&lt;br /&gt;The article is well worth reading (though it may become a pay article overnight; I forget the current policy).  With luck the wire services &amp; aggregators will pick up on it.&lt;br /&gt;&lt;br /&gt;I think anyone interested in genetic testing, DIY-bio, or just science in general should skim the comments thread.  There's a lot there to be worried about.&lt;br /&gt;&lt;br /&gt;First, a running theme is a worry that Kay will blow up her block or such.  Multiple posters, many claiming to work in labs.  Now, as Kay's comment (which is nice and level-headed, as I would have expected) points out, she's not using anything liable to do anything like that.  For the level of ethanol precipitation she's doing, a fifth of vodka would last quite a long time (an interesting experiment; I remember the Russians are said to have built lasers with the stuff).  &lt;br /&gt;&lt;br /&gt;A second class of fear is other sorts of toxins, primarily the spectre of ethidium bromide (a known carcinogen) as a DNA stain.  There are other, much safer stains, and it turns out that's what's Kay is using.&lt;br /&gt;&lt;br /&gt;Another general negative sentiment is that perhaps the city or her landlord should be (or might) shut this down.  I'm no lawyer, but this certainly wasn't obviously prohibited by any of my lease agreements.  Putting household cleaners in the public's hands (or solvents in the form of nail polish or paint removers) scares me far more than a little PCR.  &lt;br /&gt;&lt;br /&gt;One more sentiment worth noting: that this sort of thing should be done only in an official laboratory and that Kay shouldn't do this without getting a masters or Ph.D. first.  I suspect that these posters aren't aware that many of the same techniques are available in the toy section of any Target or Wal-Mart.  True, none of those offer PCR -- but they easily could.  PCR can be run without any special gear, though it would be awfully tedious.  They are probably also unaware of modern scientists who worked without Ph.D.s (e.g. Nobelist &lt;a href="http://en.wikipedia.org/wiki/Gertrude_Elion"&gt;Gertrude Elion&lt;/a&gt;) or in home labs (e.g. Nobelist &lt;a href="http://en.wikipedia.org/wiki/Rita_Levi-Montalcini"&gt;Rita Levi-Montalcini&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;On the other end of things, some of the positive posters are a bit worrisome.  One makes the quite apropos comparison of this to having a home darkroom, but gets their chemicals confused -- while the &lt;span style="font-style:italic;"&gt;stop&lt;/span&gt; solution is indeed just acetic acid, the fixer is not "drinkable but dull" but rather cyanide-based (cyanide is a great remover of silver, which is the job of the fixer).  &lt;br /&gt;&lt;br /&gt;There are also a number of posters who suggest that this information might be used against her by an insurance company or that it would be illegal to withhold it from same.  Whether this would be prohibited by &lt;a href="http://en.wikipedia.org/wiki/Genetic_Information_Nondiscrimination_Act"&gt;GINA&lt;/a&gt; isn't considered; I'm guessing the poster's aren't familiar with it.  Another poster relishes the idea that&lt;br /&gt;&lt;blockquote&gt;Perhaps she objects to the greed of her peers at Harvard who are charging people for the opportunity to get similar bio data - See &lt;a href="http://www.genomeweb.com/blog/round-100"&gt;http://www.genomeweb.com/blog/round-100&lt;/a&gt;.&lt;/blockquote&gt;  -- which is bizarre, given that the very GenomeWeb article mentions that these tests are free to participants!&lt;br /&gt;&lt;br /&gt;Regardless of how poorly informed or quick to leap to conclusions some of these folks are, this is indeed the landscape of public opinion, at least as plumbed by response to this article.  It would suggest that there is a lot of educating to do &amp; that it will be an uphill battle.  To a lot of people, science means formal labs and formal training and labs mean dangerous chemicals that might explode.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-5992375862815093177?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/5992375862815093177/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=5992375862815093177" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/5992375862815093177" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/5992375862815093177" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/05/gene-tests-dont-blow-up.html" title="Gene Tests Don't Blow Up!" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-6487040168858577300</id><published>2009-05-03T22:35:00.002-04:00</published><updated>2009-05-03T22:43:57.340-04:00</updated><title type="text">The New Gig</title><content type="html">I've always been a fan of the space program and I like movies, so when a movie astronaut speaks I listen.  Since Beyond Genomics changed it's name to BG Medicine, I can only interpret the advice as directing me to Infinity Pharmaceuticals.  &lt;br /&gt;&lt;br /&gt;Seriously, tomorrow I start at Infinity.  Infinity has a number of anti-cancer programs which it is exciting to be joining.  Of course, having drugs in the clinic can be a rocky ride; the day I agreed to go was the day a clinical trial was halted, and Infinity's stock fell 30% (or does somebody on Wall Street just not like me?)&lt;br /&gt;&lt;br /&gt;Strange but true story: The day of my interview, a new Netflix disc was scheduled to arrive.  The title: Infinity.  Spooky!&lt;br /&gt;&lt;br /&gt;As far as this space, there will probably be some subtle shifts.  I'm probably a little too careful about not posting directly around where I'm working, but that is my habit and so areas such as cancer genomics may see less action.  Infinity, as mentioned above, is public &amp; so one must follow certain rules.&lt;br /&gt;&lt;br /&gt;On the other hand, that still leaves a lot of biology to comment on.  I probably will mine more of synthetic biology, a lot of genomics/proteomics/younameitomics and evolution.  Computational stuff I'm working on -- plus some old interests that were lit anew during my time out.  Plus some of my learnings from that time, where I set up and then dismantled a trans-Pacific consulting empire (yep! often had to cross Pacific Street to go from one client to another).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-6487040168858577300?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/6487040168858577300/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=6487040168858577300" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/6487040168858577300" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/6487040168858577300" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/05/new-gig.html" title="The New Gig" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-3094370410428166861</id><published>2009-04-21T20:45:00.003-04:00</published><updated>2009-04-21T21:22:29.005-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="synthetic biology" /><title type="text">Is Codon Optimization Bunk?</title><content type="html">There is a very interesting paper in Science from a week ago which hearkens back to my gene synthesis days at Codon.  But first, some background.&lt;br /&gt;&lt;br /&gt;The genetic code (at first approximation) uses 64 codons to encode 21 different signals; hence there are some choices as to which codon to use.  Amino acids and stop can have 1,2,3,4 or 6 codons in the standard scheme of things.  But, those codons are rarely used with equal frequency.  Leucine, for example, has 6 codons and some are rarely used and others often.  Which codons are preferred and disfavored, and the degree to which this is true, depends on the organism.  In the extreme, a codon can actually go so out of favor it goes extinct &amp; can no longer be used, and sometimes it is later reassigned to something else; hence some of the more tidy codes in certain organisms.&lt;br /&gt;&lt;br /&gt;A further observation is that the more favored codons correspond to more abundant tRNAs and less favored ones to less abundant tRNAs.  Furthermore, highly expressed genes are often rich in favored codons and lowly expressed ones much more likely to use rare ones.  To complete the picture, in organisms such as E.coli there are genes which don't seem to follow the usual pattern -- and these are often associated with mobile elements and phage or have other suggestions that they may be recent acquisitions from another species.  &lt;br /&gt;&lt;br /&gt;A practical application of this is to codon optimize genes.  If you are having a gene built to express a protein in a foreign host, then it would seem apropos to adjust the codon usage to the local dialect, which usually still leaves plenty of room to accommodate other wishes (such as avoiding the recognition sites for specific restriction enzymes).  There are at least four major schemes for doing this, with different gene synthesis vendors preferring one or the other&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;CAI Maximization.  CAI is a measure of usage of preferred codons; this strategy tries to maximize the statistic by using the most preferred codons.  Logic: if these are the most preferred codons, and highly expressed genes are rich in them, why not do the same?&lt;/li&gt;&lt;br /&gt;&lt;lI&gt;Codon sampling.  This strategy (which is what Codon Devices offered) samples from a set of codons with probabilities proportional to their usage in the organism, after first zeroing out the very rare codons and renormalizing the table.  Logic: avoid the rare ones, but don't hammer the better ones either; balance is always good&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Dicodon optimization.  In addition to codons showing preferences, there's also a pattern by which adjacent codons pair slightly non-randomly.  One particular example; very rare codons are very unlikely to be followed by another very rare codon. Logic: even better approach to "when in Rome..." than either of the two above&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Codon frequency matching.  Roughly, this means look at the native mRNA and its uses of codons and ape this in the target species; a codon which is rare in the native should be replaced with one rare in the target.  Logic: some rare codons may just help fold things properly&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;A related strategy worth mentioning are special expression strains which express extra copies of the rare tRNAs.&lt;br /&gt;&lt;br /&gt;There is a lot of literature on codon optimization, and most of it suffers from the same flaw.  Most papers describe taking one ORF, re-synthesizing it with a particular optimization scheme, and then comparing the two.  One problem with this is the small N and the potential for publication bias (do people publish less frequently when this fails to work?).  Furthermore, it could well be that the resynthesized design changed something else, and the codon optimization is really unimportant.  A few papers deviate from this plan &amp; there has been a hint from the structural genomics community of surveying their data (as they often codon optimized), but systematic studies aren't common.&lt;br /&gt;&lt;br /&gt;Now in Science comes the sort of paper that starts to be systematic&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;a href="http://www.sciencemag.org/cgi/content/abstract/324/5924/255"&gt;Coding-Sequence Determinants of Gene Expression in Escherichia coli&lt;/a&gt;&lt;br /&gt;    Grzegorz Kudla, Andrew W. Murray, David Tollervey, and Joshua B. Plotkin&lt;br /&gt;    Science 10 April 2009: 255-258. &lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;In short, they generated a library of GFP variants in which the particular codon used was varied randomly and then expressed these from a standard sort of expression vector in E.coli.  The summary of their results is that codon usage didn't correlate with GFP brightness (expression), but that the key factor is avoidance of secondary structure near the beginning of the ORF.&lt;br /&gt;&lt;br /&gt;It's a good approach, but a question is how general is the result.  Is GFP a special protein in some way?  Why do the rare tRNA-expressing strains sometimes help with protein expression?  And most importantly, does this apply broadly or is it specific to E.coli and relatives?&lt;br /&gt;&lt;br /&gt;This last point is important in the context of certain projects.  E.coli and Saccharomyces have their codon preferences, but if you want to see an extreme preference, look at Streptomyces and its kin.  These are important producers of antibiotics and other natural product medications, and it turns out that the codon usage table is easy to remember: just use G or C in the 3rd position.  In one species I looked at, it was around 95% of all codons followed that rule.  &lt;br /&gt;&lt;br /&gt;This has the effect of making the G+C content of the entire ORF quite high, which engenders further problems.  High G+C DNA can be difficult to assemble (or amplify) via PCR and it sequences badly.  Furthermore, such a limited choice of codons means that anything resembling a repeat at the protein level will create a repeat at the DNA level, and even very short repeats can be problematic for gene synthesis.  Long runs of G's can also be problematic for oligonucleotide synthesizers (or so I've been told).  From a company's perspective, this is also a problem because customers don't really care about it and don't understand why you price some genes higher than others.&lt;br /&gt;&lt;br /&gt;So, would the same strategy work in Streptomyces?  If so, one could avoid synthesizing hyper-G+C genes and go with more balanced ones, reducing costs and the time to produce the genes.  But, someone would need to make the leap and repeat Kudla et al strategy in some of these target organisms.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-3094370410428166861?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/3094370410428166861/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=3094370410428166861" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/3094370410428166861" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/3094370410428166861" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/04/is-codon-optimization-bunk.html" title="Is Codon Optimization Bunk?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-6311758029146409572</id><published>2009-04-15T23:44:00.005-04:00</published><updated>2009-04-16T00:01:32.955-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="DNA sequencing" /><title type="text">Sequencing's getting so cheap...</title><content type="html">Here's a decidedly odd gendanken experiment which illustrates what next-gen sequencing is doing to the ocst.&lt;br /&gt;&lt;br /&gt;A common way of deriving the complete sequence of a large clone is shotgun sequencing -- the clone is fragmented randomly into lots of little fragments.  With conventional (Sanger) sequencing these fragments are cloned, clones are picked and each clone sequenced.  By using a universal primer (or more likely primer pair; one read from each end), a lot of data can be generated cheaply.&lt;br /&gt;&lt;br /&gt;If you search online for DNA sequencing, a common advertised cost is $3.50 per Sanger read.  This probably doesn't include clone picking or library construction, but we'll ignore that.  Read lengths vary, but to keep the math simple lets say we average 500 nucleotide reads, which from my experience is not unreasonable, though very good operations will routinely get longer reads.  &lt;br /&gt;&lt;br /&gt;So, at that price and read length it's $7.00 per kilobase of raw data.  For shotgunning, collecting 10X-20X coverage is quite common and likely to give a reasonable final assembly, though higher is always better.  At 10X coverage, that means for each 1Kb of original clone we'll spend $70.00.&lt;br /&gt;&lt;br /&gt;Suppose we have an old cosmid -- which is about 50Kb of DNA including the vector.  So to shotgun sequence it with Sanger sequencing, if building &amp; picking the library were free, would be around $5200 for 15X coverage.  Pretty cheap, right?&lt;br /&gt;&lt;br /&gt;Except, for a measly &lt;a href="http://omicsomics.blogspot.com/2009/03/next-level-in-genomics-term-papers.html"&gt;$4700 you can have next gen sequencing of it&lt;/a&gt; (and that actually includes library construction costs).  680Mb of next gen sequencing -- or 1172X coverage.  Indeed, if you left the E.coli host DNA in you'd still have well in excess of 100X coverage of E.coli plus your cosmid.  So if you had multiple cosmids, you could actually get them sequenced for the same price, assuming you can distinguish them at the end (or they just assemble together anyway)!&lt;br /&gt;&lt;br /&gt;Sequencing so cheap you can theoretically afford 99% contamination!  Yikes!&lt;br /&gt;&lt;br /&gt;Of course, it's unlikely you'd really want to be so profligate. Rather than resequence E.coli, you could pack a lot of inserts in.  But it does underline why Sanger sequencing is quickly being relegated to a few niches (for example, when you need to screen clones in synthetic biology projects) &amp; the price of used capillary sequencers is reputed to going south of $30K.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-6311758029146409572?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/6311758029146409572/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=6311758029146409572" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/6311758029146409572" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/6311758029146409572" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/04/sequencings-getting-so-cheap.html" title="Sequencing's getting so cheap..." /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-4772135182182682290</id><published>2009-04-05T20:24:00.002-04:00</published><updated>2009-04-05T20:36:07.650-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cancer" /><title type="text">Two Myeloma Patients</title><content type="html">TNG and i closed out the ski season a week ago.  It's some great time together, but it also ends up being at times a bit of a solitary activity, leaving lots of time to think.  Sometimes it's when he's in a lesson, but in general skiing is contemplative for me.  It needs to be; if I think too hard about my technique I end up crashing spectacularly.  I guess when it comes to skiing, I'm a &lt;a href="http://www.bopsecrets.org/gateway/passages/chuang-tzu.htm"&gt;Taoist&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Ideally, I'm thinking about beautiful scenery or admiring TNG's developing technique.  But other thoughts invariably intrude, and more than a few times I find myself pondering multiple myeloma, as on a ski trip last year I met the second myeloma patient I ever knew.&lt;br /&gt;&lt;br /&gt;For the last several years at Millennium, myeloma occupied a lot of my time.  Because myeloma was the first disease where Millennium found success, this was natural.  It was also two pronged.  One goal was to better understand Velcade in myeloma to further develop the drug in that disease, such as going for first line treatment.  But it was also seen as an important opportunity to learn how the drug works, so that intelligent decisions could be made about other cancers.&lt;br /&gt;&lt;br /&gt;At quarterly company meetings there were often myeloma patients onstage to tell their story.  One that particularly stuck in my mind was an oncology nurse who developed the disease, tried Velcade and almost immediately switched to something else; she experienced the full brunt of peripheral neuropathy while on Velcade and could tolerate it.  In some ways this seems like a curious choice to inspire your troops, but it did exactly that.  We had done good things, but needed to do better.  And most people came out of those meetings pretty charged up.&lt;br /&gt;&lt;br /&gt;However, these were big presentations on stage, not face-to-face meetings.  Even though I occasionally got to rub shoulders with some of the clinical giants of the field, I never met any patients.  Not surprising, but somewhat noteworthy.&lt;br /&gt;&lt;br /&gt;Last year we were away in New Hampshire for a ski weekend &amp; I struck up a conversation with a group in the lobby.  Somehow, it arose that one of their number had cancer, and I couldn't help but ask what sort &amp; it turned out it was myeloma.  As is common, someone who should have been enjoying their golden years was instead faced with this dread disease.&lt;br /&gt;&lt;br /&gt;Myleoma most commonly strikes late in life.  Myleoma arises in most, if not all, cases when a DNA rearrangment occurs within a cell which creates antibodies.  Certain rearrangements are necessary for the correct creation of antibodies; these alterations lie at the heart of the system for creating a wide array of antibodies to defend against a wide array of invaders.  But sometimes the cut-and-paste glues the wrong two things together, and that can drive a myeloma.  Myleoma shows up most commonly late in life. Perhaps this is because the switching machinery loses its edge as life goes on, or perhaps it is just that eventually the wrong number comes up on the immunologic dice.&lt;br /&gt;&lt;br /&gt;My chance meeting in that lobby was particularly poignant as it had not been long before that I had met my first myeloma patient, and that was no random stranger.  Every year growing up the family would travel west to see my grandparents in Kentucky, and in one direction or the other we would stop by my aunt and uncle in Ohio.  My cousins are much older than I, so it was often just my aunt &amp; uncle and my family.  With no children to play with, I didn't play a lot of board games there.  But I had a lot of fun, as my uncle took me to the Reds or his garden patch or to see a train.  He'd murder me in croquet.  He took me to the print shop at his high school &amp; show me how to print up a bunch of notepads.  In later years, I'd feel humble after failing to explain to him what I did for a living, realizing I had slipped deep into the land of jargon.  And he'd try to convince me that no bumpkin from AVon could have written those plays; much more likely they came from the Earl of Oxford.&lt;br /&gt;&lt;br /&gt;Eventually, I flew the nest and I no longer saw them on an annual schedule, but he never missed a family wedding and I even made it to one family reunion.  I'd avidly read his Christmas letter to catch up with the rest of the clan.  Of course, you couldn't believe everything in it, as he was a notorious prankster.  Yes, those birthday checks with the crazy name were real ("&lt;a href="http://www.53.com/"&gt;Fifth Third Bank"&lt;/a&gt; -- who's going to believe that?), but he had not been truthful about his WW2 service -- the Army probably doesn't even have dedicated mess kit repair units.  No, he actually was a decorated signalman.  Only once did he tell a story that didn't happen stateside; it is more than a little guilt for me that I can't remember any details.  It wasn't that I wasn't listening, but somehow it didn't stick.&lt;br /&gt;&lt;br /&gt;So it really hit home when I found out that this great man, who had given so much to me and others (he was recorded weekly reading for the blind) had been diagnosed with myeloma.  It seemed a bit ironic that now that I had a strong personal motivation, I was no longer working in the field.  But I did have a long phone chat with him &amp; tried to be useful, though he had been well briefed by his doctor and there wasn't a lot for me to do.  I mentioned things like stem cell transplants, and he remarked that he was eighty four, and while he wasn't going to give up there were limits to what he would do; life quality was important.&lt;br /&gt;&lt;br /&gt;A goal of modern oncology is to have a patient die with their disease, not of their disease.  I do not know how to score this case.  About a month and a half before our ski trip a cerebral hemmorhage felled my uncle.  Was this myleoma's fault?  Thalidomide's?  Or a not unlikely result for an elderly american in generally good shape?  We cannot cheat death forever, and something must end life.  On the other hand, in no way could myleoma be given a free pass -- it certainly gave him undeserved misery near the end.&lt;br /&gt;&lt;br /&gt;About a month and a half after the ski trip, I attended a very nice memorial service for him, where dozens of his former students turned out to testify how he had changed their lives.  We learned things we never knew about him (he played the tuba?) and remembered the good times.&lt;br /&gt;&lt;br /&gt;Whenever I think about myeloma now, I can't help but remember him.  I also remember that patient I met in the hotel, and sometimes I still can feel the wetness of his parting friendly gesture on my hand.  I didn't ask what medication he was on, but I can assume it wasn't Velcade or Revlimid.  Might he been on thalidomide?  If so, do standard poodles need to go through &lt;a href="http://www.thalomid.com/steps_program.aspx"&gt;STEPS&lt;/a&gt;?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4772135182182682290?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/4772135182182682290/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4772135182182682290" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4772135182182682290" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4772135182182682290" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/04/two-myeloma-patients.html" title="Two Myeloma Patients" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-4495207175073899827</id><published>2009-04-04T23:04:00.002-04:00</published><updated>2009-04-04T23:25:38.150-04:00</updated><title type="text">Too Many Closings</title><content type="html">These are dire economic times, with the signs all around. In the town where I live, several stores have closed in the town center -- and my favorite imported goodies store has frighteningly bare shelves &amp; a nearly empty cheese cooler. I fear the worst.&lt;br /&gt;&lt;br /&gt;On a much bigger scale, &lt;a href="http://www.boston.com/news/local/massachusetts/articles/2009/04/04/times_co_threatens_to_shut_globe_seeks_20m_in_cuts_from_unions"&gt;today's Boston Globe carried a headline&lt;/a&gt; that the same newspaper may close unless significant labor concessions are made by its unions, confirming &lt;a href="http://www.time.com/time/business/article/0,8599,1883785,00.html"&gt;previous speculation&lt;/a&gt; that the Globe was hemorrhaging money from its owner the New York Times. This week marked &lt;a href="v"&gt;another round of cutbacks in the newsroom&lt;/a&gt;, and it seems about every 6 months or so another redesign occurs to attempt to hide (and cope with) the shrinking number of pages.&lt;br /&gt;&lt;br /&gt;One of those redesigns has been the elimination of a separate business section, with instead the business section contained within the Metro section -- they are physically, but not logically merged. And as many may know, yesterday's had &lt;a href="http://www.boston.com/yourtown/waltham/articles/2009/04/03/codon_devices_closing_as_financing_dwindles"&gt;an obituary for my recent employer&lt;/a&gt;, which also noted the recent or imminent demise of several other biotechs.&lt;br /&gt;&lt;br /&gt;It should be noted that the Globe seems a tad slow on the news. Codon, of course, &lt;a href="http://omicsomics.blogspot.com/2009/03/tga-codon.html"&gt;unloaded the majority of its staff&lt;/a&gt; two weeks ago. Okay, nobody squealed loudly. It is a bit more striking that the Globe article stated that the ultimatum to the unions had been delivered Thursday -- how could nobody at the newspaper been tipped off to &lt;i&gt;that&lt;/i&gt;!&lt;br /&gt;&lt;br /&gt;The possibility of losing the Globe is very sad too me, as I truly have newspaper in my blood. No, I don't mean my family has a history of careers in the industry (though we do seem to dabble in it); I mean I've been reading the newspaper since I can remember, so I've certainly assimilated a good deal of into my cellular structures! I too dabbled in the industry, delivering for one paper (which ended operations shortly after I quit) and doing high school sports photography and reporting for two others (one of which also appears to be bust). I also edited my high school's newspaper, so I can take a tiny claim to once being an ink-stained wretch (is it possible to be stained with bits?). All through college and beyond, I've always had a subscription to the daily paper. While various deficiencies in local delivery have in recent years tested my loyalty, I still subscribe. Perhaps not much longer -- but not by my choice.&lt;br /&gt;&lt;br /&gt;I do want to allay any concerns that this particular enterprise might be headed for a similar fate. Fear not dear readers! While revenue has stayed completely flat, in these times that must be considered an accomplishment. Omics! Omics! balance sheet remains out of the red -- as it always has. And just think -- any future revenue would mean infinite revenue growth!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4495207175073899827?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/4495207175073899827/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4495207175073899827" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4495207175073899827" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4495207175073899827" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/04/too-many-closings.html" title="Too Many Closings" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7400480106648327571</id><published>2009-04-01T00:01:00.001-04:00</published><updated>2009-04-01T00:01:49.026-04:00</updated><title type="text">New DNA Service Makes Dates -- Via Dogs!</title><content type="html">Ever notice how a couple sometimes resemble the family pet?  A new startup company believes that this is the secret to dating success, and that DNA typing is a way to guarantee romantic bliss.&lt;br /&gt;&lt;br /&gt;Date My Dog's DNA will test both you and your dog's DNA and then apply proprietary computer algorithms to find your perfect match.  Dogless individuals can also be typed, though they will only be matched with someone who has registered their dog in the service.  &lt;br /&gt;&lt;br /&gt;Why should this work?  According to President and CEO Jack Russell, our choice of dog is driven by fundamental personality traits.  By examining the DNA, traits can be matched between dog and human.  "While it is useful for purebred canines, the real power comes with mixed breeds, as you may not realize which tendencies you are keying in to", says Russell.  "Just imagine", he continues, "all the painful breakups due to date-dog incompatibilities; we believe we can prevent most of these".&lt;br /&gt;&lt;br /&gt;Can the technology be put to other uses?  Vice President for Marketing K. Charles Cavalier suggests that once pre-conception DNA screening becomes routine, they plan to move into this area.  Would this be eugenics hidden behind a wagging tail?  Replies Cavalier: "We think each couple will choose very differently.  For example, if you have two border collies you might enjoy a bright but hyperactive child.  On the other hand, if you have a bloodhound you might prefer a quiet, contemplative child who likes to observe the world."  Continues Cavalier "We think parent-child bonding is critical to a child's mental and social development.  You've already bonded with your dog; why not leverage that bond into a better one with your child?"&lt;br /&gt;&lt;br /&gt;Seed funding for the company has been provided by the Kaltnassnase Fund.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7400480106648327571?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/7400480106648327571/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7400480106648327571" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7400480106648327571" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7400480106648327571" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/04/new-dna-service-makes-dates-via-dogs.html" title="New DNA Service Makes Dates -- Via Dogs!" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-991990548109900772</id><published>2009-03-24T22:41:00.007-04:00</published><updated>2009-03-26T09:48:06.516-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="enzyme engineering" /><title type="text">Codon's Type IIS Meganuclease</title><content type="html">When I joined Codon Devices, I swore I would not use this space to shamelessly tout any results from the company.  It turned out my resolve was never tested.  It's not that there weren't interesting results being generated in the company, but that in one way or another they never became public.  Some results were never meant to be public, but were within collaborations, whereas some intended to be public got held up by one snag or another.&lt;br /&gt;&lt;br /&gt;Perhaps the universe does like to play subtle jokes on us.  Now that I'm out, so is the &lt;a href="http://nar.oxfordjournals.org/cgi/content/full/gkp182v1"&gt;first publication from the company&lt;/a&gt;, describing the engineering of a Type IIS restriction enzyme with a very large recognition sequence.&lt;br /&gt;&lt;br /&gt;TypeIIS restriction endonucleases are handy for many purposes, but particularly for gene construction techniques.  Whereas most restriction enzymes recognize and cut at the same site, Type IIS enzymes recognize a specific site but then cut a precise distance away (or cut at perhaps two different offsets; note &lt;a href="http://science-mag.aaas.org/cgi/data/1117389/DC1/1"&gt;Fig 2 of this reference&lt;/a&gt;).  This is handy because it allows one to design two pieces to come together (via the sticky overhangs generated by the enzyme) but without the recognition sequence in the final product.  Hence, Type IIS enzymes can allow virtually any sequence to be built.&lt;br /&gt;&lt;br /&gt;The catch, of course, is that it is challenging to build in this fashion a sequence which itself contains the Type IIS recognition sequence.  Ideally, these sequences would be very long and hence unlikely to appear by chance.  Unfortunately, the known Type IIS enzymes almost all have 5 or 6 basepair long recognition sequences, which are not terribly rare once you get in the multiple kilobase range, and are certainly not rare if you want to build chromosome-sized DNA.  &lt;br /&gt;&lt;br /&gt;So the goal of a number of efforts has been to build a Type IIS restriction enzyme which has a very long recognition sequence.  Enzymes called homing endonucleases have huge recognition sequences, with effective lengths of 12 or more basepairs (the actual lengths are greater, but there is also some positions which are not fully fixed to a particular nucleotide -- hence the term effective length).  The advance of Lippow et al is that a new level of precision was obtained in the cutting sites, a level of precision compatible with gene engineering.&lt;br /&gt;&lt;br /&gt;In a sense, the problem is analogous to that of a K9 unit.  The handler has a potentially vicious dog which she would like to apply precisely.  Give the dog too short a leash and you can't deploy its teeth; give it too long a leash and the teeth may sink into places other than where you want them to.&lt;br /&gt;&lt;br /&gt;So what Lippow et al did is build different protein linkers to tie the DNA recognition domain (handler) to the cleavage domain (dog) from the Type IIS enzyme FokI.  By run-off Sanger sequencing, in which the polymerase is allowed to extend to the end of a DNA strand, they showed that cutting is precise, particularly for one of the specific enzymes generated.  The dog, alas, is not under complete control; some random off-site cutting is observed.  But it is a step forward.&lt;br /&gt;&lt;br /&gt;One last hitch: to be particularly useful, one really needs at least two Type IIS meganucleases, and ideally many.  Alas, this paper provides only one -- but it is a roadmap to building more, as there are a number of other homing endonucleases which could be potentially used for recognition modules.  Alternatively, a number of papers have generated Sce-I variants with different recognition specificities, so by introducing these mutations into the CdnI enzyme reported here should allow a new set of Type IIS meganuclease specificities.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-991990548109900772?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/991990548109900772/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=991990548109900772" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/991990548109900772" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/991990548109900772" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/03/codons-type-iis-meganuclease.html" title="Codon's Type IIS Meganuclease" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2539769105318882912</id><published>2009-03-23T09:22:00.002-04:00</published><updated>2009-03-23T10:26:28.338-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cancer" /><title type="text">JAK2 haplotype promotes JAK2 mutation</title><content type="html">An interesting trio (&lt;a href="http://www.nature.com/ng/journal/vaop/ncurrent/abs/ng.342.html"&gt;Klipivaara et al&lt;/a&gt;, &lt;a href="http://www.nature.com/ng/journal/vaop/ncurrent/abs/ng.334.html"&gt;Jones et al&lt;/a&gt;, &lt;a href="http://www.nature.com/ng/journal/vaop/ncurrent/abs/ng.341.html"&gt;Olcaydu et al&lt;/a&gt;) of abstracts from the Nature Genetics Advance Online Publications site (alas, I don't have fulltext access without traipsing in to MIT or Harvard to use the library, but more on that soon).&lt;br /&gt;&lt;br /&gt;JAK2 (Janus Kinase 2) is a protein kinase important in hematopoeitic cell function, and a particular mutation was shown several years ago to result in several distinct but related myeloproliferative disorders.  &lt;br /&gt;&lt;br /&gt;In these papers, particular haplotypes (given only the abstracts, its impossible to determine if there is complete agreement on which ones) lead to a higher risk of the disease-causing V617F mutation.  What is quite striking is that the mutation occurs in cis to the haplotype, that is to say the same chromosome with the haplotype tends to be the one bearing the mutation.&lt;br /&gt;&lt;br /&gt;The explanation favored by the papers appears to be that the haplotype somehow creates a favorable DNA context for causing the mutation.  If the mutations showed up in trans (on the other chromosome) just as often, one might contemplate a mechanism whereby the haplotype somehow increases the selective advantage of V617F -- perhaps, for example, by causing incorrect JAK2 expression.&lt;br /&gt;&lt;br /&gt;It will be fascinating to see this story play out -- of what DNA mutational or repair mechanism does the haplotype shift the balance?  And, now that this is precedented you can be sure there will be a lot of searching for other examples.  A quick screen would be to look for mutational haplotypes which contain known oncogenic mutations, and then go screening somatic samples for those haplotypes.  Of course, with sequencing getting so cheap, the not too distant future will have lots of paired somatic and tumor &lt;em&gt;complete genomes&lt;/em&gt; to compare.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2539769105318882912?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/2539769105318882912/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2539769105318882912" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2539769105318882912" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2539769105318882912" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/03/jak2-haplotype-promotes-jak2-mutation.html" title="JAK2 haplotype promotes JAK2 mutation" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-1732230616431578635</id><published>2009-03-20T11:17:00.003-04:00</published><updated>2009-03-20T11:39:13.224-04:00</updated><title type="text">TGA Codon</title><content type="html">&lt;a href="http://4.bp.blogspot.com/_N2AOZejgjyA/ScO4jU2tShI/AAAAAAAAAC8/ROEJ6TFyFMY/s1600-h/DSC01931rot.JPG"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 150px; height: 200px;" src="http://4.bp.blogspot.com/_N2AOZejgjyA/ScO4jU2tShI/AAAAAAAAAC8/ROEJ6TFyFMY/s200/DSC01931rot.JPG" border="0" alt=""id="BLOGGER_PHOTO_ID_5315294902260419090" /&gt;&lt;/a&gt;&lt;br /&gt;Well, after a few successful readthroughs I've been hit with my career's &lt;a href="http://en.wikipedia.org/wiki/Release_factor"&gt;Release Factor&lt;/a&gt; again.&lt;br /&gt;&lt;br /&gt;Looking on the bright side, this will give me time &amp; focus to write here and to tackle two invited articles.  &lt;br /&gt;&lt;br /&gt;I'm also entertaining short-term consulting gigs in the Boston area (or, with travel expenses included, in cities with resident &lt;a href="http://en.wikipedia.org/wiki/Panda"&gt;Ailuropoda melanoleuca&lt;/a&gt; :-)  But that's just a stop-gap; what I'd really like is a permanent position to again do tackle interesting scientific questions in the interface between biology and computing&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-1732230616431578635?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/1732230616431578635/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=1732230616431578635" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/1732230616431578635" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/1732230616431578635" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/03/tga-codon.html" title="TGA Codon" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_N2AOZejgjyA/ScO4jU2tShI/AAAAAAAAAC8/ROEJ6TFyFMY/s72-c/DSC01931rot.JPG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7255834632703129190</id><published>2009-03-18T22:02:00.004-04:00</published><updated>2009-03-18T22:38:46.179-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="K12 education" /><title type="text">One helix to teach them all, and in the taxonomy bind them?</title><content type="html">I originally saw this last summer in some free tourist guide, and neglected to write on it, but a little &lt;a href="http://209.85.173.132/search?q=cache:Nf-5jSqm8i8J:www.alleghany.k12.va.us/TRT_elem/Ppts/PptGames/Smart5thgrader.ppt"&gt;googling verified my memory&lt;/a&gt;.  There is a game show on one of the channels now called "Are you smarter than a 5th grader", in which adults go up against 5th graders in a quiz show format, with the questions supposedly representative of that sample of elementary school.  When I saw this particular item, my eyes rolled at first but then I pondered some more -- and realized that while I'd probably stick to my original position, it is a bit more nuanced than my first reaction.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Name 3 of the 5 kingdoms.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Okay, this was enough to generate an autonomic response.  Back in high school we probably a good chunk of a class going over various Kingdom proposals.  I don't have that textbook, but one of a similar strata would be my freshman bio textbook, Biological Science by Keeton &amp; Gould, 4th Edition.  K&amp;G (p.1019) outlines &lt;span style="font-style:italic;"&gt;eight&lt;/span&gt; different  kingdom systems, ranging from 2 to 8 kingdoms.&lt;br /&gt;&lt;br /&gt;Now, of course, one must ask what exactly is a kingdom? Ideally a kingdom would consist of a bunch of organisms with a common theme (which wouldn't be simply the lack of the all the themes of other kingdoms), all organisms with that theme would be in the kingdom, and no extant organism outside that kingdom would trace its ancestry to a member of that kingdom.  At least, off the cuff, that is definition I would give.&lt;br /&gt;&lt;br /&gt;So which one induced a reflex?  It is the five kingdom system: Plant, Fungi, Protists, Animals &amp; Monera, which it turns out is the one Keeton &amp; Gould used for organizing their survey of the living world.&lt;br /&gt;&lt;br /&gt;Now, it isn't an awful system, particularly back in the late '80s when I had it.  Monera are all the single-celled thingies which lack a nucleus.  Eukaryotes are what we know best, so they are subdivided into single celled (Protists), multi-cellular with cell walls &amp; photosynthesis (Plants), multi-cellular, with cell walls but never photosynthetic (Fungi) and multi-cellular with no walls (Animals).&lt;br /&gt;&lt;br /&gt;In that era, issues with these grouping were certainly recognized and taught.  Yeasts clearly were related to Fungi, so they went there despite unicellularity.  Some plants lack photosynthesis (e.g. &lt;a href="http://en.wikipedia.org/wiki/Dodder"&gt;dodder&lt;/a&gt;), but clearly this is a late loss and they belong in Plants.  Protists is a handy way to lasso all sorts of traditional problems such as Euglena, which both photosynthesizes and moves.&lt;br /&gt;&lt;br /&gt;But, what was just emerging when I was taught these things, but is now quite evident, is that the non-nucleated world is really two worlds, Eubacteria and Archea.  While they both have many similarities (such as mostly circular chromosomes), they are very, very different in other fundamental cellular processes, such as RNA transcription.  Plus, now we have DNA &amp; RNA phylogenetic methods which show them to have diverged &lt;span style="font-style:italic;"&gt;very&lt;/span&gt; long ago. &lt;br /&gt;&lt;br /&gt;There are other issues DNA methods have illuminated.  Protists are not an evolutionarily coherent group but are instead a mishmash of various lineages ("&lt;a href="http://en.wikipedia.org/wiki/Polyphyletic"&gt;polyphyletic&lt;/a&gt;").  Eukaryotes as a whole don't fit a simple tree lineage, due to multiple endosymbiont captures resulting in organelles such as mitochondria and chloroplasts (and perhaps more). &lt;br /&gt;&lt;br /&gt;Which asks the question: what should we be teaching 5th graders?  My reflex reaction is that we shouldn't teach them things they'll need to unlearn later, and the Monera kingdom concept is just not a very good one in the light of molecular phylogenies.  But, what my further pondering brought up is one goal of science education is to teach students to methods of science rather than just rote facts.  Given a microscope or some photographs, it is pretty easy to teach a young student how to classify organisms into the 5 kingdom system.  Trying to explain why archea and eubacteria should be in different groups isn't so easy.  Okay, a lot of archea have pretty wierd lifestyles (insanely low pH, even more insanely high heavy metal content, boiling water, etc), but not all do.  Just being strange to us isn't really a useful way to categorize.  &lt;br /&gt;&lt;br /&gt;On the other hand, perhaps at least the notion of molecular classification can be introduced early.  Granted, it's an N of 1, but I've successfully shown that you can teach the &lt;a href="http://omicsomics.blogspot.com/2008/07/blog-post.html"&gt;concept to a 3rd grader&lt;/a&gt;.  It's also something which can be easy to diagram out &amp; count -- with (obviously!) only a subset of informative positions.  And in the end, wouldn't that be the best science lesson of all -- that things which look superficially alike may have an underlying, nearly hidden great difference?&lt;br /&gt;&lt;br /&gt;Of course, the hardest part of any change is getting change.  It appears that a generation of science teachers have been taught the 5 kingdom system, and so will need to be updated.  Numerous textbooks probably also encapsulate this archaic (but not archean! :-) concept.  Probably the hardest to change will be those statewide curriculum standards or standardized tests which contain these phylogenetic fossils.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7255834632703129190?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/7255834632703129190/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7255834632703129190" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7255834632703129190" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7255834632703129190" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/03/one-helix-to-teach-them-all-and-in.html" title="One helix to teach them all, and in the taxonomy bind them?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-3148201274367990468</id><published>2009-03-08T23:47:00.005-04:00</published><updated>2009-03-08T23:56:13.410-04:00</updated><title type="text">The next level in genomics term papers</title><content type="html">I've been intrigued for a few months now since hearing about a St. Louis company called &lt;a href="http://cofactorgenomics.com/"&gt;Cofactor Genomics&lt;/a&gt;.  Right on their front webpage they advertise they will generate &amp; assemble 680Mb of sequence (from an Illumina machine) for the paltry sum of $4.7K.&lt;br /&gt;&lt;br /&gt;Wow!  That would fit on my credit card when I was a graduate student (though it would have been a few months stipend). 680Mb is 100+X coverage of an E.coli-class genome, or about 50X coverage of Saccharomyces.  It's even well over 0.5X coverage of an awful lot of interesting eukaryotes.&lt;br /&gt;&lt;br /&gt;As an aside, I feel obligated to stress that I don't have any personal stake in, or direct relationship with, Cofactor Genomics.  I also have no experience with them or any of their competitors.  It's just the ease of accessing their pricing matrix makes them easy to talk about. &lt;br /&gt;&lt;br /&gt;At those prices, the idea of doing my own personal genome project can't be easily shooed away.  Not a Personal Genome Project -- I worry I'd develop &lt;a href="http://judson.blogs.nytimes.com/2009/03/03/guest-column-genome-mania/?ref=opinion"&gt;genomania&lt;/a&gt; -- but some small genome sequenced on my whim.  There's probably still not a shortage of interesting genomes in species I could easily &amp; safely grow up with some forbearance of my shop's management or at a friendly academic. There must be some left; there are even some industrially-interesting E.coli strains that seem to lack public sequences.  However, even if it wouldn't violate my town's zoning laws to do it in my basement, neither growing biological samples nor the $5K budget would fly with my spouse.&lt;br /&gt;&lt;br /&gt;So I'll float a different idea.  My only wish is that anyone who tries it post back here, and if you're already doing the same thing I invite your response as well.  If I can't do it, why not some class?&lt;br /&gt;&lt;br /&gt;Now $5K isn't chicken feed.  I'm sure that is far beyond the typical budget for lab experiments in a college class, let alone a high school.  Maybe a donor could step in, but these days that's a particularly tough challenge to find.  But suppose the cost were spread over a lot of students?&lt;br /&gt;&lt;br /&gt;One scenario would be for a very large university to make this the project for an entire class.  A really huge state school I would guess could have 500+ students a year taking first-year biology.  Now we're talking less than $10/student -- perhaps still a significant hit (what is a typical per student budget for such a course?).  Each student would get about 1/500th of the genome as their very own research project. &lt;br /&gt;&lt;br /&gt;At a smaller school, could a genome project become a departmental initiative?  A bioinformatics class could set up the analysis pipeline &amp; develop reporting tools.  Biochemistry class could map the ORFs to the known biochemical pathways and identify both missing pathways and predicted novel (to the species) enzyme activities.  Genetics classes could focus on operon structure or identifying possible regions recently transferred horizontally from another species.  Evolution classes could tackle that, or building a bazillion gene trees.  A bit of a stretch to work this into a human physiology curriculum, though a comparative look at how another biological system manages homeostasis isn't completely absurd.&lt;br /&gt;&lt;br /&gt;Of course, when it comes time to publish it will be a very long author list!&lt;br /&gt;&lt;br /&gt;I think I've heard of a genome project being run as an undergraduate effort, but I'm guessing a lot of that involved doing the actual sequencing.  While there's merit to that, these days even with free labor, large-scale Sanger sequencing isn't cost competitive.  Perhaps some departments have one of the next-gen machines &amp; are willing to let some undergraduates play with them -- but I'm guessing that's pretty rare (like a &lt;a href="http://rebase.neb.com/rebase/enz/NotI.html"&gt;NotI&lt;/a&gt; site in an AT-rich genome).&lt;br /&gt;&lt;br /&gt;Will sequencing costs ever crash low enough that someone will sequence a genome for an grade school science fair project?  I'm not holding my breath, but I certainly wouldn't rule it out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-3148201274367990468?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/3148201274367990468/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=3148201274367990468" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/3148201274367990468" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/3148201274367990468" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/03/next-level-in-genomics-term-papers.html" title="The next level in genomics term papers" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-3137297009639192093</id><published>2009-03-03T19:40:00.003-05:00</published><updated>2009-03-03T20:04:21.740-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cancer" /><title type="text">MGH To Mutation-Type All Cancer Patients</title><content type="html">Today's Boston Globe &lt;a href="http://www.boston.com/news/local/massachusetts/articles/2009/03/03/mgh_to_use_genetics_to_personalize_cancer_care/"&gt;carried a front page item&lt;/a&gt; that Massachusetts General Hospital is planning to screen all cancer patients for a battery of about 110 common cancer mutations in 13 genes.  MGH is apparently the first hospital to go this in depth on every patient.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This is an exciting push forward into personalized medicine, and it makes sense for a teaching hospital such as MGH to leap into the void.  This sort of typing makes intuitive sense, but (as the article states) its clinical value remains to be proven.  A few patients, such as one profiled in the piece, will have radical changes in treatment which benefit the patient -- in the example a woman had a relatively rare kinase fusion (to EML4-ALK) for which an investigational drug was available -- and she responded spectacularly.  But for many patients, the mutations found won't change care because there isn't a known way to target their mutation spectrum.&lt;br /&gt;&lt;br /&gt;But, the huge value will be longer term as MGH builds a database of mutations and responses to treatment -- such a database will almost certainly provide new ideas for treatment, ideas which a research-focused hospital will be willing &amp; able to try out.  As more mutations are linked to cancer outcomes and screening costs come down, surely the panel will be expanded.  MGH is also presumably planning to screen patients on both initial diagnosis and after relapses, so an increasingly rich database of mutations appearing during cancer progression will emerge.&lt;br /&gt;&lt;br /&gt;It will be interesting to see how many other hospitals here -- and elsewhere -- follow.  Boston has a small herd of top-notch hospitals and most (if not all) have significant cancer centers (with one, Dana Farber, completely focused on the subject).  Ideally the results of many such screens could be pooled into one or more common databases, with of course the need to protect patient confidentiality.&lt;br /&gt;&lt;br /&gt;One barrier may be cost.  The Globe article pegs it at $2000, and states it is unclear if insurers will pay -- in the past they have demanded proof of clinical value.  While that isn't an indefensible position, it would be in their self-interest to chip in -- perhaps a prorated amount.  First, it's lousy PR to not pay for diagnostics that are likely to work (and the drumbeat for single-payer is pretty much constant in the same paper).  Second, the tests are likely to provide useful information some fraction of the time -- and in those cases may provide cost savings.  MGH is apparently considering eating the cost or asking the patients to kick some in.&lt;br /&gt;&lt;br /&gt;MGH may also be setting the price point for such services.  $2K isn't far from the $4K that Complete Genomics claims it will be able to run a complete genome in the not-too-distant-future.  $2K probably is in the ballpark already for sequencing off capture arrays.  &lt;br /&gt;&lt;br /&gt;Of course, budgets for diagnostics aren't infinite.  Will such initiatives be knocking elbows with other genomics-driven diagnostics, such as the existing array-based assays (e.g. &lt;a href="http://www.genomichealth.com/OncotypeDX/Index.aspx?SId=33"&gt;OncotypeDX&lt;/a&gt;, &lt;a href="http://row.agendia.com/en/cupprint.html"&gt;CupPrint&lt;/a&gt;)?  Will greater value come from methylation profiling or other assays which evaluate markers not available to current sequencing technologies?  Time will tell.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-3137297009639192093?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/3137297009639192093/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=3137297009639192093" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/3137297009639192093" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/3137297009639192093" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/03/mgh-to-mutation-type-all-cancer.html" title="MGH To Mutation-Type All Cancer Patients" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-9125451870677913540</id><published>2009-02-28T21:31:00.002-05:00</published><updated>2009-02-28T22:27:24.354-05:00</updated><title type="text">Time to deal with the IRS (Internal Reaction Service)</title><content type="html">It's that time of year again -- when those of us in the U.S. must deal with numbered forms and lettered schedules.  In this light, I wish to share a recent piece of correspondence:&lt;br /&gt;&lt;br /&gt;Dear Dr. Robison:&lt;br /&gt;&lt;br /&gt;After great difficulty (must your handwriting be so atrocious?) I have reviewed the accounts at your business enterprise.  I regret to inform you that two of your accounts, with ATP Corp and NAD(P)H Ltd, are grossly out of balance.  While you are running a deficit with the former and a surplus with the latter, as we have discussed previously these separate accounts cannot be merged.  Your enterprise is doomed to failure (and I think it goes without saying that some sort of Madoffian scheme will not be countenanced by me).  You must bring these into balance or your enterprise would fail, never mind the horror of trying to explain this in an audit.&lt;br /&gt;&lt;br /&gt;I realize I am not qualified to comment on the technical aspects of your effort.  However, may I suggest you get out of the lab more and get some fresh air?  Perhaps some oxygen would stimulate your activity in a most productive way?&lt;br /&gt;&lt;br /&gt;Sincerely,&lt;br /&gt;&lt;br /&gt;Colin Escherich, C.P.A.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-9125451870677913540?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/9125451870677913540/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=9125451870677913540" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/9125451870677913540" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/9125451870677913540" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/02/time-to-deal-with-irs-internal-reaction.html" title="Time to deal with the IRS (Internal Reaction Service)" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-731225454318240166</id><published>2009-02-08T21:10:00.002-05:00</published><updated>2009-02-08T21:35:36.990-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="genome sequencing" /><title type="text">Any Genome Sequence You Want, As Long As It's Human</title><content type="html">It's been interesting reading dispatches coming from bloggers &lt;a href="http://www.massgenomics.org/"&gt;Dan Kobolt&lt;/a&gt; and &lt;a href="http://scienceblogs.com/geneticfuture/"&gt;Daniel MacArthur&lt;/a&gt; who are attending the Marco Island conference, the big yearly confab on bleeding edge sequencing technology. How have I resisted this conference for so long, especially with the &lt;a href="http://scienceblogs.com/geneticfuture/2009/02/what_a_difference_two_days_mak.php"&gt;climate draw&lt;/a&gt;???&lt;br /&gt;&lt;br /&gt;One company that is again receiving a lot of attention is &lt;a href="http://www.completegenomics.com/"&gt;Complete Genomics&lt;/a&gt;, which is proposing to build a set of sequencing centers to sequence human genomes at $5K a pop.  What is striking is that their business model is to sequence &lt;span style="font-style:italic;"&gt;only&lt;/span&gt; human genomes and nothing else, which particularly surprised &lt;a href="http://scienceblogs.com/geneticfuture/2009/02/game_on_sequencing_companies_d.php"&gt;Daniel MacArthur at Genetic Futures&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As a biologist and someone fascinated with all genomes, such a policy is not a welcome thought.  But, as someone who has worked in an industrial high-throughput production facility, I think I can reverse engineer the logic pretty well (I have no connections to or inside information from the company).&lt;br /&gt;&lt;br /&gt;Why would you want to do this?  Simplicity.  By focusing on only a single genome, all sorts of simplifications are created.  Complexity costs significant money &amp; time, and it is often what seems trivial that ends up being very costly.  Just allowing a second genome in the door creates all sorts of additional work on the software side, and if that second source requires different sample prep that's an additional headache on the lab side.&lt;br /&gt;&lt;br /&gt;Having only one genome kicking around also creates some interesting opportunities for quality control both for each sample and for the whole factory (which is what they are talking about building: a sequencing factory).  One genome means only one reference sequence to compare against &amp; one set of pathological problems for their assembly algorithm to be fortified against.  One genome also means that if you see another genome in your data, you know something is wrong -- and if you see the same one genome repeatedly you may have a factory-wide problem.&lt;br /&gt;&lt;br /&gt;"Any color you want so long as it is black" got Ford to the top of the U.S. automotive heap, but it didn't keep them there -- I believe that GM's offering colors helped push them into first. So will the market support Complete's vision?  I think it can.  &lt;br /&gt;&lt;br /&gt;Complete is apparently talking about running a million genomes per year.  At $5K each, that would be &lt;span style="font-weight:bold;"&gt;$5 billion&lt;/span&gt;, some serious cash flow.  I don't know if they've estimated the market correctly, but it doesn't seem ridiculous.  If a large fraction of the world's wealthy decide to sequence their genomes (and their children's too) and if sequencing tumors becomes semi-routine, a few million human genomes a year doesn't seem totally ridiculous.  Of course, Complete would have to fight with all the other players for a share.&lt;br /&gt;&lt;br /&gt;That implies a question: what comparable markets are they giving up?  I'd love to see broader "zoonomics", where we go through the living world sequencing everything,  but that's all going to be grant funded.  Smaller genomes may also be completely mismatched with this sort of technology -- without some sort of multiplexing (complexity!).  Similarly, it's not easy to see some big commercial market for metagenomics -- it will remain fascinating &amp; there's no end to the ecological niches to explore, but who in the private sector is going to pony up major money for it?  Oncogenic mouse models will supply lots of tumors for sequencing, but again probably not a big private sector activity.&lt;br /&gt;&lt;br /&gt;The one area I can almost envision is sequencing valuable livestock or agricultural lines to understand their complete makeup.  If this were done not only for parentals but for offspring in breeding programs, then perhaps a big market would be generated.  But, is it really worth sequencing to completion or will some cheaper technology for skimming the surface suffice?  If there is a market, then a logical business direction for Complete might be to do a joint venture or spinout focusing on alternate genomes -- but either the prize would need to be big or the one genome business model failing for that to be worth diverting attention.g&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-731225454318240166?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/731225454318240166/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=731225454318240166" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/731225454318240166" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/731225454318240166" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/02/any.html" title="Any Genome Sequence You Want, As Long As It's Human" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2430604561813068417</id><published>2009-02-04T22:17:00.004-05:00</published><updated>2009-02-04T22:41:19.740-05:00</updated><title type="text">Trading off an argument from Scrubs</title><content type="html">I was watching &lt;a href="http://abc.go.com/primetime/scrubs/index?pn=index"&gt;Scrubs (My New Role)&lt;/a&gt; last night &amp; there was an exchange that I think should be a discussion point for everyone involved in medicine, though it wasn't the point the script writers really hammered on.&lt;br /&gt;&lt;br /&gt;The setup is that a nurse was trying to get a doctor to change the antibiotic for a patient.  The nurse's argument was that azithromycin required once daily dosing and would free her up for doing other things, where as the doctor's selection of clindamycin meant 4 times daily dosing.  The doctor replied in a condescending way that she had gone to med school, the nurse hadn't, and therefore the script would stand as written.&lt;br /&gt;&lt;br /&gt;Now, the theme of the episode was this sort of professional interaction -- where someone higher on the professional totem pole disrespects someone lower.  An important issue, to be sure.  But I think, especially in these days when we are more than ever concerned about the cost of healthcare &amp; how to deliver effective healthcare economically, the specific argument deserves more attention.&lt;br /&gt;&lt;br /&gt;Now, I'll confess I haven't gone to med school &amp; I have no particular expertise in antibiotics, other than practical experience.  For example, my wife is allergic to huge numbers, TNG broke out with Augmentin, doxycycline gives me a stomachache if I try to take it on an empty stomach &amp; penicillin is mostly excreted, not metabolized &amp; you'll notice this in the bathroom once it has cleared the infection from your nasal passages.  But I can't reasonably discuss azithromycin vs clindamycin on actual facts, so I'll use them as proxies for some hypotheticals.&lt;br /&gt;&lt;br /&gt;Suppose, for example, that there was absolutely no clinical difference between the two.  They both had the same spectrum of treatable bacteria, the same risk of similar side effects, no contraindications in this patient and both had the same cost.  Then clearly the nurse is right and the doctor wrong, as that once-a-day dosing frees a valuable resource (the nurse).  In other words, under these conditions the drug choice for a patient is neutral for that patient but has important ramifications for other patients at the hospital.  &lt;br /&gt;&lt;br /&gt;But what about the less clear cases.  For example, suppose all of the above conditions were met &lt;span style="font-style:italic;"&gt;except&lt;/span&gt; equal cost; the once daily med is significantly more expensive (e.g. azithromycin before it went off patent).  On the one hand, my argument still holds unless it is a huge cost difference -- several minutes of a nurses' time is worth quite a bit (like most hospitals, the one on Scrubs is portrayed as being cash strapped &amp; short on nurses).  However, that more convenient drug costs real money, whereas the nurse's saving is in opportunity cost: an accountant browsing the budget is likely to see the one but not the other even if both are real.&lt;br /&gt;&lt;br /&gt;Now let's muddy the water further.  Suppose they two drugs are clinically not precisely comparable but similar -- imagine if clindamycin is slightly broader spectrum or has a slightly lower risk of side effects.  Now it becomes a really sticky wicket -- what additional risk to this patient is acceptable in order to reduce the risks to other patients (due to getting better nursing care).&lt;br /&gt;&lt;br /&gt;That last one is the sort that really is troublesome.  We never like explicitly to risk one person to help multiple others, but we are often less troubled when we do it implicitly.  I won't claim to be an ethics expert, so I'll leave it at that.  But I think these scenarios embody real situations which will be faced, such as sometimes an expensive drug is better than a cheaper one &amp;  (not to say this is always or even often true, just that it isn't always false).  Or more generally: health care reform will be complex, because health care is complex.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2430604561813068417?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/2430604561813068417/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2430604561813068417" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2430604561813068417" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2430604561813068417" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/02/trading-off-argument-from-scrubs.html" title="Trading off an argument from Scrubs" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2587703827129055502</id><published>2009-02-04T00:08:00.002-05:00</published><updated>2009-02-04T00:25:57.690-05:00</updated><title type="text">Bacteria can mobilize a fifth column</title><content type="html">I recently had to deal with a bacterial upper respiratory infection.  Something to ponder about such problems is that not only did the little nasty have to gain a foothold on my immune system, but it also had to elbow a lot of other bacteria out of the way.  After all, my respiratory tract is open to the air and is far from sterile; there is a whole ecosystem of bugs which generally get along with me.  For an infection to take hold, either one of the regular residents has to go bad or the newcomers must steal some space.&lt;br /&gt;&lt;br /&gt;A recent &lt;a href="http://www.pnas.org/content/106/4/1234"&gt;abstract in PNAS&lt;/a&gt; (alas, not an open access paper) provides a fascinating window on how that elbowing takes place.  Staphylococcus aureus (aka the home front) is a standard resident of the respiratory tract (which, of course, can be nasty on its own if it gets through the skin) which Streptococcus pneumoniae (charming moniker! aka the invaders) must push aside.  It turns out that one weapon the invaders use is hydrogen peroxide (H2O2), a staple of many home medicine cabinets -- though not mine growing up; Dad still favors tincture of iodine (curiously, cuts &amp; scrapes often went unreported!).&lt;br /&gt;&lt;br /&gt;Okay, that seems straightforward.  Well, except the question of why the invaders themselves don't suffer some blowback.   But it actually gets more interesting, because it turns out the H2O2 dose is sub-lethal.  Huh?  The invaders come in with flame throwers but set them to warm &amp; cozy?&lt;br /&gt;&lt;br /&gt;But sub-lethal doesn't mean physiologically irrelevant.  The dose is enough for the home front to worry, as H2O2 can cause all sorts of damage.  Indeed, the dose is strong enough to set off the SOS system, a DNA damage response.  &lt;br /&gt;&lt;br /&gt;The SOS system has an interesting side angle.  Many bacteria carry dormant viruses, better known as lysogenic phage, within their genome.  These viral genomes are integrated within their hosts' DNA and generally keep quiet, getting a free replication ride every time their host divides.  However, that free ride isn't much good if your host dies with you in it, so these phage listen to the SOS response -- and when they hear it they go into their lytic phase, pumping out lots of virus and generally killing their host on the way out.&lt;br /&gt;&lt;br /&gt;So now we have a picture: spook the home front enough that a fifth column of phage rises within and destroys them.  Nifty.&lt;br /&gt;&lt;br /&gt;Except, we're back to the blowback problem -- unless the invaders are also free of lysogenic phage they're going to have the same problem.  However, it turns out that H2O2 does &lt;span style="font-style:italic;"&gt;not&lt;/span&gt; activate the SOS response in the invaders, because they apparently are resistant to H2O2's DNA-damaging effects.  &lt;br /&gt;&lt;br /&gt;Understanding that resistance is a next area for work.  Potentially, disabling it would offer an interesting antibiotic angle -- an antibiotic that was specific for the invaders by letting them blow themselves up.  That's a big stretch (and the economics of antibiotic development are horrendous -- hence very few companies try it or stay in it) so don't hold your breath (or cough) waiting for it -- but it is a fun aspect to ponder.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2587703827129055502?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/2587703827129055502/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2587703827129055502" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2587703827129055502" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2587703827129055502" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/02/bacteria-can-mobilize-fifth-column.html" title="Bacteria can mobilize a fifth column" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-9051727053361981021</id><published>2009-02-02T21:59:00.005-05:00</published><updated>2009-02-03T13:04:00.344-05:00</updated><title type="text">A Fatally Flawed Paper</title><content type="html">I like to review manuscripts but don't do so very often.  When I started this blog I thought I might often use it to play "If I had been the reviewer", but I haven't done that much.  However, a paper came to my attention that I can't stop thinking about until I tackle it here.&lt;br /&gt;&lt;br /&gt;As an aside, I find papers I review to fall into three categories.  The first are very solid papers that I can find little to comment on; I might make a suggestion or two (often about data visualization), but if the core is solid there isn't much for the reviewer to do.  The second category is the most frustrating: when I feel the paper is on the edges of my expertise &amp; I start to question whether I should have agreed to review it (which is done after seeing an abstract).  The third category is the one I can really dig into: seriously flawed papers.  I think one of my reviews of a paper was approaching the length of the manuscript; the paper was badly flawed but there was a thread of substance that with a lot of work could be turned into something decent.&lt;br /&gt;&lt;br /&gt;Anyway, I noticed this paper in the BioMedCentral Table of Contents extract which emailed to me weekly.&lt;br /&gt;&lt;blockquote&gt;&lt;a href="http://www.ncbi.nlm.nih.gov/entrez/utils/fref.fcgi?PrId=3196&amp;itool=AbstractPlus-def&amp;uid=19105813&amp;db=pubmed&amp;url=http://www.biomedcentral.com/1471-2164/9/625"&gt;Comparative kinomics of human and chimpanzee reveals unique kinship and functional diversity generated by new domain combinations.&lt;/a&gt;&lt;/blockquote&gt;.  Now, back at MLNM I had for a while specialized in protein kinases, so it is a field of some interest.  I hadn't kept up with the status of the chimpanzee genome sequencing, but there is a longstanding familial interest in this species so that was another angle of interest.&lt;br /&gt;&lt;br /&gt;Sometimes when there has been some accident, a review of the circumstances leading up to it will reveal many opportunities for recognizing that a bad situation had been set up: the engineer ignored a stop signal or the dispatcher should have noticed the switch was set incorrectly.  This paper, particularly one of its centerpiece findings, has that feel to it: there were many warning flags that something was amiss, but unfortunately the authors &lt;span style="font-style:italic;"&gt;and&lt;/span&gt; the reviewers failed to see them.&lt;br /&gt;&lt;br /&gt;When I first planned this critique, I was going to detail several examples.  However, that would seem to lead to a &lt;span style="font-style:italic;"&gt;very&lt;/span&gt; long post, so I will pick a few examples and claim that it is representative.  If anyone wishes to challenge that claim, then I'll flesh out some more.  Also, I feel the first example is particularly apropos because it is a bit of a centerpiece; it gets a lot of space (including a special figure) in the text.&lt;br /&gt;&lt;br /&gt;It was this bit of text that caused me to raise my eyebrows as far as they could go (I wish I could do the Spock single-eyebrow raise, but I can't).  The bolding is mine to emphasize the big surprises.&lt;br /&gt;&lt;blockquote&gt;For  example,  a  chimpanzee  kinase classified as casein  kinase  1  (ENSPTRP00000001150) on the &lt;span style="font-weight:bold;"&gt;basis of  significant  sequence  similarity (31%)&lt;/span&gt;  of  the  catalytic  domain  and  excellent  e-value  (2e-16) with  the  casein  kinase  1  from  human. However this chimp kinase has a POLO BOX tethered to the kinase catalytic domain. &lt;br /&gt;&lt;br /&gt;Thus this chimp kinase  represents  a  &lt;span style="font-weight:bold;"&gt;hybrid  CK1_POLO kinase&lt;/span&gt;.  Interestingly  ENSEMBL  reports  that ENSPTRP00000001150  has  a  high  similarity with  the  human  kinase ENSP00000361275. However, according to our classification protocol ENSP00000361275 is classified as a POLO kinase on the basis of 52% sequence identity with classical POLO kinases and excellent e-value of e-112. Figure 1 shows the dendrogram of the CK1 sub-family of kinases and it highlights the significant divergence of chimp homologue from its counterparts in other organisms&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;The first huge surprise is to find a kinase with so little sequence identity to its closest human counterpart.  The DNA identity of human and chimp is routinely cited in the high 90 percent (how exactly you calculate it affects the final value) and they are our closest relatives.  Finding a human-mouse ortholog identity of less than 31% would be stunning; for human-chimp it would be indescribably surprising.  The second huge surprise is the claim of a hybrid Polo-CK1 kinase.  The Polo box is a domain which recognizes phosphorylated peptides and is important in the activation &amp; substrate recognition by Polo kinases.  It is the signature of the Polo subfamily and has not been reported to be found on any other protein.  The third surprise is in the dendrogram; it is claimed that this kinase has an affinity to CK1-type kinaess, but in their rooted dendrogram (source of rooting not explained, a serious error) this kinase is an outgroup to all of the other presented kinases!  Without some true outgroups (ideally representatives of other key families), how can we tell what it is most similar to? &lt;br /&gt;&lt;br /&gt;Now, a strong criticism of mine of this paper is that it relies too much on Ensembl-derived sequences and annotation.  Ensembl is a great system &amp; I have high respect for it, but it is also trying to do the very complex job of integrating a lot of other data with genomic sequences of varying quality and we are not scientists if we fully trust it to always be correct.  It is much better to have a more definitive reference point; why rely on someone's hand sketched map if you have a USGS topographic section available?  And for a solid anchor database, it is hard to beat the RefSeq human protein dataset.  So, we take the sequence from their figure for this ORF&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt;3|Chimp|ENSPTRP00000001150&lt;br /&gt;SLAHIWKARHTLLEPEVRYYLRQILSGLKYLHQRGILHRDLKLGNFFITENMELKVGDF&lt;br /&gt;GLAARLEPPEQRKKTICGTPNYVAPEVLLRQGHGPEADVWSLGCVMYTLLCGSPPFETA&lt;br /&gt;DLKETYRCIKQVHYTLPASLSLPARQLLAAILRASPRDRPSIDQILRHDFFTKGYTPDR&lt;br /&gt;LPISSCVTVPDLTPPNPARSLFAKVTKSLFGRKKKKSKNHAQESDEVSGLVSGLMRTSV&lt;br /&gt;GHQDARPEAPAASGPAPVSLVETAPEDSSPRGTLASSGDGFEEGLTVATVVESALCALR&lt;br /&gt;NCVAFMPPAEQNPAPLAQPEPLVWVSKWVDYGGDLPSVEEVEVPAPPLLLQWVKTDQAL&lt;br /&gt;LMLFSDGTVQVNFYGDHTKLILSGWEPLLVTFVARNRSACTYLASHLRQLGCSPDLRQRLRYALRLLRDRSPA&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and our top hit is &lt;br /&gt;&lt;pre&gt;&lt;br /&gt; GENE ID: 1263 PLK3 | polo-like kinase 3 (Drosophila) [Homo sapiens]&lt;br /&gt;(Over 10 PubMed links)&lt;br /&gt;&lt;br /&gt; Score =  663 bits (1710),  Expect = 0.0, Method: Compositional matrix adjust.&lt;br /&gt; Identities = 328/362 (90%), Positives = 335/362 (92%), Gaps = 14/362 (3%)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;That resolves all these questions: it's a straightforward ortholog of PLK3 (which explains the Polo boxes), not some noteworthy hybrid and the sequence identity is 90+% -- and that score is dropped a lot by some iffy regions like this&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Query  301  MPPAEQNPAPLAQPEPLVWVSKWVDYGGDLPSVEEVEVPAPPLLLQWVKTDQALLMLFSD  360&lt;br /&gt;            MPPAEQNPAPLAQPEPLVWVSKWVDY                    +  + + + +LF+D&lt;br /&gt;Sbjct  445  MPPAEQNPAPLAQPEPLVWVSKWVDYSNKFG-------------FGYQLSSRRVAVLFND  491&lt;br /&gt;&lt;br /&gt;Query  361  GT  362&lt;br /&gt;            GT&lt;br /&gt;Sbjct  492  GT  493&lt;br /&gt;&lt;br /&gt;&lt;br /&gt; Score =  176 bits (446),  Expect = 1e-43, Method: Compositional matrix adjust.&lt;br /&gt; Identities = 83/91 (91%), Positives = 87/91 (95%), Gaps = 0/91 (0%)&lt;br /&gt;&lt;br /&gt;Query  319  WVSKWVDYGGDLPSVEEVEVPAPPLLLQWVKTDQALLMLFSDGTVQVNFYGDHTKLILSG  378&lt;br /&gt;            ++ + +  GGDLPSVEEVEVPAPPLLLQWVKTDQALLMLFSDGTVQVNFYGDHTKLILSG&lt;br /&gt;Sbjct  538  YMEQHLMKGGDLPSVEEVEVPAPPLLLQWVKTDQALLMLFSDGTVQVNFYGDHTKLILSG  597&lt;br /&gt;&lt;br /&gt;Query  379  WEPLLVTFVARNRSACTYLASHLRQLGCSPD  409&lt;br /&gt;            WEPLLVTFVARNRSACTYLASHLRQLGCSPD&lt;br /&gt;Sbjct  598  WEPLLVTFVARNRSACTYLASHLRQLGCSPD  628&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;What's going on there?  Well, most likely this is underlining the draft nature of the chimpanzee genome.  I checked with TBLASTN, and there aren't ESTs around this region -- the chimp PLK3 is pretty much a pure gene prediction model -- a tough problem that has been tackled well but never perfectly.  Plus, the underlying genomic data is, well, draft quality.  Another TBLASTN search revealed that although this Ensembl prediction is from the middle of a large contig, the N-terminus of human PLK3 has a great match on another contig -- but from the same chromosome.&lt;br /&gt;&lt;br /&gt;Okay, maybe that's a fluke.  So here's another chimp kinase highlighted in the text&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;A  protein  (ENSPTRP00000000076),  classified  under  PKC  subfamily,  is  composed  of  a  PB1 domain  followed  by  the  protein  kinase  domain  which  is  followed  by  a  protein  kinase  C  terminal domain  (Figure 3a1). The PB1 domain  is present  in many eukaryotic cytoplasmic  signalling proteins and is responsible, although not systematically, in the formation of PB1 dimers [25]. It thus serves as a molecular recognition module. This architecture is known so far only in an atypical PKC of Phallusia mammilata,  a  sea  squirt.  Our  analysis  identified  two  chimpanzee  PKCs  and  a  human  PKC  with  a similar architecture,  in which a phorbol esters/diacylglycerol  binding domain  is  inserted between  the PB1 and the protein kinase domain. The presence of the phorbol esters/diacylglycerol binding domain in  combination  with  the  protein  kinase  and  a  PKC  terminal  domain  indicates  that  it  is  probably responsible for the recruitment of diacylglycerol, which in turns might be involved in activation of the kinase.  The  deletion  of  this  domain  in  chimpanzee  PKC  (ENSPTRP00000000076)  implies  that  the recruitment of diacylglycerol might be achieved by an external interacting module. &lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Again, the first thing to do is to search the ORF &lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&gt;1|Chimp|ENSPTRP00000000076&lt;br /&gt;MPSRTGPKMEGSGGRVRLKAHYGGDIFITSVDAATTFEELCEEVRDMCRLHQQHPL&lt;br /&gt;TLKWVDSEGDPCTVSSQMELEEAFRLARQCRDEGLIIHVFPSTPEQPGLPCPGEDK&lt;br /&gt;SIYRRGARRWRKLYCANGHLFQAKRFNRDSVMPSQEPPVDDKNEDADLPSEETDGI&lt;br /&gt;AYISSSRKHDSIKDDSEDLKPVIDGMDGIKISQGLGLQDFDLIRVIGRGSYAKVLL&lt;br /&gt;VRLKKNDQIYAMKVVKKELVHDDETTSRLFLVIEYVNGGDLMFHMQRQRKLPEEHA&lt;br /&gt;RFYAAEICIALNFLHERGIIYRDLKLDNVLLDADGHIKLTDYGMCKEGLGPGDTTS&lt;br /&gt;TFCGTPNYIAPEILRGEEYGFSVDWWALGVLMFEMMAGRSPFDIITDNPDMNTEDY&lt;br /&gt;LFQVILEKPIRIPRFLSVKASHVLKGFLNKDPKERLGCRPQTGFSDIKSHAFFRSI&lt;br /&gt;DWDLLEKKQALPPFQPQITDDYGLDNFDTQFTSEPVQLTPDDEDAIKRIDQSEFEG&lt;br /&gt;FEYINPLLLSTEESV&lt;br /&gt;&lt;/blockquote&gt;against human RefSeq to get our bearings.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt; GENE ID: 5590 PRKCZ | protein kinase C, zeta [Homo sapiens]&lt;br /&gt;(Over 100 PubMed links)&lt;br /&gt;&lt;br /&gt; Score = 1031 bits (2665),  Expect = 0.0, Method: Compositional matrix adjust.&lt;br /&gt; Identities = 518/592 (87%), Positives = 518/592 (87%), Gaps = 73/592 (12%)&lt;br /&gt;&lt;br /&gt;Query  1    MPSRTGPKMEGSGGRVRLKAHYGGDIFITSVDAATTFEELCEEVRDMCRLHQQHPLTLKW  60&lt;br /&gt;            MPSRTGPKMEGSGGRVRLKAHYGGDIFITSVDAATTFEELCEEVRDMCRLHQQHPLTLKW&lt;br /&gt;Sbjct  1    MPSRTGPKMEGSGGRVRLKAHYGGDIFITSVDAATTFEELCEEVRDMCRLHQQHPLTLKW  60&lt;br /&gt;&lt;br /&gt;Query  61   VDSEGDPCTVSSQMELEEAFRLARQCRDEGLIIHVFPSTPEQPGLPCPGEDKSIYRRGAR  120&lt;br /&gt;            VDSEGDPCTVSSQMELEEAFRLARQCRDEGLIIHVFPSTPEQPGLPCPGEDKSIYRRGAR&lt;br /&gt;Sbjct  61   VDSEGDPCTVSSQMELEEAFRLARQCRDEGLIIHVFPSTPEQPGLPCPGEDKSIYRRGAR  120&lt;br /&gt;&lt;br /&gt;Query  121  RWRKLYCANGHLFQAKRFNR----------------------------------------  140&lt;br /&gt;            RWRKLY ANGHLFQAKRFNR                                        &lt;br /&gt;Sbjct  121  RWRKLYRANGHLFQAKRFNRRAYCGQCSERIWGLARQGYRCINCKLLVHKRCHGLVPLTC  180&lt;br /&gt;&lt;br /&gt;Query  141  ----DSVMPSQEPPVDDKNEDADLPSEETDGIAYISSSRKHDSIKDDSEDLKPVIDGMDG  196&lt;br /&gt;                DSVMPSQEPPVDDKNEDADLPSEETDGIAYISSSRKHDSIKDDSEDLKPVIDGMDG&lt;br /&gt;Sbjct  181  RKHMDSVMPSQEPPVDDKNEDADLPSEETDGIAYISSSRKHDSIKDDSEDLKPVIDGMDG  240&lt;br /&gt;&lt;br /&gt;Query  197  IKISQGLGLQDFDLIRVIGRGSYAKVLLVRLKKNDQIYAMKVVKKELVHDDE--------  248&lt;br /&gt;            IKISQGLGLQDFDLIRVIGRGSYAKVLLVRLKKNDQIYAMKVVKKELVHDDE        &lt;br /&gt;Sbjct  241  IKISQGLGLQDFDLIRVIGRGSYAKVLLVRLKKNDQIYAMKVVKKELVHDDEDIDWVQTE  300&lt;br /&gt;&lt;br /&gt;Query  249  ---------------------TTSRLFLVIEYVNGGDLMFHMQRQRKLPEEHARFYAAEI  287&lt;br /&gt;                                 TTSRLFLVIEYVNGGDLMFHMQRQRKLPEEHARFYAAEI&lt;br /&gt;Sbjct  301  KHVFEQASSNPFLVGLHSCFQTTSRLFLVIEYVNGGDLMFHMQRQRKLPEEHARFYAAEI  360&lt;br /&gt;&lt;br /&gt;Query  288  CIALNFLHERGIIYRDLKLDNVLLDADGHIKLTDYGMCKEGLGPGDTTSTFCGTPNYIAP  347&lt;br /&gt;            CIALNFLHERGIIYRDLKLDNVLLDADGHIKLTDYGMCKEGLGPGDTTSTFCGTPNYIAP&lt;br /&gt;Sbjct  361  CIALNFLHERGIIYRDLKLDNVLLDADGHIKLTDYGMCKEGLGPGDTTSTFCGTPNYIAP  420&lt;br /&gt;&lt;br /&gt;Query  348  EILRGEEYGFSVDWWALGVLMFEMMAGRSPFDIITDNPDMNTEDYLFQVILEKPIRIPRF  407&lt;br /&gt;            EILRGEEYGFSVDWWALGVLMFEMMAGRSPFDIITDNPDMNTEDYLFQVILEKPIRIPRF&lt;br /&gt;Sbjct  421  EILRGEEYGFSVDWWALGVLMFEMMAGRSPFDIITDNPDMNTEDYLFQVILEKPIRIPRF  480&lt;br /&gt;&lt;br /&gt;Query  408  LSVKASHVLKGFLNKDPKERLGCRPQTGFSDIKSHAFFRSIDWDLLEKKQALPPFQPQIT  467&lt;br /&gt;            LSVKASHVLKGFLNKDPKERLGCRPQTGFSDIKSHAFFRSIDWDLLEKKQALPPFQPQIT&lt;br /&gt;Sbjct  481  LSVKASHVLKGFLNKDPKERLGCRPQTGFSDIKSHAFFRSIDWDLLEKKQALPPFQPQIT  540&lt;br /&gt;&lt;br /&gt;Query  468  DDYGLDNFDTQFTSEPVQLTPDDEDAIKRIDQSEFEGFEYINPLLLSTEESV  519&lt;br /&gt;            DDYGLDNFDTQFTSEPVQLTPDDEDAIKRIDQSEFEGFEYINPLLLSTEESV&lt;br /&gt;Sbjct  541  DDYGLDNFDTQFTSEPVQLTPDDEDAIKRIDQSEFEGFEYINPLLLSTEESV  592&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Okay, so the human protein has been described previously: it is human protein kinase C zeta.  Has the PB1 domain in PKCzeta and its implications been previously discussed?  A quick PubMed search turned up &lt;span style="font-style:italic;"&gt;two&lt;/span&gt; papers from earlier this decade (in &lt;a href="http://www.sciencedirect.com/science?_ob=MImg&amp;_imagekey=B6WSR-495V6N6-6-G&amp;_cdi=7053&amp;_user=10&amp;_coverDate=07%2F31%2F2003&amp;_sk=%23TOC%237053%232003%23999879998%23575349%23FLA%23display%23Volume_12,_Issue_1,_Pages_1-270_(July_2003)%23tagged%23Volume%23first%3D12%23Issue%23first%3D1%23Pages%23first%3D1%23last%3D270%23date%23(July_2003)%23&amp;view=c&amp;_gw=y&amp;wchp=dGLbVzz-zSkWA&amp;md5=f69b067f3fc27a8f8550c1d40f45c69a&amp;ie=/sdarticle.pdf"&gt;Molecular Cell&lt;/a&gt;  &amp; &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/utils/fref.fcgi?PrId=3051&amp;itool=AbstractPlus-def&amp;uid=12813044&amp;db=pubmed&amp;url=http://www.jbc.org/cgi/pmidlookup?view=long&amp;pmid=12813044"&gt;JBC&lt;/a&gt;) which actually demonstrated the dimerization potential of the PKCzeta PB1 domain.  So the PKC domain with a PB1 domain is not novel &amp; noteworthy.  What about the missing diacylglycerol-binding domain (that first big gap) in the chimp kinase? That could be interesting, so let's see what whether we can find any EST evidence to support it. Alas, the only EST evidence refutes it and identifies the gap as spurious(and both of these ESTs were deposited in October 2007 and the paper submitted in March 2008, so they are not an unfair criticism)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt;dbj|DC524857.1|  DC524857 chimpanzee brain cDNA library PflB Pan troglodytes verus &lt;br /&gt;cDNA clone PflB8010 5', mRNA sequence.&lt;br /&gt;Length=404&lt;br /&gt;&lt;br /&gt; Score =  108 bits (270),  Expect(2) = 3e-31, Method: Composition-based stats.&lt;br /&gt; Identities = 57/102 (55%), Positives = 58/102 (56%), Gaps = 44/102 (43%)&lt;br /&gt; Frame = +2&lt;br /&gt;&lt;br /&gt;Query  112  KSIYRRGARRWRKLYCANGHLFQAKRFNR-------------------------------  140&lt;br /&gt;            +SIYRRGARRWRKLYCANGHLFQAKRFNR                               &lt;br /&gt;Sbjct  38   ESIYRRGARRWRKLYCANGHLFQAKRFNRRAYCGQCSERIWGLARQGYRCINCKLLVHKR  217&lt;br /&gt;&lt;br /&gt;Query  141  -------------DSVMPSQEPPVDDKNEDADLPSEETDGIA  169&lt;br /&gt;                         DSVMPSQEPPVDDKNEDADLPSEETDGIA&lt;br /&gt;Sbjct  218  CHGLVPLTCRKHMDSVMPSQEPPVDDKNEDADLPSEETDGIA  343&lt;br /&gt;&lt;br /&gt;&lt;br /&gt; Score = 42.7 bits (99),  Expect(2) = 3e-31, Method: Compositional matrix adjust.&lt;br /&gt; Identities = 20/22 (90%), Positives = 21/22 (95%), Gaps = 0/22 (0%)&lt;br /&gt; Frame = +3&lt;br /&gt;&lt;br /&gt;Query  168  IAYISSSRKHDSIKDDSEDLKP  189&lt;br /&gt;            + YISSSRKHDSIKDDSEDLKP&lt;br /&gt;Sbjct  339  LLYISSSRKHDSIKDDSEDLKP  404&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&gt;dbj|DC519886.1|  DC519886 chimpanzee brain cDNA library PccB Pan troglodytes verus &lt;br /&gt;cDNA clone PccB0482 5', mRNA sequence.&lt;br /&gt;Length=612&lt;br /&gt;&lt;br /&gt; Score =  114 bits (284),  Expect = 3e-26, Method: Compositional matrix adjust.&lt;br /&gt; Identities = 63/107 (58%), Positives = 63/107 (58%), Gaps = 44/107 (41%)&lt;br /&gt; Frame = +2&lt;br /&gt;&lt;br /&gt;Query  113  SIYRRGARRWRKLYCANGHLFQAKRFNR--------------------------------  140&lt;br /&gt;            SIYRRGARRWRKLYCANGHLFQAKRFNR                                &lt;br /&gt;Sbjct  290  SIYRRGARRWRKLYCANGHLFQAKRFNRRAYCGQCSERIWGLARQGYRCINCKLLVHKRC  469&lt;br /&gt;&lt;br /&gt;Query  141  ------------DSVMPSQEPPVDDKNEDADLPSEETDGIAYISSSR  175&lt;br /&gt;                        DSVMPSQEPPVDDKNEDADLPSEETDGIAYISSSR&lt;br /&gt;Sbjct  470  HGLVPLTCRKHMDSVMPSQEPPVDDKNEDADLPSEETDGIAYISSSR  610&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I checked in detail one more note (about the chimp protein ENSPTRP00000001185 and its human ortholog) about a domain architecture claimed to be unique to human &amp; chimp due to a missing domain.  Again, the RefSeq protein search revealed that the chimp protein is nearly identical to a known human kinase (MARK2) albeit greatly truncated -- and the missing domain is beyond the truncation point.&lt;br /&gt;&lt;br /&gt;I haven't checked every kinase in the paper, but seeing the same classes of mistakes repeatedly doesn't give much hope.  Comparing the human &amp; chimp kinomes (or any other well-defined subset of genes) is a worthwhile enterprise -- so long as it is kept in mind that the chimp genome is a very rough draft and all appropriate computational controls are used.  This paper, unfortunately, shows no awareness of either of these principles.&lt;br /&gt;&lt;br /&gt;What irks me most about this sort of paper is that it gives all of us a bit of a black eye.  Someone who saw the abstract &amp; got excited would be in for a big letdown.  It's hard enough to earn the respect of bench biologists without it being tossed away with poorly done analyses.&lt;br /&gt;&lt;br /&gt;So what are the positive lessons to be learned?  Here are a few tips&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Always try to find meaningful biological names for your sequences.  Use them in your figures &amp; search them in the literature like a bloodhound.&lt;/li&gt;&lt;br /&gt;&lt;lI&gt;Always check genomic predictions against EST &amp; cDNA databases.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Always try to root your phylogenetic trees, unless you have a really good reason not to do so.  And, if your tree is rooted, you &lt;span style="font-weight:bold;"&gt;must&lt;/span&gt; explain how you rooted it&lt;/li&gt;&lt;br /&gt;&lt;li&gt;If your results sound amazing, take a deep breath &amp; think of several tests that could debunk them.  Then do those ten tests.  If they survive, go to bed &amp; think of another batch of tests. &lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;P.S. One way to put reviewers in a bad mood is to not supply your sequences.  The supplementary materials for this paper do not have all of the ORFs; I pulled some out from their alignments with a custom script.  Elsewhere via Google I found a collection linked to the work -- but with the whole predicted chimp proteome in it!  Very unwieldy &amp; slow to download!&lt;br /&gt;P.P.S. For anyone interested in exploring further, here are the other sequences from the alignment in additional file 3.  The number in the header was added by my script to indicate which alignment within that file the sequence was taken from.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt;2|Chimp|ENSPTRP00000019171&lt;br /&gt;MSAEVRLRRLQQLVLDPGFLGLEPLLDLLLGVHQELGASELAQDKYVADFLQWAEPIVVRL&lt;br /&gt;KEVRLQRDDFEILKVIGRGAFSEVAVVKMKQTGQVYAMKIMNKWDMLKRGEVSCFREERDV&lt;br /&gt;LVNGDRRWITQLHFAFQDENYLYLVMEYYVGGDLLTLLSKFGERIPAEMARFYLAEIVMAI&lt;br /&gt;DSVHRLGYVHRDIKPDNILLDRCGHIRLADFGSCLKLRADGTVRSLVAVGTPDYLSPEILQ&lt;br /&gt;AVGGGPGTGSYGPECDWWALGVFAYEMFYGQTPFYADSTAETYGKIVHYKEHLSLPLVDEG&lt;br /&gt;VPEEARDFIQRLLCPPETRLGRGGAGDFRTHPFFFGLDWDGLRDSVPPFTPDFEGATDTCN&lt;br /&gt;FDLVEDGLTAMVSGGGETLSDIREGAPLGVHLPFVGYSYSCMALRDSEVPGPTPMELEAEQ&lt;br /&gt;LLEPHVQAPSLEPSVSPQDETAEVAVPAAVPAAEAEAEVTLRELQEALEEEVLTRQSLSRE&lt;br /&gt;MEAIRTDNQNFASQLREAEARNRDLEAHVRQLQERMELLQAEGATAVTGVPSPRATDPPSH&lt;br /&gt;VPWPGLSXALSLLLFAVVLSRAAALGCLGLVAPAGXLXAVWRRPGAARAPX&lt;br /&gt;&gt;4|Chimp|ENSPTRP00000011569&lt;br /&gt;MSDVAIVKEGWLHKRGEYIKTWRPRYFLLKNDGTFIGYKERPQDVDQREAPLNNFSVAQCQ&lt;br /&gt;LMKTERPRPNTFIIRCLQWTTVIERTFHVETPEEREEWTTAIQTVADGLKKQEEEEMDFRS&lt;br /&gt;GSPSDNSGAEEMEVSLAKPKHRVTMNEFEYLKLLGKGTFGKVILVKEKATGRYYAMKILKK&lt;br /&gt;EVIVAKDEVAHTLTENRVLQNSRHPFLTALKYSFQTHDRLCFVMEYANGGELFFHLSRERV&lt;br /&gt;FSEDRARFYGAEIVSALDYLHSEKNVVYRDLKLENLMLDKDGHIKITDFGLCKEGIKDGAT&lt;br /&gt;MKTFCGTSEYLAPRLSPPFKPQVTSETDTRYFDEEFTAQMITITPP&lt;br /&gt;DQDDSMECVDSERRPHFPQFSYSASGTA&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-9051727053361981021?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/9051727053361981021/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=9051727053361981021" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/9051727053361981021" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/9051727053361981021" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/02/fatally-flawed-paper.html" title="A Fatally Flawed Paper" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7691439435758684941</id><published>2009-01-28T21:58:00.004-05:00</published><updated>2009-01-28T22:22:30.605-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="space" /><title type="text">Remembering the 27th, 28th &amp; 1st</title><content type="html">When I was a junior in high school, on a day much like today, I wanted to stay home and watch TV a bit, so I was hoping the wintry weather would generate a snow day.  I didn't often wish for this, as my childhood love of snow had subsided substantially (though I would sometimes ski through my yard), but on this day I wanted to be home.  Winter and the superintendent, however, did not cooperate and we had only a delayed opening, and hooky was out of the question in my family so off I went.&lt;br /&gt;&lt;br /&gt;And so I was sitting in Mr. Schmidt's chemistry class that morning.  He was a nice man, but that class did very little to prepare me for a life on the periphery of chemistry, except that he did an excellent job of outlining the early 20th century revolution in chemistry &amp; physics.  I do not remember what he was talking about that morning when Mrs. Kurtz, the Biology II teacher, came in and commented on a news event.  We all nodded, given we expected the news -- but then she restated herself as we had not heard her, and Mr. Schmidt got out the TV in his closet and I found myself watching TV that morning -- exactly what I had hoped to watch on a snow day but also nothing I had ever imagined or could have remotely hoped to watch.  For that restatement was: "No, the space shuttle &lt;span style="font-style:italic;"&gt;blew&lt;/span&gt; up!".&lt;br /&gt;&lt;br /&gt;When my boy was three we were going one weekend to take him to the Boston Children's Museum, a wonderful place for a child of that age to explore and run around and have fun.  As a bonus, we would ride the subway there and oh how he loves to ride trains.  It was again a winter day and I drove the usual route to Boston &amp; there is a spot on I-93 where you come out of the relatively untouched beauty of the Middlesex Fells and the skyline of Boston suddenly appears.  It was in that spot that I heard the report on radio whose meaning became instantly clear, and I semi-silently cried "No!" -- an extended loss of radio contact with a space shuttle could not ever end happily.&lt;br /&gt;&lt;br /&gt;We are in the midst of that grim week of anniversaries for NASA; yesterday marked the 42nd anniversary of Apollo 1, today the 23rd anniversary of the loss of Challenger and Sunday is 6th anniversary of the loss of Columbia.  Only one of those events has any obvious connection to this time of year.  &lt;br /&gt;&lt;br /&gt;For as long as I can remember the space program has had an outsized influence on my imagination.  My career path did not take me in a good direction to go to space, but I still think about it almost daily.  In some ways these three disasters are completely removed from what I do, but in other ways they are not.  I do subscribe to &lt;a href="http://www.asktog.com/books/challengerExerpt.html"&gt;Edward Tufte's argument&lt;/a&gt; that poor data visualization helped enable the Challenger disaster, and while my plots do not carry such weighty implications I still must be ready in case they ever do.  All three of these were hardware failures, and I do software, but software failures have caused unmanned probes to be lost and manned missions to go awry.&lt;br /&gt;&lt;br /&gt;But of all else, it is important to remember those who pushed the limits and did not return.  We must remember who they were and why they died, as they died doing important things and they died because humans make mistakes.  Grissom, White &amp; Chaffee were doomed by a design from which escape was impossible and fire likely.  Smith, Scobee, McNair, Onizuka, McAuliffe, Jarvis &amp; Resnik died when a machine was run far outside its normal operating regime.  Brown, Husband, Clark, Chawla, Anderson, McCool and Ramon died from a design which was not well matched to the materials used to construct it. &lt;br /&gt;&lt;br /&gt;We recently learned some more details of the Columbia accident: how the astronauts never realized the disaster approaching them, but how pilot McCool worked calmly to deal with systematic failure just before it killed him.  I wish I could have such coolness under stress.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7691439435758684941?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/7691439435758684941/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7691439435758684941" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7691439435758684941" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7691439435758684941" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/01/remembering-27th-28th-1st.html" title="Remembering the 27th, 28th &amp; 1st" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry></feed>
