<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:georss="http://www.georss.org/georss"><id>tag:blogger.com,1999:blog-36768584</id><updated>2009-10-30T06:47:20.983-04:00</updated><title type="text">Omics! Omics!</title><subtitle type="html">A computational biologist's personal views on new technologies &amp; publications on genomics &amp; proteomics and their impact on drug discovery</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/" /><link rel="hub" href="http://pubsubhubbub.appspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default?start-index=26&amp;max-results=25" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>312</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><link rel="self" href="http://feeds.feedburner.com/OmicsOmics" type="application/atom+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><entry><id>tag:blogger.com,1999:blog-36768584.post-9182146484383837394</id><published>2009-10-29T22:53:00.002-04:00</published><updated>2009-10-29T23:39:25.386-04:00</updated><title type="text">My Most Expensive Paper</title><content type="html">Genome Research has a &lt;a href="http://genome.cshlp.org/content/early/2009/10/28/gr.095976.109.abstract"&gt;paper detailing the Mammalian Gene Collection (MGC)&lt;/a&gt;, and if you look way down on the long author list (which includes Francis Collins!) you'll see mine there along with two Codon Devices colleagues.  This paper cost me a lot -- nothing in legal tender, but a heck of a lot of blood, sweat &amp; tears.&lt;br /&gt;&lt;br /&gt;The MGC is an attempt to have every human &amp; mouse protein coding sequence (plus more than a few rat)available as an expression clone, with native sequence.  Most of the genes were cloned from cDNA libraries, but coding sequences which couldn't be found that way were farmed out to a number of synthetic biology companies.  Codon decided to take on a particularly challenging tranche of mostly really long ORFs, hoping to demonstrate our proficiency in this difficult task.&lt;br /&gt;&lt;br /&gt;At the start, the attitude was "can-do".  When it appeared we couldn't parse some targets into our construction scheme, I devised a new algorithm that captured a few more (&lt;a href="http://omicsomics.blogspot.com/2007/08/personal-breakthrough.html"&gt;which I blogged about cryptically&lt;/a&gt;).  It was going to be a huge order which would fill our production pipeline in a expansive new facility we had recently moved into, replacing a charming but cramped historic structure.  A new system for tracking constructs through the facility was about to be rolled out that would let us finally track progress across the pipeline without a human manager constantly looking over each plasmid's shoulder.  The delivery schedule for MGC was going to be aggressive but would show our chops.  We were going to conquer the world!&lt;br /&gt;&lt;br /&gt;Alas, almost as soon as we started (and had sunk huge amounts of cash into oligos) we discovered ourselves in a small wicker container which was growing very hot.  Suddenly, nothing was working in the production facility.  A combination of problems, some related to the move (a key instrument incorrectly recalibrated)and another problem whose source was never quite nailed down forced a complete halt to all production activity for several months -- which soon meant that MGC was going to be the only trusty source of revenue -- if we could get MGC to release us from our now utterly undoable delivery schedule.&lt;br /&gt;&lt;br /&gt;Eventually, we fixed the old problems &amp; got new processes in place and pushed a bunch of production forward.  We delivered a decent first chunk of constructs to MGC, demonstrating that we were for real (but still with much to deliver).  Personnel were swiped from the other piece of the business (protein engineering) to push work forward.  More and more staff came in on weekends to keep things constantly moving.&lt;br /&gt;&lt;br /&gt;Even so, trouble still was a constant theme.  Most of the MGC project were large constructs, which were built by a hierarchical strategy.  Which means the first key task was to build all the parts -- and some parts just didn't want to be built.  We had two processes for building "leaves", and both underwent major revisions and on-the-fly process testing.  We also started screening more and more plasmids by sequencing, sometimes catching a single correct clone in a mountain of botched ones (but running up a higher and higher capillary sequencing bill).  Sometimes we'd get almost right pieces, which could be fixed by site directed mutagenesis -- yet another unplanned cost in reagents &amp; skilled labor.  I experimented with partial redesigns of some builds -- but with the constraint of not ordering more costly oligos.  Each of these pulled in a few more constructs, a few more delivered -- and a frustrating pile of still unbuilt targets. &lt;br /&gt;&lt;br /&gt;Even when we had all the parts built, the assembly of them to the next stage was failing at alarming rates -- usually by being almost right.  Yet more redesigns requiring fast dancing by the informatics staff to support.  More constructs pushed through.  More weekend shifts.&lt;br /&gt;&lt;br /&gt;In the end, when Codon shut down its gene synthesis business -- about 10 months after starting the MGC project -- we delivered a large fraction of our assignment -- but not all of it.  For a few constructs we delivered partial sequences for partial credit.  It felt good to deliver -- and awful to not deliver.  &lt;br /&gt;&lt;br /&gt;Now, given all that I've described (and more I've left out), I can't help but be a bit guilty about that author list.  It was decided at some higher level that the author list would not be several miles long, and so some sort of cut had to be made.  Easily 50 Codon employees played some role in the project, and certainly there were more than a dozen for whom it occupied a majority of their attention.  An argument could have been easily made for at least that many Codon authors.  But, the decision was made that the three of us who had most shared the project management aspect would go on the paper.  In my case, I had ended up the main traffic cop, deciding which pieces needed to be tried again through the main pipeline and which should be directed to the scientist with magic hands.  For me, authorship is a small token for the many nights I ran SQL queries at midnight to find out what had succeeded and what had failed in sequencing -- and then checked again at 6 in the morning before heading off to work.  Even on weekends, I'd be hitting the database in the morning &amp; night to find out what needed redirecting -- and then using SQL inserts to redirect them.  I realized I was on the brink of madness when I was sneaking in queries on family ski weekend.&lt;br /&gt;&lt;br /&gt;Perhaps after such a checkered experience it is natural to question the whole endeavor.  The MGC effort means that researchers who want to express a mammalian protein from a native coding sequence can do so.  But how much of what we built will actually get used?  Was it really necessary to build the native coding sequence -- which often gave us headaches in the builds from repeats &amp; GC-rich regions (or, as we belatedly discovered, certain short runs of G could foul us up)?  MGC is a great resource, but the goal of a complete catalog of mammalian genes wasn't realized -- some genes still aren't available from MGC or any of the commercial human gene collections.&lt;br /&gt;&lt;br /&gt;MGC also torture-tested Codon's construction processes, and the original ones failed badly.  Our in-progress revisions fared much better, but still did not succeed as frequently as they should have.  when we could troubleshoot things, we could ascribe certain failures to almost every conceivable source -- bad enzymes, a bad oligo well, failure to follow procedures, laboratory mix-ups, etc.  But an awful lot could not be pinned to any cause, despite investigation, suggesting that we simply did not understand our system well enough to use it in a high-throughput production environment.&lt;br /&gt;&lt;br /&gt;I do know one thing: while I hope to stay where I am for a very long time, should I ever be looking for a job again I will avoid a production facility.  Some gene synthesis projects were worse than MGC in terms of demanding customers with tight timelines (which is no knock on the customers; now I'm that customer!), but even with MGC I found it's just not the right match for me.  It's no fun to burn so much effort on just getting something through the system so that somebody else can do the cool biology.  I don't ever want to be in a situation where I'm on vacation and thinking about which things are stalled in the line.  Some people thrive in the environment; I found it draining. &lt;br /&gt;&lt;br /&gt;But, there is something to be said for the experience.  I learned a lot which can be transferred to other settings.  That which doesn't kill us makes us stronger -- MGC must have made me Superman.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-9182146484383837394?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/9182146484383837394/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=9182146484383837394" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/9182146484383837394" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/9182146484383837394" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/my-most-expensive-paper.html" title="My Most Expensive Paper" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-4735991589714223046</id><published>2009-10-26T00:39:00.002-04:00</published><updated>2009-10-26T00:44:02.662-04:00</updated><title type="text">DTC CNVs?</title><content type="html">Curiosity question: do the current DTC genomics companies report out copy number variations (CNVs) to their customers?  Are any of their technologies unable to read these?  Clearly Knome (or Illumina, which isn't DTC but sort of competing with them) should be able to get this info from the shotgun sequencing.  But what about the array-based companies such as Navigenics &amp; 23andMe?  My impression is that any high density SNP array data can be mined for copy number info, but perhaps there are caveats or restrictions on that.  &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It would seem that with CNVs so hot in the literature and a number of complex diseases being associated to them, this would be something the DTC companies would jump at.  But have they?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4735991589714223046?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/4735991589714223046/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4735991589714223046" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4735991589714223046" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4735991589714223046" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/dtc-cnvs.html" title="DTC CNVs?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-4624918914570119248</id><published>2009-10-24T22:53:00.003-04:00</published><updated>2009-10-24T23:12:28.954-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="conferences" /><title type="text">Now where did I misplace that genome segment of mine?</title><content type="html">One of the many interesting ASHG tidbits from the Twitter feed is a &lt;a href="http://twitter.com/suganthibala/statuses/5130226737"&gt;comment from "suganthibala"&lt;/a&gt; which I'll quote in full&lt;br /&gt;&lt;blockquote&gt;On average we each are missing 123 kb. homozygously. An incomplete genome is the norm. What a goofy species we are.&lt;/blockquote&gt;.&lt;br /&gt;&lt;br /&gt;I'm horribly remiss in tracking the CNV literature, but this comment makes me wonder whether this is atypical at all.  How extensively has this been profiled in other vertebrate species and how do other species look in terms of the typical amount of genome missing?  I found two papers for dogs, &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/19015322"&gt;one of which features a former lab mate as senior author&lt;/a&gt; and the &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/utils/fref.fcgi?PrId=3051&amp;itool=AbstractPlus-def&amp;uid=19129542&amp;nlmid=9518021&amp;db=pubmed&amp;url=http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=19129542"&gt;other one&lt;/a&gt; has Evan Eichler in the author list.  Some work has clearly been done in &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?term=17965714%2017206864%2018032724%2017989247[uid]%20AND%20pubmed%20pmc%20local[sb]%20AND%20loprovpmc[sb]&amp;log$=pmcad6_more"&gt;mouse as well&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;Presumably there is some data for Drosophila, but how extensive?  Are folks going through their collections of D. melanogaster collected from all of the world and looking for structural variation?  With a second gen sequencer, this would be straightforward to do -- though a lot of libraries would need to be prepped!  Many flies could be packed into one lane of Illumina data, so this would take some barcoding.  Even cheaper might be to do it on a Polonator (reputed to cost about $500 in consumables per run (not including library prep).&lt;br /&gt;&lt;br /&gt;Attacking this by paired-end/mate-pair NGS rather than arrays (which have been the workhorse so far) would enable detecting balanced rearrangements, which arrays are blind to (though there is another &lt;a href="http://sciencepond.com/search/Eichler"&gt;tweeted item&lt;/a&gt; that Eichler states "Folks you can't get this kind of information from nextgen sequencing; you need old-fashioned capillaries" -- I'd love to hear the background on that) That leads to another proto-thought: will the study of structural variation lead to better resolution of the conundrum of speciation and changes in chromosome structure -- i.e. it's easy to see how such rearrangments could lead to reproductive isolation but not easy to see how they wouldn't be sufficiently non-isolating to allow for enough founders.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4624918914570119248?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/4624918914570119248/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4624918914570119248" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4624918914570119248" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4624918914570119248" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/now-where-did-i-misplace-that-genome.html" title="Now where did I misplace that genome segment of mine?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2585524258580019545</id><published>2009-10-24T00:06:00.004-04:00</published><updated>2009-10-24T00:13:46.195-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="conferences" /><title type="text">ASHG Tweets: Minor Fix or Slow Torture?</title><content type="html">Okay, I'll admit it: I've been ignoring Twitter.  It doesn't help that I never really learned to text (I might have sent one in my life).  Maybe if I ever get a phone with a real keyboard, but even then I'm not sure.  Live blogging from meetings seemed a bit interesting -- but in those tiny packets?  I even came up with a great post on Twitter -- alas a few days after the first of April, when it would have been appropriate.&lt;br /&gt;&lt;br /&gt;But now I've gotten myself hooked on the Twitter feed coming from attendees at the &lt;a href="http://twitter.com/#search?q=%23ASHG"&gt;American Society for Human Genetics&lt;/a&gt;.  It's an interesting mix -- some well established bloggers, lots of folks I don't know plus various vendors hawking their booths or off-conference tours and such.  Plus, you don't even need a Twitter account!&lt;br /&gt;&lt;br /&gt;The only real problem is its really making me wish I was there.  I've never been to Hawaii, despite a nearly lifelong interest in going.  And such a cool meeting!  But, you can't go to every meeting unless your a journalist or event organizer (or sales rep!), so I had to stay home and get work done.  &lt;br /&gt;&lt;br /&gt;I suspect I'm hooked &amp; will be repeating this exercise whenever I miss good conferences.  Who knows? Maybe I'll catch the Twitter bug yet!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2585524258580019545?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/2585524258580019545/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2585524258580019545" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2585524258580019545" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2585524258580019545" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/ashg-tweets-minor-fix-or-slow-torture.html" title="ASHG Tweets: Minor Fix or Slow Torture?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2618026474784290662</id><published>2009-10-22T22:33:00.002-04:00</published><updated>2009-10-22T23:34:06.981-04:00</updated><title type="text">Physical Maps IV: Twilight of the Clones?</title><content type="html">I've been completely slacking on completing my self-imposed series on how second generation sequencing (I'm finally trying to kick the "next gen" term) might reshape the physical mapping of genomes.  It hasn't been that my brain has been ignoring the topic, but somehow I've not extracted the thoughts through my fingertips.  And I've figured out part of the reason for my reticence -- my next installment was supposed to cover BACs and other clone-based maps, and I'm increasingly thinking these aren't going to be around much longer.&lt;br /&gt;&lt;br /&gt;Amongst the many ideas I turned over was how to adapt BACs to the second generation world.  BACs are very large segments -- often a few hundred kilobases -- cloned into low copy (generally single copy) vectors in E.coli. &lt;br /&gt;&lt;br /&gt;One approach would be to simple sequence the BACs.  One key challenge is that a single BAC is poorly matched to a second generation sequencer; even a single lane of a sequencer is gross overkill.  So good high-throughput multiplex library methods are needed.  Even so, there will be a pretty constant tax of resequencing the BAC vector and the inevitable contaminating host DNA in the prep.  That's probably going to run about 10% wastage -- not unbearable but certainly not pretty.&lt;br /&gt;&lt;br /&gt;Another type of approach is end-sequencing.  for this you really need long reads, so 454 is probably the only second generation machine suitable.  But, you need to smash down the BAC clone to something suitable for emulsion PCR.  I did see something in Biotechniques on a vectorette PCR to accomplish this, so it may be a semi-solved problem.  &lt;br /&gt;&lt;br /&gt;A complementary approach is to landmark the BACs, that is to identify a set of distinctive features which can be used to determine which BACs overlap. At the Providence conference one of the posters discussed getting 454 reads from defined restriction sites within a BAC. &lt;br /&gt;&lt;br /&gt;But, any of these approaches still require picking the individual BACs and prepping DNA from them and performing these reactions.  While converting to 454 might reduce the bill for the sequence generation, all that picking &amp; prepping is still going to be expensive.&lt;br /&gt;&lt;br /&gt;BACs baby cousins are fosmids, which are essentially the same vector concept but designed to be packaged into lambda phage.  Fosmids carry approximately 40Kb of DNA.  I've already seen ads from Roche/454 claiming that their 20Kb mate pair libraries obviate the need for fosmids.  While 20Kb is only half the span, many issues that fosmids solve are short enough to be fixed by a 20Kb span, and the 454 approach enables getting lots of them.&lt;br /&gt;&lt;br /&gt;This is all well and good, but perhaps its time to look just a little bit further ahead.  Third generation technologies are getting close to reality (those who have early access Pacific Biosciences machines might claim they are reality).  Some of the nanopore systems detailed in Rhode Island are clearly far away from being able to generate sequences you would believe.  However, physical mapping is a much less demanding application than trying to generate a consensus sequence or identify variants.  Plenty of times in my early career it was possible using BLAST to take amazingly awful EST sequences and successfully map them against known cDNAs.  &lt;br /&gt;&lt;br /&gt;Now, I don't have any inside information on any third generation systems.  But, I'm pretty sure I saw a claim that Pacific Biosciences has gotten reads close to 20Kb. Now, this could have been a "magic read" where all the stars were aligned.  But imagine for a moment if this technology can routinely hit such lengths (or even longer) -- albeit with quality that makes it unusable for true sequencing but sufficient for aligning to islands of sequence in a genome assembly.  If such a technology could generate sufficient numbers of such reads in reasonable time, the 454 20Kb paired libraries could start looking like buggy whips.&lt;br /&gt;&lt;br /&gt;Taking this logic even further, suppose one of the nanopore technologies could really scan very long DNAs, perhaps 100Kb or more.  Perhaps the quality is terrible, but again, as long as its just good enough.  For example, suppose the error rate was 15%, or a phred 8 score.  AWFUL!  But, in a sequence of 10,000 (standing for the size of a fair-sized sequence island in an assembly) you'd expect to find nearly 3 runs of 50 correct bases.  Clearly some clever algorithmics would be required (especially since with nanopores you don't know which direction the DNA is traversing the pore), but this would suggest that some pretty rotten sequencing could be used to order sequence islands along long reads.  &lt;br /&gt;&lt;br /&gt;Yet another variant on this line of thinking would be to use nanopores to read defined sequence landmarks from very long fragments.  Once you have an initial assembly, a set of unique sequences can be selected for synthesis on microarrays.  While PCR is required to amplify those oligos, it also offers an opportunity to subdivide the huge pool.  Furthermore, with sufficiently long oligos on the chip one could even have multiple universal primer targets per oligo, enabling a given landmark to be easily placed in multiple orthogonal pools.  With an optical nanopore reading strategy, 4 or more color-coded pools could be hybridized simultaneously and read.  Multiple colors might be used for more elaborate coding of sequence islands -- i.e. one island might be encoded with a series of flashing lights, much like some lighthouses.  Again, clever algorithmics would be needed to design such probe strategies.  &lt;br /&gt;&lt;br /&gt;How far away would such ideas be?  Someone more knowledgeable about the particular technologies could guess better than I could.  But, it would certainly be worth exploring, at least on paper, for anyone wanting to show that nanopores are close to prime time.  While really low quality reads or just landmarking molecules might not seem exciting, it would offer a chance to get the technology into routine operation -- and from such routine operation comes continuous improvement.  In other words, the way to push nanopores into routine sequencing might be by carefully picking something other than sequence -- but making sure that it is a path to sequencing and not a detour.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2618026474784290662?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/2618026474784290662/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2618026474784290662" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2618026474784290662" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2618026474784290662" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/physical-maps-iv-twilight-of-clones.html" title="Physical Maps IV: Twilight of the Clones?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-2319834782608363012</id><published>2009-10-14T23:17:00.002-04:00</published><updated>2009-10-14T23:57:27.397-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="genome sequencing" /><title type="text">Why I'm Not Crazy About The Term "Exome Sequencing"</title><content type="html">I find myself worrying sometimes that I worry too much about the words I use -- and worry some of the rest of the time that I don't worry enough.  What can seem like the right words at one time might seem wrong some other time.  The terms "&lt;a href="http://www.genomeweb.com/search/google?cx=001523166877881412738:giguif19v6c&amp;cof=FORID:11&amp;query=killer+app&amp;op=Search&amp;form_build_id=form-e20537b5c2c535c4b4d00720841ac9e4&amp;form_token=910030c86f3d9349f727ec5cac30cb24&amp;form_id=google_cse_results_searchbox_form#978"&gt;killer app&lt;/a&gt;" are thrown around a lot in the tech space, but would you really want to hear it used about sequencing a genome if you were the patient whose DNA was under scrutiny?&lt;br /&gt;&lt;br /&gt;One term that sees a lot of traction these days is "exome sequencing".  I listened in on a free &lt;a href="http://w.on24.com/r.htm?e=165328&amp;s=1&amp;k=672F81556EC6B80FF26A54B43FE5554B"&gt;Science magazine webinar&lt;/a&gt; today on the topic, and the presentations were all worthwhile.  The focus was on the Nimblegen capture technology (Roche/Nimblegen/454 sponsored the webinar), though other technologies were touched on.&lt;br /&gt;&lt;br /&gt;By "exome sequencing" what is generally meant is to capture &amp; sequence the exons in the human genome in order to find variants of interest.  Exons have the advantage of being much more interpretable than non-coding sequences; we have some degree of theory (though quite incomplete) which enables prioritizing these variants.  The approach also has the advantage of being significantly cheaper at the moment than whole genome sequencing (one speaker estimated $20K per exome).  So what's the problem?&lt;br /&gt;&lt;br /&gt;My concern is that the terms "exome sequencing" are taken a bit too literally.  Now, it is true that these approaches catch a bit of surrounding DNA due to library construction and the targeting approaches cover splice junctions, but what about some of the other important sequences?  According to my poll of practitioners of this art, their targets are entirely exons (confession: N=1 for the poll).  &lt;br /&gt;&lt;br /&gt;I don't have a general theory for analyzing non-coding variants, but conversely there are quite a few well annotated non-coding regions of functional significance.  An obvious case are promoters.  Annotation of human promoters and enhancers and other transcriptional doodads is an ongoing process, but some have been well characterized.  In particular, the promoters for many drug metabolizing enzymes have been scrutinized because these may have significant effects on how much of the enzyme is synthesized and therefore drug metabolism. &lt;br /&gt;&lt;br /&gt;Partly coloring my concern is the fact that exome sequencing kits are becoming standardized; at least two are on the market currently.  Hence, the design shortcomings of today might influence a lot of studies.  Clearly sequencing every last candidate promoter or enhancer would tend to defeat the advantages of exome sequencing, but I believe a reasonable shortlist of important elements could be rapidly identified.&lt;br /&gt;&lt;br /&gt;My own professional interest area, cancer genomics, adds some additional twists.  At least one major cancer genome effort (at the Broad) is using exome sequencing.  On the one hand, it is true that there are relatively few recurrent, focused non-coding alterations documented in cancer.  However, few is not none.  For example, in lung cancer the c-Met oncogene has been documented to be &lt;a href="http://cancerres.aacrjournals.org/cgi/content/full/63/19/6272"&gt;activated by mutations within an intron&lt;/a&gt;; these mutations cause skipping of an exon encoding an inhibitory domain.  Some of these alterations are about 50 nucleotides away from the nearest splice junction -- a distance that is likely to result in low or no coverage using the &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/utils/fref.fcgi?PrId=3494&amp;itool=AbstractPlus-nondef&amp;uid=19182786&amp;nlmid=9604648&amp;db=pubmed&amp;url=http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&amp;pubmedid=19182786"&gt;Broad's in solution capture technology&lt;/a&gt; (confession #2: I haven't verified this with data from that system).  &lt;br /&gt;&lt;br /&gt;The drug metabolizing enzyme promoters I mentioned before are a bit greyer for cancer genomics.  On the one hand, one is generally primarily interested in what somatic mutations have occurred on the tumor.  On the other hand, the norm in cancer genomics is tending towards applying the same approach to normal (cheek swab or lymphocyte) DNA from the patient, and why not get the DME promoters too?  After all, these variants may have &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/19823875?ordinalpos=1&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum"&gt;influenced the activity of therapeutic agents or even development of the disease&lt;/a&gt;.  Just as some somatic mutations seem to cluster enigmatically with patient characteristics, perhaps some somatic mutations will correlate with germline variants which contributed to disease initiation.&lt;br /&gt;&lt;br /&gt;Whatever my worries, they should be time-limited.  Exome sequencing products will be under extreme pricing pressure from whole genome sequencing.  The $20K cited (probably using 454 sequencing) is already potentially matched by one vendor (Complete Genomics).  Now, in general the cost of capture will probably be a relatively small contributor compared to the cost of data generation, so exome sequencing will ride much of the same cost curve as the rest of the industry.  But, it probably is $1-3K for whole exome capture due to the multiple chips required and the labor investment (anyone have a better estimate?).  If whole mammalian genome sequencing really can be pushed down into the $5K range, then mammalian exome sequencing will not offer a huge cost advantage if any.  I'd guess interest in mammalian exome sequencing will peak in a year or two, so maybe I should stop worrying and learn to love the hyb.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-2319834782608363012?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/2319834782608363012/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=2319834782608363012" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2319834782608363012" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/2319834782608363012" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/why-im-not-crazy-about-term-exome.html" title="Why I'm Not Crazy About The Term &quot;Exome Sequencing&quot;" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-5739116942368040164</id><published>2009-10-09T14:32:00.007-04:00</published><updated>2009-10-09T14:39:19.756-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="administration" /><title type="text">Bad blog!  Bad, bad, bad blog!</title><content type="html">Thanks to Dan Koboldt from Mass Genomics, I've discovered that another blog (the &lt;a href="http://medicalcenterinfo.com/"&gt;Oregon Personal Injury Law Blog&lt;/a&gt; had copied my breast cancer genome piece.  Actually, it appears that since it started this summer it may have copied every one of my posts here at &lt;a href="http://omicsomics.blogspot.com"&gt;Omics! Omics!&lt;/a&gt; without any attribution or apparent linking back.  I've left a comment (which is moderated) protesting this.&lt;br /&gt;&lt;br /&gt;curiously, the author of this blog (I assume it has one) doesn't seem to have left any identifying information or contact info, so for the moment the comments section is my only way of communicating.  Perhaps this is some sort of wierd RSS-driven bug; that's the only charitable explanation I can contemplate.  But it is strange -- most of these have no possible link to personal injury -- or can PNAS sue me for complaining about their RSS feed?&lt;br /&gt;&lt;br /&gt;We'll see if the author fixes this, or at least replies with something along the lines of "head down, ears flat &amp; tail between the legs".  &lt;br /&gt;&lt;br /&gt;Just to double-check the RSS hypothesis, I'm actually going to explicitly sign this one -- Keith Robison from &lt;a href="http://omicsomics.blogspot.com"&gt;Omics! Omics!&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-5739116942368040164?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/5739116942368040164/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=5739116942368040164" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/5739116942368040164" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/5739116942368040164" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/bad-blog-bad-bad-bad-blog.html" title="Bad blog!  Bad, bad, bad blog!" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-603880617736730152</id><published>2009-10-09T00:06:00.004-04:00</published><updated>2009-10-09T00:53:09.757-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="proteomics" /><category scheme="http://www.blogger.com/atom/ns#" term="metagenomics" /><title type="text">Nano Anglerfish Snag Orphan Enzymes</title><content type="html">The new Science has an extremely impressive paper tackling the problem of orphan enzymes.  Due primarily to Watson-Crick basepairing, our ability to sequence nucleic acids has shot far past our ability to characterize the proteins they may encode.  If I want to measure an RNA's expression, I can generate an assay almost overnight by designing specific real-time PCR (aka RT-PCR aka TaqMan) probes.  If I want to analyze any specific protein's expression, it generally involves a lot of teeth gnashing &amp; frustration.  If you're lucky, there is a good antibody for it -- but most times there is either no antibody or one of unknown (and probably poor) character.  Mass spec based methods continue to improve, but still don't have an "analyze any protein in any biological sample anytime" character (yet?).&lt;br /&gt;&lt;br /&gt;One result of this is that there are a lot of ORFs of unknown function in any sequenced genome.  Bioinformatic approaches can make guesses for many of these and those guesses are often around enzymatic activity, but a bioinformatic prediction is not proof and the predictions are often quite vague (such as "hydrolase").  Structural genomics efforts sometimes pull in additional proteins whose sequence didn't resemble anything of known function, but whose structure has enzymatic characteristics such as nucleotide binding pockets.  There have been one or two of such structures de-orphaned by virtual screening, but these are a rarity.&lt;br /&gt;&lt;br /&gt;Attempts have been made at high-throughput screening of enzyme activities.  For example, several efforts have been published in which cloned libraries of proteins from a proteome were screened for enzyme activity.  While these produced initial papers, they've never seemed to really catch fire.&lt;br /&gt;&lt;br /&gt;The new paper is audacious in providing an approach to detecting enzyme activities and subsequently identifying the responsible proteins, all from protein extracts.  The key trick is an array of golden nano anglerfish -- well, that's how I imagine it.  Like an anglerfish, the gold nanoparticles dangle their chemical baits off long spacers (poly-A, of all things!).  In reverse of an anglerfish, the bait complex glows &lt;span style="font-style:italic;"&gt;after&lt;/span&gt; it has been taken by its prey, with a clever unquenching mechanism activating the fluorophore and marking that a reaction took place.  But the real kicker is that like an anglerfish, the nanoparticles seize their prey!  Some clever chemistry around a bound Cobalt ion (which I won't claim to understand)results in linking the enzyme to the nanoparticle, from which it can be cleaved, trypsinized and identified by mass spectrometry.  1676 known metabolites and 807 other compounds of interest were immobilized in this fashion.  &lt;br /&gt;&lt;br /&gt;As one test, the researchers applied separately extracts of the bacteria Pseudomonas putida and Streptomyces coelicolor to arrays.  Results were in quite strong agreement with the existing bioinformatic annotations of these organisms, in that the P.putida extract's pattern of metabolized and not metabolized substrates strongly coincided with what the informatics would predict and the same was true for S.coelicolor (with a P&lt;5.77^-177 for the latter!). But, agreement was not perfect -- each species catalyzed additional reactions on the array which were absent from the databases.  By identifying the bound proteins, numerous assignments were made which were either novel or significant refinements of the prior annotation. Out of 191 proteins identified in the P.putida set, 31 hypothetical proteins were assigned function, 47 proteins were assigned a different function and the previously ascribed function was confirmed for the remaining 113 proteins.&lt;br /&gt;&lt;br /&gt;Further work was done with environmental samples.  However, given the low protein abundance from such samples, these were converted into libraries cloned into E.coli and then the extracts from these E.coli strains analyzed.  Untransformed E.coli was used to estimate the backgrounds to subtract -- I must confess a certain disappointment that the paper doesn't report any novel activities for E.coli, though it isn't clear that they checked for them (but how could you not!).  The samples came from three extreme environments -- one from a hot, heavy metal rich acidic pool, one from oil-contaminated seawater and a third from a deep sea hypersaline anoxic region.  From each sample a plethora of enzyme activities were discovered.&lt;br /&gt;&lt;br /&gt;Of course, there are limits to this approach.  The tethering mechanism may interfere with some enzymes acting on their substrates.  It may, therefore, be desirable to place some compounds multiple times on the array but with the linker attached at different points.  It is unlikely we know all possible metabolites (particularly for strange bugs from strange places), so some enzymes can't be deorphaned this way.  And sensitivity issues may challenge finding some enzyme activities if very few copies of the enzyme are present.&lt;br /&gt;&lt;br /&gt;On the other hand, as long as these issues are kept in mind this is an unprecedented &amp; amazing haul of enzyme annotations.  Application of this method to industrially important fungi &amp; yeasts is another important area, and certainly only the bare surface of the bacterial world was scratched in this paper.  Arrays with additional unnatural -- but industrially interesting -- substrates are hinted at in the paper.  Finally, given the reawakened interest in small molecule metabolism in higher organisms &amp; their diseases (such as cancer), application of this method to human samples can't be far behind.  &lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Science&amp;rft_id=info%3A%2F10.1126%2Fscience.1174094&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Reactome+array%3A+Forging+a+link+between+metabolome+and+genome&amp;rft.issn=&amp;rft.date=2009&amp;rft.volume=326&amp;rft.issue=5950&amp;rft.spage=252&amp;rft.epage=257&amp;rft.artnum=http%3A%2F%2Fwww.sciencemag.org%2Fcgi%2Fcontent%2Fabstract%2F326%2F5950%2F252&amp;rft.au=Ana+Beloqui&amp;rft.au=Mar%C3%ADa-Eugenia+Guazzaroni&amp;rft.au=Florencio+Pazos&amp;rft.au=Jos%C3%A9+M.+Vieites&amp;rft.au=Marta+Godoy&amp;rft.au=Olga+V.+Golyshina%2C&amp;rft.au=Tatyana+N.+Chernikova&amp;rft.au=Agnes+Waliczek&amp;rft.au=Rafael+Silva-Rocha&amp;rft.au=Yamal+Al-ramahi&amp;rft.au=Violetta+La+Cono&amp;rft.au=Carmen+Mendez&amp;rft.au=Jos%C3%A9+A.+Salas&amp;rft.au=Roberto+Solano&amp;rft.au=Michail+M.+Yakimov&amp;rft.au=Kenneth+N.+Timmis&amp;rft.au=Peter+N.+Golyshin&amp;rft.au=Manuel+Ferrer&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CChemistry%2CBiotechnology%2C+Biochemistry%2C+Bioinformatics%2C+Microbiology+%2C+Chemical+Biology%2C+Biochemistry%2C+Biological+Chemistry"&gt;Ana Beloqui, María-Eugenia Guazzaroni, Florencio Pazos, José M. Vieites, Marta Godoy, Olga V. Golyshina,, Tatyana N. Chernikova, Agnes Waliczek, Rafael Silva-Rocha, Yamal Al-ramahi, Violetta La Cono, Carmen Mendez, José A. Salas, Roberto Solano, Michail M. Yakimov, Kenneth N. Timmis, Peter N. Golyshin, &amp; Manuel Ferrer (2009). Reactome array: Forging a link between metabolome and genome &lt;span style="font-style: italic;"&gt;Science, 326&lt;/span&gt; (5950), 252-257 : &lt;a rev="review" href="http://www.sciencemag.org/cgi/content/abstract/326/5950/252"&gt;10.1126/science.1174094&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-603880617736730152?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/603880617736730152/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=603880617736730152" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/603880617736730152" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/603880617736730152" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/nano-anglerfish-snag-orphan-enzymes.html" title="Nano Anglerfish Snag Orphan Enzymes" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7615799363377654638</id><published>2009-10-07T23:37:00.002-04:00</published><updated>2009-10-08T00:29:56.426-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cancer" /><title type="text">The genomic history of a breast cancer revealed</title><content type="html">Today's Nature contains a great paper which is one more step forward for cancer genomics.  Using Illumina sequencing a group in British Columbia sequenced both the genome and transcriptome of a metastatic lobular (estrogen receptor positive) breast cancer.  Furthermore, they searched a sample of the original tumor for mutations found in the genome+transcriptome screen in order to identify those that may have been present early vs. those which were acquired later.&lt;br /&gt;&lt;br /&gt;From the combined genome sequence and RNA-Seq data they found 1456 non-synonymous changes which was then trimmed to 1178 after removing pseudogenes and HLA sequences.  1120 of these could be re-assayed by Sanger sequencing of PCR amplicons from both normal DNA and the metastatic samples -- 437 of these were confirmed.  Most of these (405) were found in the normal sample.  Of the 32 remaining, 2 were found only in the RNA-Seq data, a point to be addressed later below.  Strikingly, none of the mutated genes were found in the previous whole-exome sequencing (by PCR+Sanger) of breast cancer, though those samples were of a different subtype (estrogen receptor negative).&lt;br /&gt;&lt;br /&gt;There are a bunch of cool tidbits in the paper, which I'm sure I won't give full justice to here but I'll do my best.  For example, several other papers using RNA-Seq on solid cancers have identified fusion proteins, but in this paper none of the fusion genes suggested by the original sequencing came through their validation process.  Most of the coding regions with non-synonymous mutations have not been seen to be mutated before in breast cancer, though ERBB2 (HER2, the target of Herceptin) is in the list along with PALB2, a gene which when mutated predisposes individuals to several cancers (and is also associated with BRCA2).  The algorithm (SNVMix) used for SNP identification &amp; frequency estimation is a good example of an &lt;a href="http://omicsomics.blogspot.com/2007/10/scientific-easter-eggs.html"&gt;easter egg&lt;/a&gt;, a supplementary item that could easily be its own paper.&lt;br /&gt;&lt;br /&gt;One great little story is HAUS3.  This was found to have a truncating stop codon mutation and the data suggests that the mutation is homozygous (but at normal copy number) in the tumor.  A further screen of 192 additional breast cancers (112 lobular and 80 ductal) for several of the mutations found no copies of the same hits seen in this sample, but two more truncating mutations in HAUS3 were found (along with 3 more variations in ERBB2 within the kinase domain, a hotspot for cancer mutations).  HAUS3 is particularly interesting because until about a year ago it was just C4orf15, an anonymous ORF on chromosome 15.  Several papers have recently described a complex ("augmin") which plays a role in genome stability, and HAUS3 is a component of this complex.  This starts smelling like a tumor suppressor (truncating mutations seen repeatedly; truncating mutation homozygous in tumor; protein in function often crippled in cancer), and I'll bet HAUS3 will be showing up in some functional studies in the not too distant future.&lt;br /&gt;&lt;br /&gt;Resequencing of the primary tumor was performed using amplicons targeting the mutations found in the metastatic tumor.  These amplicons were small enough to be spanned directly by paired-end Illumina reads, obviating the need for library construction (a trick which has shown up in some other papers).  By using Illumina sequencing for this step, the frequency of the mutation in the sample could be estimated.  It is also worth noting that the primary tumor sample was a Formalin Fixed Paraffin Embedded slide, a way to preserve histology which is notoriously harsh on biomolecules and prone to sequencing artifacts. Appropriate precautions were made, such as sequencing two different PCR amplifications from two different DNA extractions.  The sequencing of the primary tumor suggests that only 10 of the mutations were present there, with only 4 of these showing a frequency consistent with being present in the primary clone and the others probably being minor components.  This is another important filter to suggest which genes are candidates for being involved in early tumorigenesis and which are more likely late players (or simply passengers).&lt;br /&gt;&lt;br /&gt;One more cool bit I parked above: the 2 variants seen only in the RNA-Seq library.  This suggested RNA editing and also consistent with this an RNA editase (ADAR) was found to be highly represented in the RNA-Seq data.  Two genes (COG3 and SRP9) showed high frequency editing.  RNA editing is beginning to be recognized as a widespread phenomenon in mammals (e.g. the nice work by Jin Billy Li in the Church lab); the possibility that cancers can hijack this for nefarious purposes should be an interesting avenue to explore.  COG3 is a Golgi protein &amp; &lt;a href="http://www.nature.com/nature/journal/v459/n7250/full/nature08109.html"&gt;links of the Golgi to cancer&lt;/a&gt; are starting to be teased out.  SRP9 is part of the signal recognition particle involved in protein translocation into the ER -- which of course feeds the Golgi.  Quite possibly this is coincidental, but it certainly rates investigating.&lt;br /&gt;&lt;br /&gt;One final thought: the next year will probably be filled with a lot of similar papers.  Cancer genomics is &lt;a href="http://www.genomeweb.com/cancer-genome-atlas-gets-275m-funding-stimulus-nci-and-nhgri"&gt;gearing up in a huge way&lt;/a&gt;, with &lt;a href="http://www.massgenomics.org/2009/08/wucgi-washu-cancer-genomics-initiative.html"&gt;Wash U alone planning 150 genomes&lt;/a&gt; well before a year from now.  It seems unlikely that those 150 genomes will end up as 150 distinct papers and more so it will be a challenge to do the level of follow-up in this paper on such a grand scale.  A real challenge to the experimental community -- and the funding establishment -- is converting the tantalizing observations which will come pouring out of these studies into validated biological findings.  With a little luck, biotech &amp; pharma companies (such as my employer) will be able to convert those findings into new clinical options for doctors and patients.  &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3A%2F10.1038%2Fnature08489&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Mutational+evolution+in+a+lobular+breast+tumor+profiled+at+single+nucleotide+resolution&amp;rft.issn=&amp;rft.date=2009&amp;rft.volume=461&amp;rft.issue=&amp;rft.spage=809&amp;rft.epage=813&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fnature%2Fjournal%2Fv461%2Fn7265%2Fabs%2Fnature08489.html&amp;rft.au=Sohrab+P.+Shah&amp;rft.au=Ryan+D.+Morin&amp;rft.au=Jaswinder+Khattra&amp;rft.au=Leah+Prentice&amp;rft.au=Trevor+Pugh&amp;rft.au=Angela+Burleigh&amp;rft.au=Allen+Delaney&amp;rft.au=Karen+Gelmon&amp;rft.au=Ryan+Guliany&amp;rft.au=Janine+Senz&amp;rft.au=Christian+Steidl&amp;rft.au=Robert+A.+Holt&amp;rft.au=Steven+Jones&amp;rft.au=Mark+Sun&amp;rft.au=Gillian+Leung&amp;rft.au=Richard+Moore&amp;rft.au=Tesa+Severson&amp;rft.au=Greg+A.+Taylor&amp;rft.au=Andrew+E.+Teschendorff&amp;rft.au=Kane+Tse&amp;rft.au=Gulisa+Turashvili&amp;rft.au=Richard+Varhol&amp;rft.au=Ren%C3%A9+L.+Warren&amp;rft.au=Peter+Watson&amp;rft.au=Yongjun+Zhao&amp;rft.au=Carlos+Caldas&amp;rft.au=David+Huntsman&amp;rft.au=Martin+Hirst&amp;rft.au=Marco+A.+Marra&amp;rft.au=Samuel+Aparicio&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CCancer%2C+Genetics"&gt;Sohrab P. Shah, Ryan D. Morin, Jaswinder Khattra, Leah Prentice, Trevor Pugh, Angela Burleigh, Allen Delaney, Karen Gelmon, Ryan Guliany, Janine Senz, Christian Steidl, Robert A. Holt, Steven Jones, Mark Sun, Gillian Leung, Richard Moore, Tesa Severson, Greg A. Taylor, Andrew E. Teschendorff, Kane Tse, Gulisa Turashvili, Richard Varhol, René L. Warren, Peter Watson, Yongjun Zhao, Carlos Caldas, David Huntsman, Martin Hirst, Marco A. Marra, &amp; Samuel Aparicio (2009). Mutational evolution in a lobular breast tumor profiled at single nucleotide resolution &lt;span style="font-style: italic;"&gt;Nature, 461&lt;/span&gt;, 809-813 : &lt;a rev="review" href="10.1038/nature08489"&gt;10.1038/nature08489&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7615799363377654638?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/7615799363377654638/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7615799363377654638" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7615799363377654638" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7615799363377654638" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/genomic-history-of-breast-cancer.html" title="The genomic history of a breast cancer revealed" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-8434385511641647816</id><published>2009-10-06T23:46:00.002-04:00</published><updated>2009-10-07T00:06:29.833-04:00</updated><title type="text">Diagramming the Atari Pathway</title><content type="html">Okay, it was an outside speaker at work who planted this seed in my brain, and now I can't shake the image -- but perhaps by writing this I will (but also perhaps I will infect my loyal readers with it).&lt;br /&gt;&lt;br /&gt;The stated observation was that some biological pathway diagrams "look like Space Invaders".  Now, I hold such games dear to my heart -- they were quite the rage in our neighborhood growing up, though we didn't own one &amp; I was never very good.  Nowadays one can buy replicas which play many of the old games -- except the entire system fits inside the replica of the old joysticks.  My hardware-oriented brother loves to point out all the interesting workarounds which are now fossilized in these players -- such as limits on the number of moving graphics ("sprites") which could occupy a scan line.&lt;br /&gt;&lt;br /&gt;But which video game seems to be the model for some of these diagrams?  Space invaders is an obvious candidate (or one of the knockoffs or follow-ons such as Galaga), but my old favorite Centipede (or its successor Millipede) is even closer -- they even had spiders trying to spin webs.&lt;br /&gt;&lt;br /&gt;It would be a pretty funny visual joke -- saved for precisely the right time (the wrong time could be disaster!) -- to have a pathway display morph into a game.  The transcription factors start moving about and crashing into the kinases which in turn blast away at the receptors.  &lt;br /&gt;&lt;br /&gt;Versions of the reverse have sometimes occupied my mind -- what if we could make scientific programs more game-like?  The notion I most commonly ponder is a flight simulator for protein structures.  Even that could be taken to another level -- your X-wing is flying down a canyon of the giant structure, ready to unleash a boronic warhead to destroy the evil proteasomic death star!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-8434385511641647816?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/8434385511641647816/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=8434385511641647816" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/8434385511641647816" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/8434385511641647816" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/diagramming-atari-pathway.html" title="Diagramming the Atari Pathway" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-8371831931300523784</id><published>2009-10-06T00:04:00.002-04:00</published><updated>2009-10-06T00:10:18.218-04:00</updated><title type="text">Why does PNAS clip their RSS feeds?</title><content type="html">Okay, minor pet peeve.  I've pretty much switched over to using Outlook as an RSS reader to keep up with journals of interest.  I still get a few ToC by email, but the RSS mechanism has lots of advantages.  First, I'm in Outlook all the time, so it's a natural place.  Second, I can leave behind copies of the papers of interest, with all the tools in Outlook for moving them or tagging them &amp; such.  One minor annoyance is you can't (as least as far as I can tell) force a scan of the RSS feeds.  Sure, mostly this is obsessive or time-killing, but when you have intermittent net access it's really handy.&lt;br /&gt;&lt;br /&gt;But one big difference in the feeds.  Most ToC feeds send out one entry per article and that entry contains the title, authors &amp; abstract.  But PNAS sends out only the authors, title &amp; a very short head end of the abstract.  Aaaaarrrrrgggghhhh!   Lost is much of the ability to vet my level of interest in an article plus the additional keywords which would enhance searching for it.&lt;br /&gt;&lt;br /&gt;I realize PNAS is already busy with torquing their acceptance channels, but could someone who knows someone there in power please get them to fix this?!!?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-8371831931300523784?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/8371831931300523784/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=8371831931300523784" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/8371831931300523784" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/8371831931300523784" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/why-does-pnas-clip-their-rss-feeds.html" title="Why does PNAS clip their RSS feeds?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-8780865675423049126</id><published>2009-10-01T22:19:00.003-04:00</published><updated>2009-10-01T22:49:52.648-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="conferences" /><title type="text">Pondering Polonators</title><content type="html">&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_N2AOZejgjyA/SsVmR4RoSoI/AAAAAAAAAEU/AotDROtd4rc/s1600-h/DSC01984.JPG"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/_N2AOZejgjyA/SsVmR4RoSoI/AAAAAAAAAEU/AotDROtd4rc/s400/DSC01984.JPG" border="0" alt=""id="BLOGGER_PHOTO_ID_5387824986568804994" /&gt;&lt;/a&gt;&lt;br /&gt;Standing next to the Polonator like a proud relative is Kevin McCarthy, who leads the Polonator effort at Dover Systems.  I had remembered him giving permission to photograph it at the first day of the Providence meeting &amp; brought my camera along the second day.  When I mentioned it was for my blog, Kevin leaped into the frame.  All in good fun!&lt;br /&gt;&lt;br /&gt;The Polonator is an intriguing gadget.  No other next-gen sequencer can be had for under $200K -- or about 1/2 to 1/4 the price of any of the other instruments.  But it's no tinfoil-and-paperclip contraption -- not only does it look very solid &amp; professional, with everything laid out neatly in the cabinet, but in one small test it was quite robust.  Kevin had it running mock sequencing cycles and he said "if you put your hand on the stage".  I thought he was being hypothetical, but then he politely insisted I do just that. Clearly he wasn't worried about anything going wrong (and somehow I was convinced my hand would emerge unscathed!).  &lt;a href="http://omicsomics.blogspot.com/2009/09/chi-next-gen-conference-day-1.html"&gt;In his talk&lt;/a&gt;, Kevin pointed out the various vibration isolation schemes engineered in -- you need not tiptoe past it during operation, despite the fact that it is doing some amazingly high-precision imaging.&lt;br /&gt;&lt;br /&gt;The truly intriguing angle on the Polonator is that it is a completely open architecture.  If you want to play around with different chemistries, go ahead (but please respect appropriate licenses!).  I'm guessing you could probably run any of the existing amplification-based chemistries on it (again, licenses might be an issue) -- presumably with a loss of performance.  Of course, with 454 you need continuous watching of a small bit of the flowcell, so the machine isn't ideal.  But that isn't the point -- you could use this as a general hardware &amp; software chassis to experiment.  I &lt;a href="http://omicsomics.blogspot.com/2009/09/chi-next-gen-conference-day-2.html"&gt;speculated previously that some new sequencing-by-synthesis chemistries&lt;/a&gt; could be run on the Polonator, and on further reflection I'm wondering if the optical-based nanopore scheme could be prototyped on a Polonator.  NAR published earlier this year &lt;a href="http://nar.oxfordjournals.org/cgi/content/full/37/1/e5?maxtoshow=&amp;HITS=10&amp;hits=10&amp;RESULTFORMAT=&amp;fulltext=cyclic&amp;searchid=1&amp;FIRSTINDEX=0&amp;resourcetype=HWCIT"&gt;another proposed chemistry&lt;/a&gt; that would seem Polonator-friendly.  &lt;br /&gt;&lt;br /&gt;If you wish to reprogram the fluidics, go ahead!  If you wish to image only in 1 color (the default chemistry requires 4), that's programmable.  &lt;span style="font-style:italic;"&gt;Everything&lt;/span&gt; is programmable.&lt;br /&gt;&lt;br /&gt;That's pretty enticing from a techie angle, but it's also a pretty risky business strategy.  Generally such an expensive gadget is either paid for with a hefty markup up front and/or a hefty premium on reagents.  But, while standard reagent kits are on the way, there's nothing proprietary about them.  Anyone can whip up their own.  Just like the hardware &amp; software, the wetware is all open as well.&lt;br /&gt;&lt;br /&gt;There's also the issue of the current chemistry, which appears to be the original Church lab sequencing-by-ligation scheme.  That means a bunch of sample prep steps and very short reads -- &lt;a href="http://www.polonator.org/protocols/polony.aspx"&gt;26 nucleotides of tag&lt;/a&gt;.  The tags are derived from the original sequence in a predictable way but which isn't quite like getting two simple paired-end or mate-pair reads.  That may be a barrier to many software toolsmiths including Polonator in their code, though perhaps with wide acceptance that would happen.  But, with 10Gbases of data after 80 hours of running, it may attract some attention!&lt;br /&gt;&lt;br /&gt;I'm trying to figure out how I would use one if I had one.  In the abstract sense, polony sequencing has already been shown quite capable of sequencing bacterial genomes.  Also, Complete Genomics' chemistry generates reads in the same ballpark and they are tackling human.  But would I have the courage to try that?  Certainly in my current professional situation it would be going out a bit on a limb.  Plus, even at under $200K it really needs to be kept busy to look like a good buy.  Does almost make me wish I was back in graduate school, as that is the time to experiment with such cool toys!&lt;br /&gt;&lt;br /&gt;On the other hand, I do have some notions of what I might try out on one.  Not enough notions to be able to justify buying one, but certainly if I could rent some time on one at a reasonable price I'd jump at the notion.  With luck, a service provider or two will decide to offer Polonating as a service.  Or, perhaps someone who has bought one might be interested in collaborating on some interesting clinically-relevant projects?  If so, leave me a comment here (which I won't make visible) &amp; we can talk!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-8780865675423049126?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/8780865675423049126/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=8780865675423049126" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/8780865675423049126" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/8780865675423049126" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/10/pondering-polonators.html" title="Pondering Polonators" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_N2AOZejgjyA/SsVmR4RoSoI/AAAAAAAAAEU/AotDROtd4rc/s72-c/DSC01984.JPG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7910595058025295165</id><published>2009-09-28T22:31:00.003-04:00</published><updated>2009-09-28T23:08:21.108-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="structural biology" /><category scheme="http://www.blogger.com/atom/ns#" term="evolution" /><title type="text">Locking in new functions</title><content type="html">The September 24th Nature came in the mail today and as always with this journal (otherwise I wouldn't pay for it!) is full of interesting stuff.   One paper of particular interest is a cool merger of evolution, computational biology, structural biology and protein engineering.&lt;br /&gt;&lt;br /&gt;An interesting question in evolution is to what degree are changes reversible.  In the simplest case, of purely neutral characteristics, the answer would seem to be largely that they are.  However, even a purely neutral change will have a certain probability of reverting.  For example, since transversions (mutation of a pyrimidine to a purine or vice versa) are less common than transitions (purine-&gt;purine or pyrimidine-&gt;pyrimidine), a C-&gt;G mutation (transversion) is less likely to return to C than a C-&gt;T (transition).  Similarly, if a C is methylated but that methylation serves no purpose, the methylation will favor conversion to a T, but the T has no such biochemical slanting to mutate to a C.  But even these will be small changes.&lt;br /&gt;&lt;br /&gt;But throw in some function, and the question gets more complicated.  The question that this paper addresses is a specific receptor, the glucocorticoid receptor. A previous paper by the group showed that the inferred ancestral form was promiscuous,  primarily bound some related steroids, but did have some affinity for glucocorticoids.  This ancestral form existed in the last common ancestor of cartilaginous and bony fishes but by the time of the last common ancestor for bony fishes and tetrapods (such as us) it had fixed a specificity for corticosteroids.  These inferred ancestral receptors are referred to respectively as AncGR1 and AncGR2.&lt;br /&gt;&lt;br /&gt;While there are 37 amino acid replacements between AncGR1 and AncGR2, it takes only two of these (group X) to switch the preference of AncGR1 to corticoidsteroids.  The change is accomplished by substantially swinging a helix to a new position in the ligand binding pocket (helix 7)  Only three more substitutions (group Y) enforce specificity for corticosteroids; make all 5 of these changes and you convert a promiscuous receptor with weak activity towards corticosteroids to one activated only by them.  But the interesting kicker is you can't make this second set of specificity-locking mutations until 2 other mutations (group Z) are made.  The issue is that the first two X mutations cause a significant structural shift which is not entirely stable; without the stability of the group Z pair of mutations the group Y specificity trio can't be tolerated.&lt;br /&gt;&lt;br /&gt;But, there's a kicker.  If you engineer the AncGR2 protein back to having the ancestral states for groups X, Y and Z, the resulting protein is non-functional for any ligand.  Something is going on somewhere in those other 30 changes.  Some further phylogenetic filtering suggested 6 strong candidates and the solution of the X-ray structure of the AncGR2 ligand binding domain (though it turns out the prior homology model of this structure was apparently almost dead on).  Five of the candidates (group W) turn out to either be in or to contact that swung helix 7.  The structure of AncGR1 had been previously solved and a comparison of the AncGR1 and AncGR2 structures showed that the ancestral (AncGR1) forms at these 5 positions stabilize the ancestral position of helix 7 and the derived (AncGR2) amino acids at these positions actually clash with the AncGR1 positioning of helix 7.  Aynthesis of AncGR2 with the ancestral amino acids at groups X, Y, Z and W yielded a receptor whose specificity is very like AncGR1.  One group W substitution had a strong enough effect it could imbue the ancestral phenotype even without the other group W changes but some of the other group W changes could be made only in pairs to show an effect.  Finally, receptors with the ancestral state for combinations of x, y and z mutations (e.g. combining with Xyz -- AncGR2 for X but AncGR1-like at y and z) and found that any combination with xW is non-functional.  AncGR2 with ancestral amino acids at x,y,z &amp; w is not as good a receptor as AncGR1 -- suggesting that at least some of the remaining 25 positions contribute.&lt;br /&gt;&lt;br /&gt;So, this is a well-detailed case where evolutionary change eventually blocked the route back to the start.  A receptor which made the group X changes could still bind the original ligands but that would be lost once the group Y changes were layered on.  Group Y changes were probably preceded by group Z changes which would have made reversion to the original binding specificity unlikely -- and the group W mutations really nail shut the door.&lt;br /&gt;&lt;br /&gt;This particular system was a single polypeptide chain.  But it is not difficult to see how the concept could extend to other biological systems.  Co-evolution of interacting proteins, such as a protein and its receptor, or modification of a developmental system could similarly proceed in a stepwise fashion that ultimately prevents retreat.  We are a bit lucky in this case that the evolutionary traces are all preserved where we can find them; it is not difficult to imagine a scenario where part of the ancestral form is lost from all extant lineages and therefore invisible to our current vision.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Apmid%2F19779450&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=An+epistatic+ratchet+constrains+the+direction+of+glucocorticoid+receptor+evolution.&amp;rft.issn=0028-0836&amp;rft.date=2009&amp;rft.volume=461&amp;rft.issue=7263&amp;rft.spage=515&amp;rft.epage=9&amp;rft.artnum=&amp;rft.au=Bridgham+JT&amp;rft.au=Ortlund+EA&amp;rft.au=Thornton+JW&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CStructural+Biology%2C+Evolutionary+Biology%2C+Computational+Biology"&gt;Bridgham JT, Ortlund EA, &amp; Thornton JW (2009). An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. &lt;span style="font-style: italic;"&gt;Nature, 461&lt;/span&gt; (7263), 515-9 PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19779450"&gt;19779450&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I'll probably add to my spam issues by pointing this out, but this&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7910595058025295165?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/7910595058025295165/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7910595058025295165" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7910595058025295165" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7910595058025295165" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/locking-in-new-functions.html" title="Locking in new functions" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-958044851312071593</id><published>2009-09-25T20:20:00.005-04:00</published><updated>2009-09-25T21:21:06.224-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="evolution" /><category scheme="http://www.blogger.com/atom/ns#" term="controversies" /><title type="text">How many genomes did I just squash?</title><content type="html">Yesterday was a good day for catching up on the literature; not only did I finally get around to the IL28B papers I blogged about yesterday, but I also took a run through the genome fusion paper which is being seen as the fitting marker of the end of the "Communicated by" mechanism of PNAS (sample coverage by &lt;a href="http://pipeline.corante.com/archives/2009/09/23/pnas_shuts_a_door.php"&gt;In The Pipeline&lt;/a&gt; and &lt;a href="http://www.sciencemag.org/cgi/content/full/325/5947/1486-b"&gt;Science&lt;/a&gt;, though the latter requires a subscription).&lt;br /&gt;&lt;br /&gt;The paper, by Donald Williamson and communicated by &lt;a href="http://en.wikipedia.org/wiki/Lynn_Margulis"&gt;Lynn Margulis&lt;/a&gt;, takes the position that  " in animals that metamorphose, the basic types of larvae originated as adults of different lineages, i.e., larvae were transferred when, through hybridization, their genomes were acquired by distantly related animals".  This is a whopper of a proposal and definitely interesting.&lt;br /&gt;&lt;br /&gt;Margulis is famous for proposing the endosymbiont hypothesis to explain mitochondria and chloroplasts and other organelles.  The gist of it is that some ancestral eukaryote took in a guest species and in the long run integrated it fully into its operations so that the two could not be separated.  An important observation which this explained is the fact that mitochondria and chloroplasts have their own genomes, which encode (almost?) entirely for proteins and RNAs used in these structures.  However, their genomes do not encode many of the proteins required -- indeed in metazoans such as ourselves only a tiny pittance of genes are encoded by the mitochondrial genome.  A further observation which fits into this framework is the curious case of Cyanophora paradoxa, a photosynthetic organism whose chloroplast-like structure is surrounded by a rudimentary cell wall.&lt;br /&gt;&lt;br /&gt;When I was an undergraduate, there was still significant controversy on the validity of the endosymbiont hypothesis.  I remember this well, as I wrote a term paper on the subject. What really nailed it down was the careful comparison of gene trees in the cases where the same function is required both in the organelle and in the cytoplasm and both are nuclear encoded.  In the vast majority of these cases, the two are evolutionarily distant from one another and in the case of chloroplasts the gene whose protein goes to the chloroplast looks more like homologs in cyanobacteria and the copy producing cytoplasmic protein looks more like homologs in non-photosynthetic eukaryotes.  There are some fascinating exceptions, such as cases in which one gene does double duty -- via (for example) alternative splicing or promoters including or excluding the chloroplast targeting sequences.&lt;br /&gt;&lt;br /&gt;Margulis and others have tried to extend this notion to other systems.  There are definitely other success -- unicellular organisms which appear to carry three genomes &amp; the always challenging to classify Euglena, which appears to be a genome fusion.  But there have also been some prominent non-successes, such as the eukaryotic flagellum/cillium.  Also when I was an undergraduate a &lt;a href="http://linkinghub.elsevier.com/retrieve/pii/0092-8674(89)90875-1"&gt;Cell paper&lt;/a&gt; made a big splash claiming to find a chromosome associated with the basal body, the organelle associated with flagellum synthesis.  However, this work was never repeated and the &lt;a href="http://www.sciencemag.org/cgi/content/full/318/5848/245"&gt;publication of the Chlamydomas genome&lt;/a&gt; failed to find such a chromosome. &lt;br /&gt;&lt;br /&gt;After reading the paper at hand, I'm both confused and disappointed.  The confusion is embarassing, but the paper goes into a lot of detail on taxonomy and gross development of which I'm horribly ignorant.  But, conversely the disappointment comes from what I do understand and how cursorily that is treated.  And since it is the stuff I understand which is the route Williamson proposes to test his hypothesis, that is a big let down.&lt;br /&gt;&lt;br /&gt;A key part that I do understand (minus a few terms I hadn't encountered before), with my emphasis:&lt;br /&gt;&lt;blockquote&gt;Many corollaries of my hypothesis are testable. If insects acquired larvae by hybrid transfer, the total base pairs of DNA of exopterygote insects that lack larvae will be smaller than those of endopterygote (holometabolous) species that have both larvae and pupae. &lt;span style="font-style:italic;"&gt;Genome sequences are known for the fruitfly, Drosophila melanogaster, the honeybee, Apis mellifera, the malarial mosquito, Anopheles gambiae, the red flour beetle, Tribolium castaneum, and the silkworm, Bombyx mori: holometabolous species, with marked metamorphoses.&lt;/span&gt; &lt;span style="font-weight:bold;"&gt;I predict that an earwigfly (Mercoptera Meropeidae), an earwig (Dermaptera), a cockroach (Dictyoptera), or a locust (Orthoptera) will have not necessarily fewer chromosomes but will have fewer base pairs of protein-coding chromosomal DNA than have these holometabolans. Also the genome of an onychophoran that resembles extant species will be found in insects with caterpillar or maggot-like larvae.&lt;/span&gt; Onychophoran genomes will be smaller than those of holometabolous insects. Urochordates, comprising tunicates and larvaceans, present a comparable case. Larvaceans are tadpoles throughout life. Garstang  regarded larvaceans as persistent&lt;br /&gt;tunicate larvae, and, if so, their genomes would resemble those of tunicates. But if larvaceans provided the evolutionary source of marine tadpole larvae, their genomes would be smaller and included in those of adult tunicates. The genome of the larvacean Oikopleura dioica is about one-third that of the tunicate Ciona intestinalis, consistent with my thesis&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Williamson is obviously not an expert on genomics, but Margulis should have known better and pushed him to improve this section.  In the "communicated by" path, the academy member can basically hand-pick the reviewers and is supposed to act as an editor would.  &lt;br /&gt;&lt;br /&gt;The first problem is a rather naive view of genome size and evolution.  Genome sizes vary all over the map even within related species; Fugu to salmon is several fold as is fruit fly to malaria vector.  The latter pair is particularly relevant since these are both dipteran insects, and therefore in the same bin by Williamson's standard (as stated in the quoted text).  Now, that is overall genome size; if you restrict to protein coding regions these pairs are more similar, which leaves some wiggle room.  But, by the same token the &lt;a href="=http://www.sciencemag.org/cgi/pmidlookup?view=long&amp;pmid=11752568"&gt;Oikopleura&lt;/a&gt; and &lt;a href="=http://www.sciencemag.org/cgi/pmidlookup?view=long&amp;pmid=12481130"&gt;Ciona genomes&lt;/a&gt; contain about the same number of genes (~15-16K).&lt;br /&gt;&lt;br /&gt;But furthermore, his hypothesis should be quite testable &lt;span style="font-style:italic;"&gt;right now&lt;/span&gt;, at least in a basic form.  If a genome fusion occurred, then genes active in larval stages and genes active in the adult should show different gene trees if they are homologs.  Given that there is a lot of data to annotate which Drosophila genes are active when, this should be a practical exercise.  While I leave this as an exercise for the student, I would point out that it is already known that in Drosophila many proteins are active in both phases.  This can probably also be tallied in some fashion.  I'm guessing that the fraction of genes shared between stages will be quite large, which would not be very supportive of the fusion hypothesis.&lt;br /&gt;&lt;br /&gt;Should a paper like this get into a journal such as PNAS?  Given what I've written above, I think not, simply on its demerits.  On the other hand, crazy hypotheses do need a place to go because they are sometimes the right hypotheses -- Margulis's formulation of endosymbiont hypothesis had very tough sledding on its path to the textbooks.  However, in the modern world there is a place for odd speculations and journeying outside your expertise.  It's called a blog!&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+of+the+United+States+of+America&amp;rft_id=info%3Apmid%2F19717430&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Caterpillars+evolved+from+onychophorans+by+hybridogenesis.&amp;rft.issn=0027-8424&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Williamson+DI&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CEvolutionary+Biology%2C+Computational+Biology"&gt;Williamson DI (2009). Caterpillars evolved from onychophorans by hybridogenesis. &lt;span style="font-style: italic;"&gt;Proceedings of the National Academy of Sciences of the United States of America&lt;/span&gt; PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19717430"&gt;19717430&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-958044851312071593?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/958044851312071593/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=958044851312071593" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/958044851312071593" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/958044851312071593" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/how-many-genomes-did-i-just-squash.html" title="How many genomes did I just squash?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7527382912728523948</id><published>2009-09-24T19:25:00.009-04:00</published><updated>2009-09-25T10:06:03.548-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="GWAS" /><title type="text">Unwarranted pessimism on IL28A/B &amp; HCV?</title><content type="html">I finally got around to reading the Nature News &amp; Views article by Iadonato and Katze summarizing and opining on the recent quartet of papers linking genetic variation around IL26B and the response to standard therapy for Hepatitis C Virus.  The N&amp;V has at least one glaring flaw and also (IMHO) goes down the cliched route of concluding that the result will be clinically useless.&lt;br /&gt;&lt;br /&gt;The four GWAS studies found the same cluster of SNPs around IL28B, nicely cross-validating the studies.  One curious statement in the N&amp;V is&lt;br /&gt;&lt;blockquote&gt;Although all of the identified variants in the three studies lie in or near the IL28B gene, none of them has an obvious effect on the function of this gene, which encodes interferon-3, a growth factor with similarities to the interferon- preparations used as treatment.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Two of the papers provide direct evidence as to at least one effect of these SNPs; &lt;a href="http://dx.doi.org/10.1038/ng.447"&gt;one&lt;/a&gt; showed that the SNPs are linked to the expression of both IL28B and the nearby related gene IL28A; &lt;a href="http://dx.doi.org/10.1038/ng.449"&gt;the other&lt;/a&gt; looked only at IL28B.  Lower expression of these loci was correlated with the genotype with worse prognosis.&lt;br /&gt;&lt;br /&gt;The N&amp;V goes on with some boilerplate pessimism about GWAS studies impact on medicine&lt;br /&gt;&lt;blockquote&gt;The question remains, however, as to how readily these and other observations from GWAS can be translated into meaningful changes in patient care. The field of human genetics has described many associations between specific mutations and medically important outcomes, but rarely have these observations resulted in new therapies to treat disease or in major shifts in existing treatments. This failure is exemplified by the lack of clinical benefit that followed the cloning in 1989 of the gene responsible for cystic fibrosis11 — the first example of the use of molecular genetics to discover the cause of an otherwise poorly understood condition. Although some progress has been made in treating patients with cystic fibrosis, in the ensuing 20 years neither of the two newly approved drugs for this condition were developed using knowledge of the gene mutations that cause it. Apart from a few well-characterized beneficial mutations (for example, those resulting in resistance to HIV infection), genetics has been an inefficient tool for drug discovery.&lt;br /&gt;&lt;br /&gt;So although these findings raise the tantalizing prospect of a more personalized approach to treating HCV by tailoring treatment to patients who are most likely to benefit, the reality is more sobering. Diagnostic testing to identify likely responders to interferon may be a future possibility, but clinical decision-making will be clouded by the fact that the effect of the advantageous variant is not absolute — not all carriers of the variant clear the virus, nor do all patients lacking the variant fail to benefit from treatment. Furthermore, there is currently no alternative to interferon therapy for the HCV-infected population.&lt;/blockquote&gt;.  They also pile on with graphs showing the exponential growth of Genbank and dbSNP vs. the flat numbers for INDs (new drugs into trials) and NMEs (new approvals). &lt;br /&gt;&lt;br /&gt;Of course, I could respond with the boilerplate response (found in at least one of the papers) that patients with the "poor response" genotype.  And indeed, new HCV therapies are in the pipeline, perhaps most prominently a compound under development by Vertex.  Understanding if these variants affect response to the new compounds now becomes an important research question.  &lt;br /&gt;&lt;br /&gt;But, it's also stunning that the N&amp;V authors didn't suggest a rather obvious approach suggested by these papers.  Not only do patients with the "high expression" genotype respond better to therapy, but this genotype also predicts spontaneous clearance of the virus.  Furthermore, these loci encode secreted immune factors.  So to me at least, this can be viewed as a classic protein replacement therapy candidate -- a subset of patients produce too little of a natural protein (or two natural proteins) and providing them with recombinant protein might provide therapeutic benefit.  I suspect that whatever companies hold patent claims on IL28A &amp; IL28B are contemplating just such a strategy.  This is also in stark contrast to cystic fibrosis, where the affected protein is damaged rather than underexpressed and is a membrane protein not a secreted protein.  By focusing on the general difficulty of converting genetic information to therapy rather than the specific circumstances of these papers, the N&amp;V authors completely blew it.&lt;br /&gt;&lt;br /&gt;IL28A &amp; IL28B loci produce proteins classified as interferons and it is another interferon (alpha) which is a key part of the standard therapy.  A more extreme version (or a bit of the flip side) of the protein shortage theory would posit that the sum of the interferons is important for response -- and perhaps also for side effects.  If this were the case, then increasing the dose of alpha interferon in the "low expression" genotype (or better yet, actually typing patients white cells for expression of these proteins) might be a reasonable clinical approach.  Given that interferon alpha is already approved, this is the sort of clinical experimentation that goes on all the time.&lt;br /&gt;&lt;br /&gt;Yet another angle suggested by the "IL28A/B deficiency hypothesis" is that a viable therapeutic discovery approach is to find compounds which increase expression of IL28A and/or IL28B in leukocytes.  This has been a successful strategy for generating new therapeutic hypotheses in oncology.  Better yet, hints may already exist -- some enterprising student should search the Broad's Connection Map or other databases of expression data for cell lines treated with compounds to identify compounds which upregulate IL28A/B transcripts.  A hit in such a search or a broader screen of already approved compounds could potentially rapidly lead to clinical experiments.&lt;br /&gt;&lt;br /&gt;The one time I had an opportunity to write a N&amp;V (as a grad student) I got writer's block and missed the boat.  It will always irk me.  But, perhaps it's better to blow a chance silently rather than write such an awful, unimaginative one which stuck to stock genomics negativity rather than creatively exploring the topic at hand.&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Apmid%2F19759611&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genomics%3A+Hepatitis+C+virus+gets+personal.&amp;rft.issn=0028-0836&amp;rft.date=2009&amp;rft.volume=461&amp;rft.issue=7262&amp;rft.spage=357&amp;rft.epage=8&amp;rft.artnum=&amp;rft.au=Iadonato+SP&amp;rft.au=Katze+MG&amp;rfe_dat=bpr3.included=1;bpr3.tags="&gt;Iadonato SP, &amp; Katze MG (2009). Genomics: Hepatitis C virus gets personal. &lt;span style="font-style: italic;"&gt;Nature, 461&lt;/span&gt; (7262), 357-8 PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19759611"&gt;19759611&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Adoi%2F10.1038%2Fnature08309&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genetic+variation+in+IL28B+predicts+hepatitis+C+treatment-induced+viral+clearance&amp;rft.issn=0028-0836&amp;rft.date=2009&amp;rft.volume=461&amp;rft.issue=7262&amp;rft.spage=399&amp;rft.epage=401&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fdoifinder%2F10.1038%2Fnature08309&amp;rft.au=Ge%2C+D.&amp;rft.au=Fellay%2C+J.&amp;rft.au=Thompson%2C+A.&amp;rft.au=Simon%2C+J.&amp;rft.au=Shianna%2C+K.&amp;rft.au=Urban%2C+T.&amp;rft.au=Heinzen%2C+E.&amp;rft.au=Qiu%2C+P.&amp;rft.au=Bertelsen%2C+A.&amp;rft.au=Muir%2C+A.&amp;rft.au=Sulkowski%2C+M.&amp;rft.au=McHutchison%2C+J.&amp;rft.au=Goldstein%2C+D.&amp;rfe_dat=bpr3.included=1;bpr3.tags="&gt;Ge, D., Fellay, J., Thompson, A., Simon, J., Shianna, K., Urban, T., Heinzen, E., Qiu, P., Bertelsen, A., Muir, A., Sulkowski, M., McHutchison, J., &amp; Goldstein, D. (2009). Genetic variation in IL28B predicts hepatitis C treatment-induced viral clearance &lt;span style="font-style: italic;"&gt;Nature, 461&lt;/span&gt; (7262), 399-401 DOI: &lt;a rev="review" href="http://dx.doi.org/10.1038/nature08309"&gt;10.1038/nature08309&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Apmid%2F19759533&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genetic+variation+in+IL28B+and+spontaneous+clearance+of+hepatitis+C+virus.&amp;rft.issn=0028-0836&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Thomas+DL&amp;rft.au=Thio+CL&amp;rft.au=Martin+MP&amp;rft.au=Qi+Y&amp;rft.au=Ge+D&amp;rft.au=O%27huigin+C&amp;rft.au=Kidd+J&amp;rft.au=Kidd+K&amp;rft.au=Khakoo+SI&amp;rft.au=Alexander+G&amp;rft.au=Goedert+JJ&amp;rft.au=Kirk+GD&amp;rft.au=Donfield+SM&amp;rft.au=Rosen+HR&amp;rft.au=Tobler+LH&amp;rft.au=Busch+MP&amp;rft.au=McHutchison+JG&amp;rft.au=Goldstein+DB&amp;rft.au=Carrington+M&amp;rfe_dat=bpr3.included=1;bpr3.tags="&gt;Thomas DL, Thio CL, Martin MP, Qi Y, Ge D, O'huigin C, Kidd J, Kidd K, Khakoo SI, Alexander G, Goedert JJ, Kirk GD, Donfield SM, Rosen HR, Tobler LH, Busch MP, McHutchison JG, Goldstein DB, &amp; Carrington M (2009). Genetic variation in IL28B and spontaneous clearance of hepatitis C virus. &lt;span style="font-style: italic;"&gt;Nature&lt;/span&gt; PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19759533"&gt;19759533&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+genetics&amp;rft_id=info%3Apmid%2F19749758&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=IL28B+is+associated+with+response+to+chronic+hepatitis+C+interferon-alpha+and+ribavirin+therapy.&amp;rft.issn=1061-4036&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=the+Hepatitis+C+Study&amp;rft.au=Suppiah+V&amp;rft.au=Moldovan+M&amp;rft.au=Ahlenstiel+G&amp;rft.au=Berg+T&amp;rft.au=Weltman+M&amp;rft.au=Abate+ML&amp;rft.au=Bassendine+M&amp;rft.au=Spengler+U&amp;rft.au=Dore+GJ&amp;rft.au=Powell+E&amp;rft.au=Riordan+S&amp;rft.au=Sheridan+D&amp;rft.au=Smedile+A&amp;rft.au=Fragomeli+V&amp;rft.au=M%C3%BCller+T&amp;rft.au=Bahlo+M&amp;rft.au=Stewart+GJ&amp;rft.au=Booth+DR&amp;rft.au=George+J&amp;rfe_dat=bpr3.included=1;bpr3.tags="&gt;the Hepatitis C Study, Suppiah V, Moldovan M, Ahlenstiel G, Berg T, Weltman M, Abate ML, Bassendine M, Spengler U, Dore GJ, Powell E, Riordan S, Sheridan D, Smedile A, Fragomeli V, Müller T, Bahlo M, Stewart GJ, Booth DR, &amp; George J (2009). IL28B is associated with response to chronic hepatitis C interferon-alpha and ribavirin therapy. &lt;span style="font-style: italic;"&gt;Nature genetics&lt;/span&gt; PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19749758"&gt;19749758&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+genetics&amp;rft_id=info%3Apmid%2F19749757&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genome-wide+association+of+IL28B+with+response+to+pegylated+interferon-alpha+and+ribavirin+therapy+for+chronic+hepatitis+C.&amp;rft.issn=1061-4036&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Tanaka+Y&amp;rft.au=Nishida+N&amp;rft.au=Sugiyama+M&amp;rft.au=Kurosaki+M&amp;rft.au=Matsuura+K&amp;rft.au=Sakamoto+N&amp;rft.au=Nakagawa+M&amp;rft.au=Korenaga+M&amp;rft.au=Hino+K&amp;rft.au=Hige+S&amp;rft.au=Ito+Y&amp;rft.au=Mita+E&amp;rft.au=Tanaka+E&amp;rft.au=Mochida+S&amp;rft.au=Murawaki+Y&amp;rft.au=Honda+M&amp;rft.au=Sakai+A&amp;rft.au=Hiasa+Y&amp;rft.au=Nishiguchi+S&amp;rft.au=Koike+A&amp;rft.au=Sakaida+I&amp;rft.au=Imamura+M&amp;rft.au=Ito+K&amp;rft.au=Yano+K&amp;rft.au=Masaki+N&amp;rft.au=Sugauchi+F&amp;rft.au=Izumi+N&amp;rft.au=Tokunaga+K&amp;rft.au=Mizokami+M&amp;rfe_dat=bpr3.included=1;bpr3.tags="&gt;Tanaka Y, Nishida N, Sugiyama M, Kurosaki M, Matsuura K, Sakamoto N, Nakagawa M, Korenaga M, Hino K, Hige S, Ito Y, Mita E, Tanaka E, Mochida S, Murawaki Y, Honda M, Sakai A, Hiasa Y, Nishiguchi S, Koike A, Sakaida I, Imamura M, Ito K, Yano K, Masaki N, Sugauchi F, Izumi N, Tokunaga K, &amp; Mizokami M (2009). Genome-wide association of IL28B with response to pegylated interferon-alpha and ribavirin therapy for chronic hepatitis C. &lt;span style="font-style: italic;"&gt;Nature genetics&lt;/span&gt; PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19749757"&gt;19749757&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;(ugh: had a serious typo in the title on first posting; now fixed &amp; revised)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7527382912728523948?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/7527382912728523948/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7527382912728523948" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7527382912728523948" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7527382912728523948" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/unwarranted-pessimism-on-il26b-hcv.html" title="Unwarranted pessimism on IL28A/B &amp; HCV?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-6359271542002463498</id><published>2009-09-23T22:27:00.002-04:00</published><updated>2009-09-23T23:12:21.530-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="conferences" /><title type="text">CHI Next-Gen Conference, Day 3 (final)</title><content type="html">Final day of conference, with some serious fatigue setting in (my hotel room was too close to, and faced, a highway. Doh!)&lt;br /&gt;&lt;br /&gt;Discovered that I was indeed getting a reputation.  Two people I met today asked about my recurrent interest in FFPE (Formalin Fixed, Paraffin Embedded) -- which is how most of the nucleic acids I want to work with are stored.  FFPE is notoriously difficult for molecular studeis, with the informational macromolecules having been chemically and physically abused in the fixation process, but it is also famously stable, preserving histological features for years.&lt;br /&gt;&lt;br /&gt;Rain Dance sponsored the breakfast &amp; announced that their maximum primer library size has gone up to 20K.  To back up, Rain Dance uses microfluidics to create libraries of very tiny (single digit picoliter) droplets in which each droplet contains a primer pair.  The precise volume control &amp; normalization of the concentrations means that each primer droplet contains about the same number of oligos, which allows each droplet in a PCR to be run to completion -- meaning that efficient PCRs and inefficient ones in theory end up both having the same number of product molecules.  Another set of droplets are created which contain your template DNA, and these are cleverly merged &amp; the whole emulsion cycled.  Break up the emulsion &amp; you have lots of PCR amplicons ready to go into a fragmentation protocol.  Their movies of droplets marching around, splitting, merging, etc. are dangerously mesmerizing!&lt;br /&gt;&lt;br /&gt;Jin Billy Li of the Church group reviewed all the really cool stuff they've done using padlock probes (and confirmed that IP conflicts are retarding indefinitely any commercialization of these).  A padlock probe is a long DNA which primes on both sides of a targeted region.  Filling the gap between &amp; ligating the gap yields a circle, which can be purified away from any uncircularized DNA and then amplified with universal primers.  Turns the multiplex PCR problem into a very diverse set of uniplex PCRs.  Various tweaks have substantially improved uniformity, though there is still room for improvement (but the same is true for the hybridization approaches).&lt;br /&gt;&lt;br /&gt;Nicolas Bergman presented data on transcriptomic complexity in B.anthracis. I think most of this is published, but I hadn't seen it.  A very striking result is that an awful lot (~88%) of transcripts in a supposedly uniform culture are present at much less than 1 copy per cell.  He mentioned that small numbers of spores are seen in log cultures, and this might explain it.  Also showed that many unannotated genes -- including some that had been truly UNannotated (originally annotated but then removed from the catalogs) are clearly transcribed.  Operon structures could be worked out, with 90% matching computational predictions -- and in ~30 testest experimentally by RT-PCR there was 100% concordance.  &lt;br /&gt;&lt;br /&gt;Epicentre gave an overview of their clever system for fragmenting DNA upstream of either 454 or Illumina.  By hijacking a transposase in a clever way, they not only break up the DNA but add on defined sequences.  For 454 you then jam on the 454 primers &amp; just get stuck reading 19nt of transposase each time; for Illumina you must use custom sequencing primers.&lt;br /&gt;&lt;br /&gt;Eric Wommack &amp; Shawn Polson of University of Delaware (Go Hens!) described work on metagenomics of bacteriophages in seawater.  Here's a stunning estimate: if you lined all the world's phages end-to-end, they would stretch 60 &lt;span style="font-style:italic;"&gt;light years&lt;span style="font-weight:bold;"&gt;&lt;/span&gt;&lt;/span&gt;.   Also striking is the high level of bacteriophage-driven turnover of oceanic bacteria -- in about 1/2 to 2 days there is 100% turnover.  This is a huge churn of the biochemical space. &lt;br /&gt;&lt;br /&gt;Stacey Gabriel gave an update on the Broad's Cancer Genomics effort.  Some whole genomes (25 tumor+normal pairs so far) and a lot of exonic sequencing.  So far, not a lot of lightning though -- in one study the only thing popping out so far is p53, which is disappointing.  Using the Agilent system (developed at the Broad), they can scan 20Kgenes in 1/2 an Illumina run, with 82% of their targeted sequences having at least 14 reads covering.&lt;br /&gt;&lt;br /&gt;Matthew Ferber at the Mayo described trying to replace Sanger assays for inherited disorders with 454 and Illumina based approaches.  He underscored that this isn't for research -- these are actual diagnostic tests used to determine treatments, such as prophylactic removal of the colon if inherited colon cancer is likely.  Capture of the targets on the Nimblegen chips were done and the recovered DNA split to do 454 &amp; Illumina sequencing in parallel. The two next gen approaches came close -- but neither found enough that they could be relied on.  Also, some targets are just not recoverable by array capture and would need to be backstopped by something else. One caveat: older technology was used in both cases, so it may be with longer read lengths on both platforms the higher coverage &amp; higher mapping confidence needed would be obtained.  On the other hand, some of the mutations were picked to be difficult for the platforms (small indel for Illumina, homopolymer run of &gt;20 for 454) and might remain problems even with more coverage.  PCR amplification in place of chip capture is another approach that might improve coverage and get some targets missed by the chip (this is certainly a claim RainDance made in their presentation).&lt;br /&gt;&lt;br /&gt;The last talk I took notes on was by Michael Zody on signatures of domestication in chickens.  If I had organized things, this would have been just before or after the phage talk!  Alas, while the Rhode Island Red was amongst the lines sequenced (apropos the location) Blue Hens were missing -- how could that be?  Seriously, the basic design was to sequence pools of DNA from either various domestic chicken lines or the Red Jungle Fowl (representing pre-domestication chicken).  Some of these lines were commercial egg layer strains and others commercial broiler (meat) strains.  He commented that this level of specification occured very resently (forgot to write down when, but I think it was around a century ago).  Two other strains are interesting as they have been selected for about 50 years for one to be very heavy and the other lean -- apparently the heavy line will eat itself silly and the other nearly starves itself.  1 SOLiD slide on each of the 10 pools was used to call out SNPs and various strategies were used to filter out errors in the new data as well as variation due to errors in the reference sequence (in some cases, even typing the reference DNA to demonstrate the need for correction).  Reduced heterozygosity was seen around BCDO3, which gives modern chickens their yellow skin (positive control) and also a bunch of other loci -- but those are still under wraps.  They also looked for deletions in exons which appear to have been fixed in various lines, and found 1284 which are fixed in one or more domestic lines relative to the Red Jungle Fowl.  One interesting one (which is present in the Red Jungle Fowl at low frequency) has gone homozygous (I think; my notes here show fatigue) in the high growth line but is either absent or heterozygous in the low growth line (terrible notes!).  It's a 19kb deletion that clips out exons 2-5 (based on the human homolog; there isn't a good transcript sequenced for chicken) and RT-PCR confirms the gene is expressed in the hypothalamus, which has been previously implicated in controlling the feeding behavior.&lt;br /&gt;&lt;br /&gt;I took almost no notes on the last talk, looking at dietary influences on gut microbiome (and also, regrettably, had to leave early to make sure I made school night) but it did feature some more "extreme genomics" -- microbiome studies on burmese pythons!&lt;br /&gt;&lt;br /&gt;One last thought: sequencing techs represented were either here-and-now (the players you can actually buy) or pretty-distant-future; absent were PacBio and Oxford Nanopore and the host of other companies (save NABSys) announced in the last 3-4 years in this space.  Have the others just disappeared quietly or are they in stealth mode? It's hard to imagine the conference would have deliberately snubbed them, which would be a third possibility.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-6359271542002463498?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/6359271542002463498/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=6359271542002463498" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/6359271542002463498" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/6359271542002463498" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/chi-next-gen-conference-day-3-final.html" title="CHI Next-Gen Conference, Day 3 (final)" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-4802940967423838292</id><published>2009-09-22T21:34:00.004-04:00</published><updated>2009-09-22T22:16:22.196-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="conferences" /><title type="text">CHI Next-Gen Conference, Day 2</title><content type="html">I'll confess that in the morning I took notes on only one talk, but the afternoon got back into gear.&lt;br /&gt;&lt;br /&gt;The morning talk was by John Quackenbush over at Dana Farber Cancer Institute and covered a wide range of topics.  Some was focused on various database approaches to tracking clinical samples but a lot of the talk was on microarrays.  He described a new database his group has curated from the cancer microarray literature called GeneSigDb.  He also described some work on inferring networks from such data &amp; how it is very difficult to do with no prior knowledge, but with a little bit of network information entered in a lot of other interactions fall out which are known to be real.  He also noted that if you look at the signatures collected in GeneSigDb, most human genes are in at least one -- suggesting either cancer affects a lot of genes (probable) and/or a lot of the microarray studies are noisy (certainly!).  I did a similar curation at MLNM (whose results were donated to Science Commons when the group dissolved, though I think it never quite emerged from there) &amp; saw the same pattern.  I'd lean heavy on "bad microarray studies", as far too many studies on similar diseases come up with disjoint results, whereas there are a few patterns which show up in far too many results (suggesting, for example, that they are signatures of handling cells not signatures of disease).  He also described some cool work initiated in another group but followed-up by his group of looking at trajectories of gene expression during the forced differentiation of a cell line.  Using two agents that cause the same final differentiated state (DMSO &amp; all-trans retinoic acid), the trajectories are quite different even with the same final state.  Some talk at the end of attractors &amp; such.&lt;br /&gt;&lt;br /&gt;In the afternoon I slipped over to the "other conference" -- in theory there are two conferences with some joint sessions &amp; a common vendor/poster area, but in reality there isn't much reason to hew to one or the other &amp; good-sounding talks are split between them.  I did, alas, accidentally stick myself with a lunch ticket for a talk on storage -- bleah!  But, the afternoon was filled with talks on "next next" generation approaches, and despite (or perhaps because of, as the schedule had been cramped) two cancellations, it was a great session.&lt;br /&gt;&lt;br /&gt;All but one of the talks at least mentioned nanopore approaches, which have been thought about for close to two decades now.  Most of these had some flavor of science fiction to them in my mind, though I'll freely admit the possibility that this reflects more the limitations of my experience than wild claims by the speakers.&lt;br /&gt;&lt;br /&gt;One point of (again, genteel) contention between the speakers was around readout technology, with one camp arguing that electrical methods are the way to go, because that is the most semiconductor-like (there is a bit of a cult worship of the semiconductor industry evident at the meeting).  Another faction (well, one speaker) argues that optics is better because it can be more naturally multiplexed.  Another speaker had no multiplexing in his talk, but that will be covered below&lt;br /&gt;&lt;br /&gt;Based on the cluster of questioners (including myself) afterwards, the NABSys talk by John Oliver had some of the strongest buzz.  The speaker showed no data from actual reads and was circumspect about a lot of details, but some important ones emerged (at least for me; perhaps I'm the last to know).  Their general scheme is to fragment DNA to ~150Kb (well, that's the plan -- so far they go only to 50Kb) and create 384 such pools of single-stranded DNA.  Each pool is probed with a set of short (6-10) oligonucleotide probes.  Passing a DNA through a machined pore creates a distinct electrical signal for an aligned probe vs. a single stranded region.  You can't tell which probe just rode through, but the claim is that by designing the pools carefully and comparing fingerprints you can infer a complete "map" and ultimately a sequence, with some classes of sequence which can't be resolved completely (such as long simple repeats).  While no actual data was shown, in conversation the speaker indicated that they could do physical mapping right now, which, I doubt is a big market but would be scientifically very valuable (and yes, I will get back to &lt;a href="http://researchblogging.org/post/gotourl/id/135627"&gt;my series on physical maps&lt;/a&gt; &amp; finish it up soon).&lt;br /&gt;&lt;br /&gt;Oliver did have a neat trick for downplaying the existing players.  It is his contention that any system that can't generate 10^20 bases per year isn't going to be a serious player in medical genomics.  This huge figure is arrived at by multiplying the number of cancer patients in the developed world by 100 samples each and 20X coverage.  The claim is that any existing player would need 10^8 sequencers to do this (Illumina is approaching 10^3 and SOLiD 10^2).  I'm not sure I buy this argument -- there may be value in collecting so many samples per patient, but good luck doing it!  It's also not clear that the marginal gain from the 11th sample is really very much (just to pick an arbitrary number).  Shave a factor of 10 off there &amp; increase the current platforms by a factor of 10 and, well, you're down to 10^6 sequencers.  Hmm, that's still a lot.  Anyway, only if the cost gets down to 10s of dollars could national health systems afford any such extravagance.  &lt;br /&gt;&lt;br /&gt;Another speaker, Derek Stein of Brown University (whose campus I stumbled on today whilst trying to go from my distant hotel to the conference on foot) gave an interesting talk on trying to marry nanopores to mass spec.  The general concept is to run the DNA through the pore, break off each nucleobase on the other side &amp; slurp that into the mass spec for readout.  It's pretty amazing -- one one side of the membrane a liquid and the other a vacuum!  It's just beginning and a next step is to prove that each nucleotide gives a distinct signal. Of course, one possible benefit of this readout is that covalent epigenetic modifications will probably be directly readable -- unless, of course, the modified base has a mass too close to one of the other bases.  &lt;br /&gt;&lt;br /&gt;Another nanoporist, Amit Meller at Boston University, is back in the optical camp. The general idea here is for the nanopore to strip off probes from a specially modified template.  the probes make a rapid fluorescent flash -- they are "molecular beacons" which are inactive when hybridized to template, become unquenched when the come off but then immediately fold unto themselves and quench again.  Meller was the only nanopore artist to actually show a read -- 10nt!!!  One quirk of the system is that a cyclic TypeIIS digestion &amp; ligation process is used to substitute each base in the original template with 2 bases to give more room for the beacon probes.  He seemed to think read lengths of 900 will be very doable and much longer possible.&lt;br /&gt;&lt;br /&gt;One other nanopore talk was from Peiming Zhang at Arizona State, who is tackling the readout problem by having some clever molecular probes to interrogate the DNA after it exits the nanopore.  He also touched on sequencing-by-hybridization &amp; using atomic microscopy to try to read DNA.&lt;br /&gt;&lt;br /&gt;The one non-nanopore talk is one I'm wrestling with my reaction to it.  Xiaohua Huang at UCSC described creating a system that marries some of the best features of 454 with some of the features of the other sequencing-by-synthesis systems.  His talk helped crystalize in my mind why 454 has such long read lengths but also is a laggard in density space.  He attributed the long reads to the fact that 454 uses natural nucleotides rather than the various reversible terminator schemes. But, since pyrosequencing is real-time you get fast reads but the camera must always watch every bead on the plate.  In contrast, the other systems can scan the camera across their flowcells, enabling one camera to image many more targets -- but the terminators don't always reverse successfully.  His solution is to use 90% natural nucleotides and 10% labeled nucleotides -- but &lt;span style="font-style:italic;"&gt;no&lt;/span&gt; terminators.  After reading one nucleotide, the labels are stripped (he mentioned photobleaching, photolabile tags and chemical removal as all options he is working with) and the next nucleotide flowed in.  It will have the same trouble with long mononucleotide repeats as 454 -- but also should have very long read lengths.  He puts 1B beads on his plates -- and has some clever magnetic and electric field approaches to jiggle the beads around so that nearly every well gets a bead.  In theory I think you could run his system on the Polonator, but he actually built his own instrument.&lt;br /&gt;&lt;br /&gt;If I had to rate the approaches by which is most likely to start generating real sequence data, I'd vote for Huang -- but is that simply because it seems more conservative?  NABSys talks like they are close to being able to do physical maps -- but will that be a dangerous detour?  Or simply too financially uninteresting to attract their attention?  The optically probed nanopores actually showed read data -- but what will the errors look like?  Will the template expansion system cause new errors? &lt;br /&gt;&lt;br /&gt;One minor peeve: pretty much universally, simulations look too much like real data and need more of a scarlet S on them.  On the other hand, I probably should have a scarlet B on my forehead, since I've only once warned someone that I blog.  One movie today of DNA traversing a nanopore looked very real, but was mentioned later to be simulated.  Various other plots were not explained to be simulations until near the end of the presentation of that slide.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4802940967423838292?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/4802940967423838292/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4802940967423838292" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4802940967423838292" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4802940967423838292" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/chi-next-gen-conference-day-2.html" title="CHI Next-Gen Conference, Day 2" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-3676414067556160966</id><published>2009-09-21T22:46:00.002-04:00</published><updated>2009-09-21T23:16:30.231-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="conferences" /><title type="text">CHI Next-Gen Conference, Day 1</title><content type="html">Interesting set of talks today.  I never did explicitly check on the blogging policy, but given that the session chair kidded a speaker that I would be blogging her live, it wouldn't seem to be a problem.  I would honor a ban (particularly since blogging is a bit hard to hide after the fact!), but quite a few folks were photographing slides despite an admonition not to (one person was clearly worried neither about being caught nor being courteous nor being clever; he had his flash on, which is clearly useless for projected images!).&lt;br /&gt;&lt;br /&gt;The morning talks ended up as just a trio.  The best of the three was Robert Cook-Deegan's talk "So my genome costs less than my bike, what's the big deal?".  He obviously has more expensive tastes in bicycles than I do -- or knows a really cheap genome shop!  He covered a lot of the ground around what sort of regulatory model will encompass personal genome sequencing.  The U.S. weakly and Germany strongly have gone with the model that genome sequencing should be treated like a diagnostic with M.D.s as the absolute gatekeeper (a position which is rather vocally promoted by certain bloggers).  Cook-Deegan pointed out something that increasingly worries me, which is that this locks genome sequencing into a very expensive cost model which doesn't improve with scale; you are locking in some very pricey labor that will only increase in price.  Cook-Deegan also felt that M.D.s were being picked as the gatekeeper primarily because they are who the regulators are comfortable with historically, not because they are particularly well-trained for the job.&lt;br /&gt;&lt;br /&gt;Jonathan Rothberg gave an entertaining talk on his various ventures, which built up to Ion Torrents but where the crescendo was expected by the audience there was instead the request for audience questions.  Ion Torrents seems to be a company (Joule is another) which is still trying to be in the public eye without releasing any key information.  Understandable, but frustrating.&lt;br /&gt;&lt;br /&gt;Henry Erlich gave a nice presentation on using PCR amplification and 454 sequencing to do HLA typing for transplantation.  All sorts of advantages to 454 over Sanger here, but cost will probably remain an issue and definitely corral this in very large centers (one 454 run, with multiplexing, can type ~20 samples).&lt;br /&gt;&lt;br /&gt;Lunch was given over to IT stuff.  CycleComputing presented their bioinformatics-friendly gateway to Amazon's cloud computing stuff (plus some benchmarking).  I'll confess to checking email during the presentation on compressing data on servers; far too IT for me.&lt;br /&gt;&lt;br /&gt;The afternoon was devoted to a series of presentations by the 6 next gen sequencing platforms with some flavor of being here-and-now: 454, SOLiD, GA2, Helicos, Dover (Polonator) &amp; Complete Genomics.  Actually, that was an interesting theme running through some talks, with Illumina saying "we're now gen, not next gen" whereas Complete Genomics calls themselves "third gen".  The talks were all genteel but contained pokes at each other.  &lt;br /&gt;&lt;br /&gt;For example, 454 trumpeted a comparison of two unpublished cucumber genome sequences, one by Illumina+Sanger and one by 454.  The 454 16X assembly had a contig N50 of 87Kb vs. 9Kb for a 50X Illumina assembly (no mention made of the amount of paired end data in either, I think -- though now I'm not sure).  454 also declared they've had one perfect read 997 long, though they were open that commercial runs near this are long in the future.  &lt;br /&gt;&lt;br /&gt;The SOLiD speaker emphasized all the different applications of their technology, using a published graphic that later turned out to have been commissioned by Helicos.  Illumina's speaker emphasized the simpler sample prep over emulsion PCR systems (i.e. 454, SOLiD &amp; Polonator).  &lt;br /&gt;&lt;br /&gt;Helicos promised even simpler sample prep and offered tantalizing hints of good stuff to come -- such as my nemesis of sequencing from FFPE slides.  Helicos did detail their paired-end protocol, which is very clever (after reading a bunch of sequence, a set of timed extensions with all 4 nucleotides gives jumps of various distributions which are then followed by more reads.  Clearly this will only work with single molecule sequencing, at least in that form (must ponder thought of how to either improve this or get it to work on Illumina-style platform).   Helicos also tantalized with a bunch of data from different applications, suggesting that some more publications from this platform are imminent.&lt;br /&gt;&lt;br /&gt;Danaher's talk was mostly on details of the instrument, which is the only one actually at the conference &amp; is running.  Always fascinated by moving machines, I watched it for a while -- and it demos very nicely, with the stage moving &amp; illuminator flashing &amp; filter wheel spinning.  Polonator has very short reads compared to the other platforms, but is promising very low cost which could make it a contender.  &lt;br /&gt;&lt;br /&gt;Finally, Complete showed off their sequencing center approach.  One striking fact is that their read lengths are actually extremely short -- but they extract a quartet of paired short reads.  Clearly their recent announced delivery of genomes has improved their credibility &amp; they also detailed some very neat medical genetics results which are presumably going to hit the journals very soon -- in which case they will have complete lab cred.  It was pointed out in the discussion panel &amp; in several talks that human sequencing is not the whole world, but even their competitors did not violently object (and therefore seemed to grudgingly acquiesce) that Complete may grab the lion's share of the human genome sequencing market, with the other players going after non-human sequencing or human areas like FFPE or transcriptome sequence where Complete isn't positioning themselves.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-3676414067556160966?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/3676414067556160966/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=3676414067556160966" title="9 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/3676414067556160966" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/3676414067556160966" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/chi-next-gen-conference-day-1.html" title="CHI Next-Gen Conference, Day 1" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7389320890283374381</id><published>2009-09-20T23:51:00.003-04:00</published><updated>2009-09-20T23:53:53.414-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="conferences" /><title type="text">Next-Generation Sequencing Conference, Providence RI, Day 0</title><content type="html">I'm going to be at Cambridge HealthTech's Next-Gen Sequencing Conference in exotic Providence RI for the next few days.  &lt;br /&gt;&lt;br /&gt;I need to check on their blogging policy, though I guessing it is lenient.  I probably won't try to write from within sessions but may try to skim some highlights at the end of the day&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7389320890283374381?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/7389320890283374381/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7389320890283374381" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7389320890283374381" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7389320890283374381" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/next.html" title="Next-Generation Sequencing Conference, Providence RI, Day 0" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-7744231304888735860</id><published>2009-09-17T00:16:00.003-04:00</published><updated>2009-09-17T00:26:37.866-04:00</updated><title type="text">A genome too far?</title><content type="html">I've crossed fruit flies, tomatoes and yeast, but I would clearly draw the line before taking on this project.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.biomedcentral.com/1471-2164/10/339"&gt;A genetic linkage map for the saltwater crocodile (Crocodylus porosus). BMC Genomics. 2009 Jul 29;10:339&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Actually, the &lt;a href="http://en.wikipedia.org/wiki/Crocodylus_porosus"&gt;Wikipedia entry for the species&lt;/a&gt; is a bit reassuring.  Don't believe those ridiculous stories of 21 footers -- the largest measured individual was &lt;span style="font-style:italic;"&gt;only&lt;/span&gt; 20 feet.  And only one or two fatal attacks occur each year, and they make headlines in the papers.  I'll sleep like a rock tonight!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-7744231304888735860?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/7744231304888735860/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=7744231304888735860" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7744231304888735860" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/7744231304888735860" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/genome-too-far.html" title="A genome too far?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-4526697593420599264</id><published>2009-09-15T22:10:00.004-04:00</published><updated>2009-09-15T22:47:42.724-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="synthetic biology" /><title type="text">Industrial Protein Production: Further Thoughts</title><content type="html">A question raised by a commenter on &lt;a href="http://omicsomics.blogspot.com/2009/09/codon-optimization-is-not-bunk.html"&gt;yesterday's piece about codon optimization&lt;/a&gt; is how critical is this for the typical molecular biologist?  I think for the typical bench biologist who is expressing small numbers of distinct proteins each year, perhaps the answer is "more critical than you think, but not project threatening".  That is, if you are expressing few proteins only rarely will you encounter show-stopping expression problems.  That said, with enough molecular biologists expressing enough proteins, some of them will have awful problems expressing some protein of critical import.&lt;br /&gt;&lt;br /&gt;But, consider another situation: the high-throughput protein production lab.  These can be found in many contexts.  Perhaps the proteins are in a structural proteomics pipeline or similar large scale structure determination effort.  Perhaps the proteins are to feed into high-throughput screens.  Perhaps they are themselves the products for customers or are going into a protein array or similar multi-protein product.  Or perhaps you are trying to express multiple proteins simultaneously to build some interesting new biological circuit.   &lt;br /&gt;&lt;br /&gt;Now, in some cases a few proteins expressing poorly isn't a big deal.  The numbers for the project have a certain amount of attrition baked in, or for something like structural proteomics you can let some other protein which did express jump ahead in the queue.  However, even with this the extra time and expense of troubleshooting the problem proteins, which can (as suggested by the commenter) be as simple as running multiple batches or can be as complex as screening multiple expression systems and strains, is time and effort that must be accounted for.  However, sometimes the protein will be on a critical path and that extra time messes up someone's project plan.  Perhaps the protein is the actual human target of your drug or the critical homolog for a structure study.  Another nightmare scenario is that the statistics don't average out; for some project you're faced with a jackpot of poor expressors.&lt;br /&gt;&lt;br /&gt;This in the end is the huge advantage of predictability; the rarer the unusual events, the smoother a high-throughput pipeline runs and the more reliable its output.  So, from this point of view the advantage of the new codon optimization work is not necessarily that you can get huge amounts of proteins, but rather that the unpredictability is ironed out.&lt;br /&gt;&lt;br /&gt;But suppose you wanted to go further?  Given the enormous space of useful &amp; interesting proteins to express, there will probably be some that become the outliers to the new process.  How could you go further?  &lt;br /&gt;&lt;br /&gt;One approach would be to further tune the tRNA system of E.coli (or any other expression host).  For example, there are already special E.coli strains which express some of the extremely disfavored E.coli tRNAs, and these seem to help expression when you can't codon optimize.  In theory, it should be possible to create an E.coli with completely balanced tRNA expression.  One approach to this would be analyze the promoters of the weak tRNAs and try to rev them up, mutagenizing them en masse with the &lt;a href="http://dx.doi.org/10.1038/nature08187"&gt;MAGE technology&lt;/a&gt; published by the Church lab.&lt;br /&gt;&lt;br /&gt;What else could you do?  Expression strains carry all sorts of interesting mutations, often in things such as proteases which can chew up your protein product.  There are, of course, all sorts of other standard cloning host mutations enhancing the stability of cloned inserts or providing useful other features.  Other important modifications include such things as tightly controlled phage RNA polymerases locked into the host genome.&lt;br /&gt;&lt;br /&gt;Another approach is the one commercialized by &lt;a href="http://www.scarabgenomics.com/"&gt;Scarab Genomics&lt;/a&gt; in which large chunks of E.coli have been tossed out.  The logic behind this is that many of these deleted regions contain genetic elements which may interfere with stable cloning or genetic expression.  &lt;br /&gt;&lt;br /&gt;One challenge to the protein engineer or expressionist, however, is getting all the features they want in a single host strain.  One strain may have desirable features X and Y but another Z.  What is really needed is the technology to make any desirable combination of mutations and additions quickly and easily.  The MAGE approach is one step in this direction but only addresses making small edits to a region.  &lt;br /&gt;&lt;br /&gt;One interesting use of MAGE would be to attempt to further optimize E.coli for high-level protein production.  One approach would be to design a strain which already had some of the desired features.  A further set of useful edits would be designed for the MAGE system.  For a readout, I think GFP fused to something interesting would do -- but a set of such fusions would need to be ready to go.  This is so evolved strains can quickly be counter-screened to assess how general an effect on protein production they have.  If some of these tester plasmids had "poor" codon optimization schemes, then this would allow the tRNA improvement scheme described above to be implemented.  Furthermore, it would be useful to have some of these tester constructs in compatible plasmid systems, so that two different test proteins (perhaps fused to different color variants of GFP) could be maintained simultaneously.  This would be an even better way to initially screen for generality, and would provide the opportunity to perform the mirror-image screen for mutations which degrade foreign protein overexpression.&lt;br /&gt;&lt;br /&gt;What would be targeted and how?  The MAGE paper shows that ribosome binding sites can be a very productive way to tune expression, and so a simple approach would be for each targeted gene to have some strong RBS and weak RBS mutagenic oligos designed.  For proteins thought to be very useful, MAGE oligos to tweak their promoters upwards would also be included.  For proteins thought to be deleterious, complete nulls could be included via stop-codon introducing oligos.  As far as the genes to target, the list could be quite large but would certainly include tRNAs, tRNA synthetases, all of the enzymes involved in the creation or consumption of amino acids, amino acid transporters.  The RpoS gene and its targets, which are involved in the response to starvation, are clear candidates as well.  Ideally one would target every gene, but that isn't quite in the scope of feasibility yet.&lt;br /&gt;&lt;br /&gt;The screen then is to mutagenize via MAGE and select either dual-high (both reporters enhanced in brightness) or dual-low expressors (both reduced in brightness) by cell sorting.  After secondary screens, the evolved strains would be fully sequenced to identify the mutations introduced both by design and by chance.  Dual-high screens would pull out mutations that enhance expression whereas dual-low would pull out the opposite.  Ideally these would be complementary -- genes knocked down in one would have enhancing mutations in the other.&lt;br /&gt;&lt;br /&gt;Some of the mutations, particularly spontaneous ones, might be "trivial" in that they simply affect copy number of the expression plasmid.  However, even these might be new insights into E.coli biology.   And if multiple strains emerged with distinct mutations, a new round of MAGE could be used to attempt to combine them and determine if there are additive effects (or interferences).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-4526697593420599264?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/4526697593420599264/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=4526697593420599264" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4526697593420599264" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/4526697593420599264" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/industrial-protein-production-further.html" title="Industrial Protein Production: Further Thoughts" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-1945588234847412821</id><published>2009-09-14T22:56:00.006-04:00</published><updated>2009-09-15T19:33:57.979-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="synthetic biology" /><title type="text">Codon Optimization is Not Bunk?</title><content type="html">&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0007002.g003&amp;representation=PNG_M"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 408px; height: 600px;" src="http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0007002.g003&amp;representation=PNG_M" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;In a previous post I asked "&lt;a href="http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=1&amp;url=http%3A%2F%2Fomicsomics.blogspot.com%2F2009%2F04%2Fis-codon-optimization-bunk.html&amp;ei=QwKvSpDGLtHklAeewey7Bg&amp;usg=AFQjCNGRO80hcVdtmWkjqN0Enl4i_lHc8w&amp;sig2=rKNxjpfpfeVrgCJd2b-AYw"&gt;Is Codon Optimization Bunk?&lt;/a&gt;", reflecting on a paper which showed that the typical rules for codon optimization appeared not to be highly predictive of the expression of GFP constructs.  A &lt;a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007002"&gt;paper released in PLoS One&lt;/a&gt; sheds new light on this question.&lt;br /&gt;&lt;br /&gt;A quick review.  To the first approximation, the genetic code consists of 3 nucleotide units called codon; there are 64 possible codons.  Twenty amino acids plus stop are specified by these codons (again, 1st approximation).  So, either a lot of codons are never used or at least some codons mean the same thing.  In the coding system used by the vast majority of organisms, two amino acids are encoded with a single codon whereas all the others have 2, 3, 4 or 6 codons apiece (and stop gets 3).  For amino acids with 2, 3 or 4 codons, it is the third position that makes the difference; for the three that have 6, they have one block of 4 which follows this pattern and one set of two which also differ from each other in the third position. For two amino acids with 6 codons, the two groups are next to each other so that you can think of the change between the blocks as a change in the second position; Ser is very strange in that the two blocks of codons are terribly like each other.  For amino acids with two codons, the 3rd position is either a purine (A,G) or pyrimidine (C,T).  For a given amino acid, these codons are not used equally by a given organism; the pattern of bias in codon usage is quite distinct for an organism and its close cousins and this has wide effects on the genome (and vice versa). For example, in some Streptomyces I have the codon bias pattern pretty much memorized: use G or C in the third position and you'll almost always pick a frequent codon; use A or T and you've picked a rarity.  Some other organisms skew much the reverse; they like A or T in the third position.&lt;br /&gt;&lt;br /&gt;Furthermore, within a species the genes can even be divided further.  In E.coli, for example, there are roughly three classes of genes each with a distinctive codon usage signature.  One class is rich in important proteins which the cell probably needs a lot of, the second class seems to have many proteins which see only a soupcon of expression and the third class is rich in proteins likely to have been recently acquired from other species.&lt;br /&gt;&lt;br /&gt;So, it was natural to infer that this mattered for protein expression.  In particular, if you try to express a protein from one species in another.  Some species seemed to care more than others.  E.coli has a reputation for being finicky and had one of the best studied systems.  Not only did changing the codon usage over to a more E.coli system seem to help some proteins, but truly rare codons (used less than 5% of the time, though that is an arbitrary threshold) could cause all sorts of trouble.  &lt;br /&gt;&lt;br /&gt;However, the question remained &lt;span style="font-style:italic;"&gt;how&lt;/span&gt; to optimize.  Given all those interchangeable codons, a synthetic gene could have many permutations. Several major camps emerged with many variants, particularly amongst the gene synthesis companies.  One school of thought said "maximize, maximize, maximize" -- pick the most frequently used codons in the target species.  A second school said "context matters" -- and went to maximize the codon pair useage.  A third school said "match the source!", meaning make the codon usage of the new coding sequence in the new species resemble the codon usage of the old coding region in the old species.  This hedged for possible requirements for rare codons to ensure proper folding.  Yet another school (which I belonged to) urged "balance", and chose to make the new coding region resemble a "typical" target species gene by sampling the codons based on their frequencies, throwing out the truly rare ones.  A logic here is that hammering the same codon -- and thereby the same tRNA -- over and over would make that codon as good as rare.&lt;br /&gt;&lt;br /&gt;The new work has some crumbs for many of these camps but not many; it suggests much was wrong with each -- or perhaps, the same thing was wrong with each.  The problem is that even with these systems some proteins just didn't express well, leaving everyone scratching their heads.  The GFP work seemed to suggest that the effects of codon usage were unpredictable if present, and in any case other factors, such as secondary structure near the ribosome, were what counted.&lt;br /&gt;&lt;br /&gt;What the new work did is synthesize a modest number (40) of versions of two very different proteins (a single-chain antibody and an enzyme, each version specifying the same protein sequence but with a different set of codons.  Within each type of protein, the expression varied over &lt;span style="font-style:italic;"&gt;two logs&lt;/span&gt;; clearly something matters.  Furthermore, they divided some of the best and worst expressors into thirds and made chimaeras, head of good and tail of bad (and vice versa).  Some chimaeras seemed to have expression resembling their parent for the head end but others seemed to inherit from the tail end parent.  So the GFP-based "ribosome binding site neighborhood secondary structure matters" hypothesis did not fare well with these tests.&lt;br /&gt;&lt;br /&gt;After some computational slicing-and-dicing, what they did come up with is that codon usage matters.  The twist is that it isn't matching the best used codons (CAI) that's important, as shown in the figure at the top which I'm fair-using.  The codons that matter aren't necessarily the most used codons, but when cross-referenced with some data on which codons are most sensitive to starvation conditions the jackpot lights come on.  When you use these as your guide, as shown below, the predictive ability is quite striking.  In retrospect, this makes total sense: expressing a single protein at very high levels is probably going to deplete a number of amino acids.  Indeed, this was the logic of the sampling approach. But, I don't believe any proponent of that approach ever predicted this.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0007002.g002&amp;representation=PNG_M"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 214px; height: 600px;" src="http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0007002.g002&amp;representation=PNG_M" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Furthermore, not only does this work on the training set but new coding regions were prepared to test the model, and these new versions had expression levels consistent with the new model.&lt;br /&gt;&lt;br /&gt;What of secondary structure near the ribosome?  In some of the single-chain antibody constructs an effect could be seen, but it appears the codon usage effect is dominant.  In conversations with the authors (more on this below), they mentioned that GFP is easy to code with secondary structure near the ribosome binding site; this is just an interesting interaction of the genetic code with the amino acid sequence of GFP.  Since it is easy in this case to stumble on secondary structure, that effect shows up in that dataset. &lt;br /&gt;&lt;br /&gt;This is all very interesting, but it is also practical.  On the pure biology side, it does suggest that studying starvation is applicable to studying high level protein expression, which should enable further studies on this important problem.  On the protein expression side, it suggests a new approach to optimizing expression of synthetic constructs.  A catch however: this work was run by DNA2.0 and they have filed for patents and at least some of these patents have issued (e.g. &lt;a href="http://www.patentstorm.us/patents/7561972/fulltext.html"&gt;US 7561972&lt;/a&gt; and &lt;a href="http://www.patentstorm.us/patents/7561973/fulltext.html"&gt;US 7561973&lt;/a&gt;).  I mention this only to note that it is so and to give some starting points for reading further; clearly I have neither the expertise nor responsibility to interpret the legal meaning of patents.&lt;br /&gt;&lt;br /&gt;Which brings us to one final note: this paper represents my first embargo!  A representative of DNA2.0 contacted me back when my "bunk" post was written to mention that this work was going to emerge, and finally last week the curtain was lifted.  Obviously they know how to keep a geek in suspense!  They sent me the manuscript and engaged in a teleconference with the only proviso being that I continued to keep silent until the paper issued.  I'm not sure I would have caught this paper otherwise, so I'm glad they alerted me; though clearly both the paper and this post are not bad press for DNA2.0.  Good luck to them!  Now that I'm on the other side of the fence, I'll buy my synthetic genes from anyone with a good price and a good design rationale.&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=PLoS+One&amp;rft_id=info%3A%2F10.1371%2Fjournal.pone.0007002&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Design+Parameters+to+Control+Synthetic+Gene+Expression+in+Escherichia+coli&amp;rft.issn=&amp;rft.date=2009&amp;rft.volume=4&amp;rft.issue=9&amp;rft.spage=0&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fwww.plosone.org%2Farticle%2Finfo%253Adoi%252F10.1371%252Fjournal.pone.0007002&amp;rft.au=Mark+Welch%2C+Sridhar+Govindarajan%2C+Jon+E.+Ness%2C+Alan+Villalobos%2C+Austin+Gurney%2C+Jeremy+Minshull1%2C+Claes+Gustafsson&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CComputational+Biology%2C+Genetics+%2C+Microbiology+%2C+Molecular+Biology"&gt;Mark Welch, Sridhar Govindarajan, Jon E. Ness, Alan Villalobos, Austin Gurney, Jeremy Minshull1, Claes Gustafsson (2009). Design Parameters to Control Synthetic Gene Expression in Escherichia coli &lt;span style="font-style: italic;"&gt;PLoS One, 4&lt;/span&gt; (9) : &lt;a rev="review" href="10.1371/journal.pone.0007002"&gt;10.1371/journal.pone.0007002&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-1945588234847412821?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/1945588234847412821/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=1945588234847412821" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/1945588234847412821" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/1945588234847412821" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/codon-optimization-is-not-bunk.html" title="Codon Optimization is Not Bunk?" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-1831031150968912497</id><published>2009-09-09T23:08:00.003-04:00</published><updated>2009-09-10T00:09:10.858-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="genome sequencing" /><category scheme="http://www.blogger.com/atom/ns#" term="gardening" /><title type="text">A Blight's Genome</title><content type="html">&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_N2AOZejgjyA/SqhzJbnnfyI/AAAAAAAAAEM/Meh86ShKwNY/s1600-h/2009TomatoBlight.JPG"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 300px; height: 400px;" src="http://3.bp.blogspot.com/_N2AOZejgjyA/SqhzJbnnfyI/AAAAAAAAAEM/Meh86ShKwNY/s400/2009TomatoBlight.JPG" border="0" alt=""id="BLOGGER_PHOTO_ID_5379676360763997986" /&gt;&lt;/a&gt;&lt;br /&gt;Normally this time of year I would be watching the weather forecasts checking for the dreaded early frost which slays tomato plants, often followed by weeks of mild weather that could have permitted further growth.  Alas, this year that will clearly not be the case.  A wet growing season and commercial stock contaminated with spores has led to an epidemic of late blight, and my tomato plants (as shown, with the night photography accentuating the horror) are being slaughtered.  This weekend I'll clear the whole mess out &amp; for the next few years plant somewhere else.  &lt;br /&gt;&lt;br /&gt;Late blight is particularly horrid as it attacks both foliage and fruits -- many tomato diseases simply kill the foliage.  What looked like promising green tomatoes a week ago are now disgusting brown blobs.  &lt;br /&gt;&lt;br /&gt;Late blight is caused by Phytophora infestans, a fungus-like organism.  An even more devastating historical manifestation of this ogre was the Great Irish Potato Famine and remains a scourge of potato farmers.  Given my current difficulties with it, I was quite excited to see the publication of the &lt;a href="http://www.nature.com/nature/journal/vaop/ncurrent/abs/nature08358.html"&gt;Phytophora infestans genome sequence (by Sanger) in Nature&lt;/a&gt; today.&lt;br /&gt;&lt;br /&gt;A sizable chunk of the paper is devoted to the general structure of the genome, which tops out at 240Mb. Two related plant pathogens, P.sojae (soybean root rot) and P.ramorum (sudden oak death) come in only at 95Mb and 65Mb respectively.  What accounts for the increase?  While the genome does not seem to be duplicated as a whole, a number of gene families implicated in plant pathogenesis have been found.&lt;br /&gt;&lt;br /&gt;Also in great numbers are transposons.  About a third of the genome are Gypsy-type retrotransposons.  Several other classes of transposons are present also.  In the end, just over a quarter (26%) of the genome is non-repetitive.  While these transposons do not themselves appear to contain phytopathological genes, their presence appears to be driving expansion of some key families of such genes.  Comparison of genomic scaffolds with the other two sequenced Phytophora show striking overall conservation of conserved genes, but with local rearrangements and expansion of the zones between conserved genes (Figure 1 plus S18 and S19).  Continuing evolutionary activity in this space is shown by the fact that some of these genes are apparently inactivated but have only small numbers of mutations, suggesting very recent conversion to pseudogenes.  A transposon polymorphism was also found -- an insertion in one haplotype which is absent in another (&lt;a href="http://www.nature.com/nature/journal/vaop/ncurrent/extref/nature08358-s2.pdf"&gt;figure S9&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;A curious additional effect shown off in two-D plots of 5' vs. 3' intergenic length (Figure 2.  Overall this distribution is a huge blob, but for some of the pathogenesis gene classes are clustered in the quadrant where both intergenic regions are large -- conversely many of the core genes are clustered in the graph in the "both small" quadrant.  &lt;a href="http://www.nature.com/nature/journal/vaop/ncurrent/extref/nature08358-s2.pdf"&gt;Supplemental Figure S2&lt;/a&gt; shows rather strikingly how splayed the distribution is for P.infestans -- other genomes show much tighter distributions but P.infestans seems to have quite a few intergenic regions at about every possible scale.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://www.nature.com/news/2009/090909/full/news.2009.893.html"&gt;news item accompanying the paper&lt;/a&gt; puts some perspective on all this: P.infestans is armed with lots of anti-plant weapons which enable it to evolve evasions to plant resistance mechanisms.  A quoted plant scientist offers a glum perspective&lt;br /&gt;&lt;blockquote&gt;After taking 15 years to incorporate this resistance in a cultivar, it would take Phytophthora infestans only a couple of years to defeat it.&lt;/blockquote&gt;.  Chemical control of P.infestans &lt;a href="http://plant-disease.ippc.orst.edu/disease.cfm?RecordID=1084"&gt;reportedly works only before the infection is apparent&lt;/a&gt; and probably involves stuff I'd rather not play with.  &lt;br /&gt;&lt;br /&gt;A quick side trip to Wikipedia finds that the genus is a pack of blights.  Indeed, &lt;a href="http://en.wikipedia.org/wiki/Phytophthora"&gt;Phytophora is coined from the Greek for "plant destruction"&lt;/a&gt;.  Other horticultural curses from this genus include alder root rot, rhododendron root rot, cinnamon root rot (not of cinnamon, but rather various woody plants) and fruit rots in a wide variety of useful &amp; yummy fruits including strawberries, cucumbers and coconuts.  What an ugly family tree!&lt;br /&gt;&lt;br /&gt;The Wikipedia entry also sheds light on why these awfuls are referred to as "fungus-like".  While they have a life cycle and some morphology similarities to fungi, their cell walls are mostly cellulose and molecular phylogenetics place them closer to plants than to fungi.&lt;br /&gt;&lt;br /&gt;So, the P.infestans genome sequence sheds light how this pathogen can shift its attacks quickly.  Unfortunately, as with human genomic medicine, it will take a long time to figure out how to outsmart these assaults, particularly in a manner practical and safe for commercial growers and home gardeners alike.&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3A%2F10.1038%2Fnature08358&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Genome+sequence+and+analysis+of+the+Irish+potato+famine+pathogen+Phytophthora+infestans&amp;rft.issn=&amp;rft.date=2009&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fnature%2Fjournal%2Fvaop%2Fncurrent%2Fabs%2Fnature08358.html&amp;rft.au=BJ+Haas+et+al&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CAgriculture%2C+Genetics+%2C+Microbiology+%2C+Bioinformatics"&gt;BJ Haas et al (2009). Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans &lt;span style="font-style: italic;"&gt;Nature&lt;/span&gt; : &lt;a rev="review" href="10.1038/nature08358"&gt;10.1038/nature08358&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-1831031150968912497?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/1831031150968912497/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=1831031150968912497" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/1831031150968912497" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/1831031150968912497" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/blights-genome.html" title="A Blight's Genome" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_N2AOZejgjyA/SqhzJbnnfyI/AAAAAAAAAEM/Meh86ShKwNY/s72-c/2009TomatoBlight.JPG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-8100634563987972125</id><published>2009-09-08T23:01:00.003-04:00</published><updated>2009-09-08T23:39:22.064-04:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="genome sequencing" /><title type="text">Next-generation Physical Maps III: HAPPy Maps</title><content type="html">A second paper which triggered my current &lt;a href="http://omicsomics.blogspot.com/2009/09/physical-maps-part-i-of-series.html"&gt;physical map madness&lt;/a&gt; is a piece (open access!) &lt;a href="http://www.biolsci.org/v05p0298.htm"&gt;arguing for the adaptation of HAPPY mapping to next-gen sequencing&lt;/a&gt;.  This is intriguing in part because I see (and have) a need for cheap &amp; facile access to the underlying technologies but also because I think there are some interesting computational problems (not touched on the paper, as I will elaborate below) and some additional uses to the general approach.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.mrc-lmb.cam.ac.uk/happy/HappyGroup/happy.html"&gt;HAPPy mapping&lt;/a&gt; is a method developed by Simon Dear for physical mapping.  The basic notion is that large DNA fragments (the size range determining several important parameters of the map; &lt;a href="http://www.mrc-lmb.cam.ac.uk/happy/HappyGroup/methods/markerorder.html"&gt;the maximum range between two markers is about 10 times the minimum resolution&lt;/a&gt;) are randomly gathered in pools which each contain approximately one half a genome equivalent.  Practical issues limit the maximum size fragments to about 1Mb; any larger and they can't be handled in vitro. By typing markers across these pools, a map can be generated.  If two markers are on different chromosomes or are farther apart than the DNA fragment size, then there will be no correlation between them.  On the other hand, two markers which are very close together on a chromosome will tend to show up together in a pool. Traditionally, HAPPy pools have been &lt;a href="http://www.mrc-lmb.cam.ac.uk/happy/HappyGroup/methods/type.html"&gt;typed by PCR assays&lt;/a&gt; designed to known sequences.  One beauty of HAPPy mapping is that it is nearly universal; if you can extract high molecular weight DNA from an organism then a HAPPy map should be possible.&lt;br /&gt;&lt;br /&gt;The next-gen version of this proposed by the authors would make HAPPy pools as before but then type them by sequence sampling the pools.  Given that a HAPPy pool contains many orders of magnitude less DNA than current next-gen library protocols require, they propose using whole-genome amplification to boost the DNA.  Then each pool would be converted to a bar-coded sequencing library.  The final typing would be performed by incorporating these reads into a shotgun assembly and then scoring each contig as present or absent in a pool.  Elegant!&lt;br /&gt;&lt;br /&gt;When would this mapping occur?  One suggestion is to first generate a rough assembly using standard shotgun sequencing, as this improves the estimate of the genome size which in turn enables the HAPPy pools to be optimally constructed so that any given fragment will be in 50% of the pools.  Alternatively, if a good estimate of the genome size is known the HAPPy pools could potentially be the source of all of the shotgun data (this is hinted at).&lt;br /&gt;&lt;br /&gt;One possible variation to this approach would be to replace bar-coded libraries and WGA with Helicos sequencing, which can theoretically work on very small amounts of DNA.  Fragmenting such tiny amounts would be one challenge to be overcome, and of course the Helicos generates much shorter, lower-quality reads than the other platforms.  But, since these reads are primarily for building a physical map (or, in sequence terms driving to larger supercontigs), that may not be fatal.&lt;br /&gt;&lt;br /&gt;If going with one of the other next-gen platforms (as noted in the &lt;a href="http://omicsomics.blogspot.com/2009/09/physical-maps-ii-reading-signposts.html"&gt;previous post in this series&lt;/a&gt;, perhaps microarrays make sense as a readout), there is the question of input DNA.  For example, mammalian genomes range in size from &lt;a href="http://www.genomesize.com/statistics.php?stats=mammals"&gt;1.73pg to 8.40pg&lt;/a&gt;.  A lot of next-gen library protocols seem to call for more like 1-10ug of DNA, or about 6 logs more.  The HAPPy paper's authors suggest whole-genome amplification, which is reasonable but could potentially introduce bias.  In particular, it could be problematic to allow reads from amplified DNA to be the primary or even a major source of reads for the assembly.  &lt;a href="http://omicsomics.blogspot.com/2009/09/physical-maps-ii-reading-signposts.html"&gt;As I've noted before&lt;/a&gt;, other approaches such as molecular inversion probes might be useful for low amounts, but have not been demonstrated to my knowledge with picograms of input DNA. However, today I stumbled on two papers, one from the &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/utils/fref.fcgi?PrId=3051&amp;itool=AbstractPlus-def&amp;uid=18084031&amp;nlmid=0411011&amp;db=pubmed&amp;url=http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&amp;pmid=18084031"&gt;Max Planck Institute&lt;/a&gt; and &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/utils/fref.fcgi?PrId=3196&amp;itool=AbstractPlus-def&amp;uid=19298667&amp;nlmid=100965258&amp;db=pubmed&amp;url=http://www.biomedcentral.com/1471-2164/10/116"&gt;one from Stanford&lt;/a&gt;, which use digital PCR to quantitate next-gen libraries and assert that this can assist in successfully preparing libraries from tiny amounts of DNA.  It may also be possible to deal with this issue by attempting more than the required number of libraries, determining which built successfully by digital PCR and then pooling a sufficient number of successful libraries.&lt;br /&gt;&lt;br /&gt;The desirable number of HAPPY libraries and the desired sequencing depth for each library are two topics not covered well in the paper, which is unfortunate. The number of libraries presumably affects both resolution and confidence in the map.  Pretty much the entire coverage of this is the tail of one paragraph &lt;blockquote&gt; In the Illumina/Solexa system, DNA can be randomly sheared and amplified with primers that contain a 3 bp barcode. Using current instruments, reagents, and protocols, one Solexa "lane" generates ~120 Mb in ~3 million reads of ~40 bp. When each Solexa lane is multiplexed with 12 barcodes, for example, it will provide on average, ~10 Mb of sequence in ~250,000 reads for each sample. At this level of multiplexing, one Solexa instrument "run" (7 lanes plus control) would allow tag sequencing of 84 HAPPY samples. This means, one can finish 192 HAPPY samples in a maximum of three runs. New-generation sequencing combined with the barcode technique will produce innumerous amounts of sequences for assembly.&lt;br /&gt;&lt;/blockquote&gt;  &lt;br /&gt;&lt;br /&gt;Changing the marker system from direct testing of sequence-tagged sites by PCR to sequencing-based sampling has an important implication &lt;a href="http://omicsomics.blogspot.com/2009/09/physical-maps-ii-reading-signposts.html"&gt;as discussed in the last post&lt;/a&gt;.  If your PCR is working well, then if a pool contains a target it will come up positive.  But with the sequencing, there is a very real chance of not detecting a marker present in a pool.  This probability will depend on the size of the target -- very large contigs will have very little chance of being missed, but as contigs go smaller their probability of being missed goes up.  Furthermore, actual size won't be as important as the effective size: the amount of sequence which can be reliably aligned.  In other words, two contigs might be the same length, but if one has a higher repeat count that contig will be more easily detectable. These parameters in turn can be estimated from the actual data.  &lt;br /&gt;&lt;br /&gt; The actual size of the pool is a critical parameter as well.  So, the sampling depth (for a given haploid genome size) will determine&lt;br /&gt;&lt;br /&gt;In any case, the problem of false negatives must be addressed.  One approach is to only map contigs which are unlikely to have ever been missed.  However, that means losing the ability to map smaller contigs.  Presumably there are clever computational approaches to either impute missing data or simply deal with it.&lt;br /&gt;&lt;br /&gt;It should also be noted that HAPPy maps, like many physical mapping techniques, are likely to yield long-range haplotype information.  Hence, even after sequencing one individual the approach will retain utility.  Indeed, this seems to be the tack that Complete Genomics is taking to obtain this information for human genomes, though they call it &lt;a href="http://scienceblogs.com/geneticfuture/2009/02/complete_genomics_some_questio.php"&gt;Long Fragment Reads&lt;/a&gt;.  It is worth noting that the haplotyping application has one clear difference from straight HAPPy mapping.  In HAPPy mapping, the optimal pool size is one in which any given genome fragment is expected to appear in half the pools, which means &lt;a href="http://www.mrc-lmb.cam.ac.uk/happy/HappyGroup/methods/panel/panel.html"&gt;pools of about 0.7X genome&lt;/a&gt;.  But for haplotyping (and for trying to count copy numbers and similar structural issues), it is desirable to have the pools much smaller, as this information can only be obtained if a given region of the genome is haploid in that pool.  Ideally, this would mean each fragment in its own pool (library), but realistically this will mean as small a pool size as one can make and still cover the whole genome in the targeted number of pools.  Genomes of higher ploidies, such as many crops which are tetraploid, hexaploid or even octaploid, would probably require more pools with lower genomic fractions in order to resolve haplotypes.&lt;br /&gt;&lt;br /&gt;In conclusion, HAPPy mapping comes close to my personal ideal of a purely in vitro mapping system which looks like a clever means of preparing next-gen libraries.  The minimum and maximum distances resolvable are about 10-fold apart, so more than one set of HAPPy libraries is likely to be desirable for an organism.  Typically this is two sizes, since the maximum fragment size is around 1Mb (and may be smaller from organisms with difficult to extract DNA).  A key problem to resolve is that HAPPy pools contain single digit picograms of DNA.  Amplification is a potential solution but may introduce bias; clever library preparation (or screening) may be another approach.  An open problem is the best depth of coverage of the multiplexed HAPPy next-gen libraries.  HAPPy can be used both for physical mapping and long-range haplotyping, though the fraction of genome in a pool will differ for these different applications.&lt;br /&gt;&lt;br /&gt;&lt;span style="float: left; padding: 5px;"&gt;&lt;a href="http://www.researchblogging.org"&gt;&lt;img alt="ResearchBlogging.org" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" style="border:0;"/&gt;&lt;/a&gt;&lt;/span&gt;&lt;span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=International+journal+of+biological+sciences&amp;rft_id=info%3Apmid%2F19381348&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Old+can+be+new+again%3A+HAPPY+whole+genome+sequencing%2C+mapping+and+assembly.&amp;rft.issn=&amp;rft.date=2009&amp;rft.volume=5&amp;rft.issue=4&amp;rft.spage=298&amp;rft.epage=303&amp;rft.artnum=&amp;rft.au=Jiang+Z&amp;rft.au=Rokhsar+DS&amp;rft.au=Harland+RM&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CGenetics"&gt;Jiang Z, Rokhsar DS, &amp; Harland RM (2009). Old can be new again: HAPPY whole genome sequencing, mapping and assembly. &lt;span style="font-style: italic;"&gt;International journal of biological sciences, 5&lt;/span&gt; (4), 298-303 PMID: &lt;a rev="review" href="http://www.ncbi.nlm.nih.gov/pubmed/19381348"&gt;19381348&lt;/a&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-8100634563987972125?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/8100634563987972125/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=8100634563987972125" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/8100634563987972125" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/8100634563987972125" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/next-generation-physical-maps-iii-happy.html" title="Next-generation Physical Maps III: HAPPy Maps" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-36768584.post-674529787791211141</id><published>2009-09-07T21:33:00.002-04:00</published><updated>2009-09-07T21:39:12.074-04:00</updated><title type="text">Farewell to Summer</title><content type="html">&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_N2AOZejgjyA/SqW1ggQPVdI/AAAAAAAAAEE/KUrYSlv8iBI/s1600-h/DSC01973.JPG"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://1.bp.blogspot.com/_N2AOZejgjyA/SqW1ggQPVdI/AAAAAAAAAEE/KUrYSlv8iBI/s400/DSC01973.JPG" border="0" alt=""id="BLOGGER_PHOTO_ID_5378904899982415314" /&gt;&lt;/a&gt;&lt;br /&gt;The autumnal equinox is still a few weeks away, but today marks the traditional end of the summer vacation season.  How does a genomics geek dress for the beach?  I present the KR's Career Collection, with sun protection from Infinity, a lovely Codon Devices beach bag and a towel from Millennium, with gorgeous Crane's Beach in Ipswich lending the background.  This collection was more spontaneous than planned, so I didn't dig for one of my remaining Harvard T-shirts or the Centocor frisbee that's stashed somewhere.&lt;br /&gt;&lt;br /&gt;I haven't yet worked for an Iguana Pharmaceuticals, so perhaps that is a portent of my future.  And while I am quite happy in my job, if someone were to start a bioinformatics shop on St. Thomas, I would have to listen to the pitch...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/36768584-674529787791211141?l=omicsomics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://omicsomics.blogspot.com/feeds/674529787791211141/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="https://www.blogger.com/comment.g?blogID=36768584&amp;postID=674529787791211141" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/674529787791211141" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/36768584/posts/default/674529787791211141" /><link rel="alternate" type="text/html" href="http://omicsomics.blogspot.com/2009/09/farewell-to-summer.html" title="Farewell to Summer" /><author><name>Keith Robison</name><uri>http://www.blogger.com/profile/04765318239070312590</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd="http://schemas.google.com/g/2005" name="OpenSocialUserId" value="08368724497474381730" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_N2AOZejgjyA/SqW1ggQPVdI/AAAAAAAAAEE/KUrYSlv8iBI/s72-c/DSC01973.JPG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total></entry></feed>
