<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
 
 <title>b.atcg.us</title>
 <link href="http://b.atcg.us/" />
 
 <updated>2012-01-07T13:49:43-08:00</updated>
 <id>http://b.atcg.us/</id>
 <author>
   <name>brant faircloth</name>
   <email>faircloth+junk@gmail.com</email>
 </author>

 
 <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/atcg" /><feedburner:info uri="atcg" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
   <title>Python multiprocessing - multiple producers, single consumer</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/63ThX9C_UEg/multiprocessing-with-multiple-producers-and-a-single-consumer.html" />
   <published>2011-11-05T10:58:59-07:00</published>
   <updated>2011-11-05T10:58:59-07:00</updated>
   <id>http://b.atcg.us/blog/2011/11/05/multiprocessing-with-multiple-producers-and-a-single-consumer.html</id>
   <content type="html">&lt;p&gt;I'm working on &lt;a href="http://www.github.com/faircloth-lab/demuxipy/"&gt;some
code&lt;/a&gt; that I initially
wrote to write results to a &lt;a href="http://www.mysql.com/"&gt;mysql&lt;/a&gt; database.  I
chose mysql because it supports concurrent writes, and this program
processess data in parallel having each process write it's results to
the database when a given task is complete.  This was a lazy way of
getting the data stored relatively quickly and easily (let's forget
about the overhead of any given process for a moment).  That said,
having mysql as a dependency of your code is a bummer, particularly for
folks with one-off tasks who don't have the time or patience to install
and configure mysqld.  Additionally, portability of mysql data, while
relatively easy using dumpfiles, is not as useful as several other
options.&lt;/p&gt;

&lt;p&gt;Because I'm working in &lt;a href="http://www.python.org/"&gt;Python&lt;/a&gt;, I can also use
the exceptional &lt;a href="http://docs.python.org/library/sqlite3.html"&gt;sqlite3&lt;/a&gt;
module.  It offers most of the database functions that i want/need, it's
generally available on any platform, it is open source, and databases
are portable between machines.&lt;/p&gt;

&lt;p&gt;However, sqlite3 does not support concurrent writes - meaning that I
need a general way to process my data in parallel while writing my
results to a database using a single process.  This basically translates
to needing a multiprocessing model having multiple producers and a
single consumer - something with few examples available on the
interwebs.&lt;/p&gt;

&lt;p&gt;So, after a bit of playing around, here's some test code that does just
that:&lt;/p&gt;

&lt;script src="https://gist.github.com/1255715.js"&gt; &lt;/script&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/63ThX9C_UEg" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2011/11/05/multiprocessing-with-multiple-producers-and-a-single-consumer.html</feedburner:origLink></entry>
 
 <entry>
   <title>(Relatively) Easily get coverage for velvet assemblies</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/6EVcEqJUZWs/relatively-easily-get-coverage-for-velvet-assemblies.html" />
   <published>2011-07-02T11:15:59-07:00</published>
   <updated>2011-07-02T11:15:59-07:00</updated>
   <id>http://b.atcg.us/blog/2011/07/02/relatively-easily-get-coverage-for-velvet-assemblies.html</id>
   <content type="html">&lt;p&gt;You can get kmer coverage from contigs assembled by velvet by &lt;a href="http://seqanswers.com/forums/showthread.php?t=6887"&gt;parsing the kmer value from the output fasta header&lt;/a&gt;, but sometimes I want "actual" coverage for contigs or coverage across a specific subset of contigs.&lt;/p&gt;

&lt;p&gt;Here is a way to do this relatively painlessly (requires that you first download and build the amos tools).&lt;/p&gt;

&lt;script src="https://gist.github.com/1061484.js"&gt; &lt;/script&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/6EVcEqJUZWs" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2011/07/02/relatively-easily-get-coverage-for-velvet-assemblies.html</feedburner:origLink></entry>
 
 <entry>
   <title>beast in the cloud</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/wb-IdKF38Q4/beast-in-the-cloud.html" />
   <published>2011-04-09T12:58:34-07:00</published>
   <updated>2011-04-09T12:58:34-07:00</updated>
   <id>http://b.atcg.us/blog/2011/04/09/beast-in-the-cloud.html</id>
   <content type="html">&lt;p&gt;We've been running &lt;a href="http://beast.bio.ed.ac.uk/Main_Page"&gt;beast&lt;/a&gt; and &lt;a href="http://mrbayes.csit.fsu.edu/"&gt;mrbayes&lt;/a&gt; on several data sets lately, generally using &lt;a href="http://aws.amazon.com/ec2/"&gt;ec2&lt;/a&gt; to help us run multiple analyses simultaneously.  Along those lines, I was interested in getting beast (using the &lt;a href="http://code.google.com/p/beagle-lib/"&gt;beagle-lib&lt;/a&gt;) running on ec2, to take advantage of their &lt;a href="http://aws.amazon.com/ec2/hpc-applications/"&gt;GPU HPC&lt;/a&gt; options (what a load of acronyms!).&lt;/p&gt;

&lt;p&gt;Anyway, here are steps to get BEAST running on ec2. Also, expect something a little more formal in the (hopefully) near future describing this option, and even easier means of implementation.&lt;/p&gt;

&lt;script src="https://gist.github.com/835693.js"&gt; &lt;/script&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/wb-IdKF38Q4" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2011/04/09/beast-in-the-cloud.html</feedburner:origLink></entry>
 
 <entry>
   <title>454 assembly on ec2</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/AGUBYEpakd4/454-assembly-on-ec2.html" />
   <published>2011-02-11T15:59:51-08:00</published>
   <updated>2011-02-11T15:59:51-08:00</updated>
   <id>http://b.atcg.us/blog/2011/02/11/454-assembly-on-ec2.html</id>
   <content type="html">&lt;p&gt;I recently came across a situation in which I needed to assemble some reads from a critter that we gathered from a couple of &lt;a href="http://www.454.com/"&gt;454&lt;/a&gt; sequencing runs.  Unfortunately, the assembly of these runs using &lt;a href="http://www.454.com/products-solutions/analysis-tools/gs-de-novo-assembler.asp"&gt;gsAssembler 2.5.3&lt;/a&gt; required a bit more RAM than available on our local workstation running a Fedora image in VMware.&lt;/p&gt;

&lt;p&gt;So, I decided to assemble the reads using &lt;a href="http://aws.amazon.com/ec2/"&gt;Amazon EC2&lt;/a&gt;.  The process was relatively painless once I found an OS image that would boot properly with the instance type I chose (m2.2xlarge.  see &lt;a href="http://aws.amazon.com/ec2/instance-types/"&gt;here&lt;/a&gt; for a list).&lt;/p&gt;

&lt;p&gt;Here's the steps to follow using a 64-bit Amazon Linux image:&lt;/p&gt;

&lt;script src="https://gist.github.com/823271.js"&gt; &lt;/script&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/AGUBYEpakd4" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2011/02/11/454-assembly-on-ec2.html</feedburner:origLink></entry>
 
 <entry>
   <title>Getting taxonomy information from NCBI</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/Ivg0qFZcqvs/getting-taxonomy-information-from-ncbi.html" />
   <published>2011-01-29T11:01:51-08:00</published>
   <updated>2011-01-29T11:01:51-08:00</updated>
   <id>http://b.atcg.us/blog/2011/01/29/getting-taxonomy-information-from-ncbi.html</id>
   <content type="html">&lt;p&gt;Sometimes you need to get &lt;a href="http://www.ncbi.nlm.nih.gov/Taxonomy/"&gt;taxonomy&lt;/a&gt; information from NCBI, assuming that you know a particular species name.  If you only are working with one species, then this is not very hard.  When it comes to working with multiple species, however, attempting such a task using the web-frontend would be painful.  Luckily, you can use the NCBI &lt;a href="http://eutils.ncbi.nlm.nih.gov/"&gt;eutils&lt;/a&gt; to make the process somewhat easier, particularly when combined with &lt;a href="http://biopython.org/wiki/Biopython"&gt;BioPython&lt;/a&gt;:&lt;/p&gt;

&lt;script src="https://gist.github.com/802094.js"&gt; &lt;/script&gt;


&lt;p&gt;Several things to remember here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you must add &lt;em&gt;your&lt;/em&gt; email address as the &lt;code&gt;Entrez.email&lt;/code&gt; attribute&lt;/li&gt;
&lt;li&gt;you should not request info. for more than 100 species during the work-day (see &lt;a href="http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements"&gt;eutils restrictions&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;you will need to make changes to get this to work with your particular input&lt;/li&gt;
&lt;/ul&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/Ivg0qFZcqvs" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2011/01/29/getting-taxonomy-information-from-ncbi.html</feedburner:origLink></entry>
 
 <entry>
   <title>sphinx + github with no submodules</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/VWbrajWRIKM/sphinx-github-with-no-submodules.html" />
   <published>2011-01-22T18:59:50-08:00</published>
   <updated>2011-01-22T18:59:50-08:00</updated>
   <id>http://b.atcg.us/blog/2011/01/22/sphinx-github-with-no-submodules.html</id>
   <content type="html">&lt;p&gt;Previously, I detailed how I was &lt;a href="../../../2010/02/10/sphinx-documentation-and-github.html"&gt;using git submodules&lt;/a&gt; and &lt;a href="http://pages.github.com/"&gt;gh-pages&lt;/a&gt; at github to host the html versions of my documentation generated by &lt;a href="http://sphinx.pocoo.org/"&gt;sphinx&lt;/a&gt;.  Basically, the problem is that using git submodules for this is a real pain (for me, at least).  You've always got to remember to sync up the submodule(s) just so, or things go horribly wrong and stop tracking each other.&lt;/p&gt;

&lt;p&gt;Because I can never remember the correct order of what needs to be committed when, I searched for other ways.  And searched.  And searched.  Finally, I ginned up something that I think I like.  It basically uses a directory, within your project but ignored by &lt;code&gt;.gitignore&lt;/code&gt;, to keep track of your gh-pages content.  Because the &lt;code&gt;.git&lt;/code&gt; directory tracks what you need to track, you can run &lt;code&gt;make html&lt;/code&gt; for sphinx with impunity, and you can push your changes to gh-pages when you want - no need to remember what goes when and where.&lt;/p&gt;

&lt;p&gt;Finally, you can track the raw &lt;a href="http://docutils.sourceforge.net/rst.html"&gt;rst&lt;/a&gt; of your doc files in the regular source tree, just like you should.&lt;/p&gt;

&lt;script src="https://gist.github.com/791759.js"&gt; &lt;/script&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/VWbrajWRIKM" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2011/01/22/sphinx-github-with-no-submodules.html</feedburner:origLink></entry>
 
 <entry>
   <title>casting a numpy array of strings to int</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/acN0Jqdw3Cc/casting-a-numpy-array-of-strings-to-int.html" />
   <published>2010-12-13T09:27:59-08:00</published>
   <updated>2010-12-13T09:27:59-08:00</updated>
   <id>http://b.atcg.us/blog/2010/12/13/casting-a-numpy-array-of-strings-to-int.html</id>
   <content type="html">&lt;p&gt;Sometimes you need to create an array from a string, and then you need to cast the array (which is of string type) into something more useful like &lt;code&gt;int&lt;/code&gt; - for example when reading PHRED quality scores from a file.  You can do this several ways, often using a list comprehension, perhaps like so:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="python"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;40 40 40 40 40&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;sl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rstrip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39; &amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;si&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;elem&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sl&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;sa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;si&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;p&gt;But, this is kludgy and there is a more efficient way:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="python"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt;

&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;40 40 40 40 40&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;sa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rstrip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39; &amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/acN0Jqdw3Cc" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/12/13/casting-a-numpy-array-of-strings-to-int.html</feedburner:origLink></entry>
 
 <entry>
   <title>chunking a fasta file, part 2</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/Hi4OcCDPc_U/chunking-a-fasta-file-part-2.html" />
   <published>2010-12-11T15:45:45-08:00</published>
   <updated>2010-12-11T15:45:45-08:00</updated>
   <id>http://b.atcg.us/blog/2010/12/11/chunking-a-fasta-file-part-2.html</id>
   <content type="html">&lt;p&gt;Well, it took me more time than I had planned to get around to wrapping &lt;a href="http://b.atcg.us/blog/2010/10/03/chunking-a-fasta-file-part-1.html"&gt;this&lt;/a&gt; up... but, it is what it is.&lt;/p&gt;

&lt;p&gt;I have completed some code that will use single- or multiple-processes to split a fasta or fastq file into a requested number of subunits.  I have yet to test the speed of the code relative to something like &lt;code&gt;split&lt;/code&gt;, but my guess is that it's rather fast, particularly for large files.&lt;/p&gt;

&lt;p&gt;I provide a use case at the bottom of the file.&lt;/p&gt;

&lt;script src="https://gist.github.com/737708.js"&gt; &lt;/script&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/Hi4OcCDPc_U" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/12/11/chunking-a-fasta-file-part-2.html</feedburner:origLink></entry>
 
 <entry>
   <title>vmware fusion and the cli</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/Bq3bJr-wOIE/vmware-fusion-and-the-cli.html" />
   <published>2010-12-11T00:00:00-08:00</published>
   <updated>2010-12-11T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2010/12/11/vmware-fusion-and-the-cli.html</id>
   <content type="html">&lt;p&gt;Today, I needed to shutdown an instance running on &lt;a href="http://www.vmware.com/products/fusion/"&gt;VMware Fusion&lt;/a&gt; so i could free up some resources.  Normally, this is an easy task because I'm sitting at the machine in question.  But, I ran out of my office the other day and forgot to shutdown the instance.  So, i needed to quit it remotely.  Digging into things a bit, I found &lt;code&gt;vmrun&lt;/code&gt;, which is the CLI to VMware Fusion.  Running &lt;code&gt;vmrun&lt;/code&gt; without any options show how to make it work, and shutting down a VM is as simple as:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="bash"&gt;/Library/Application&lt;span class="se"&gt;\ &lt;/span&gt;Support/VMware&lt;span class="se"&gt;\ &lt;/span&gt;Fusion/vmrun stop &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;&amp;quot;/Volumes/Data2/VM/Fedora 64-bit.vmwarevm/Fedora 64-bit.vmx&amp;quot;&lt;/span&gt; soft
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/Bq3bJr-wOIE" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/12/11/vmware-fusion-and-the-cli.html</feedburner:origLink></entry>
 
 <entry>
   <title>a plethora of sequence tags</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/gzcgJUo0C10/a-plethora-of-sequence-tags.html" />
   <published>2010-11-20T00:00:00-08:00</published>
   <updated>2010-11-20T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2010/11/20/a-plethora-of-sequence-tags.html</id>
   <content type="html">&lt;p&gt;Sequence tags can be attached to DNA reads of interest to let you &lt;a href="http://dx.doi.org/10.1093/nar/gkm566"&gt;track&lt;/a&gt; &lt;a href="http://dx.doi.org/10.1371/journal.pone.0000197"&gt;different&lt;/a&gt; &lt;a href="http://dx.doi.org/10.1038/NMETH.1184"&gt;pools&lt;/a&gt; of reads following a second generation sequencing run.  The best way to generate tags for these reads is a matter of some debate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you can "just make them"&lt;/li&gt;
&lt;li&gt;you can generate them with a &lt;a href="http://bioinf.eva.mpg.de/multiplex/"&gt;program&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;you can use &lt;a href="http://en.wikipedia.org/wiki/Hamming_distance"&gt;Hamming&lt;/a&gt; &lt;a href="http://www.lee.eng.uerj.br/~gil/redesII/hamming.pdf"&gt;distance&lt;/a&gt; tags&lt;/li&gt;
&lt;li&gt;you can use &lt;a href="http://en.wikipedia.org/wiki/Levenshtein_distance"&gt;Levenshtein&lt;/a&gt; &lt;a href="http://sascha.geekheim.de/wp-content/uploads/2006/04/levenshtein.pdf"&gt;distance&lt;/a&gt; tags&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The problem with "just making" the tags is that sequencing errors can inadvertently turn one tag into another if there is, say, an erroneous substitution of a base into the tag portion of the sequence read.  Error correcting tags using &lt;a href="http://dx.doi.org/10.1038/NMETH.1184"&gt;Hamming distance&lt;/a&gt; attempt to counter this effect, but are only robust to substitution errors, which &lt;a href="http://dx.doi.org/10.1109/CIBCB.2009.4925705"&gt;can be problematic&lt;/a&gt;.  Levenshtein distance tags are robust to insertion, deletion, and substitution error, but it is often hard to find available sets of Levenshtein distance sequence tags.&lt;/p&gt;

&lt;p&gt;With all of that in mind, I offer several sets of Levenshtein distance sequence tags.  These tags range from 4 to 10 nt and edit distance 3 to 9.  The 10nt tags are somewhat slow to create (70 or 80 hours on a multicore machine), so you might as well just use these rather than generate a set, &lt;em&gt;de novo&lt;/em&gt;.  If you would like to check the tags, to ensure they are of the appropriate distance, you &lt;a href="https://github.com/BadDNA/edittag/blob/master/bin/validate_edit_metric_tags.py"&gt;can&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For those interested in the nitty-gritty details, see the &lt;a href="https://github.com/BadDNA/edittag/blob/master/bin/design_edit_metric_tags.py"&gt;code&lt;/a&gt;, which is one program within &lt;a href="https://github.com/baddna/edittag/"&gt;edittag&lt;/a&gt;.  Now, here are the tags:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/downloads/BadDNA/edittag/edit_metric_tags.xls.zip"&gt;sequence tag xls file&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/downloads/BadDNA/edittag/edit_metric_tags.txt"&gt;sequence tag csv file&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/gzcgJUo0C10" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/11/20/a-plethora-of-sequence-tags.html</feedburner:origLink></entry>
 
 <entry>
   <title>extending the nextera indexing set</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/XobIew26-qI/extending-the-nextera-indexing-set.html" />
   <published>2010-11-09T00:00:00-08:00</published>
   <updated>2010-11-09T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2010/11/09/extending-the-nextera-indexing-set.html</id>
   <content type="html">&lt;p&gt;&lt;strong&gt;UPDATE&lt;/strong&gt;: For those of you unwilling to just jump at these, we have validated all of these extended indices and have not found problems with any of the full set of 24, across 6 lanes of multiplexed Illumina sequencing.  You may also be interested in &lt;a href="http://b.atcg.us/blog/2010/11/20/a-plethora-of-sequence-tags.html"&gt;edittag&lt;/a&gt;, which provides a more generic solution to the barcode generation problem.&lt;/p&gt;

&lt;p&gt;The &lt;a href="http://www.epibio.com/item.asp?ID=566"&gt;Epicentre Nextera&lt;/a&gt; kit comes with aliquots of barcodes that are compatible with the Illumina multiplex system.  However, they only send you adapters with 12 different indices:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#Index  Name
ATCACG  IDX_1
CGATGT  IDX_2
TTAGGC  IDX_3
TGACCA  IDX_4
ACAGTG  IDX_5
GCCAAT  IDX_6
CAGATC  IDX_7
ACTTGA  IDX_8
GATCAG  IDX_9
TAGCTT  IDX_10
GGCTAC  IDX_11
CTTGTA  IDX_12
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For whatever reason (HiSeq, anyone?), we may want to run a number of indexed sequences in a single sequencing run larger than 12.  Thus, we want to extend the set of 12 adapters to support a larger number of samples.  Skip to the results, here are the indices/barcodes that extend the set of 12:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="/media/txt/extended_nextera_tags_6nt_ed2_barcodes.txt"&gt;Extended Nextera Tags (6nt, edit distance = 2)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Now, as you'll see below, the default set of barcode IDX adapters are edit distance 2 apart from one another.  I don't typically feel comfortable until barcode adapters are edit distance ≥ 3 from one another - so, if we want to extend the Nextera set (using &lt;em&gt;most&lt;/em&gt; of the adapters provided), we can &lt;strong&gt;DROP IDX 1 and IDX 8 from the set of adapters that came with the kit&lt;/strong&gt;, and then design additional adapters from the barcode sequences int the following (see below for slightly more detail):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="/media/txt/extended_nextera_tags_6nt_ed3_barcodes.txt"&gt;Extended Nextera Tags (6nt, edit distance = 3)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I didn't have tons of time to spend on this, so what you can see over at &lt;a href="https://github.com/"&gt;github&lt;/a&gt; is slightly ugly.  But, it is what it is.  Also, I put the Methods posted over on github to catch any changes, but the &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; formatting that you get (notice below) is not quite WYSIWYG.&lt;/p&gt;

&lt;script src="https://gist.github.com/670033.js"&gt; &lt;/script&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/XobIew26-qI" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/11/09/extending-the-nextera-indexing-set.html</feedburner:origLink></entry>
 
 <entry>
   <title>an alternative method to run colony2</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/naA0vwF0kS0/an-alternative-method-to-run-colony2.html" />
   <published>2010-10-20T00:00:00-07:00</published>
   <updated>2010-10-20T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/10/20/an-alternative-method-to-run-colony2.html</id>
   <content type="html">&lt;p&gt;For parentage/sibship inference, I've started using &lt;a href="http://www.zsl.org/science/research/software/colony,1154,AR.html"&gt;Colony2&lt;/a&gt; and
&lt;a href="http://cran.r-project.org/web/packages/MasterBayes/index.html"&gt;MasterBayes&lt;/a&gt; in place of the venerable &lt;a href="http://www.fieldgenetics.com/pages/aboutCervus_Overview.jsp"&gt;Cervus&lt;/a&gt;.  However, one (of two*) things that is annoying about Colony2 are the scripts for running the program
that are available on "alternative" operating systems (e.g. not Windows).
These scripts are provided as part of the &lt;a href="http://r-forge.r-project.org/projects/rcolony/"&gt;rcolony&lt;/a&gt; package.&lt;/p&gt;

&lt;p&gt;Because the script to run colony through &lt;a href="http://www.r-project.org/"&gt;R&lt;/a&gt; (aka &lt;code&gt;run.colony()&lt;/code&gt;) was
causing me problems and because I'd rather do pretty much anything than hack on
R code, I wrote some &lt;a href="http://www.python.org"&gt;Python&lt;/a&gt; code to accomplish a function similar to
&lt;code&gt;run.colony()&lt;/code&gt; but that operates outside of R.  Here goes:&lt;/p&gt;

&lt;script src="http://gist.github.com/637381.js"&gt; &lt;/script&gt;


&lt;p&gt;* The second thing that can be a downer when using Colony on alternative
operating systems is the formatting of the input file.  I've got something in
the works, there, too... it's just not at all on the front burner&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/naA0vwF0kS0" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/10/20/an-alternative-method-to-run-colony2.html</feedburner:origLink></entry>
 
 <entry>
   <title>compiling pslCDNAFilter on osx</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/rtME55h0LJE/compiling-pslcdnafilter-on-osx.html" />
   <published>2010-10-18T00:00:00-07:00</published>
   <updated>2010-10-18T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/10/18/compiling-pslcdnafilter-on-osx.html</id>
   <content type="html">&lt;p&gt;I need to filter some &lt;code&gt;psl&lt;/code&gt; output from &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/11932250"&gt;blat&lt;/a&gt;, and I'd like to do it using
the &lt;code&gt;pslCDNAFilter&lt;/code&gt; that's provided as part of the &lt;a href="http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=summary"&gt;Kent Source tree&lt;/a&gt;.
Although the Kent Source typically behaves well after you set your environment
variables appropriately, this little bit was not behaving well.  So, I undertook a horrific hack job to get it to compile (this basically requires altering the &lt;code&gt;makefile&lt;/code&gt; and physically placing some text into &lt;code&gt;pslCDNAFilter.c&lt;/code&gt;):&lt;/p&gt;

&lt;script src="http://gist.github.com/633694.js"&gt; &lt;/script&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/rtME55h0LJE" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/10/18/compiling-pslcdnafilter-on-osx.html</feedburner:origLink></entry>
 
 <entry>
   <title>installing mira on osx</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/-RUMaljjeTs/installing-mira-on-osx.html" />
   <published>2010-10-16T00:00:00-07:00</published>
   <updated>2010-10-16T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/10/16/installing-mira-on-osx.html</id>
   <content type="html">&lt;p&gt;I need to assemble some 454 &lt;a href="http://en.wikipedia.org/wiki/RNA-Seq"&gt;RNA-seq&lt;/a&gt;
data.  I've evaluated a couple of assembly methods, and I'd like to give
&lt;a href="http://mira-assembler.sourceforge.net/"&gt;mira&lt;/a&gt; a shot.  The output of
&lt;code&gt;./configure&lt;/code&gt; indicates that this build can be problematic
on &lt;a href="http://en.wikipedia.org/wiki/Osx"&gt;osx&lt;/a&gt;, so here's what i did:&lt;/p&gt;

&lt;script src="http://gist.github.com/630205.js"&gt; &lt;/script&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/-RUMaljjeTs" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/10/16/installing-mira-on-osx.html</feedburner:origLink></entry>
 
 <entry>
   <title>mysql date and time strings to datetime values</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/7vnPmpp2PiI/mysql-date-and-time-strings-to-datetime-values.html" />
   <published>2010-10-14T00:00:00-07:00</published>
   <updated>2010-10-14T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/10/14/mysql-date-and-time-strings-to-datetime-values.html</id>
   <content type="html">&lt;p&gt;Getting dates and times into MySQL can be a total pain.  Sometimes, it's easier
to just import the date values into a varchar column and do the same with time
values - and &lt;strong&gt;then&lt;/strong&gt; convert them to a datetime type (which is far more handy)
than a date and a time column (since you can derive both from a datetime
column).&lt;/p&gt;

&lt;p&gt;So, in my case, I have this setup:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="sql"&gt;&lt;span class="n"&gt;mysql&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;usdi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;all_bands_temp&lt;/span&gt; 
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;RAND&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;-----------+--------------+------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;usdi&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;capture_date&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;-----------+--------------+------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;154187574&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;98&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1312&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;139107796&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;0820&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;136141001&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1209&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;187140505&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;0750&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;190134590&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;06&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;0000&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;190134700&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;06&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;0000&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;139105075&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;88&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1135&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;139107358&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;89&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1017&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;190134254&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;04&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;0000&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;154189988&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;03&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1322&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;-----------+--------------+------+&lt;/span&gt;
&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="n"&gt;sec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;p&gt;To convert these into a datetime value, all i need to do is:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="sql"&gt;&lt;span class="n"&gt;mysql&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;usdi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;str_to_date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;capture_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39; &amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;%m/%d/%y %k%i&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;str_to_datetime&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;all_bands_temp&lt;/span&gt; &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;RAND&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;-----------+--------------+------+---------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;usdi&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;capture_date&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;str_to_datetime&lt;/span&gt;     &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;-----------+--------------+------+---------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;139106995&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;91&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;0707&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1991&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;06&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="mi"&gt;07&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;07&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;149143944&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;95&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1212&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1995&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;06&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;02&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;139107786&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1730&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1990&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;147164369&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;91&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1030&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1991&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;149143925&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;95&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1140&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1995&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;154189201&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;02&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1338&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;2002&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;38&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;123156987&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;                &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;190134380&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;04&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;0000&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;2004&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;06&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt; &lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;139109340&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;87&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;0830&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1987&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;07&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="mi"&gt;08&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;154189917&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;03&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1332&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;2003&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;03&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;-----------+--------------+------+---------------------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;p&gt;Now that we know that, all we need to do is insert these values into a datetime
column...&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/7vnPmpp2PiI" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/10/14/mysql-date-and-time-strings-to-datetime-values.html</feedburner:origLink></entry>
 
 <entry>
   <title>two sixty two eighty</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/MX1ctoQTLis/two-sixty-two-eighty.html" />
   <published>2010-10-11T00:00:00-07:00</published>
   <updated>2010-10-11T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/10/11/two-sixty-two-eighty.html</id>
   <content type="html">&lt;p&gt;We have a &lt;a href="http://nanodrop.com/"&gt;nanodrop&lt;/a&gt; in our lab.  It serves us extremely well, and i really appreciate it - i was initially "broken in" on an older spectrophotometer (Beckman DU-800) that basically required you to sacrifice half your sample to get a so-so measure of DNA concentration.&lt;/p&gt;

&lt;p&gt;Because we have a nanodrop in our lab, people often ask me about &lt;a href="http://en.wikipedia.org/wiki/Nucleic_acids_analysis"&gt;260/280&lt;/a&gt; ratios.  Well, more commonly, they ask me about 260/280 ratios when things go &lt;a href="http://media.photobucket.com/image/nuclear%20bomb/glamgalz/funzug/imgs/misc/nuclear_bomb_expo_01.jpg"&gt;poorly&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;First, i should state that the &lt;strong&gt;gold standard&lt;/strong&gt; for whether or not a DNA extraction is "good" (i,e. assuming you are doing PCR) is whether or not you can get it to amplify... with or without &lt;a href="http://en.wikipedia.org/wiki/Chelex_100"&gt;trickeration&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Second, for those still interested in 260/280 ratios and the presence of contaminants, i offer this handy little publication (from nanodrop/thermofisher), which i'll just refer to as &lt;a href="http://www.nanodrop.com/Library/T009-NanoDrop%201000-&amp;amp;-NanoDrop%208000-Nucleic-Acid-Purity-Ratios.pdf"&gt;number nine&lt;/a&gt;.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/MX1ctoQTLis" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/10/11/two-sixty-two-eighty.html</feedburner:origLink></entry>
 
 <entry>
   <title>starting a jekyll post from the CLI</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/x56h_NNTJTw/starting-a-jekyll-post-from-the-cli.html" />
   <published>2010-10-03T00:00:00-07:00</published>
   <updated>2010-10-03T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/10/03/starting-a-jekyll-post-from-the-cli.html</id>
   <content type="html">&lt;p&gt;Starting a &lt;a href="http://github.com/mojombo/jekyll"&gt;jekyll&lt;/a&gt; post is one of the minor annoyances of using jekyll (otherwise there are few).  However, you can certainly script the redundant task. And, there are several scripts out there, but few worked well for me without throwing an error or two.  So, I forked one script, added pieces of another, and the outcome is below:&lt;/p&gt;

&lt;script src="http://gist.github.com/608362.js"&gt; &lt;/script&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/x56h_NNTJTw" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/10/03/starting-a-jekyll-post-from-the-cli.html</feedburner:origLink></entry>
 
 <entry>
   <title>chunking a fasta file, part 1</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/VzbmwirT7I0/chunking-a-fasta-file-part-1.html" />
   <published>2010-10-03T00:00:00-07:00</published>
   <updated>2010-10-03T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/10/03/chunking-a-fasta-file-part-1.html</id>
   <content type="html">&lt;p&gt;I've been thinking for a little while about how best to chunk up a gigantic fasta file for distribution across several machines (more on that later).  There are obviously several ways to do this - one of which would just be to sequentially read &lt;code&gt;x&lt;/code&gt; number of fasta entries (say, 10,000 with some &lt;a href="http://www.biopython.org/wiki/SeqIO"&gt;fasta&lt;/a&gt; &lt;a href="http://pypi.python.org/pypi/pyfasta/"&gt;parser&lt;/a&gt;) and split them off into a file.  Then, rinse and repeat.&lt;/p&gt;

&lt;p&gt;However, when we're talking about larger files (let's assume something in the GB range), we may actually want to generate a tuple of start and stop positions within a file (such that each block contains ~ 1 MB of data), and then use this tuple of positions to spread file splitting duties among more than one thread/processor - by having each process grab a set of coordinates, extract a particular part of a file, and write that piece to an outfile.  Let's assume we're splitting something like the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;1. this is my first line\n
2. this is my second line\n
3. this is my third line\n
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For this sort of data, we can take the approach discussed in the &lt;a href="http://effbot.org/zone/wide-finder.htm"&gt;widefinder&lt;/a&gt; benchmarks.  Namely, we can write some code to seek through the file in 1 MB chunks, make sure the line we're on is actually a line, and record the start and end positions that split the file into 1 MB chunks/blocks.&lt;/p&gt;

&lt;p&gt;However, fasta (and similar files like fastq) present an annoying challenge, by their very design... fasta files look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt;fasta-sequence-number-one
ATTTACCCTTTATTTGTCCAGTTGACGTTTTACTTTTATAGATAAATGTTTTTAAGAAGT
TATTGGGCGTTGCTACGAGTGATTGGTAAATACCTTATTGTTTTACTATGTCATGAAGTG
TGACGTACGTGTCATCCTATTTAAAACTTGTCAGTTGAATGTATCTGCATTCTTGGAGTT
&amp;gt;fasta-sequence-number-two
TGACGTACGTGTCATCCTATTTAAAACTTGTCAGTTGAATGTATCTGCATTCTTGGAGTT
ATTTACCCTTTATTTGTCCAGTTGACGTTTTACTTTTATAGATAAATGTTTTTAAGAAGT
TATTGGGCGTTGCTACGAGTGATTGGTAAATACCTTATTGTTTTACTATGTCATGAAGTG
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What should be apparent, now, is the crux of the problem... the header line of a fasta file is followed by one, two, or many additional lines of sequence data - meaning that the header line (which begins with '&gt;') and the trailing lines (which are not prepended with a symbol) are a &lt;em&gt;unit&lt;/em&gt; that goes together.  IN other words, this fasta unit could be 2 lines long (1 header line; 1 sequence line) or 20,000 lines long (1 header line and 19,999 sequence lines).  Compounding the formatting nightmare is the additional fact that various fasta-formatting schemes will split the sequence portion of the read across an arbitrary length of characters, to make everything &lt;em&gt;pretty&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Thus, the technique used in the widefinder benchmarks - assuming lines of a file are independent while splitting files into chunks of arbitrary size can cause grave problems when applied to DNA sequence data in fasta format:   using this method, we could inadvertently separate (1) the header from it's sequence, (2) the header and line one from lines 2...n, (3) the header and lines 2-3 from lines 4...n, etc., etc.&lt;/p&gt;

&lt;p&gt;So, what we need is a method to chunk fasta files that is fast (e.g. ideally faster than sequential read-write operations); efficient (e.g. chunking the file into smaller pieces); and accurate (e.g. not splitting a fasta mid-record).&lt;/p&gt;

&lt;p&gt;With that in mind, I offer up fasta_chunker.py, the first of a series of steps that will read a fasta file, returning a iterator of start and offset positions of a chunk that is &lt;em&gt;n&lt;/em&gt; (where default = 1) megabytes in size:&lt;/p&gt;

&lt;script src="http://gist.github.com/609035.js"&gt; &lt;/script&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/VzbmwirT7I0" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/10/03/chunking-a-fasta-file-part-1.html</feedburner:origLink></entry>
 
 <entry>
   <title>bulk sequence renaming with biopython</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/FOiqW6gRRho/rename-bulk-sequence.html" />
   <published>2010-08-31T00:00:00-07:00</published>
   <updated>2010-08-31T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/08/31/rename-bulk-sequence.html</id>
   <content type="html">&lt;p&gt;Sometimes large, genome-size files are not named like you want them to be.  Most of
them are large and it's easier to rename them programmatically.  Here is one
way to do that, using BioPython.&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="python"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;Bio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SeqIO&lt;/span&gt;

&lt;span class="c"&gt;# create a dict to hold our new GI:names mapping, which looks like so, in &lt;/span&gt;
&lt;span class="c"&gt;# this case (from sheep)&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# gi|289623201|gb|CM000885.1|   299839927   chr1&lt;/span&gt;
&lt;span class="c"&gt;# gi|289623190|gb|CM000894.1|   94216033     chr10&lt;/span&gt;
&lt;span class="c"&gt;# gi|289623189|gb|CM000895.1|   67137890     chr11&lt;/span&gt;
&lt;span class="c"&gt;# gi|289623188|gb|CM000896.1|   86457535     chr12&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="c"&gt;# read in this file, split it into a dict&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;oviAri1 copy.info&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;rU&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;line_split&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;line_split&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line_split&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c"&gt;# open a file for the output&lt;/span&gt;
&lt;span class="n"&gt;output_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;new_oviAri1_seq.fa&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;w&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# create an iterable to hold the new data&lt;/span&gt;
&lt;span class="n"&gt;new_seq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="c"&gt;# iterate over seq, updating the name.  This is going to give us something like:&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# &amp;gt;chr1&lt;/span&gt;
&lt;span class="c"&gt;# &lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;SeqIO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;oviAri1.fa&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;rU&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;fasta&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;new_record_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_record_name&lt;/span&gt;
    &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&amp;#39;&lt;/span&gt;
    &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&amp;#39;&lt;/span&gt;
    &lt;span class="n"&gt;new_seq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# write the whole thing out&lt;/span&gt;
&lt;span class="n"&gt;SeqIO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;fasta&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/FOiqW6gRRho" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/08/31/rename-bulk-sequence.html</feedburner:origLink></entry>
 
 <entry>
   <title>restriction batches and biopython</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/N_j8dOastEU/virtual-restriction-batch-digest.html" />
   <published>2010-07-20T00:00:00-07:00</published>
   <updated>2010-07-20T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/07/20/virtual-restriction-batch-digest.html</id>
   <content type="html">&lt;p&gt;Sometimes, you need to get an idea of the sizes of fragments you are likely to get from a sequence cut with a &lt;em&gt;batch&lt;/em&gt; of restriction enzymes.  Here's how you would do that virtually.&lt;/p&gt;

&lt;p&gt;This might be useful when you have no idea of the likely size of resulting fragments, but you'd like to have some idea - before you order a lot of enzymes.&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="python"&gt;&lt;span class="c"&gt;# subset record #&lt;/span&gt;

&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c"&gt;# take a look at the common restriction enzymes&lt;/span&gt;
&lt;span class="n"&gt;Restriction&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CommOnly&lt;/span&gt;

&lt;span class="c"&gt;# do a restriction batch analysis on our test sequence with ALL common&lt;/span&gt;
&lt;span class="c"&gt;# enzymes&lt;/span&gt;
&lt;span class="n"&gt;Ana&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Restriction&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Restriction&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CommOnly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linear&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# look that the enzymes that cut (BLUNT) the test sequence in more than #&lt;/span&gt;
&lt;span class="c"&gt;# 10 spots&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;Ana&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blunt&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Ana&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blunt&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Ana&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blunt&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;])])&lt;/span&gt;

&lt;span class="c"&gt;# sort by the 2nd element of the list&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    
&lt;span class="c"&gt;# or you can do this:&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;operator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;itemgetter&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;itemgetter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/N_j8dOastEU" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/07/20/virtual-restriction-batch-digest.html</feedburner:origLink></entry>
 
 <entry>
   <title>virtual restriction digests with biopython</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/K-ZjCDKqDDA/virtual-restriction-digest.html" />
   <published>2010-07-16T00:00:00-07:00</published>
   <updated>2010-07-16T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/07/16/virtual-restriction-digest.html</id>
   <content type="html">&lt;p&gt;The other day, I needed to get an approximate idea of the size of fragments
returned from a particular restriction digest.  Generally, this sort of
info is available on the intertubes, but I was not finding terribly much.&lt;/p&gt;

&lt;p&gt;So, I decided to do a quick, virtual restriction digest using Python, BioPython,
and numpy for some summary stats (in this case I "digested" chr1 of &lt;em&gt;Arabidopsis thaliana&lt;/em&gt;):&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="python"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;Bio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SeqIO&lt;/span&gt;
&lt;span class="n"&gt;seq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SeqIO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;chr1.fas&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;rU&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;fasta&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mi"&gt;30427671&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;Bio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Restriction&lt;/span&gt;
&lt;span class="n"&gt;Restriction&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HindIII&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sites&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Restriction&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HindIII&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# get the number of fragments produced by cutting&lt;/span&gt;
&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sites&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mi"&gt;16963&lt;/span&gt;

&lt;span class="c"&gt;# sort the sites by position (just to be sure)&lt;/span&gt;
&lt;span class="n"&gt;sites&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c"&gt;# get the distance between sites&lt;/span&gt;
&lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sites&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sites&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sites&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;

&lt;span class="c"&gt;# generate some summary stats&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;math&lt;/span&gt;
&lt;span class="n"&gt;dist_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# mean fragment size&lt;/span&gt;
&lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dist_a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mf"&gt;1793.6787525056006&lt;/span&gt;

&lt;span class="c"&gt;# 95 % Confidence invterval around mean&lt;/span&gt;
&lt;span class="mf"&gt;1.96&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dist_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ddof&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dist_a&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="mf"&gt;33.572399955020472&lt;/span&gt;

&lt;span class="c"&gt;# get number of fragments btw. 100 and 500 bp - this could likely use some #&lt;/span&gt;
&lt;span class="c"&gt;# cleanup, but I was in a hurry&lt;/span&gt;
&lt;span class="n"&gt;less_than_five_bool&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dist_a&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;span class="n"&gt;less_than_five&lt;/span&gt;         &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dist_a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;less_than_five_bool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;greater_than_one_bool&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;less_than_five&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="n"&gt;one_to_five_hundred_bp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;less_than_five&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;greater_than_one_bool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/K-ZjCDKqDDA" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/07/16/virtual-restriction-digest.html</feedburner:origLink></entry>
 
 <entry>
   <title>removing a file/directory permanently from git</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/0JIub3bvAQQ/remove-from-git.html" />
   <published>2010-06-29T00:00:00-07:00</published>
   <updated>2010-06-29T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/06/29/remove-from-git.html</id>
   <content type="html">&lt;p&gt;I'm not entirely sure how many times I've needed this or how many times I've dug it up only to forget it later, but here is how to permanently remove a file or directory from a git repository.  This comes by way of &lt;a href="http://dalibornasevic.com/posts/2-permanently-remove-files-and-folders-from-a-git-repository"&gt;Dalibor Nasevic's site&lt;/a&gt;.&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="bash"&gt;&lt;span class="c"&gt;# blackhole the content&lt;/span&gt;
git filter-branch --tree-filter &lt;span class="s1"&gt;&amp;#39;rm -rf my/folder&amp;#39;&lt;/span&gt; HEAD

&lt;span class="c"&gt;# force the update&lt;/span&gt;
git push origin master --force
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;p&gt;If you like to read up on these things, you can find more information in &lt;a href="http://progit.org/book/ch6-4.html"&gt;Chapter 6&lt;/a&gt; of the &lt;a href="http://progit.org/"&gt;ProGit&lt;/a&gt; book.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/0JIub3bvAQQ" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/06/29/remove-from-git.html</feedburner:origLink></entry>
 
 <entry>
   <title>py2app hell, the second</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/eVC3nhI-o1Y/py2app-hell-the-second.html" />
   <published>2010-06-14T00:00:00-07:00</published>
   <updated>2010-06-14T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/06/14/py2app-hell-the-second.html</id>
   <content type="html">&lt;p&gt;I have this funny relationship with py2app.  I like what it produces but it is a real hassle to get working.  Anyway, the basic situation is this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;i needed an application bundle for &lt;a href="http://msatcommander.googlecode.com"&gt;msatcommander&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;you cannot, technically, include the Apple-compiled python in an application bundle&lt;/li&gt;
&lt;li&gt;this means you need to get one from somewhere else (e.g. roll-your-own)&lt;/li&gt;
&lt;li&gt;pyqt4 requires a framework build of python (unless you hack, hack, hack)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Other interesting tidbits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you need to keep the architectures of everything produced the same.  i chose to stick with &lt;code&gt;x86_64&lt;/code&gt; and &lt;code&gt;i386&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;the prepackaged Qt is built for 3 architectures. This will cause you problems.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Therefore:&lt;/p&gt;

&lt;script src="http://gist.github.com/437957.js"&gt;&lt;/script&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/eVC3nhI-o1Y" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/06/14/py2app-hell-the-second.html</feedburner:origLink></entry>
 
 <entry>
   <title>matplotlib hell, too</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/wkslKrG0fKg/matplotlib-hell.html" />
   <published>2010-06-14T00:00:00-07:00</published>
   <updated>2010-06-14T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/06/14/matplotlib-hell.html</id>
   <content type="html">&lt;p&gt;For another thing I'm working on, I needed matplotlib to compile (this is on a slightly difference machine than that of the previous post).  Bottom line is this:  I needed matplotlib built for &lt;code&gt;x86_64&lt;/code&gt; and &lt;code&gt;i386&lt;/code&gt; and without too much other mumbo-jumbo.  I wanted to build this in a virtualenv, because that is how i like things.&lt;/p&gt;

&lt;p&gt;Needless to say, matplotlib building is pretty awful (see &lt;a href="http://ambiguousbase.com/post/444830454/matplotlib-is-a-compilation-nightmare-but"&gt;previous post&lt;/a&gt;).  It sort of reminds me of py2app.&lt;/p&gt;

&lt;p&gt;Of course, all of that being said, part of the problem is due to Apple switching architectures while maintaining backwards compatible support (at least up to OSX 10.5).  Anyway, I was finally able to get a working matplot lib (with help from the stackoverflow post &lt;a href="http://stackoverflow.com/questions/1477144/compile-matplotlib-for-python-on-snow-leopard"&gt;here&lt;/a&gt;, using the following OSX-specific makefile (note: i installed freetype and libpng from source into &lt;code&gt;/usr/local/&lt;/code&gt;)&lt;/p&gt;

&lt;script src="http://gist.github.com/437978.js"&gt;&lt;/script&gt;


&lt;p&gt;And the following edits to the &lt;code&gt;./Makefile&lt;/code&gt; that comes with matplotlib (not making these edits builds some shared objects for &lt;code&gt;ppc&lt;/code&gt; and &lt;code&gt;i386&lt;/code&gt;, which causes problems):&lt;/p&gt;

&lt;script src="http://gist.github.com/437975.js"&gt;&lt;/script&gt;


&lt;p&gt;Finally, I built matplotlib with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;make -f osx.make mpl_build
make -f osx.make mpl_install
&lt;/code&gt;&lt;/pre&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/wkslKrG0fKg" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/06/14/matplotlib-hell.html</feedburner:origLink></entry>
 
 <entry>
   <title>revisiting regex</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/y4GjJ-3dYKI/revisiting-regex.html" />
   <published>2010-04-13T00:00:00-07:00</published>
   <updated>2010-04-13T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/04/13/revisiting-regex.html</id>
   <content type="html">&lt;p&gt;To follow up on a &lt;a href="http://ambiguousbase.tumblr.com/post/444839066/of-parsers-regex-and-python"&gt;previous post&lt;/a&gt;, Google has today released &lt;a href="http://code.google.com/p/re2/"&gt;re2&lt;/a&gt;, noting the release in a &lt;a href="http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html"&gt;blog post&lt;/a&gt;.  From the &lt;a href="http://code.google.com/p/re2/"&gt;re2&lt;/a&gt; website:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;RE2 is a fast, safe, thread-friendly alternative to backtracking
regular expression engines like those used in PCRE, Perl, and
Python. It is a C++ library.&lt;/p&gt;&lt;/blockquote&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/y4GjJ-3dYKI" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/04/13/revisiting-regex.html</feedburner:origLink></entry>
 
 <entry>
   <title>py2app hell, the first</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/oK1_4_94yDc/py2app-hell-the-first.html" />
   <published>2010-04-13T00:00:00-07:00</published>
   <updated>2010-04-13T00:00:00-07:00</updated>
   <id>http://b.atcg.us/blog/2010/04/13/py2app-hell-the-first.html</id>
   <content type="html">&lt;p&gt;I ran into a problem with a program that I work on where &lt;a href="http://www.wxpython.org/"&gt;wxPython&lt;/a&gt; wasn't cutting it any longer, for various reasons (notably their current lack of support for 64-bit archs - see &lt;a href="http://trac.wxwidgets.org/ticket/11160"&gt;here&lt;/a&gt;).  I wasn't really in love with wxPython, anyway, so I decided to ditch it in favor of &lt;a href="http://qt.nokia.com/downloads"&gt;Qt4&lt;/a&gt;, in combination with &lt;a href="http://www.riverbankcomputing.co.uk/software/pyqt/intro"&gt;PyQt4&lt;/a&gt; (which depends on sip for its C++ bindings).&lt;/p&gt;

&lt;p&gt;While the general process of rewriting my GUI interface was pretty straightforward (given the decent tools in the Qt SDK + pyuic from PyQt4), I figured that packaging up the release was going to be painful.  Call me crazy, but the shift to 64-bit in Snow Leopard should cause some issues with &lt;a href="http://svn.pythonmac.org/py2app/py2app/trunk/doc/index.html"&gt;py2app&lt;/a&gt; working well for building bundles.  It generally turns out that I was correct (don't get me wrong, I like py2app, it's just touchy).&lt;/p&gt;

&lt;p&gt;I searched around for specific instructions on how to fix lots of the things that were broken.  These included what I think was an endianness error, a typo in py2app's subversion code, mach-o build mismatches between sip-PyQt4-PyQt, and failure of py2app to add a directory to the bundle you build (i think that's it).  Of most utility in my problem solving, I came across &lt;a href="http://www.mail-archive.com/pythonmac-sig@python.org/msg09615.html"&gt;this&lt;/a&gt; thread which details many of the additions made (in the subversion trunk) to get py2app working, so thank you &lt;a href="http://maparent.ca/"&gt;Marc-Antoine Parent&lt;/a&gt;.  You are awesome.  This &lt;a href="http://www.rkblog.rk.edu.pl/w/p/building-mac-os-x-applications-py2app/"&gt;post&lt;/a&gt; by &lt;a href="http://twitter.com/riklaunim"&gt;Piotr Maliński&lt;/a&gt; was also helpful.&lt;/p&gt;

&lt;p&gt;Long story short, for the moment, on Snow Leopard (10.6.2) with the system python, you need to do the following to make things work:&lt;/p&gt;

&lt;script src="http://gist.github.com/607966.js"&gt; &lt;/script&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/oK1_4_94yDc" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/04/13/py2app-hell-the-first.html</feedburner:origLink></entry>
 
 <entry>
   <title>sphinx documentation and github</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/zP9toWfrHTk/sphinx-documentation-and-github.html" />
   <published>2010-02-10T00:00:00-08:00</published>
   <updated>2010-02-10T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2010/02/10/sphinx-documentation-and-github.html</id>
   <content type="html">&lt;p&gt;So, there's tools out there to get your documentation up on github (&lt;a href="http://github.com/dinoboff/github-tools"&gt;github-tools&lt;/a&gt;, &lt;a href="http://github.com/michaeljones/sphinx-to-github"&gt;sphinx-to-github&lt;/a&gt;), but neither totally jive with how I want to do things.  So, I went searching long and hard for a way to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep the document-generating source in &lt;code&gt;master&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;place the generated html in &lt;code&gt;gh-pages&lt;/code&gt;, so it's served as static html&lt;/li&gt;
&lt;li&gt;have everything just "work"&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Luckily, I came across this post which gets you most of the way there (but assumes you're using jekyll, which isn't the case for me/us).&lt;/p&gt;

&lt;p&gt;So…&lt;/p&gt;

&lt;script src="http://gist.github.com/301301.js?file=gistfile1.sh"&gt;&lt;/script&gt;



&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/zP9toWfrHTk" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/02/10/sphinx-documentation-and-github.html</feedburner:origLink></entry>
 
 <entry>
   <title>climm and osx</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/xrbaVd0C9dE/climm-and-osx.html" />
   <published>2010-01-31T00:00:00-08:00</published>
   <updated>2010-01-31T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2010/01/31/climm-and-osx.html</id>
   <content type="html">&lt;p&gt;I've been looking for a CLI chat program that works pretty well.  After digging around for some time, I came across &lt;a href="http://www.climm.org/"&gt;climm&lt;/a&gt;.  climm has a lot of dependencies, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tcl&lt;/li&gt;
&lt;li&gt;iksemel&lt;/li&gt;
&lt;li&gt;libgcrypt&lt;/li&gt;
&lt;li&gt;openssl&lt;/li&gt;
&lt;li&gt;pkgconfig&lt;/li&gt;
&lt;li&gt;gloox&lt;/li&gt;
&lt;li&gt;gnutls&lt;/li&gt;
&lt;li&gt;libotr&lt;/li&gt;
&lt;li&gt;autoconf&lt;/li&gt;
&lt;li&gt;automake&lt;/li&gt;
&lt;li&gt;libtool&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;You can get most of these from &lt;a href="http://www.macports.org/"&gt;macports&lt;/a&gt; (if you install libgcrypt, you should pull down pretty much everything).  You will still need to &lt;code&gt;port install libotr&lt;/code&gt; and you will also need to download the source for gloox (1.0) as the version at macports fails to build.&lt;/p&gt;

&lt;p&gt;Once all of that is done, get the source for climm, patch the source of configure.ac with the patch referenced below (&lt;a href="http://gist.github.com/331172"&gt;climm-r2826.patch&lt;/a&gt;) and &lt;code&gt;rm configure&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You also need to patch  &lt;code&gt;climm-0.7/src/io/io_dns.c&lt;/code&gt; replacing&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#include &amp;lt;arpa/nameser.h&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;with&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#include &amp;lt;arpa/nameser_compat.h&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;See &lt;a href="http://gist.github.com/331171"&gt;climm-nameser_compat.patch&lt;/a&gt;.  Now, you want to&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;make clean
./configure
make
sudo make install
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Follow the instructions from the wizard.  climm initially segfaulted after finishing the wizard, but appears to be "fine" on a restart.&lt;/p&gt;

&lt;p&gt;Update:  I should have mentioned that these changes were made to the 0.7.0 source distribution.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/xrbaVd0C9dE" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/01/31/climm-and-osx.html</feedburner:origLink></entry>
 
 <entry>
   <title>moving folder from an existing git repo</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/CfZo0KixsVQ/move-folder-from-existing-git-repo.html" />
   <published>2010-01-16T00:00:00-08:00</published>
   <updated>2010-01-16T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2010/01/16/move-folder-from-existing-git-repo.html</id>
   <content type="html">&lt;p&gt;Today, I needed to move some code from another project into it's own repo. I really wanted to keep the history of the changes to the code, during its development, so I went a googlin'. Here's what I found:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://bit.ly/7fUfBy"&gt;http://bit.ly/7fUfBy&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Worked like a charm.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/CfZo0KixsVQ" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/01/16/move-folder-from-existing-git-repo.html</feedburner:origLink></entry>
 
 <entry>
   <title>another mini next-gen sequencer</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/gLPdH9qJ-wA/mini-genome-analyzer.html" />
   <published>2010-01-15T00:00:00-08:00</published>
   <updated>2010-01-15T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2010/01/15/mini-genome-analyzer.html</id>
   <content type="html">&lt;p&gt;&lt;a href="http://www.illumina.com/systems/genome_analyzer.ilmn"&gt;&lt;img src="/media/img/2010/mini-genome-analyzer.jpg" alt="mini illumina genome analyzer" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not to be outdone by &lt;a href="http://www.454.com/"&gt;Roche&lt;/a&gt;, &lt;a href="http://www.illumina.com/"&gt;Illumina&lt;/a&gt; has announced a new version of it's Genome Analyzer:  the &lt;a href="http://www.illumina.com/"&gt;IIe&lt;/a&gt;.  The IIe's specs are generally pretty reasonable (18-20 GB in about 10 days @ 2x100).  Cost will be approximately $250,000.&lt;/p&gt;

&lt;p&gt;Ilumina also annouced a big brother to the &lt;a href="http://www.illumina.com/systems/genome_analyzer_iix.ilmn"&gt;IIx&lt;/a&gt;, the &lt;a href="http://www.illumina.com/systems/hiseq_2000.ilmn"&gt;HiSeq2000&lt;/a&gt;, which kicks things up a notch to 150-200 Gb (of 2x100) in about 8 days.&lt;/p&gt;

&lt;p&gt;Wow.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/gLPdH9qJ-wA" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2010/01/15/mini-genome-analyzer.html</feedburner:origLink></entry>
 
 <entry>
   <title>pianobar - a CLI for pandora</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/9DtokPrrIu8/pianobar-a-cli-for-pandora.html" />
   <published>2009-12-31T00:00:00-08:00</published>
   <updated>2009-12-31T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2009/12/31/pianobar-a-cli-for-pandora.html</id>
   <content type="html">&lt;p&gt;Just ran across this - &lt;a href="http://github.com/PromyLOPh/pianobar"&gt;Pianobar&lt;/a&gt; - which is a CLI for &lt;a href="http://www.pandora.com/"&gt;Pandora&lt;/a&gt;.  This builds pretty easily on the mac.  Following (with a modification) instructions from &lt;a href="http://www.reddit.com/r/linux/comments/a3snu/pianobar_is_a_command_line_client_for_pandora_no/c0fupws"&gt;this&lt;/a&gt; thread over at &lt;a href="http://www.reddit.com"&gt;reddit.com&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;del&gt;Install &lt;a href="http://www.macports.org/"&gt;MacPorts&lt;/a&gt; or make sure it's squared away with this &lt;a href="http://gist.github.com/331169"&gt;gist&lt;/a&gt;&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:  it's fantastic!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update 2:  just install it using &lt;a href="6"&gt;homebrew&lt;/a&gt;:&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;brew install pianobar
&lt;/code&gt;&lt;/pre&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/9DtokPrrIu8" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2009/12/31/pianobar-a-cli-for-pandora.html</feedburner:origLink></entry>
 
 <entry>
   <title>temporarily fix wxPython on osx 10.6</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/P5ulAINrY8o/temporarily-fix-wxpython-on-osx-106.html" />
   <published>2009-12-23T00:00:00-08:00</published>
   <updated>2009-12-23T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2009/12/23/temporarily-fix-wxpython-on-osx-106.html</id>
   <content type="html">&lt;p&gt;Just run this in the terminal to change to 32-bit Python for the session or set your environment variable in .bash_profile.  Once you want to change back, just erase it.  No fiddling around with defaults write com.blah.blah.blah:&lt;/p&gt;

&lt;script src="http://gist.github.com/262879.js?file=gistfile1.sh"&gt;&lt;/script&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/P5ulAINrY8o" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2009/12/23/temporarily-fix-wxpython-on-osx-106.html</feedburner:origLink></entry>
 
 <entry>
   <title>python multiprocessing.JoinableQueue() example</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/6cnbMmHelJE/python-multiprocessingjoinablequeue-example.html" />
   <published>2009-12-19T00:00:00-08:00</published>
   <updated>2009-12-19T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2009/12/19/python-multiprocessingjoinablequeue-example.html</id>
   <content type="html">&lt;p&gt;After a lot of pain and suffering trying to debug what was going wrong that appeared to be a stuck process, it simply boiled down to not closing my Queues:&lt;/p&gt;

&lt;script src="http://gist.github.com/260331.js?file=gistfile1.py"&gt;&lt;/script&gt;


&lt;p&gt;Here's the full module's documentation:  http://docs.python.org/library/multiprocessing.html&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/6cnbMmHelJE" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2009/12/19/python-multiprocessingjoinablequeue-example.html</feedburner:origLink></entry>
 
 <entry>
   <title>of parsers, regex and python...</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/F5VUsDGLUQY/of-parsers-regex-and-python.html" />
   <published>2009-12-11T00:00:00-08:00</published>
   <updated>2009-12-11T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2009/12/11/of-parsers-regex-and-python.html</id>
   <content type="html">&lt;p&gt;After re-reading the excellent &lt;a href="http://swtch.com/%7Ersc/regexp/regexp1.html"&gt;Regular expression matching can be simple and fast&lt;/a&gt; by &lt;a href="http://swtch.com/~rsc/"&gt;Russ Cox&lt;/a&gt; and noticing a &lt;a href="http://www.reddit.com"&gt;reddit.com&lt;/a&gt; comment related to parsing expression grammars, I went on a google and came across the following handy summary of python parsers:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://nedbatchelder.com/text/python-parsers.html"&gt;Python parsing tools&lt;/a&gt;&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/F5VUsDGLUQY" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2009/12/11/of-parsers-regex-and-python.html</feedburner:origLink></entry>
 
 <entry>
   <title>add google calendar items from launchbar</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/-UuhCG1iVWY/add-google-calendar-items-from-launchbar.html" />
   <published>2009-12-11T00:00:00-08:00</published>
   <updated>2009-12-11T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2009/12/11/add-google-calendar-items-from-launchbar.html</id>
   <content type="html">&lt;p&gt;I came across a post on Launchbar's &lt;a href="http://forums.obdev.at/viewforum.php?f=24"&gt;forum&lt;/a&gt;, discussing a means of &lt;a href="http://forums.obdev.at/viewtopic.php?f=24&amp;amp;t=1591"&gt;adding entries to your Google Calendar via Launchbar&lt;/a&gt;. I've made a small simplification to the script that runs the terminal command in the background (with less error checking, of course). Follow the instructions up to the script presented in the forum post (install &lt;a href="http://code.google.com/p/gdata-python-client/"&gt;GData&lt;/a&gt; and &lt;a href="http://www.labix.org/python-dateutil"&gt;python-dateutil&lt;/a&gt;, &lt;a href="http://code.google.com/p/gcalcli/downloads/list"&gt;download&lt;/a&gt; and place the gcalcli script somewhere) and copy the following into AppleScript Editor:&lt;/p&gt;

&lt;script src="http://gist.github.com/254567.js?file=gcal.scpt"&gt;&lt;/script&gt;


&lt;p&gt;&lt;/p&gt;

&lt;p&gt;Then edit the first line to point to the gcalcli binary and save the script to ~/Library/Application Support/LaunchBar/Actions/.&lt;/p&gt;

&lt;p&gt;I may eventually add some other modifications, although Applescript is not my bag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;:  Google Calendar &lt;a href="http://www.google.com/support/calendar/bin/answer.py?hl=en&amp;amp;answer=36604#text"&gt;Quick Add Grammar&lt;/a&gt;&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/-UuhCG1iVWY" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2009/12/11/add-google-calendar-items-from-launchbar.html</feedburner:origLink></entry>
 
 <entry>
   <title>sql histograms</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/ABqsmmvyV8k/sql-histograms.html" />
   <published>2009-12-09T00:00:00-08:00</published>
   <updated>2009-12-09T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2009/12/09/sql-histograms.html</id>
   <content type="html">&lt;script src="http://gist.github.com/331137.js?file=gistfile1.sql"&gt;&lt;/script&gt;


&lt;p&gt;via Michael Stassen @ http://lists.mysql.com/mysql/155540&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/ABqsmmvyV8k" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2009/12/09/sql-histograms.html</feedburner:origLink></entry>
 
 <entry>
   <title>rotate labels in ggplot2</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/AYw-Kd9RW_8/rotate-labels-in-ggplot2.html" />
   <published>2009-12-09T00:00:00-08:00</published>
   <updated>2009-12-09T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2009/12/09/rotate-labels-in-ggplot2.html</id>
   <content type="html">&lt;p&gt;Rotating labels in &lt;a href="http://had.co.nz/ggplot2/"&gt;ggplot2&lt;/a&gt; can be mystery.  Maybe this helps:&lt;/p&gt;

&lt;script src="http://gist.github.com/331135.js?file=gistfile1.txt"&gt;&lt;/script&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/AYw-Kd9RW_8" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2009/12/09/rotate-labels-in-ggplot2.html</feedburner:origLink></entry>
 
 <entry>
   <title>median and mode in mysql</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/uUgWAz6ZvJM/median-and-mode-in-mysql.html" />
   <published>2009-12-09T00:00:00-08:00</published>
   <updated>2009-12-09T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2009/12/09/median-and-mode-in-mysql.html</id>
   <content type="html">&lt;script src="http://gist.github.com/253908.js?file=mysql_mode_and_median.sql"&gt;&lt;/script&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/uUgWAz6ZvJM" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2009/12/09/median-and-mode-in-mysql.html</feedburner:origLink></entry>
 
 <entry>
   <title>it's just so cute... Roche 454's gsjunior</title>
   <link href="http://feedproxy.google.com/~r/atcg/~3/X8A1GHvjpc0/its-just-so-cute-roche-454s-gsjunior.html" />
   <published>2009-12-09T00:00:00-08:00</published>
   <updated>2009-12-09T00:00:00-08:00</updated>
   <id>http://b.atcg.us/blog/2009/12/09/its-just-so-cute-roche-454s-gsjunior.html</id>
   <content type="html">&lt;p&gt;&lt;a href="http://www.gsjunior.com/"&gt;&lt;img src="/media/img/2009/mini-454.jpg" alt="the roche-454 gsjunior" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I may be behind the times a bit, but I’ve just run across the &lt;a href="http://www.gsjunior.com"&gt;cutest little circa $100,000 (? my guess ?) DNA sequencer&lt;/a&gt; in the land (&lt;a href="http://www.454.com/about-454/news/index.asp?display=detail&amp;amp;id=137"&gt;Press Release&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;You can see some opinion on the machine over at &lt;a href="http://pathogenomics.bham.ac.uk/blog/tag/gs-junior/"&gt;Pathogenomics&lt;/a&gt;. All points presented are certainly true:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;library construction needs to be easier&lt;/li&gt;
&lt;li&gt;reagents need to be less expensive&lt;/li&gt;
&lt;li&gt;etc.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;One thing that’s not mentioned, where I think this machine will be quite useful, is in the methods development and/or amplicon/sequence capture spaces - essentially those areas in which we are either trying to or succeeding at changing the initial focus of 454 sequencing from vertical (depth of coverage) to horizontal (breadth of coverage).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;a smaller platform for testing new sequencing techniques would be super handy - particularly if/when library prep is simplified and costs are reduced. Overall, small-run efficiency should be greater as you’re using the entirety of the plate for each run (i,e. you won’t loose reads from areas of the plate covered by a gasket) and costs should be lower as a result of increased efficiencies and direct reductions in time and reagent prices. I guess that I am sort of thinking of it as a &lt;a href="http://en.wikipedia.org/wiki/Rapid_application_development"&gt;rapid application development&lt;/a&gt; tool, but within the genomics/genetics sphere. The assumption would be that once the method is working, you scale up to the “full” 454 run.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Within the amplicon/sequence capture sp here, the machine should be of sufficient size for daily to weekly runs and likely of sufficient size to track the workflow of several lab employees. Assuming that the gsjunior will return roughly 80,000 reads (from the 100k spec.), we get a theoretical coverage of 10X across 1000 loci from 8 individuals. Similarly, if we expand to 96 individuals, we’re going to get around 10X coverage of 80 loci (or @ 8X we can get roughly 100 loci per 96 individuals).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;img src="http://feeds.feedburner.com/~r/atcg/~4/X8A1GHvjpc0" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://b.atcg.us/blog/2009/12/09/its-just-so-cute-roche-454s-gsjunior.html</feedburner:origLink></entry>
 
 
</feed>

