<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" version="2.0"><channel><title>Bracing against the wind</title><link>http://www.documentroot.com/</link><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/BracingAgainstTheWind" /><description>Lo! In the east against the blazing shadow of a rising sun, there come four men striding in unison, each step a journey, against the terrible winds.</description><language>en</language><managingEditor>noreply@blogger.com (Erik)</managingEditor><lastBuildDate>Fri, 09 Mar 2012 22:57:48 PST</lastBuildDate><generator>Blogger http://www.blogger.com</generator><openSearch:totalResults xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/">512</openSearch:totalResults><openSearch:startIndex xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/">1</openSearch:startIndex><openSearch:itemsPerPage xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/">25</openSearch:itemsPerPage><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="bracingagainstthewind" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><geo:lat>36.015992</geo:lat><geo:long>-78.959119</geo:long><creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">BracingAgainstTheWind</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><item><title>gethostbyname command line</title><link>http://www.documentroot.com/2012/02/gethostbyname-command-line.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Fri, 09 Mar 2012 12:55:42 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-4765921414605580527</guid><description>Pasteable script below.   I can't believe this doesn't exist.  No command line tool to get a host name using the resolver on linux?  &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#!/usr/bin/perl&lt;br /&gt;use Socket;&lt;br /&gt;&lt;br /&gt;$host = shift @ARGV;&lt;br /&gt;die("usage: gethostbyname hostname\n") unless(defined($host));&lt;br /&gt;&lt;br /&gt;$packed_ip = gethostbyname($host);&lt;br /&gt;&lt;br /&gt;if (defined $packed_ip) {&lt;br /&gt;    $ip_address = inet_ntoa($packed_ip);&lt;br /&gt;    print "$ip_address\n";&lt;br /&gt;    exit 0&lt;br /&gt;} else {&lt;br /&gt;    warn "$host not found\n";&lt;br /&gt;    exit 1&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;</description></item><item><title>Google Scholar Bookmarklet</title><link>http://www.documentroot.com/2012/02/google-scholar-bookmarklet.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Fri, 09 Mar 2012 22:43:44 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-3414879693061699215</guid><description>&lt;a href="javascript:var%20regex=new%20RegExp('[\?&amp;]q=([^&amp;#]*)');var%20results=regex.exec(window.location.href);if(results){q=decodeURIComponent(results[1].replace(/\+/g,'%20'));}else{{q=document.getSelection();for(i=0;i&amp;lt;frames.length;i++){q=frames[i].document.getSelection();if(q)break;}}if(!q)void(q=prompt('Keywords:',''));}if(q){location.href='http://scholar.google.com/scholar?client=googlet&amp;q='+escape(q)}"&gt;This link&lt;/a&gt; is a bookmarklet which takes the current query string in google, OR the current selection (if there's no q= parameter), and searches google scholar.   Now that Google has removed scholar from the dropdown (sad), this is necessary for me to work the way I've been working.</description></item><item><title>As predicted</title><link>http://www.documentroot.com/2012/02/as-predicted.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Fri, 09 Mar 2012 22:57:48 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-6034714657976872466</guid><description>As predicted 6 months ago, and again last week, Oxford Nanopore is shaking up the industry.   Pacbio, Complete Genomics, Life technologies and others are all seeing their stocks drop 5% on the news that sequencing a genome just got 10x cheaper.</description></item><item><title>Church-Turing thesis and strings</title><link>http://www.documentroot.com/2012/01/church-turing-thesis-and-strings.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Mon, 30 Jan 2012 13:08:25 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-5281456973113200363</guid><description>My empirically determined corollary to Turing's computability thesis is that any problem, no matter how interesting it seems on the surface, can be reduces to a deja-vu inducing set of string handling, vector sorting and hash lookups.&lt;br /&gt;&lt;br /&gt;And as new features are added, the problem of writing out and reading files slowly becomes most of what that program does - until the program resembles an actual Turing machine: ploddingly scribbling and reading things from an infinite tape which, because of flooding in Thailand, is infintely expensive.</description></item><item><title>Embed Tab Size Info in Source Text</title><link>http://www.documentroot.com/2012/01/embed-tab-size-info-in-source-text.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Mon, 30 Jan 2012 13:14:03 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-2062794125586771407</guid><description>IDEA: a "universal tab stop signature" line you can add to any text file in which tabs may be found and need to be rendered in an editor.&lt;br /&gt;&lt;br /&gt;The sig must be in the first or last 128 bytes of a file (so reading the sig is faster)&lt;br /&gt;&lt;br /&gt;The syntax could be something like: open paren or brace, % or # sign, open paren or brace, the word "tab" then a ':', then anything except a closing/matching paren or brace, then a percent and then another closing paren/brace.  [(\[][%#][(\[]tab:[^\)]*[%#][(\[]&lt;br /&gt;&lt;br /&gt;Examples:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;C++&lt;br /&gt;// (%(tab:4)%)&lt;br /&gt;&lt;br /&gt;Perl:&lt;br /&gt;# (%(tab:4)%)&lt;br /&gt;&lt;br /&gt;HTML (for source editing only):&lt;br /&gt;&amp;lt;!-- (%(tab:4)%) --&amp;gt;&lt;br /&gt;&lt;br /&gt;LISP (no parens... too confusing):&lt;br /&gt;; [%[tab:4]%]&lt;br /&gt;&lt;br /&gt;SMX (no % signs or parens):&lt;br /&gt;%null([#[tab:4]#])&lt;br /&gt;&lt;/blockquote&gt;</description></item><item><title>Uncovering significant miRNA species with RNA-seq</title><link>http://www.documentroot.com/2011/12/uncovering-significant-mirna-species.html</link><author>noreply@blogger.com (Erik)</author><pubDate>Fri, 23 Dec 2011 14:08:41 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-1907826065375740498</guid><description>Note to self: An easy way to uncover significant miRNA species with Illumina's RNA seq data is to run counts on the small RNA between two cell types, then toss out anything that appears ribosomal, filter for some minimum threshold and run a negative binomial distribution test on them like edge-R or DEseq.   The resulting adjusted p-values can illuminate new miRNA species.</description></item><item><title>Xenograft &amp; Contamination Filter</title><link>http://www.documentroot.com/2011/12/xenograft-contamination-filter.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Mon, 19 Dec 2011 07:18:05 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-5472900380731870085</guid><description>&lt;a href="http://ea-utils.googlecode.com/svn/trunk/clipper/bam-filter.cpp"&gt;Useful code&lt;/a&gt; that uses libbamtools to filter out alignments to another genome.  Tested with xenograft and bacterial contamination removal...works great.</description></item><item><title>Connecticut Martin</title><link>http://www.documentroot.com/2011/10/connecticut-martin.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Fri, 21 Oct 2011 01:46:16 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-6995791957463666223</guid><description>Martin Hedge paused to wipe his glasses with the sleeve of his shirt.  The crowd at the coffee shop had gotten louder.&lt;br /&gt;&lt;br /&gt;It had never been enough.  Even though idle wealthy housewies had long since taken up the pen and written libraries of wisdom and humour, and the girls hanging out at the tattoo parlors had discovered the zen of automotive engineering.  Even though the unemployed had long ago sought out physics texts and wrangled secrets of a new energy density theory, it still hadn't been enough.   The question remained... what were we working &lt;i&gt;for&lt;/i&gt;?&lt;br /&gt;&lt;br /&gt;The crowd at the coffee shop, from here, seemed angry, but as Martin walked closer he could see familiar faces, sternly triumphant.</description></item><item><title>Non "allelic" variation - thinking out loud</title><link>http://www.documentroot.com/2011/10/non-allelic-variation-thinking-out-loud.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Thu, 22 Dec 2011 20:37:13 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-901333950279015802</guid><description>&lt;div&gt;(Later Note: I wrote this before I knew what it's called.   The term most people use is "Somatic Mosaicism".   Apparently this is a pretty well researched topic... so I can go back to all the biologists that looked at me like I was crazy and tell them .... hmm.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here's a link to a good article on the topic:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.sciencedirect.com/science/article/pii/S0168952511000436"&gt;http://www.sciencedirect.com/science/article/pii/S0168952511000436&lt;/a&gt; &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Original rant below...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;Much of genetics is concerned with "alleles" and "variations".   That is, an organism is assumed to be comprised of "one kind of dna".  That DNA has 1) inherited alleles from it's parents, and 2)  de-novo alleles (also called somatic mutations).  The 1000 genomes project estimates at about 30 or so per person.  This is probably a very conservative number considering the nature of the 1000 genomes project.  IE: all of these mutations are whole-organism, detectible, validated mutations. &lt;a href="http://www.jbc.org/content/268/32/23762.abstract?ijkey=413bddbbe32d8d87d540c2e8f02a5177ef9d522b&amp;amp;keytype2=tf_ipsecsha"&gt;E-coli error rate estimates&lt;/a&gt; would put the range at 30-300, and this &lt;a href="http://www.jbc.org/content/279/17/16895.full#ref-4"&gt;might be a better estimate&lt;/a&gt; because of how the study was done.   Human blastoma cells have has error estimates accurately measured at &lt;b&gt;10 times&lt;/b&gt; that rate.  Individual organs may be less sensitive to immune response correction.&lt;br /&gt;&lt;br /&gt;But let's assume 30 is our number.   It's nice and small.  And it's good to have a lower-bound.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;That is only the set of variants that went into the "first cell" (fertilized egg) of an organism.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;When that egg divides, half the organism has another, different, set of mutations.  So 100% of the organism has 30 de-novo mutations and 50% of the organism will have *another* 30 de-novo mutations (30 new ones in that 50%, plus 30 original).&lt;br /&gt;&lt;br /&gt;But wait, there's more.   When those 2 cells divide in half again, you now get 60 new mutations, 30 from each cell.   These 60 will be detectible at the "25%" level ... IE: 25% of the final organism will have them. &lt;br /&gt;&lt;br /&gt;High-throughput sequencing can readily detect variation at the "1%" admixture level.   That is, commonly detect variation when as little as 1% of the cells have that variation. &lt;br /&gt;&lt;br /&gt;So how much variation can we expect, based on a low de-novo mutation rate, detectible at the 1% level?&lt;br /&gt;&lt;br /&gt;100%-&amp;gt;30, 50% -&amp;gt;30, 25% -&amp;gt;60, 12.5% -&amp;gt;120, 7.25% -&amp;gt;240, 3.12% -&amp;gt;480, 1.6% -&amp;gt; 960&lt;br /&gt;&lt;br /&gt;So we can expect about 1000 de-novo variants in a healthy individual, or 32 times the mutation rate.   But what if the somatic mutation rate is higher, say, &lt;a href="http://cancerres.aacrjournals.org/content/65/18/8111.short"&gt;3000 variants per replication&lt;/a&gt;?   This may be the case in some organ development. &lt;br /&gt;&lt;br /&gt;Thus, at the 1% level, would that be 96000 non-inherited detectable variants.  I would call that my "upper bound".   In real pileup data... I see around 30% "non-allelic" variation.  So if, say, you've got 15000 SNPS (&lt;a href="http://www.pnas.org/content/106/13/5008.full"&gt;a reasonable number&lt;/a&gt;), we would expect 5000 "background" snps.....putting the mutation rate at "156.25"  (5000/32).   That's smack in the middle of the e-coli based estimate.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Lots of variant callers filter these out.... but I'm interested in them... i think they may be a lot more important than people think.&lt;/i&gt;</description></item><item><title>Multiplayer TD</title><link>http://www.documentroot.com/2011/09/multiplayer-td.html</link><author>noreply@blogger.com (Erik)</author><pubDate>Fri, 30 Sep 2011 20:24:47 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-1582916321021161649</guid><description>Looks fun.  Seems like it would be cool to play against other players... but they are rarely there.   The developer needs to link-in to Armor or Kongregate or Facebook or some other PVP network.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There's probably a market for that... build a PVP portal so that lesser known games that are still fun can have a base of players.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>Getting perl readline to work in Ubuntu</title><link>http://www.documentroot.com/2011/09/getting-perl-readline-to-work-in-ubuntu.html</link><author>noreply@blogger.com (Erik)</author><pubDate>Fri, 30 Sep 2011 11:21:22 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-2032921855078416364</guid><description>I use perl's debugger and psh (simple perl shell).    When the up arrow doesn't work, I always forget how to fix it.  Hopefully by posting the solution here, it will be easier for me to find:&lt;p&gt;&lt;span style="font-family:courier,fixed-width"&gt;apt-get install libterm-readline-gnu-perl&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;I use perl's debugger and psh (simple perl shell).    When the up arrow doesn't work, I always forget how to fix it.  Hopefully by posting the solution here, it will be easier for me to find:&lt;br /&gt;&lt;/p&gt;</description></item><item><title>sam-stats</title><link>http://www.documentroot.com/2011/09/sam-stats.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Wed, 28 Sep 2011 13:06:46 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-2295882107037935136</guid><description>Put up a useful &lt;a href="http://code.google.com/p/ea-utils/wiki/SamStats"&gt;summary statistics from SAM or BAM&lt;/a&gt; files.  TODO: rewrite in C so it's fast.</description></item><item><title>Should we try to contact ET?</title><link>http://www.documentroot.com/2011/08/should-we-try-to-contact-et.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Tue, 23 Aug 2011 15:01:16 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-273754590976121731</guid><description>Would it be safe to meet an intelligent alien lifeform from another planets?
&lt;br /&gt;
&lt;br /&gt;Let's assume an ET (extra terrestrial, alien) is more advanced than we are.  Let's assume, again, because it's the only experience we have, that we are a good example of an intelligent life form. 
&lt;br /&gt;
&lt;br /&gt;Have other species benefited from their contact with humans?   Species that are useful to us, like cows and dogs have proliferated in population, but are controlled/used by us.   Species that feed on our leavings, like roaches and pigeons do well.
&lt;br /&gt;
&lt;br /&gt;But, given that we're intelligent, and we'd like to think that aliens would regard us as so, we may want to look only at "higher" mammals, with comparable intelligence.  Primates certainly don't do well when humans show up.  In fact, iut seems we are particularly brutal when dealing with them.   Large mammals were nearly driven extinct from North America after human contact.  Some very intelligent species, like right whales, with rich communication systems and highly social habits, were driven to near extinction.   
&lt;br /&gt;
&lt;br /&gt;OK, but maybe we're talking about "modern man".  Maybe we've escaped that brutal past.   
&lt;br /&gt;
&lt;br /&gt;We still round up dolphins to kill them.  And, to this day, very little effort has been spent in attempting to understand the language and society of the "alien species" we share our home with.  Any time even a modicum of effort has been spent it's been met with "shocking" revelations about how other species have complex grammars, notions of fairness, etc.  And still most people in the world refuse to believe that other species can feel pain the way humans do - largely (IMO)  as a way of justifying abuse and mistreatment.
&lt;br /&gt;
&lt;br /&gt;So.... I would expect to be treated at &lt;span style="font-style:italic;"&gt;least as well &lt;/span&gt;as we have treated others.  Accordingly, someone should *shut down* the SETI program.  ASAP.
&lt;br /&gt;</description></item><item><title>Oracle fails to crush open source</title><link>http://www.documentroot.com/2011/08/oracle-fails-to-crush-open-source.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Fri, 05 Aug 2011 01:24:46 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-6714659218304935285</guid><description>Oracle has repeatedly attempted to acquire it's way into the open source movement, each time charging and licensing technologies it had no hand in developing.  Berkeley DB, Sun Grid Engine, Open Office, MySQL, and Java (JRE/JDK), are now all Oracle licensed technologies.  &lt;br /&gt;&lt;br /&gt;What is admirable is that they make no pretense of being "friends" of open source.  They stopped releasing new versions of SGE, and transitioned to a closed-source system immediately.  Similar moves are happening in other products at different speeds... with a "what the market will bear" approach.&lt;br /&gt;&lt;br /&gt;It took me 6 months to move all my BDB code to SQLite, switch SGE stuff to condor.  I never touched MySQL because of the InnoDB creepiness and I'm glad the decision has been vindicated... Postgres was the obvious choice.   Oracle's falesly named "Open Office" has thankfully been forked to become the truly open "Libre Office".&lt;br /&gt;&lt;br /&gt;The only thing left is Java.  And there's nowhere near an adequate replacement.  Open JDK is limited to a handful of operating systems.   And Java itself may have some questionable licensing - Oracle may be able to seize the whole thing by the fistful.&lt;br /&gt;&lt;br /&gt;I've despised Java from the beginning, mostly for it's poor architecture and the way it encourages bad coding (not as bad as python).   Watching Oracle roll in and step all over it is, for me, merely another "I told you so" moment.</description></item><item><title>View the internet as if you are Google</title><link>http://www.documentroot.com/2011/07/view-internet-as-if-you-are-google.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Fri, 29 Jul 2011 08:37:20 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-1915357002791648560</guid><description>Many companies present a different face to search engines than they do to users.  It's sometimes interesting to see the internet as if you are a search engine:&lt;br /&gt;&lt;br /&gt;To change the Chrome user agent string, add the –user-agent option switch to the chrome.exe program shortcut:&lt;br /&gt;&lt;br /&gt;chrome.exe --user-agent=" Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"</description></item><item><title>Correlation Heat Map</title><link>http://www.documentroot.com/2011/06/correlation-heat-map.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Wed, 22 Jun 2011 09:08:30 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-6615972956942730607</guid><description>&lt;a href="http://www1.mooncostumes.com/bio/CNCOR_map.png"&gt;&lt;img width=50% height=50% src="http://www1.mooncostumes.com/bio/CNCOR_map.png" alt="Copy Number Correlation Heat Map" border="0"&gt;&lt;/a&gt;</description></item><item><title>Illumina Quality Score 'B'</title><link>http://www.documentroot.com/2011/06/illumina-quality-score-b.html</link><author>noreply@blogger.com (Erik)</author><pubDate>Fri, 10 Jun 2011 12:52:20 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-8956190155935947632</guid><description>When analyzing about 100 PhiX control lanes, by aligning to PhiX and calling any reference mismatch an "error", the quality score 'B' performs higher than 'C' on average.   The 'B' score is approximately abot 3 phred-scores lower than the mean score for the whole run.    I think the 'B' score represents the equivalent of "I don't know" on the Phred scale.   For programs that rely on quality scores to indicate actual likelihoods of error, the 'B' score should be treated as a special case.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is in addition to the known issue that reported phred scores have higher ovserved error rates overall, especially at the top end of the scale (Q-40's perform like Q-37's should, etc.)&lt;/div&gt;</description></item><item><title>Lightweight Job Queuing Engine and GitHub</title><link>http://www.documentroot.com/2011/05/lightweight-job-queuing-engine.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Sat, 14 May 2011 20:15:29 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-4461023282487684699</guid><description>I wrote a job queuing daemon.  It sits on a head node and any number of compute nodes.  Good with EC2.  Replacement for condor, OGE (formerly SGE), torque, etc.  So far, it's good about crash recovery, very good about latency.  Other job queuing systems seem to be sooo slow with latency, you never want to use them.   And they all spend so much time worrying about copying i/o files all over the place, when all I ever use is NFS anyway.&lt;br /&gt;&lt;br /&gt;&lt;a href="https://github.com/earonesty/grun"&gt;https://github.com/earonesty/grun&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Written because those other systems are administratively difficult, and have more features than I need.   Some day I may choose to go back to condor, or SGE, but for now, it's nice to have a smaller, faster, easier one.&lt;br /&gt;&lt;br /&gt;I'm trying github instead of google code.  The developers and gangla got me to try it out.  I think I &lt;span style="font-style:italic;"&gt;might &lt;/span&gt;like it.  Not sure why it makes you "add" and "commit" separately.&lt;br /&gt;&lt;br /&gt;I based the socket system off of ppcgid, since it worked so well in the past.</description></item><item><title>Dealing with FASTQ's</title><link>http://www.documentroot.com/2011/04/dealing-with-fastqs.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Sun, 01 May 2011 17:37:15 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-3673876754729423353</guid><description>I posted some code for dealing with FASTq's.   They're all written in C++, and each one replaces a tool that's between 3 and 50 times slower.   The scoring metric they use for "anchored sequence alignment" has been improved.   Also, they contain lots of "do what I mean" features, allowing the user to focus on the results, and not on the mechanics of processing sequencing data.&lt;br /&gt;&lt;br /&gt;The essential tasks they accomplish are sequence quality filtering, adapter/primer removal, demultiplexing and paired-end joining (stitching):&lt;br /&gt;&lt;br /&gt;They have to be compiled, but they are each "monolithic" ... and require no libraries, headers, etc.   I tested fastq-join on Linux and Windows, so I know it compiles easily.   The other 2 tools have only been tested on Linux.&lt;br /&gt;&lt;br /&gt;See: &lt;a href="https://code.google.com/p/ea-utils/"&gt;https://code.google.com/p/ea-utils/&lt;/a&gt;</description></item><item><title>New aligner</title><link>http://www.documentroot.com/2011/03/new-aligner.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Tue, 15 Mar 2011 10:03:31 PDT</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-5021568727285809923</guid><description>I'm working on a new aligner i have that doesn't use  BTREE.  It's fun.  Not hard, and finally working.</description></item><item><title>F-Test for Comparing Variability</title><link>http://www.documentroot.com/2011/03/f-test-for-comparing-variability.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Sun, 06 Mar 2011 02:43:15 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-6746150553377147858</guid><description>I left my textbook at home, and had to search for a while to find &lt;a href='http://faculty.washington.edu/tabrooks/599.course/Ftest.html'&gt;a decent description of how to use an F-test&lt;/a&gt; (even though it's somewhat controversial), so I'm posting it for future reference.   Wikipedia stuff on statistics appears to be written by people who forget it's supposed to be human-readable.</description></item><item><title>GATCGGAAGAGCGGTTCAGCAGGAATG</title><link>http://www.documentroot.com/2011/02/gatcggaagagcggttcagcaggaatg.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Wed, 23 Feb 2011 11:40:58 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-8528185836464444347</guid><description>I knew it was some kind of primer, adapter or barcode - but I wasn't sure which, whether to trim, or if the barcode came before or after.&lt;br /&gt;&lt;br /&gt;BLAS*T searches turned up nothing, but Joel answered, within a second, "it's an adapter sequence". He said "I just Googled it".&lt;br /&gt;&lt;br /&gt;Reminder to self... Google searching DNA sequences works (NOTE: Bing didn't).</description></item><item><title>Approximate Line Count for Very Large Files</title><link>http://www.documentroot.com/2011/02/approximate-line-count-for-very-large.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Thu, 10 Feb 2011 03:21:58 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-7846171068918047613</guid><description>When dealing with very large files, the unix tool "wc" can be extremely slow.   The  alternative, byte size, is often not what I want to look at, especially when trying to estimate the number of reads in a fastq file.&lt;br /&gt;&lt;br /&gt;A good estimate (2 sig figs) is, 90% of the time, what I need.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.documentroot.com/alc"&gt;alc&lt;/a&gt; is my "approximate" line count tool.   It counts the number of lines in a file, just like wc, except it only "samples" the file in a series of segments.   By seeking and reading 200K from a dozen places in the file, rather than reading the whole thing,  I get a good representative sample, and an accurate-enough count.</description></item><item><title>A True Teflon Alternative</title><link>http://www.documentroot.com/2011/02/true-teflon-alternative.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Fri, 04 Feb 2011 18:29:18 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-3409266496276030760</guid><description>We went looking for a nonstick solution that wasn't coated in Teflon or PFTE, or whatever people try to sell it as.   Apparently it causes &lt;a href="http://www.epa.gov/opptintr/pfoa/"&gt;developmental disorders in laboratory animals (click for EPA advisory)&lt;/a&gt;.  I can only wonder what it does to human kids.&lt;br /&gt;&lt;br /&gt;The first thing we found was &lt;a href="http://www.naturalnews.com/021059.html"&gt;Swiss Diamond... which is a &lt;span style="font-style: italic;"&gt;complete scam&lt;/span&gt;&lt;/a&gt;.   Glad to have avoided that before making an expensive mistake, I scoured reviews and sites and we found the &lt;a href="http://www.amazon.com/Bialetti-Aeternum-Saute-Pan-12-Inch/dp/B00448H23W"&gt;Aeternum&lt;/a&gt; and &lt;a href="http://www.amazon.com/Frying-Ozeri-Ceramic-Non-Stick-Coating/dp/B004AMUELO"&gt;Green Earth&lt;/a&gt; line of ceramic coated pans (like porcelain, it seems).  &lt;br /&gt;&lt;br /&gt;I bought the Aeternum and it works exactly as well as advertised.   Nonstick, and not made with creepy chemicals.  If it lasts or breaks, I'll report it here.</description></item><item><title>Uncuddled Else</title><link>http://www.documentroot.com/2010/12/uncuddled-else.html</link><author>noreply@blogger.com (Erik Aronesty)</author><pubDate>Sun, 19 Dec 2010 12:19:10 PST</pubDate><guid isPermaLink="false">tag:blogger.com,1999:blog-3570590.post-5997705868089782387</guid><description>I don't care how cute the name is.   It wastes precious vertical space while adding nothing to readability.   Worse yet, the uncuddled else substracts from the implication that it is a part of the prior "if" statement... which it is.</description></item></channel></rss>

