<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><title>Bioinformatics Zen</title><link>http://www.bioinformaticszen.com</link><language>en</language><copyright>Creative Commons Attribution 3.0 Unported</copyright><managingEditor>mail@michaelbarton.me.uk (Michael Barton)</managingEditor><lastBuildDate>Sat, 28 Feb 2009 16:00:00 PST</lastBuildDate><description></description><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/BioinformaticsZen" type="application/rss+xml" /><item><title>how to use a database</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/II-Qzl7LVnU/how_to_use_a_database</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Sat, 28 Feb 2009 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/video/how_to_use_a_database</guid><description>&lt;p&gt;This video illustrates a minimal approach to using a database as the foundation of a bioinformatics project. Only two files are used: a database.rb file, and a Rakefile. Object relational mapping is provided by using the &lt;a href="http://datamapper.rubyforge.org/dm-core/"&gt;DataMapper&lt;/a&gt; gem. The &lt;a href="http://www.bioruby.org/"&gt;BioRuby&lt;/a&gt; gem is used parse a fasta file of &lt;em&gt;S. cerevisiae&lt;/em&gt; proteins. Examples of using data validations and experimental logging are also given.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The text will be more legible in the &lt;a href="http://vimeo.com/3387564"&gt;high definition version&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;object width="601" height="338"&gt;&lt;param name="allowfullscreen" value="true" /&gt;&lt;param name="allowscriptaccess" value="always" /&gt;&lt;param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3387564&amp;amp;amp;server=vimeo.com&amp;amp;amp;show_title=1&amp;amp;amp;show_byline=1&amp;amp;amp;show_portrait=0&amp;amp;amp;color=ff9933&amp;amp;amp;fullscreen=1" /&gt;&lt;embed src="http://vimeo.com/moogaloop.swf?clip_id=3387564&amp;amp;amp;server=vimeo.com&amp;amp;amp;show_title=1&amp;amp;amp;show_byline=1&amp;amp;amp;show_portrait=0&amp;amp;amp;color=ff9933&amp;amp;amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="601" height="338"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;a href="http://vimeo.com/3387564"&gt;Example organisation of a bioinformatics experiment&lt;/a&gt; from &lt;a href="http://vimeo.com/user1359407"&gt;Michael Barton&lt;/a&gt; on &lt;a href="http://vimeo.com"&gt;Vimeo&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Code&lt;/h3&gt;

&lt;script src="http://gist.github.com/60407.js"&gt;&lt;/script&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/BioinformaticsZen?a=II-Qzl7LVnU:PE_lIV70WtE:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/BioinformaticsZen?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/BioinformaticsZen?a=II-Qzl7LVnU:PE_lIV70WtE:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/BioinformaticsZen?i=II-Qzl7LVnU:PE_lIV70WtE:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/video/how_to_use_a_database</feedburner:origLink></item><item><title>using a database</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/1_tmljbcPz0/using_a_database</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Thu, 19 Feb 2009 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/software/using_a_database</guid><description>&lt;p&gt;My best recommendation for any computational biologist is to learn to use a relational database with a corresponding object relational mapping system (ORM). This sounds complicated, but doesn't have to be. In bioinformatics, data are distributed as files. Supplementary data are available from journal websites, and a file is easy to attached to an email. The use of data files in programming, however, should be limited wherever possible. A bioinformatics project should instead be built using a database.&lt;/p&gt;

&lt;p&gt;Using a database allows all data to be accessed in the same way, whether in a script, at the command line, or through third-party database software. Databases are fast and optimised for searching and joining datasets. Joins between two sets of data that would be difficult when merging two files are made much easier using database relationships.&lt;/p&gt;

&lt;p&gt;A simple database workflow first loads all data into the database. Each file usually becomes a table in the database, where each file row is a table row. Analytical scripts make database calls to pull and join different data sets together. Adding indices to a database further increases the speed at which joins are made and data searched.&lt;/p&gt;

&lt;p&gt;In contrast using files as the base of the project results in errors when file paths change. Scripts need rewriting if a file format is altered. If the data file has a missing bracket or comma, the resulting script will throw an exception and break. The worst thing about using flat files though, is that they must be parsed and joined at the start of each script. This is repetitive and leads to code duplication across scripts.&lt;/p&gt;

&lt;h3&gt;SQL is hard&lt;/h3&gt;

&lt;p&gt;What I haven't mentioned is that learning to use a database takes time. Understanding how to structure tables and the language to join them together requires effort. Furthermore, writing SQL join statements in scripts requires attaching strings together to create the SQL query, which is complex, hard to maintain, and produces ugly code.&lt;/p&gt;

&lt;p&gt;Using object relational mapping (ORM) makes using a database easy and code simpler to write. The phrase "object relational mapping" is jargon for what allows database tables and rows to be treated as in-code variables. Instead of creating verbose SQL statements or reading to the required line in a file, the required data are called in the familiar programming syntax of the language you are used to. This combines the best of efficient data storage, with the language you are skilled in.&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=imHbX9wn"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?d=41" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=sMJm4tzY"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?i=sMJm4tzY" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/software/using_a_database</feedburner:origLink></item><item><title>scripting</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/viTVCIChycQ/scripting</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Thu, 19 Feb 2009 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/software/scripting</guid><description>&lt;p&gt;Scripts differentiate computational research from software production. A script is a file of code with a specific purpose such as running a BLAST search on the &lt;em&gt;E. coli&lt;/em&gt; genome. Contrast this with much larger programs designed to manage a variety of inputs and commands. A bioinformatician uses scripts as research tools in the same way a laboratory biologist uses a pipette. In software development, scripting supplements the designing of a software product. The focus is the finished product and scripts there to make source code management or unit testing easier. Since scripts receive comparably less attention as a part of software design, is there best practice for using scripts?&lt;/p&gt;

&lt;h3&gt;Managing dependency&lt;/h3&gt;

&lt;p&gt;Scripts are often required to run in a specific order. One script produces a result which is the input to the next script. This means the second script is dependent on the first. Dependency in software equates to increased complexity and requires more work to maintain a project. For example, if there is an undetected bug in one script mistakes are propagated as the next scripts are run. Or if one script in a series is missed, and the output files of a previous iteration still remain, then datasets are mixed between workflow repetitions resulting in unexpected side effects.&lt;/p&gt;

&lt;p&gt;Removing the dependencies between workflow steps is difficult. Build files such as &lt;a href="http://rake.rubyforge.org/"&gt;Rake&lt;/a&gt;, &lt;a href="http://ant.apache.org/"&gt;Ant&lt;/a&gt;, and &lt;a href="http://www.gnu.org/software/make/"&gt;make&lt;/a&gt; allow dependencies between scripted steps to be formalised: the required previous steps are automatically run first. This is useful to force the requirement that all previous results are deleted before hand, &lt;a href="http://github.com/jandot/biorake/tree/master"&gt;or that all rows in the database are refreshed&lt;/a&gt;, or even that the entire analysis is repeated from scratch. &lt;a href="http://www.capify.org/"&gt;Capistrano&lt;/a&gt; is a variant where build files can be used to automate repetitive tasks across multiple remote computers. All of this can be managed using single command line calls.&lt;/p&gt;

&lt;h3&gt;Light and fluffy&lt;/h3&gt;

&lt;p&gt;Light and simple scripts are easier to maintain. To simplify, a script reads in a set of input data (such as a protein sequence), analyses the data (formatdb on a sequence database followed by BLAST), and then returns to the data (prints the results to the command line). Using this simplification, a script only needs to know what data is coming in, how to analyse the data, and how to return it.&lt;/p&gt;

&lt;p&gt;Scripts can be made lighter by reducing the amount of analytical code. Instead of writing the code to call and parse BLAST, use existing code such as in &lt;a href="http://www.bioperl.org/wiki/Main_Page"&gt;BioPerl&lt;/a&gt;. If the code you need doesn't exist anywhere else, consider writing it as a separate library which can be shared across all your scripts. A script that reads in a the data, calls an external library, then prints the results will be smaller and simpler to understand. Contrast this with a script that reads in data, formats the data, has several lines of a code to interpret and massage the results, then writes output. &lt;/p&gt;

&lt;p&gt;Keeping light and simple, and formalising dependencies makes script-based projects easier to manage, maintain, and repeat.&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=kIUlDVlv"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?d=41" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=EIwPXL38"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?i=EIwPXL38" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/software/scripting</feedburner:origLink></item><item><title>why write good software</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/451onGYHJvQ/why_write_good_software</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Mon, 02 Feb 2009 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/software/why_write_good_software</guid><description>&lt;p&gt;Bioinformatics is far from commercial software development. A bioinformatician's goal is developing novel scientific research or tools. A software developer is judged on delivering software that people will pay to use. A biologist, whether they use Perl, a pipette or both, is evaluated on their publication record.&lt;/p&gt;

&lt;p&gt;In bioinformatics, or any science using a computer, software development is a lesser priority than generating new data. Statistical tests for significance outweigh software testing for reliability. A series of Python scripts for interpreting Chip-chip data are a bioinformatician's tools; what is important is the publishable prediction of binding sites.&lt;/p&gt;

&lt;p&gt;Compare this with commercial software development, for example development of a hotel online line booking system. The developer talks to the hotel to understand the job. A good developer keeps regular meetings with the hotel, to update the project based on the customer's requirements. The developer maintains the code using common development practices: &lt;a href="http://en.wikipedia.org/wiki/Unit_testing"&gt;unit testing&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Build_Automation"&gt;automated building&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/Revision_control"&gt;source control&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The situation in bioinformatics is different; a hypothesis is made, implemented, and tested. There are no best practices. Methods of research range from a directory of BLAST results with an Excel spreadsheet, to a full application stack with a database backend, revision control, and unit tests. The choice depends on the bioinformatician's knowledge and experience.&lt;/p&gt;

&lt;p&gt;Is good software important for bioinformatics? Either end of the above scenario is rare, and a middle approach is a set of flat files, Perl scripts to parse out required rows, with R scripts to plot the results. If the tools work does the method matter?&lt;/p&gt;

&lt;p&gt;Receiving peer review on a manuscript is comparable to getting feedback on delivering a product to a customer. Instead of new feature requests, changes are required as new analyses or the addition of a new data set. Feedback from reviewers is only received after months of work, when the software is developed and mature. The same principles that apply for commercial software can apply for scientific software. Investing 10% extra time in developing versatile and maintainable code saves time later when large changes are required.  Using version control is a safety net for making changes to existing code. Unit testing ensures fewer bugs. Automated building makes execution of linear tasks easier. A database enables easier manipulation of large complex data.&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=80fvqLMQ"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?d=41" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=6iJ8yf2a"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?i=6iJ8yf2a" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/software/why_write_good_software</feedburner:origLink></item><item><title>reuse, contribute, create</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/FADBzlQC4RU/reuse,_contribute,_create</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Sun, 25 Jan 2009 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/software/reuse,_contribute,_create</guid><description>&lt;h3&gt;Use existing code&lt;/h3&gt;

&lt;p&gt;A quick way to get something done is using code that someone else has already written. Many languages have a bioinformatics specific library, such as &lt;a href="http://www.bioperl.org/wiki/Main_Page"&gt;BioPerl&lt;/a&gt;, &lt;a href="http://biojava.org/wiki/Main_Page"&gt;BioJava&lt;/a&gt;, &lt;a href="http://www.biopython.org/wiki/Documentation"&gt;BioPython&lt;/a&gt;, or &lt;a href="http://bioruby.open-bio.org/"&gt;BioRuby&lt;/a&gt;. These libraries have functions for many common tasks, such as reading Fasta files, or parsing BLAST results.&lt;/p&gt;

&lt;p&gt;One reason to use an existing library over writing your own code is saving time. These libraries are also mature and tested, which means the chance of a bug is much less.  If you're unable to do something in particular and can't find an answer in the documentation, asking a question on the mailing list will usually result in a suggestion of where to look.&lt;/p&gt;

&lt;h3&gt;Contribute to existing code&lt;/h3&gt;

&lt;p&gt;The more specific your requirement the less likely an existing solution. In this case you'll need to create the necessary fix yourself. After coding something up, being a generous person you will want to contribute the code to a bioinformatics library. This might mean a little work, but by contributing you can save other people time with the same problem.&lt;/p&gt;

&lt;p&gt;Contributing code first requires getting the library source code using whatever version control system (VCS) the code is managed with. This can be difficult if you're never used a VCS before, but is a good change to learn. Once you've got the library you'll need to add your own code, as well as some documentation, and usually a few tests.  After this you'll need to send your update back via the VCS, or submit a patch to the mailing list.&lt;/p&gt;

&lt;h3&gt;Create new code&lt;/h3&gt;

&lt;p&gt;Creating a new library should be a last resort, but sometimes the function you want doesn't fit with any existing libraries. Why is creating a new library a last resort? Because it takes more work than adding to an existing library. Having said that, creating a new library does have benefits, as packaging your code makes it is easier to maintain and use across your own projects. Taking the time to share your code with other people also makes you a good person.&lt;/p&gt;

&lt;p&gt;Smaller and simpler is better when creating a new library. The simpler the library, the easier it is for others to use, and for you to maintain. Use a version control system to keep track of changes to the code, &lt;a href="http://git-scm.com/"&gt;git&lt;/a&gt; is a good choice. Document the code, and create some web pages highlighting how the library is used. Develop unit tests so that you can make sure the functionality remains the same whenever changes. Make the code open source so that anyone else can contribute. Finally host the library somewhere so that people can get access. There are usually specific resources for each language, for instance in Ruby there is &lt;a href="http://rubyforge.org/"&gt;Rubyforge&lt;/a&gt; or &lt;a href="https://github.com/"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This sounds like a lot of work, which is why simpler and lighter libraries are easier is to maintain. The process of creating a new library is a rewarding itself, but also has benefits. Other people may like your library and decide to contribute or fix any bugs. Therefore if you use the library regularly yourself, the investment in creating and maintaining a library will feed back into your own work.&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=jzYXsg46"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?d=41" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=m0k479UN"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?i=m0k479UN" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/software/reuse,_contribute,_create</feedburner:origLink></item><item><title>writing good code</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/klmha67-R0I/writing_good_code</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Sat, 24 Jan 2009 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/software/writing_good_code</guid><description>&lt;p&gt;Writing good code makes life easier. If there's a common theme in bioinformatics, it is this: you will write a script, move on to something else, then return to the script in a few months or years time and try to remember how it works. The clearer the code is originally written, the better to remember how it works. Here is a quote &lt;a href="http://www.artima.com/intv/dry.html"&gt;"All programming is maintenance programming, because you are rarely writing original code"&lt;/a&gt;. This means that most of your time will be spent fixing and improving code, rather than writing fresh. Writing code is personal, and discussing what makes good code is controversial. But I'm going do it anyway and describes what I think are a few basic principles that can help to make code easier to maintain.&lt;/p&gt;

&lt;h3&gt;Be too descriptive&lt;/h3&gt;

&lt;p&gt;I think code should err on the side of being too descriptive, rather than being too concise. I mean that code should be loud and expressive about its purpose. An example is choosing variable names.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;pre&gt;
&lt;span class="comment"&gt;# Concise&lt;/span&gt;
&lt;span class="ident"&gt;seq&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;gene.fasta&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;

&lt;span class="comment"&gt;# Descriptive&lt;/span&gt;
&lt;span class="ident"&gt;fasta_gene_sequence&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt;&lt;span class="punct"&gt;('&lt;/span&gt;&lt;span class="string"&gt;gene.fasta&lt;/span&gt;&lt;span class="punct"&gt;')&lt;/span&gt;
&lt;/pre&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The second example is longer, but leaves no doubt as to what the variable contains. The same can be applied to method names. The more specific a method name the better to remember the function and what is returned.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;pre&gt;
&lt;span class="comment"&gt;# Concise&lt;/span&gt;
&lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;get_seq&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;file&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="comment"&gt;# ...&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;

&lt;span class="comment"&gt;# Descriptive&lt;/span&gt;
&lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;read_fasta_from&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;file&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
  &lt;span class="comment"&gt;# ...&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Next are magic numbers, numbers that appear in code, but have no explanation to their meaning. These can particularly annoying if you can't remember why you used the number and there is no other reference to it.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;pre&gt;
&lt;span class="comment"&gt;# Three, its the magic number&lt;/span&gt;
&lt;span class="ident"&gt;dna_sequence&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;scan&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;3&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;

&lt;span class="comment"&gt;# Descriptive&lt;/span&gt;
&lt;span class="ident"&gt;nucleotides_per_codon&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;3&lt;/span&gt;
&lt;span class="ident"&gt;dna_sequence&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;scan&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;nucleotides_per_codon&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Comments never hurt either, as long as they are correct. Incorrect comments are generally not considered useful. Comments are especially useful when the meaning of the code is not obvious, but going too much commenting can sometimes make code less easy to read&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;pre&gt;
&lt;span class="comment"&gt;# Why the chop?&lt;/span&gt;
&lt;span class="ident"&gt;protein&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;dna_sequence&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;translate&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;chop&lt;/span&gt;

&lt;span class="comment"&gt;# Some of wikipedia in here...&lt;/span&gt;
&lt;span class="comment"&gt;# In the genetic code, a stop codon (or termination &lt;/span&gt;
&lt;span class="comment"&gt;# codon) is a nucleotide triplet within messenger RNA&lt;/span&gt;
&lt;span class="comment"&gt;# that signals a termination of translation. Proteins &lt;/span&gt;
&lt;span class="comment"&gt;# are unique sequences of amino acids, and most &lt;/span&gt;
&lt;span class="comment"&gt;# codons in messenger RNA correspond to the addition&lt;/span&gt;
&lt;span class="comment"&gt;# of an amino acid to a growing protein chain, stop&lt;/span&gt;
&lt;span class="comment"&gt;# codons signal the termination of this process,&lt;/span&gt;
&lt;span class="comment"&gt;# releasing the amino acid chain.&lt;/span&gt;
&lt;span class="comment"&gt;# Here I am removing the stop codon after translation&lt;/span&gt;
&lt;span class="ident"&gt;protein&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;dna_sequence&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;translate&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;chop&lt;/span&gt;

&lt;span class="comment"&gt;# Remove the stop codon after translating&lt;/span&gt;
&lt;span class="ident"&gt;protein&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;dna_sequence&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;translate&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;chop&lt;/span&gt;
&lt;/pre&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Try to follow the indentation guidelines for the language you're writing in. Indentation makes code easier to read for you and anyone you share the code with.&lt;/p&gt;

&lt;h3&gt;DRY&lt;/h3&gt;

&lt;p&gt;DRY means don't repeat yourself. Code for a single function should exist in a single place. When code needs fixing or maintaining, it only needs to changed once in the one place that it resides. In the short term it's tempting to copy and paste to save time, but this will be time consuming in the long term when debugging.&lt;/p&gt;

&lt;p&gt;For example a common function such as system specific BLAST settings, used across a variety of scripts can be kept in a single file. The can then be called by any script when required. By moving all the common code to a single file, if the BLAST settings change, this is done in just one place.&lt;/p&gt;

&lt;h3&gt;Books and frameworks&lt;/h3&gt;

&lt;p&gt;When I used Java, Joshua Bloch's &lt;a href="http://java.sun.com/docs/books/effective/"&gt;Effective Java&lt;/a&gt; book helped me learn a great deal about how to programme well. When learning Ruby I found the &lt;a href="http://rubyhacker.com/"&gt;Ruby Way&lt;/a&gt; book had many useful examples of how to write in Ruby. I might guess for any popular programming language there is a respected book that illustrates the best practices in the language. These are not the most useful if you're just starting to learn the language, but as you get more confident they are great for helping to write better, more maintainable code.&lt;/p&gt;

&lt;p&gt;In addition to good books, examples of the best practices in a language can be found in popular open source libraries. &lt;a href="http://rubyonrails.org/"&gt;Rails&lt;/a&gt; is a Ruby framework for creating dynamic website. Knowing Rails will come in handy if I ever need to create an interactive website, but practising with Rails also gives an opinionated view of the best way to organise a Ruby project, from people who are experienced in creating them.&lt;/p&gt;

&lt;h3&gt;Further reading&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://seanskti.wordpress.com/2006/10/08/six-easy-tips-for-more-maintainable-code/"&gt;How to write maintainable code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.joelonsoftware.com/articles/fog0000000043.html"&gt;Twelve steps to better code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://particletree.com/features/successful-strategies-for-commenting-code/"&gt;Strategies for commenting code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://sourcemaking.com/antipatterns/software-development-antipatterns"&gt;Common types of bad design&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.freevbcode.com/ShowCode.Asp?ID=2547"&gt;How to write unmaintainable code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=AZFahYeq"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?d=41" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=0EPWN1Me"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?i=0EPWN1Me" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/software/writing_good_code</feedburner:origLink></item><item><title>git</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/w2qBntmcpNQ/git</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Sat, 10 Jan 2009 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/tools/git</guid><description>&lt;h3&gt;About&lt;/h3&gt;

&lt;p&gt;Git is a version control system or VCS for short. A VCS helps you manage your code by saving changes as versions in a repository. Each version of any file can be retrieved by rolling back the changes to the required version. At the most basic a VCS allows you the freedom to experiment and actively break the code you're working because the last working version can be reverted to with a single command. Version control is used in software development, and in bioinformatics is useful for keeping track of the scripts and libraries you use in development. Version control using an external server is also a good way to back-up code.&lt;/p&gt;

&lt;h3&gt;Features&lt;/h3&gt;

&lt;p&gt;As a VCS git is clean and minimal, working out of a single .git directory in the root of your project. If you want to remove the project from version control, delete the .git directory and all git files are gone. Git managed repositories are small using compression to store the differences between versions. Git is fast at storing the latest version of your code, on even a large repository, it is almost instantaneous. Git repositories are simple to create, and don't necessarily require an external server to begin tracking versions. If you do use an external git server, pushing and pulling to the server is also very fast. Another feature of git allows you to create branches within your code repository. Branching means copying the code as a duplicate branch of the main "master" branch. The duplicate branch can be modified, committed to and then compared with the original branch. If you are happy with the changes in the new branch you can merge them back into the original master branch. Another option is to leave the alternate branch to work on later since switching back to the master branch will restore the previous state before branching. In this way using a branches is a simple and lightweight way to develop or experiment with new features.&lt;/p&gt;

&lt;h3&gt;Collaboration&lt;/h3&gt;

&lt;p&gt;Git is useful for collaborating on shared source code repositories. The collaborative development of the Linux kernel is the reason git was &lt;a href="http://en.wikipedia.org/wiki/Git_(software)#Early_history"&gt;created by Linus Torvalds&lt;/a&gt;. A key feature of git is that it is distributed. You are not bound by working from a single source server. I have my copy of the repository and you have yours. I like the changes you are making so I clone your repository as a branch into my own. I can test out the changes you've made before merging them into my master branch. If I only want a subset of the changes you've made I can use the git cherry-pick command to merge only the changes I want. The website &lt;a href="http://www.github.com"&gt;github.com&lt;/a&gt; enables a collaborative aspect of developing software with git. Github acts as a git server but also highlights the social links of branches between developers. Other developers' git repositories can be viewed and downloaded, but also forked into your own github space. This fork acts as a copy of the original repository with the relationship between the two repositories maintained. Github monitors the commits, merges and branching between repositories which can be viewed, compared, or visualised as a network.&lt;/p&gt;

&lt;h3&gt;Getting started&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://git-scm.com/"&gt;The git website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.gitcasts.com/posts/railsconf-git-talk"&gt;Video introduction to git&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://github.com/blog/120-new-to-git"&gt;Links for git beginners on Github&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://mendicantbug.com/2008/11/30/10-reasons-to-use-git-for-research/"&gt;Using git for research&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www-cs-students.stanford.edu/~blynn/gitmagic/"&gt;Extensive git guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=33mvxsxy"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?d=41" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=kFPzavcC"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?i=kFPzavcC" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/tools/git</feedburner:origLink></item><item><title>vim</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/JjvtyD7syuk/vim</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Tue, 23 Dec 2008 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/tools/vim</guid><description>&lt;h3&gt;About&lt;/h3&gt;

&lt;p&gt;Doing bioinformatics I only use two tools: the keyboard and the mouse. Since I use these tools all-day, everyday I want to use them efficiently. My opinion is that using the keyboard as much as possible while the mouse as little as possible, is the best way to work at a computer. In a simple example, knowing the keyboard shortcut to run a given command means that I can execute the command quickly without breaking the flow of typing.  The text editor Vim takes this to another level and is entirely keyboard driven. There are no drop down menus and everything is performed using the keyboard; there's no reason for my hands to leave the keyboard. The large number of commands for Vim means there is a fair amount of practice required before you can use it fluently. I think this practice is a great investment though, as being able to use Vim intuitively makes you work faster and more efficiently. The reason for this is because Vim has a huge functionality to be taken advantage of, using just a few quick remembered keystrokes.&lt;/p&gt;

&lt;h3&gt;Features&lt;/h3&gt;

&lt;p&gt;As Vim is entirely text-based, without the pretty interface of modern editors, my original opinion was of an archaic hangover from the early days of Unix. Vim is around twenty years old, but is still a sophisticated text editor with a large range of functions. The large range of vim's functionality means a steep learning curve, but the extensive help documentation is an eloquent and gentle introduction. The commands ":vimtutor" and ":help" are the places to get started for using Vim. Vim's greatest feature is how easy it is to move, edit and manipulate text. This can sound trivial, but reordering paragraphs in minutes with a mouse, takes just seconds using the keyboard with Vim. This example is applicable to any type of text file, such as Ruby or LaTeX source code, which is what I spend most of my time editing. Another Vim feature are the registers, which act as super-charged clipboards. Not only can text be stored for pasting, but also sequences of commands as well. Stored commands can then be replayed by calling the register, this eliminates performing repetitive actions. Typing ":help usr_10" is a good place to start for learning about using registers for commands, as well as other ways of making large changes quickly.&lt;/p&gt;

&lt;h3&gt;Customise&lt;/h3&gt;

&lt;p&gt;One reason not to use Vim, is that it doesn't have the code orientated features of integrated development environments (IDE) such as &lt;a href="http://www.eclipse.org/"&gt;Eclipse&lt;/a&gt; and &lt;a href="http://www.netbeans.org/"&gt;Netbeans&lt;/a&gt;. But it does. Vim is easy to customise which has lead to a community of developers creating a large number of plugins. &lt;a href="http://www.vim.org/scripts/script.php?script_id=1318"&gt;Code snippets&lt;/a&gt;, &lt;a href="http://www.vim.org/scripts/script.php?script_id=1658"&gt;project drawers&lt;/a&gt;, and &lt;a href="http://www.vim.org/scripts/script.php?script_id=1984"&gt;fuzzy file finding&lt;/a&gt; are just some examples of plugins aimed at using Vim as an IDE. Whatever &lt;a href="http://vim-latex.sourceforge.net/"&gt;language&lt;/a&gt;, &lt;a href="http://www.infynity.spodzone.com/vim/HTML/"&gt;file type&lt;/a&gt;, or &lt;a href="http://www.vim.org/scripts/script.php?script_id=1567"&gt;framework&lt;/a&gt; you use, someone will have written a Vim plugin. This gives Vim all the functionality to rival any IDE.&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=se6Nsp3q"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?d=41" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=MPdnxSw0"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?i=MPdnxSw0" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/tools/vim</feedburner:origLink></item><item><title>latex</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/Xza6BlhPOwg/latex</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Mon, 22 Dec 2008 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/tools/latex</guid><description>&lt;h3&gt;About&lt;/h3&gt;

&lt;p&gt;LaTeX (pronounced lay-tek) is a document creation system aimed at scientific and technical documents. LaTeX documents are written in plain text using markup to describe which parts should be sections tables or pictures. The LaTeX system parses the markup and formats the text to produce either dvi, postscript or pdf output. As LaTeX is entirely text based, the content can be tracked using a version control system. Plain text files mean that you can work on a document with your favourite editor, and can also be manipulated at the command line using Unix tools. The syntax of LaTeX markup will take an hour or two of practice to learn, but the advantage of creating documents from a marked up source is that the results are consistent and reproducible, which isn't always the case for graphical document editors.&lt;/p&gt;

&lt;h3&gt;Features&lt;/h3&gt;

&lt;p&gt;The main reason for using LaTeX is that it allows you to work on the content of the document, not the formatting. If you were using a graphical editor you format the text as type, but with LaTeX you only have to add the markup to the document and LaTeX takes care of the rest. This can save a lot of time with large documents. The basic features of LaTeX include automatic generation of tables of contents, tables of figures and automatic numbering of sections tables and figures. BibTeX is the companion to LaTeX which adds simple organisation and addition of citations. Citations are added to documents using a simple "cite" command in the text, without the requirement for third party software. One of the benefits of creating documents using LaTeX is that the produced formatting is the result of best practices in typography and document presentation, which means LaTeX documents look better than the average.&lt;/p&gt;

&lt;h3&gt;Templates and Plugins&lt;/h3&gt;

&lt;p&gt;LaTeX is free software and available for most operating systems. There is a large LaTeX community which develops themes and modules that can be added to LaTeX documents. Many journals also provide LaTeX templates in which papers can be submitted. There are templates available for writing a &lt;a href="http://bit.ly/lBZs"&gt;thesis or dissertation&lt;/a&gt;, and there is likely a specific templates which follows your own institution guidelines. There are many useful third party plugins for adding extras to a document. For example &lt;a href="http://www.ctan.org/tex-archive/macros/latex/contrib/booktabs/"&gt;beautiful formating of tables&lt;/a&gt;, &lt;a href="http://www.ctan.org/tex-archive/macros/latex/contrib/subfig/"&gt;grouping figures into subfigures&lt;/a&gt;, &lt;a href="http://www.ctan.org/tex-archive/macros/latex/contrib/subfig/"&gt;replacing text inside figures&lt;/a&gt; and even &lt;a href="http://www.stat.uni-muenchen.de/~leisch/Sweave/"&gt;a framework for including R-code inside a LaTeX document&lt;/a&gt;.&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=4kZlvXBW"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?d=41" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=EbIqz9kw"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?i=EbIqz9kw" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/tools/latex</feedburner:origLink></item><item><title>about</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/e7BhVkHx1kc/about</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Mon, 22 Dec 2008 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/about/about</guid><description>&lt;p&gt;The aim of this website is to try and describe the what the best practices are in the field of bioinformatics. I used to write at this web address as a blog, but with less time in my PhD I had less time to write. So instead I've converted this blog into an online book that anyone can read, share and contribute to.&lt;/p&gt;

&lt;h3&gt;Sharing&lt;/h3&gt;

&lt;p&gt;All content on this site is free to distribute under a &lt;a href="http://creativecommons.org/licenses/by/3.0/"&gt;Creative Commons&lt;/a&gt; license, and you are encouraged to take and share the material with anyone who who you think may find it useful. For example material can be used verbatim into the teaching material for a bioinformatics degree or course.&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=m71VcKeT"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?d=41" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=KAIuY6jF"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?i=KAIuY6jF" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/about/about</feedburner:origLink></item><item><title>contribute</title><link>http://feedproxy.google.com/~r/BioinformaticsZen/~3/5ncdhZGU9e8/contribute</link><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Barton</dc:creator><pubDate>Mon, 22 Dec 2008 16:00:00 PST</pubDate><guid isPermaLink="false">http://www.bioinformaticszen.com/about/contribute</guid><description>&lt;p&gt;Suggestions to Bioinformatics Zen are welcome and gratefully received. The single requirement is that any contributions you make become part of Bioinformatics Zen to be distributed freely under a &lt;a href="http://creativecommons.org/licenses/by/3.0/"&gt;Creative Commons license&lt;/a&gt;. Any contributions are automatically acknowledged on the site.&lt;/p&gt;

&lt;p&gt;The easiest way to make a contribution is to sign up for a free &lt;a href="https://github.com/"&gt;Github&lt;/a&gt; account, and fork the &lt;a href="http://github.com/michaelbarton/bioinformatics-zen-content/tree/master"&gt;Bioinformatics Zen content repository&lt;/a&gt;. You can make any edits using the online Github editor, which should be written in the simple &lt;a href="http://daringfireball.net/projects/markdown/"&gt;markdown syntax&lt;/a&gt;. Once you make your changes click the "Pull Request" button and I'll be notified of your contribution.&lt;/p&gt;

&lt;p&gt;If you feel adventurous or prefer not to use GitHub, contributions will be accepted from any publicly accessible git repository. Please keep commits to one change per file for the purposes of cherry picking and draft management.&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=HiW9A84L"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?d=41" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~f/BioinformaticsZen?a=4VPFhtPN"&gt;&lt;img src="http://feeds.feedburner.com/~f/BioinformaticsZen?i=4VPFhtPN" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;</description><feedburner:origLink>http://www.bioinformaticszen.com/about/contribute</feedburner:origLink></item></channel></rss>
