{Under reconstruction}

TrinotateWeb in a Docker container

Andrew Perry — Thu, 28 Jun 2018 02:17:19 +0000

TrinotateWeb shows some reports from Trinotate. I know very little about it (please don’t ask me how to run Trinotate or interpret your results), but I wanted to serve up the reports. To make the services we provide to end-users a little more portable and reproducable, we tend to wrap them up as Docker containers. Even if we don’t actually ever move the images/containers between hosts, the Dockerfile acts as ‘runnable documentation’ on how a key part of the service is setup.

We do a similar thing for private instances of SequenceServer when researchers want a convenient interface to BLAST search some of their private (hopefully eventually open !) sequence databases.

The container here is self-contained with the data baked in. You may not want this, but an immutable container containing the analysis is what we wanted.

The code lives here: github.com/MonashBioinformaticsPlatform/bio-service-containers/

Requires:

TrinotateAnno.sqlite – the database generated via Trinotate
lighttpd.conf (provided) – preconfigured, don’t edit.

FROM debian:buster-slim

ENV TRINOTATE_HOME=/app/Trinotate
ENV TRINOTATE_VERSION=3.1.1

WORKDIR /app

RUN apt-get -y update && \
    apt-get install -y lighttpd libhtml-template-perl libdbd-sqlite3-perl && \
    rm -rf /var/lib/apt/lists/*

# Not required, using Debian packages instead
# RUN apt-get install -y cpanminus build-essential
# RUN cpanm -i DBI && \
#     cpanm -i HTML && \
#     cpanm -i HTML::Template && \
#     cpanm -i DBD::SQLite

ADD https://github.com/Trinotate/Trinotate/archive/Trinotate-v${TRINOTATE_VERSION}.tar.gz Trinotate-v${TRINOTATE_VERSION}.tar.gz
RUN tar xvzf Trinotate-v${TRINOTATE_VERSION}.tar.gz && \
    rm Trinotate-v${TRINOTATE_VERSION}.tar.gz && \
    mv Trinotate-Trinotate-v${TRINOTATE_VERSION} Trinotate

COPY TrinotateAnno.sqlite /data/TrinotateAnno.sqlite 
COPY lighttpd.conf /app/lighttpd.conf

RUN chown -R www-data:www-data /app

EXPOSE 80

ENTRYPOINT ["lighttpd", "-D", "-f", "/app/lighttpd.conf"]

“Production”

On port 4569.

docker run --name DatasetName_Trinotate --restart=always -it -d -p 4569:80 trinotate:dataset_name

Use Apache to forward (proxy) to the container for a nice URL (eg /apps/trinotate/DatasetName), behind .htaccess.

Option: external data and config in a host directory

With a few small edits to the Dockerfile (comment out the Trinotate download and sqlite db COPY), you can instead use an external copy of Trinotate and a database on the host.
You might want this for data that is going to be in flux for a while, before baking it permanently in a container (?).

docker run --name DatasetName_Trinotate --rm -it -d -p 4569:80 -v $(pwd):/app -v /home/perry/bin/Trinotate-Trinotate-v3.1.1/:/app/Trinotate -v $(pwd)/TrinotateAnno.sqlite:/data/TrinotateAnno.sqlite trinotate:dataset_name

Apache config

Use this to forward incoming requests to /apps/trinotate/DatasetName/ -> the port on the Docker container (4569), with a custom htaccess file for Basic Auth.

    # /apps/trinotate/DatasetName
    
      Order deny,allow
      Allow from all
      Authtype Basic
      Authname "Restricted Content"
      AuthUserFile /etc/apache2/htaccess/DatasetName
      Require valid-user
    

    RewriteEngine on

    # For TrinotateWeb inside a Docker container - absolute URLs mean /css and /js links break
    # when proxied, unless we use this RewriteCond trick detecting referrer. 
    RewriteCond "%{HTTP_REFERER}" ".*bioinformatics.erc.monash.edu(?:.au)?/apps/trinotate/DatasetName/.*" [NV]
    RewriteRule ^/css/(.*)$ "http://localhost:4569/css/$1" [P]
    RewriteCond "%{HTTP_REFERER}" ".*bioinformatics.erc.monash.edu(?:.au)?/apps/trinotate/DatasetName/.*" [NV]
    RewriteRule ^/js/(.*)$ "http://localhost:4569/js/$1" [P]
    RewriteRule ^/apps/trinotate/DatasetName$ /apps/trinotate/DatasetName/ [R]
    RewriteRule ^/apps/trinotate/DatasetName/(.*)$ "http://localhost:4569/$1" [P]

TrinotateWeb makes requests to https://canvasxpress.org/ – as of 28-Jun-2018 the certificates for HTTPS are currently expired. The user should visit https://canvasxpress.org/ first and accept the insecure certificate so that icons in TrinotateWeb load correctly.

mPartsRegistry : small update

Andrew Perry — Mon, 30 Jul 2012 05:24:30 +0000

I just made a small update to mPartsRegistry, the mobile interface I wrote to make browsing the Registry of Standard Biological Parts a little easier on smartphones.

This update adds a “Random Part” button – it’s mostly just so people who want to play with it without actually knowing a part ID can get some instant gratification. This is in addition to the quiet update I made a few months ago to replace jQTouch with JQuery Mobile, since jQTouch development stagnated for a while and never really properly supported most mobile browsers.

Wiider postmortem

Andrew Perry — Sat, 28 Jan 2012 03:00:54 +0000

I always intended to write this postmortem earlier … now three years after development ceased, I’m finally getting around to it. Warning – retrospective rambling ahead.

In mid 2007, Nintendo released the Opera-powered browser for their Wii gaming console which they called the Internet Channel. For many people, including myself, this was the first time they had been able to use “Internet on the TV”. Because of the typical viewing distance, low resolution for CRT-based televisions, and the unique navigation interface using the Wiimote, many web sites were functional but not particularly comfortable to use. Many sites targeted at desktop PCs were too complex and heavyweight for the Internet Channel, fonts were often too small such that cumbersome zooming and scrolling was required. I felt this was a good opportunity to write a Wii-browser specific app – in particular, I wanted a news reader that was comfortable to use in a lounge room setting, controlled via the Wiimote.

I started the Wiider project around Dec 2007, as the successor to a Wii-specific news aggregator service I had set up called WiiRSS. The last SVN commit for Wiider was in Dec 2008.

The goal of the Wiider project was to create a web-based news feed reader optimized for the Nintendo Wii Internet Channel. Features included:

Wii-friendly user interface – large TV friendly fonts, simple navigation
Cookie-less view-only access for a personal feed list (via ?key=xxx, bookmarked on the Wii once you’ve logged in)
Wiimote navigation controls, beyond what the browser provides
Painless image zooming (eg Lightbox)
RSS and ATOM feed support
Easy feed discovery using the Google Feed API

While I’ve since retired the project, I felt it would be good to document some of the insights I gained as a result of developing it.

Choosing the domain name

Every cool web two-point-oh-ish app needs a cool domain name. It’s a given. Here’s the list of available domains I brainstormed at the time (good, bad and cheesy):

wiigregator.com
newswiider.com
wii-feeds.com, feedwii.com, feedmywii.com, feedthewii.com
wiilovenews.com, wiinoos.com
atomicwii.com
wiireadr.com
wiiwiire.com
(wiifeeds.com, wiifeeder.com and wiireader.com were taken)

Ultimately, I settled on wiider.com. The term ‘wiider’ had already been used around the tech press as a pun in reference to a (now defunct) Wii friendly interface to Google Reader, released by Google. There were advantages and disadvantages to this – I felt it was more of an advantage since people might feel compelled to search for ‘wiider’ and find my app. I wasn’t too worried about competition from Google’s offering, since my implementation was far better for it’s purpose.

Choice of programming language & web app framework

I prefer Python. I dabbled briefly with Ruby (on Rails), but ultimately still prefer Python. However, at the time when I began this project, there was no clear “one Python framework to rule them all”. Django was one good option, Turbogears was another. Ruby on Rails was exciting (if not hyped), but was a bit of a moving target and it’s application server was very crashy. I chose to use the Turbogears 1.0.x framework for Wiider, since I liked the philosophy of bringing together well tested components rather than reinventing the wheel (eg CherryPy as the web request handler, SQLObject as the ORM). While I now appreciate the Django templating language, at the time I also preferred the approach of Turbogears default template package, Kid (now largely superseded by Genshi), which maintains valid XML templates and allows arbitrary inline Python code if required (MVC separation be damned). Turbogears did the job just fine, but it turned out to be a fast moving project in flux. The 2.0.x release made enough key changes that migration from 1.0.x to 2.0.x wasn’t trivial, such that I was essentially stuck using 1.0.x for Wiider while the core Turbogears development focus moved on to 2.0.x and beyond. Not a showstopper, but a little annoying.

If I was to start the project again today, I’d very likely use Django, since in my opinion it now provides a better battle tested and stable base with a larger library of useful optional modules. Alternatively, these days there are enough Python microframeworks (eg, webapp2, Flask) that can be easily coupled with a templating language and an ORM (everyone seems to like SQLAlchemy) that you can easily roll your own preferred web app environment with a few “pip install” commands, and any of these would have been appropriate for a small project like Wiider.

What worked well

You could read feeds on your TV, via the Wii, lounging on your couch. I used it personally in some kind of beta state, on and off, for about 12 months, reading feeds of interest over my morning coffee. Feeds were ‘auto-managing’ – no read/unread, just the latest stuff, based on a per feed setting for how old items were allowed to be. You could add feeds directly via URL, or via Google’s feed search service embedded in the page and styled to look like it belonged there. I’d tried to design the app to scale – the database model only stored a feed with a particular URL once, even if multiple users subscribed to it.

I learned a lot. About handling ATOM and RSS feeds. About ‘small screen devices’ and modern Python web development. Optimizing web pages for comfortable reading on the Wii presents similar challenges to producing mobile sites for smartphones and tablets – skills that I’ve used on other projects since. Also, I honed skills in mundane things like keeping good project documentation in a wiki (which I, maybe strangely, enjoy) and using bug tracker (all via Trac).

Zero login. One reason why desktop-targeted feed readers were cumbersome on the Wii was the difficulty of logging in using the on screen keyboard. It could be done, but it was slow and cumbersome, and since the Internet Channel clears it’s cookies between restarts, you have to log back in every time you use the app. I solved this problem with Wiider by providing a ‘secret URL’ that would allow users to view their feeds without logging in. The user was prompted to bookmark this page for future use. To add or delete feeds, they would still need to log in, but typically I expected that people would add feeds using their desktop computer, and use the Wii only for reading. This of course meant that feed lists and content were not guaranteed 100% private, since the secret URL could be sniffed on the open network or inadvertently shared; users were warned of this danger. I believe it was a reasonable trade off between usability, privacy and security.

Wiimote controls. The Opera browser maps Wiimote button presses to Javascript keycodes – this allowed mapping of the D-pad left and right buttons to “scroll page up” and “scroll page down” and the ‘1’ button to “scroll to bottom” functions. This made navigation far easier than pointing and dragging to scroll large distances.

What didn’t work

I’m no web designer. It was functional, but could have been prettier. It did however have some smooth show/hide transitions courtesy of JQuery.

Refreshing feeds sometimes failed. Feeds were fetched on demand by the server when the page loaded – this isn’t a good way to do things since it often gave long page load times, and timeouts sometimes occurred. It’s not quite as bad as it sounds – Etags and Last-Modified headers were respected though, so updates only occurred when required. I had no interest in DoS’ing feed providers Feed updates should have been decoupled from the page rendering via cron or a task queue – that’s something I would have added if the project continued.

Unicode. I never quite cracked a few lingering Unicode bugs. Somewhere between the feed parsing with BeautifulSoup, the database ORM provider, the template engine and the web application server, something was munging the Unicode. I learnt everything I never wanted to know about character encodings, but never quite managed to fix it.

Boot-time and startup time matters. The goal was to have a news reader where you could sit down, turn on the TV and read the latest feeds quickly, without too much messing about. This is typically the appeal of a games console over a fully fledged desktop PC – startup speed, reliability and simplicity. Personally, for me, the Internet Channel cannot provide that in it’s current form. To launch Wiider on your Wii, you needed to:

Turn on the TV, turn on your Wii.
Press ‘A’ while waiting for Nintendo’s unskippable health and safety message to disappear.
Launch the Internet Channel. Wait a little.
Go to bookmarks, launch Wiider, wait a little.

While this may still sound simple, in practice I felt it was still too much time and too much work for a user wanting instant gratification. There are two key waiting times (2 and 3) and one extraneous interaction (3) that I had no control over due to limitations/features of the device. If Nintendo had allowed the health and safety message to be disabled, or immediately skippable, and also allowed bookmarks to web pages to appear as ‘Channels’ on the front page of the Wii, I believe the launch time and simplicity for ‘instant gratification’ would be met. I’d anticipated that Nintendo would continue to develop the Wii as a lightweight browsing appliance, however a feature allowing bookmarks on the home screen never appeared, and I’m pretty sure it never will given that the Wii will be considered ‘obsolete’ next year after the release of the Wii U. In fact, Nintendo have recently started to give indications that the WiiU will make better use of the browser and ‘apps’, so it will be interesting to see what they do with it – Wiider may rise from the ashes as WiiderU.

I’m certainly not blaming Nintendo for the ultimate retirement of my little web app. While Nintendo’s lack of interest in developing the Internet Channel to meet it’s full potential limited the utility of my project, it’s not that they are at fault. They do games, and they do them well. History has shown that they rarely deviate from this formula – facilitating easier access to the wilds of the open web is just not in their nature, even if it’s within their reach. I took a known risk, scratching my own itch and investing some time in a project while making some guesses about where they might head with it. My guesses turned out to be wrong.

Since I never felt the application was ready for casual use, I never really promoted Wiider publicly. I did have a single random signup by a user who must have stumbled upon it and added some feeds. Not really sure how they found it.

These days I prefer the immediacy of my smartphone or tablet for reading feeds over firing up the Wii. On the TV, things like Google TV are emerging, and no doubt many more people have home theater PCs that would run Google Reader or Feedly in a usable fashion. Many people prefer reading ‘feeds’ via Twitter/Facebook/Google+/Reddit link streams. I think Wiider’s niche has narrowed. Ultimately, since I couldn’t see myself using it anymore, I decided to retire the project. But it’s nice to have something ‘finished’, even if it was never really complete.

Here’s an export of the documentation wiki and the source code (tailored to run on WebFaction), for posterity: wiider_source_postmortem.zip

Running a local JABAWS server for Jalview on Ubuntu (11.04 Natty)

Andrew Perry — Fri, 14 Oct 2011 03:52:52 +0000

The excellent Jalview sequence alignment visualization and editing tool has the ability to send a set of sequences to a multiple sequence alignment web service (“JABAWS”) and receive the results in a new alignment window. This is really convenient when you are doing lots of sequence analysis, and Geoff Barton’s group at the University of Dundee provide a JABAWS server that Jalview will use by default.

But maybe the Dundee server is down. Or maybe you think your local machine will do things faster. Or maybe you work on über secret sequences in some Faraday cage bunker with no permanent network connection. In each of these cases, you may want to run your own local JABAWS server and use that instead. In this case, read on.

Download the JABAWS war file (direct link here).

Install Apache Tomcat and the management interface:

sudo apt-get install tomcat6 tomcat6-admin

As root, edit the /etc/tomcat6/tomcat-users.xml file to enable admin access.

Between the tags, add:

where ‘s3cret’ is a secret password for the user ‘admin’.

Go to http://localhost:8080/manager/html/ and login as ‘admin’ and the password you set.

Under “WAR file to deploy”, click on the “Choose File” button, and select the jaba.war file you downloaded.

Now you need to set the permissions of the Muscle/Mafft/Clustal etc binaries that come packaged with JABAWS. Type the following commands:

cd /var/lib/tomcat6/webapps/jaba/binaries/src

sudo chmod +x setexecflag.sh

sudo ./setexecflag.sh

This should do it .. in Jalview, go to Preferences, and under the “Web Services” tab add a new service URL “http://localhost:8080/jaba” (no quotes, no trailing backslash). Now when you load an alignment, your local JABAWS server should appear under the “Web Service->JABAWS Alignment menu”.

For the record .. I tried this under the version of Jetty packaged with Ubuntu 11.04, but I couldn’t get it to work so I gave up and just did it with Tomcat as per the JABAWS documentation.

Links:

This HOWTO is an Ubuntu specific regurgitation of the docs below.

http://www.compbio.dundee.ac.uk/jabaws/manual_qs_war.html

https://help.ubuntu.com/10.04/serverguide/C/tomcat.html

A mobile interface to the Registry of Standard Biological Parts

Andrew Perry — Sun, 24 Oct 2010 08:37:47 +0000

Recently I developed a simple mobile interface to the Registry of Standard Biological Parts – the database that is currently the focal point for parts-based synthetic biology. I’ve called this mobile interface mPartsRegistry and I thought it would be worth outlining it’s features and sharing some notes about the project, in case someone else finds it useful.

mPartsRegistry is a simple interface to the Registry of Standard Biological Parts aimed at mobile smartphone browsers. It’s powered by the Parts Registry API, which provides a simple RESTful interface to key metadata about parts in the database. It features:

A simple interface tailored for mobile WebKit browsers (Android browser, mobile Safari, probably others). Web-based, zero-installation required.
Basic search of the Registry by part name.
“Favorite parts” to locally bookmark parts on your device.
Provides basic metadata associated with parts, including size, description, authors, DNA sequence, categories and availability.
Freely available and recyclable source code, released under the MIT License (fork it on GitHub).

The idea for a mobile interface to the Registry came out of a moment in the wet lab, where I was supervising the Monash iGEM team, and someone asked “How many basepairs is that part again ?”. I’ve found most ideas for smartphone apps in the lab a little contrived; nothing more than an excuse to jump on the Android or iOS app bandwagon, with limited practical utility. This was a situation where I could genuinely see a use for a simple mobile interface to look up some reference information, so I thought I’d create it.

The goal is not to completely replicate the functionality of the Registry (at this stage the API would not allow that anyhow), but to provide simple mobile-friendly interface to quickly look up important data about a Biobrick(tm) parts in a laboratory setting, where accessing a desktop computer is often less convenient. In this context, you generally know the part name (eg B0034) that is written on a tube, but would like to quickly lookup some details.

The project consists of two main parts: the web frontend, build using jQTouch and Django templates hosted on Google App Engine, and the parser backend (partsregistry.py) that deals with directly querying the Registry API.

The application uses BeautifulSoup on the server side to parse the XML served by the Registry’s API. This parser may be useful as a generic Python interface to the Registry API for other projects, although it is not yet feature complete. Why parse the XML on the server rather than the client ? The Registry API does not offer JSONP callbacks, making direct client-to-API queries by a web app served from another domain tricky (Same Origin Policy, yadda yadda). While this probably could have been done in straight clientside Javascript if I’d used some type of cross-domain AJAX hack, parsing on the server side also opens the possibility in the future to ‘value-add’ to the data in some way, potentially incorporating extra data not served directly by the Registry API, before it’s sent to the client.

Google App Engine works as a cheap hosting solution for a low traffic app like this, which is likely to stay within the free quotas. Also, GAE supports Python, and I like Python. jQTouch makes for a reasonable cross-platform mobile web interface, since it is optimized for WebKit-based browsers. While officially jQTouch supports iPhone/iPod Touch and doesn’t have official Android support, in my hands it works well enough on Android (and in fact displayed some minor bugs on Mobile Safari that were not evident on Android). Typically when using jQTouch you are expected to load multiple ‘pages’ all into several div-sections, lumped into a single HTML document. jQTouch then does the Javascript+CSS magic to render fast page switching, which actually working within a single HTML document. Since the main action of this app is to ‘search’, we don’t yet know what the results page will be, so this nice feature of jQTouch is barely used.

Searching for the same part all the time can get annoying, so mPartsRegistry provides a simple ‘bookmarking’ feature where a list of favorite parts can be managed and stored on the device. This is implemented via HTML5 localStorage – if there was demand then this could easily be turned into server side storage, but I doubt it’s necessary. In the future, it might make sense to pre-cache the metadata for any of these “favorite parts” so that the fast page switching features in jQTouch can be used to full advantage.

Currently, the interface does not show information about sequence features, subparts and twins, however I plan to implement these at some point. The Registry API currently does not provide information about samples, literature references or lab groups, but once these are enabled I plan to support this metadata within mPartsRegistry too.

Okay, that’s all kids .. and remember .. take off your gloves before using your smartphone in the lab !

Stack Exchange sites for science

Andrew Perry — Wed, 12 May 2010 05:33:48 +0000

Recently I’ve noticed the emergence of several Stack Overflow-style sites for science-related questions and answers. For those unfamiliar with Stack Overflow – it’s a question and answer ‘forum’ for computer programmers that keeps the signal-to-noise ratio very high through a carefully refined reputation system. Late last year the creators of Stack Overflow launched a hosted service called Stack Exchange, which allows anyone to start their own “Stack Overflow” based around any topic.

http://www.flickr.com/photos/alicebartlett/ / CC BY-NC 2.0

The service is was a little pricey ($129+/month), and I suspect this is one reason why a few open source clones inspired by Stack Overflow also exist. Since then, Stack Exchange sites (or clones) have proliferated – and those working as scientists (or those interested in science) haven’t been neglected. Here are my favorites:

MajorGroove.org pitches itself as a ‘forum for biologists’, which it is, however most of the content currently focuses on X-ray crystallography and associated techniques. It is currently in ‘bootstrap mode’, which means that reputation requirements are a little less strict until the userbase and site activity has grown to a critical size. Is there even a need for a Stack Exchange forum for biological crystallography ? Macromolecular crystallography already has a single, central, de facto standard forum – the CCP4BB mailing list. While it may be antiquated by Web2.0 standards, CCP4BB works well for a lot of people, and there is a huge amount of useful and important information buried in it’s archives. For many crystallographers, it seems CCP4BB would only be extracted from their “cold dead hands”. Despite this, I think the Stack Overflow format will be very beneficial for people new to the field. As a side note – I discovered MajorGroove via Graeme Winters XIA2 blog right around the time when I was considering kickstarting a “Stack Overflow for crystallography”. At the moment it seems that a small userbase of crystallographers is already established on MajorGroove and there would be no purpose for another near identical forum. Even if questions about other techniques in the biosciences start to dilute out the structural biology, one click on the ‘crystallography‘ tag or the ‘ccp4‘ tag, and you can get straight to the good stuff. (In fact this feature was deemed useful enough by Google that they decided to bless the ‘android‘ tag on Stack Overflow as the official Android Q&A forum).
NMRWiki Q&A (http://qa.nmrwiki.org/) is a StackExchange-clone for magnetic resonances, mostly focused on NMR, but also open to EPR/ESR and MRI users. It’s not actually running on the StackExchange platform, but uses the open source OSQA / CNPROG clone, built on top of Django. As far as I know, there is no “CCP4BB for NMR”, which makes the NMRWiki Q&A site potentially even more valuable to structural biologists than it’s crystallography centric cousin, MajorGroove. Back when I was doing my PhD using protein NMR spectroscopy as my primary technique, there were very few good resources like this online – I do less NMR these days, but you can bet that I’ll be using the NMRWiki Q&A and it’s associated wiki to refresh my memory and catch up on need methodological developments in the future.
BioStar (http://biostar.stackexchange.com/), a StackExchange for bioinformatics, computational genomics and systems biology questions and answers. This one is busier and better established than the above mentioned forums, probably by virtue of the fact the bioinformaticians spend more time in front of the computer than your average molecular biologist or structural biologist.
And, for a bit of fun: Skeptic Exchange (http://exchange.bristolskeptics.co.uk/), which covers rational questions and answers to various topics including pseudoscience, faith healing, the supernatural and alternative medicine.

Want more ? There are a bunch of science related StackExchanges listed under “Science” here: http://meta.stackexchange.com/questions/4/list-of-stackexchange-sites .. and digging back through the FriendFeed archives I see Matt Todd initiated a concise listing (which if I’d seen, I probably never would have started this post).

And now, the latest* news Stack Exchange 2.0 will be ‘free‘. It looks like they are trying to structure the new Stack Exchange ecosystem a bit like the Usenet hierarchy (comp.*, rec.* etc), with a fairly involved discussion, proposal and acceptance process for new sites – it’s unclear yet whether this approach is going to work out better than just open sourcing the whole shebang, but time will tell. My guess is that BioStar, MajorGroove and probably even an incarnation of NMRWiki Q&A will eventually become part of this formalized ecosystem.

On one hand making StackExchange sites free to run is great – it lowers the barrier to entry to allow many more sites to emerge and operate. On the other hand, as we have seen with the acquisition of FriendFeed by Facebook, not having a clear revenue stream can ultimately leave communities (such as The Life Scientists) without any certainty in a sites future, potentially impacting growth and participation. Personally I’m much more inclined to invest time in a site if it is something like Wikipedia, where I know my contributions are very likely to live on, in some form, for decades (centuries ?) to come. Ideally the archives of these new Stack Exchange sites could become useful online resources for decades to come – but with a single company at the helm and a “Web 2.0 business model”, continued operation for even a decade seems unlikely. The one saving grace: all content on the new Stack Exchange sites will be licensed under a Creative Commons license – so if Stack Exchange itself is acquired and shut down, we will always be able to preemptively leech the archives and provide them online elsewhere. Maybe it’s strange that I’m already thinking about archiving the new Stack Exchange upon it’s demise before it’s even begun, but I think it’s important to take the long term view with our data and recorded wisdom. Unlike when in 1994 when GeoCities (R.I.P) was started, teh Internets is no longer a fad – the hard disks connected to it are fast becoming the sum of all accessible human knowledge, so we’d better make sure we can retain the good bits for a little longer than 10 years.

* – as all too common these days .. I’m a little behind the curve on this one. I meant to finish this post a month ago, but with a busy time pre-holiday, then the actual holiday, a month has gone by.

The Great Australian Internet Blackout WordPress Plugin

Andrew Perry — Fri, 22 Jan 2010 00:28:49 +0000

Normally I stick to posts about science and technology on this blog. Like most Australians, I vote in elections, try to remain informed, but otherwise stay away from getting involved in politics. However, occasionally certain things become important enough issues that they need to be advertised more widely.

As you may know, the Australian Federal Government is attempting to censor the Internet within Australia by forcing ISPs to block a list of websites. This proposed internet filter will not be optional; it will effect all Australians, and the blocklist will compiled by a small group of people. The list of blocked sites will remain secret, so the Australian public will find it difficult to determine if this power is being abused. It will not prevent the spread of illegal material, which is typically shared via peer-to-peer networks that will not be blocked by the internet filter. If it is not already self evident why this approach to internet censorship is both an ineffective, a waste of resources and a potential threat to the freedom of information flow required for a healthy democracy, you can read more at the Great Australian Internet Blackout site and the Electronic Frontiers Australia site.

The Great Australian Internet Blackout is a combined online and offline demonstration against this imposed online censorship. For one week – January 25-29th – Aussie websites will “black out” to inform an even wider audience about the threat of imposed censorship.

This is what it looks like right now. I'm guessing that on January 25th something exciting (or educational) will appear inside that popup box !

I’ve created a simple WordPress plugin that makes it a little easier to participate in the demonstration and spread the word. It uses the ‘blackout.js’ script written by John Ferlito to display a popup box that tells the user about the Great Australian Internet Blackout, while “blacking out” (significantly darkening) your website in the background. Once the user closes the box things go back to normal – it uses cookies so they only see the popup once.

Download the Internet Blackout WordPress Plugin

(version 0.9, md5: 16522abb4d492f445a4c5ffccd845c73 )

{{{

git rm path-of-file-to-kill

}}

Install it as you would any other simple WordPress plugin – eg, unzip the archive in your wp-content/plugins/ directory on the server. Also, online demonstrations are all well and good, but that shouldn’t be where it ends. Finish the installation by Contacting your Member of Parliament.

This is my first WordPress plugin, so it may be sub-optimal (or even contain bugs !). I’ve put the Internet Blackout plugin source on Github so that programmer-types can fix it, if need be.

2009 – the posts that never made it

Andrew Perry — Sat, 02 Jan 2010 11:57:23 +0000

So, people tell me 2009 ended recently. Apparently there were fireworks and stuff. This blog as seen very little action during 2009, despite my various good intentions for a blog ‘reboot’ (ala Pawel).

Like many of my online friends, I blame FriendFeed. I find commenting on a FriendFeed post a much more productive way of having a conversation around some new development sweeping the web than writing a dedicated blog post. Still, despite this being my “year of FriendFeed”, I started writing a few blog posts / articles / essays this year which never made it out of the Drafts folder. There is a positive side to unpublished drafts – they serve to nicely organize some thoughts, even if they are ultimately never shared. Anyhow, it’s time to clean them out and move on – and as part of that process – here are the highlights of my posts that never were.

“Why biohacking cannot come of age”

I wrote quite a long essay around the time that synthetic biology was getting lots of press, and just before DIYbio appeared on the scene (as a side note: the name “DIYbio” is PR genius – taking the ‘hacking’ out of biohacking to help avoid misinterpretation by the mass media was a smart move). The opening of this defunct post pretty much sums up it’s contention:

“A healthy biohacking ecosystem requires the participation of hobbyists, and will fail to flourish in the same way ‘Information Technology’ and ‘The Internet’ have flourished if participants remain confined to academic and commercial labs.”.

The old Silicon Valley example (myth?) of the two guys, both called Steve, launching technology from their garage was cited. I then went on to state the obvious – current regulatory frameworks surrounding recombinant DNA and genetic modification make most serious pursuits by hobbyists acting alone legally dubious. Ultimately, I chickened out and decided it was better left unpublished, but a highly modified version my emerge one day. Key links:

The case of Professor Steven Kurtz
“The bio-security framework is going to collapse. — Drew Endy“
Good chemistry kits are hard to buy these days
The “precautionary principle”

IceCondor – continuous location tracking

Around the end of 2008 when I was momentarily in employment limbo, I began to write an Android mobile geolocation app and started playing with Don Park’s IceCondor. I decided to highlight it with a blog post, but never got around to ultimately publishing it. Essentially, IceCondor is/was a location sharing app, but unlike BrightKite, FireEagle, Google Latitude, Foursquare (& Twitter, these days), IceCondor does continuous location tracking. eg, your GPS location can be shared every 30 seconds via 3G on your Android device (although high frequency updates eat the battery quickly, so lower frequency updates are more practical). IceConder (initially) didn’t include any privacy settings – all locations were openly shared online, with individuals identifiable via their OpenID. As far as I could tell, the only two individuals that gave it any significant use were Don Park, and myself. My main point for writing about IceCondor was to argue that wilfully sharing your location in realtime and opting out of some privacy may actually be safer that not sharing your location. I believe that for most people there is more chance of being randomly mugged than actively stalked, so letting people know where you are is a Good Thing(tm). Don has since changed the focus of IceCondor (at least in the version on the Android Market) to be a simple GeoRSS reader. I get the impression that he is working on other things these days, but the original software and it’s potential uses are pretty cool – it lives on at GitHub, and I notice he has been poking at it again recently.

(Re)-discovering Pymol

I get a little sad thinking about this particular post. I’d planned to write about some lesser known functions of Pymol that I had recently discovered (namely the -p, -R and -G commandline options), but never got round to investigating them thoroughly enough to warrant a blog post. Some time after starting the draft and then leaving it to languish, the author of Pymol, Warren DeLano, tragically passed away at the age of 37. I never met Warren, but I was a grateful user of his amazing software, and I wish his family well over what must have been a difficult festive season without him.

Protein sequence clustering tools

I planned to write an article comparing protein sequence clustering tools. I still might, but here is the unannotated list so far:

CLANS
Blastclust
CD-HIT
MCL / TribeMCL ( http://micans.org/mcl/ )
an excellent list of sequence clustering tools on Wikipedia

Spam as an indicator of social network success ?

Surely there are already multiple essays on this topic by social media and internet culture enthusiasts. I’ve only searched briefly. The idea for this post was stimulated by some advertising that was sent to me via my delicious inbox (On an unrelated note: 2009 was the year I moved to Diigo for social bookmarking). This spam wasn’t as indiscriminant as the usual “enlarge your whatever” you expect by email, but some fairly niche advertising for cheminformatics software … while probably not spam in the strictest sense, it was nonetheless “spammish” in nature since numerous others were also targeted (via delicious “for:” tags). Neil Saunders also noted that he had seen some spam on Slideshare. Key ideas:

Is spam an indicator of social network self-sustainability, ‘viral growth’ or ‘critical mass’ ?
or is it an indicator that ‘stationary phase’, the slowing of growth, has begun ?
Just as “the network interprets censorship as damage and routes around it“, does spam “interpret small networks as inviable, and avoid them” ?
How does this relate to the cost / reward – ie. cost of spamming vs. potential audience – see Economics of Spam.

Synthetic biology 4.0: reflections on the state of play

This is one I’d totally forgotten about until now, from late 2008, written shortly after I’d attended the Synthetic Biology 4.0 conference in Hong Kong. It contained the picture below, along with lots of opinion.

Gartner’s hype cycle

On re-reading it, I’ve decided to make some final changes and retro-publish it anyway. It’s not the most coherent article I’ve ever written, and some of my opinions have probably changed in the last 12 months, but I couldn’t bring myself to just trash it.

More thoughts on Biopython from a non-contributing shoegazer

This post was a little bit of a rant/analysis that probably better belongs on the Biopython development mailing list. It was started by Chris Lasher lamenting that academic researchers are rarely encouraged to work on tools like Biopython, and continued summarizing various peoples ideas on why Bioperl still remains in dominant usage, over Biopython. My main conclusion (if there was one), was that the Biopython team over the years has tended to do a good job by maintaining a high standard of quality by deprecating unused, undocumented and unit test-less code … but sometimes perfect has been the enemy of good. Plus, Bioperl had a head start

The Golden ratio in molecular biology ?

This one has been sitting in Drafts since 2007. I really should just dump it, but the idea still appeals to me. The Golden ratio does appear in nature at the macroscopic level, so why not at the micro- or nano- scale ?

Here’s a choice quote from my notes that may explain why I haven’t yet finished this post:

I think one difficulty in searching for this type of stuff is that the Golden ratio is popular with those into “numerical mysticism”, so if PubMed gives you naught, you have to wade through a lot of kooky pseudoscience in the Google hits before you find the “real science”.

Maybe it will see the light of day in 2010, you never know.

Computation in a single cell … how many logic gates would fit ?

Well … you tell me

A proposal for encouraging user contributed annotations to Uniprot

Andrew Perry — Mon, 03 Aug 2009 09:21:56 +0000

Today I attended a presentation by Maria J Martin about Uniprot and various other EBI database services. At the end of the talk, someone asked something to the effect of “How about simplifying user submission of annotations / corrections” – they wanted something in addition to the current ‘free text’ feedback and comments forms, and wanted a way to easily suggest annotations in a structured way. There was some suggestion of wiki’s etc, and how this had been tried to some extent, but they hadn’t got it right yet.

Here is my take on an approach to user submitted content to Uniprot. Essentially users should be able to add/change annotations piecewise, directly via the standard Uniprot web page for each protein record. These changes would ‘go live’ immediately, but since a large part of the value in Uniprot lies in its curation by expert annotators, the interface would also provide a very clear separation between user-submitted ‘uncurated’ annotations and the current expertly curated annotations.

I’ve made some mockups of how some parts of the UI may look in my little fantasy world:

• User login box at top (eg, OpenID)
• A History tab at the top.
• User submitted changes tab.
• Maybe a “Discussion” tab, ala Wikipedia (not pictured).
• Each field, or block of related fields, would have an Add/edit button at the top right of the block. (I’ve chosen the Universal Edit Button as an example)

Aftertought: Maybe putting these features under tabs isn’t quite the best place, since the existing tabs are ‘actions’ that can be taken rather than ‘extra info’ to be viewed. This UI detail could certainly be refined.

This proposal has many wiki-like features (history, attribution, open editing, curation by trusted users and potentially page/section locking) but doesn’t really fit my definition of a wiki since the input format is not free-form wiki-text, but is instead constrained by the interface to enforce the submission of (mostly) structured data (eg, a traditional data entry into an HTML form, or in-line editing of fields).

Any authenticated user would be able to add or edit fields by clicking on the “Add/edit annotations” button associated with that block (see mockup above). They would then be sent to a page where they can click to edit a particular field (in this case a point mutation and associated change in function), or click “Add new” to add a new mutation field and fill out the details (I didn’t make a mockup picture for this .. use your imagination). They also must specify one of the standard “evidence codes” from a dropdown box for each change/addition, including the PMID of a publication if relevant. User submissions are automatically flagged with some type of ‘user submitted’ flag too, and a username. Homologs (from UniRef clusters) could also be listed here to remind the user that certain annotations might need to be propogated to other members of the same family, if required (otherwise the curators would do this part, when applicable, for the next Uniprot release). For all I know, Uniprot may already have an interface similar to this, already in use by their professional curators. In effect, I’d like to see the 37signals “One interface” dictum applied.

User submitted changes would not automatically go live on the main Uniprot record page, but can be seen by clicking the “User submitted” tab at the top. Alternatively, the user submitted annotations could be put at the bottom of the page, like most blog comments, but clearly differentiated from the curated data by colour and other visual queues. The REST API could be told to include/exclude uncurated user annotations in responses by an extra query flag in the request (eg &userannotations=true). Uniprot curators can periodically review user submitted annotations and integrate them into the official Uniprot release as they see fit.

Under the History tab, the history of changes to that Uniprot record, both by user submitted changes and by Uniprot release would be available. This functionality is already mostly available under “Entry history->Complete history” at the bottom of the page, but user submitted annotations would also be included here with appropriate diff colouring (eg, coloured differently to curated changes, until they are officially accepted).

Providing user pages at a URL: http://www.uniprot.org/user/some_sensible_username with an associated RSS/ATOM feed would encourage participation by highlighting individual user contributions, and potentially allow a Wikipedia-like community of expert/fanatical annotators to emerge.

The Discussion tab would be used in much the same way Wikipedia Talk pages are – passive users, contributors and curators would be able to discuss the finer details of any submitted annotations. I’m of two minds about this one, since anyone who has read Wikipedia Talk pages knows things can get quite ugly there sometimes. On the other hand, the communication it allows would be important for building a community of annotators and helping clarify contributions.

PS: I’m a Uniprot fanboy. Can you tell ?

Occyd : tagging for locations

Andrew Perry — Sat, 14 Feb 2009 02:08:16 +0000

Those who have been watching may have noticed I quietly started developing an Android application in the last month or so. It’s still super-buggy and far from feature complete, but I thought it was time to announce it here (“release early, release often”). It’s not ready for real users yet, but developers may like to take a little look.

Occyd (-k d .. sounds like rockied or oggied) is an application for tagging geolocations, aimed at GPS-enabled network-connected devices. It currently consists of an Android client, and a server backend running on Google App Engine. The (evolving) API is simple enough that it should be easy to write clients (or servers) for various platforms. The idea is to enable people to tag locations on the surface of the planet with a list of keywords, just like they can tag web pages with delicious. They should also be able to search for tagged locations, based on tag(s), on distance from their current location and recency of the post.

Here’s one possible elevator pitch (for a very long, slow elevator ride):

“You are a member of a large bird watching club. Your members like to record where they have spotted various species, and use Occyd to share the locations at which they have sighted various birds. You are out in the park, when you spot the rare Orange Bellied Parrot. You pull out your Android phone, fire up the Occyd client which automatically knows your location via GPS, and tag that current location ‘orangebelliedparrot parrot birds’. You then decide to see if others have spotted parrots in the area. You search for ‘parrot’ in the Occyd client; a map appears showing the locations of all the other sightings tagged ‘parrot’ in your vacinity. You tweak the search settings to show only ‘parrot’ sightings within 100 metres and 14 days … on the map you see that your friend RobHill spotted an Orange Bellied Parrot here last week – looks like the numbers of this population are recovering !”

Ponder for a bit, and I’m sure you can think up at least a handful of other great uses (tagging good fishing spots, favorite cafes, or maybe even sightings of parking inspectors ).

As with any new project, there are lots more ideas than time to implement them (and I have a day job that doesn’t involve Occyd …). The Occyd Android client and Occyd GAE server source is currently available under the GPL v3 on GitHub, and I’m keeping all my documentation and notes on the Occyd Android client wiki provided at GitHub. Watch this space ….