ptigas blog

Down To Earth

admin — Sun, 11 May 2014 16:04:22 +0000

It was about a year ago when I first participated NASA’s space app challenge. I went there with no expectation but in the end I had great fun, met amazing people and after all, I was dealing with NASA’s problems. A few weekends ago, the 3rd international spaceapp challenge took place again and in London this time was hosted by Dana’s center, a few steps away from science museum.

I got at the venue quite late so I missed most of the challenges presentations, however we didn’t miss a minute and started immediately working on the project. This year we decided to follow a different approach. Instead of getting our hands dirty with the implementation we gave significantly more time on brainstorming and concept development, putting the presentation as our top priority. This approach helped us organize our time significantly better, working only on the items that matter.

Anyway, we worked hard and smart and our efforts got rewarded in the end. We managed to get the people’s choice award in London and get in the list of the top 25 projects selected as finalist for global people’s choice award. Other than that, you know, tons of fun, great people and celestial problems.

Bringing Space Data Down to Earth! @spaceapps People’s Choice http://t.co/OcLBUEsEG8 #spaceapps #downtoearthapp pic.twitter.com/J9HhlV8eVK

— beth beck (@bethbeck) April 30, 2014

The challenge

After the typical procrastination and “what about this project” moments we finally decided to solve the asteroid visualization task, according which we had to visualize a publicly available dataset about near-earth asteroids in a way that make sense. That sounds pretty straight forward, right? Well, it’s easy to think the trivial solution and go and start building fancy infographics that don’t actually mean anything to you. We got so many parameters in the dataset that we couldn’t even decide what to visualize. After a few discussions with people (thanks James Parr for the inspiring ideas), we thought that what actually matters is to make the data easily interpretable.

To mobilise resources for asteroid research, we will need to create empathy for the potential moral and economic hazard that these objects present

James Parr, Founder of Open Space Agency

Imagine you read that an asteroid has a diameter of 1.7 km and the minimum distance from earth is 5 AU. What does this tells you? I’ll bet nothing. What if I tell you instead that the distance is 120 times the distance of Eiffel tour from Big Ben or the size is 7 times the size of a London eye (given that you’re a Londoner). That’s quite easier to visualize, right ?

Our proposed solution, creates such familiar analogies with open data about asteroids, using your location and an open collaborative human knowledge database (freebase.org). Using this database we can get buildings or monuments that are near you, assuming that that’s something familiar to you (most of the times it is). Then, we use the retrieved buildings to make analogies that matter to you.

What about earth blast impact though? How do you visualize this? Well, you can always visualize the crater of a catastrophe fictional scenario, but you can’t really empathize with that. What if I tell you that when asteroid X hits London 100k people will die or the GDP of the destroyed area will be 1.5 million pounds? That’s easier to understand. Again, using freebase, we retrieved details about popularity density and we do the maths for you to get an idea.

Next plans

That was a proof of concept and our plan is to make more analogies. For example, something we wanted to show you is the probability of an earth blast event (e.g. one asteroid might hit earth with probability 0.000005 or equivalent of a thunder hitting a deer in Alaska. You get the point.)

Maybe though, the whole idea of visualizing using familiar analogies would be great to be applied in other domains as well. I haven’t researched about this yet, but it sounds like a cool idea.

Team

Paris Selinas, Panagiotis Tigas, Dionysia Mylonaki

Links

Project page in spaceappchallenge.org

Cave of Sounds Documentation

admin — Wed, 12 Mar 2014 01:34:03 +0000

Last year I worked in a project called Cave of Sounds, a residency hosted by Music Hack Space. For 10 months, 8 people worked together to create an interactive music installation inspired by primitive music interactions. The outcome was documented by the amazing Mind the Film. Check caveofsounds.com for more details.

23 and me

admin — Sat, 25 May 2013 20:26:57 +0000

Yesterday I received my results from 23 and me, after 6 weeks of waiting. Well, that was quite exciting for obvious reasons, but most importantly due to the fact that I was able to download the raw data in my computer. Let me repeat. I was able to download raw data in my computer. 7MB text file.

You should be excited too. That’s a miracle. You know, we know about biology, nature and stuff, but having the algorithm ( actually DNA is a turing-complete language [1] ) that builds you in a downloadable, searchable format is something that wasn’t possible a few years ago. Actually, it wasn’t possible for billions of years.

Alright, holds your pants on though because actually you don’t get your complete genome sequenced but only the SNPs ( single nucleotide polymorphisms). Let me explain, since once again journalists screwed this up and managed to create a myth. So, what are SNPs ? To put in a simple way, it’s a single change in your DNA, a mutation, an accident which actually make each one of us different. Around 10,000,000 of those are more than enough to enable huge variation in humans phenotypes. 23andme, after they extract the SNPs of interest, they search hundreds of published results to see if your DNA mutation is correlated with any phenotype – desease, characteristic or even behaviour (smoking addiction).

So, let me give you an example. My report showed that I have an increased probability (compared to the average) of having type-2 diabetes. The evidence for this is a mutation in my gene TCF7L2, which is on chromosome 10, (dna consists of regions, the genes, which make the proteins). This single mutation in that specific position of this gene consist of the SNP rs7903146. You can actually see the position in the gene from here. The normal sequence should’ve been:

... TTTTAGATA [C]  TATATAATTTAATTGC ...

but mine got shitfaced and now it looks like this

... TTTTAGATA [TT] TATATAATTTAATTGC ...

Apparently, that single mutation can change the behaviour of a protein involved in cell signalling, which for unknown reasons it increases the chances of diabetes [2]. You see, it’s almost impossible to figure out the phenotype or the effect of a mutation analytically (no really, there is a proof that protein folding is NP-Complete [3]), bioinformaticians use statistics to correlate phenotypes with patterns in DNA.

If you’re still not convienced about the awesomeness of bioinformatics I have another example for you. Mutations happen to humans for several reasons but most frequently due to errors during DNA copying. Also, offsprings inherit mutations. And some of the mutations get inherited only from mothers – e.g. mutations that happen in mitochondrial DNA (mtDNA). Also, humans coming from Asia share some common SNPs, humans from Europe some others, etc. So, by checking “region specific” mutations in mitochondrial DNA you can actually track the origin or path of your ancestors from the side of your mother’s mother’s etc., although that doesn’t actually matter much since we are all African.

[1] On the computational power of insertion-deletion systems
[2] Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes.
[3] On the Complexity of Protein Folding

Hack The Barbican – Bazaar

admin — Wed, 22 May 2013 19:56:08 +0000

In August 2013 the Barbican will be taken over by London’s largest ever experiment in inter-disciplinary collaboration. Hack the Barbican is a month-long event exploring the boundaries of technology, the arts and entrepreneurship. The core of Hack the Barbican will be a series of residencies hosted in specially constructed studios and workshops spread throughout the public foyer spaces in which a range of practitioners will develop new work across disciplines. Alongside these residencies a constellation of workshops, performances and talks will showcase their creations and explore the heritage of inter-disciplinary thinking. A strand of entrepreneurial learning sessions will bring young people into the thick of the action and help them develop new skills and perspectives.

A few weekends ago, I spend almost 48 hours, 4 floors under Barbican’s ground. Visiting this space for the first time, I must admit it’s currently one of my favorite places in London (this awkward complex of buildings, a masterpiece of architecture, was the vision of the future 30 years ago – proper retrofuturism).

The event, called the Bazaar, was about getting a taste of what is going to happen in August at Barbican and get to know each other through collaborative activities. The main concept of the Bazaar was to conceptualize and implement an idea within 48 hours and then present it, but most importantly have fun.

My participation was with Music Hack Space where we presented for the first time a first draft of our installation, a project led by Tim Murray-Browne, “Cave Of Sounds”. For this project, 7 people collaborated for the last 6 months to create an installation, consisted by DIY instruments, inspired by primitive cave music.

Resolutions for 2013

admin — Sat, 05 Jan 2013 13:43:47 +0000

The last two years my life changed significantly. From a student I was the last 18 years of my life (6 in university, 12 in school), I moved to London to start my post-studies life. In short, here is a list:

Got my MSc in Machine Learning and Data Mining, from university of Bristol.
Became financially independent.
Moved to London.
Worked for an awesome startup, buyometric, for 1 year.
Started working for Microsoft.
I watched Herbie Hancock live.
Participated Music Hack Day. A goal I had the last few years now has a huge tick symbol next to it.
Started participating music hack space. Amazing space. Amazing people.

new year resolution

I can summarise my goals for the following year in the next sentence. Become better in my expertises and invest more time in creativity.

Make writing a habit.
Prepare / search for a PhD.
Start playing the guitar again. I feel guilty not having played for many years.
Start writing music more seriously – Finish at least 4 tracks.
Write at least two conference paper.
Become a better scientist and exercise math and compsci skills as much as possible. That should be expanded to write a review of an article / publication / new knowledge at least every week.
Participate in Hackspace Ensemble residency project
Improve my thesis project (Real time jam session support system) and present / publish it. Give a talk at Music Hack Space about it.
Travel and have great vacations. I haven’t have proper vacations for the last 3 years. Need some serious inspiration.
Wake up earlier.
Run a marathon (I can see myself already failing on that :p)
Listen to more music.
Finally, those four rules from John Maeda http://creativeleadership.com/2012/11/20/my-four-rules-1999/

Happy New Year to all!

YouBox – Our hack for London’s Music Hack Day (2012)

admin — Wed, 28 Nov 2012 20:00:56 +0000

One thing I had in my todo list for several year was participating in Music Hack Day. Finally, this year I made it. I was lucky enough to book my ticket during the 5 minute window of the second round (the first time it sold out in 2 hours). With my team (Tijl De Bie and Raul Santos-Rodriguez) we developed YouBox, an app for the gameification of the party experience.

The main feature list contains a mobile app where people can request tracks but also vote and give implicit and explicit feedback to the system. A “brain” then tries to recognise the best DJ of the party and give priority to his suggestions. Eventually, when the party finish, the system awards people according their DJing and Dancing performace (we take accelerometers measurements).

The whole experience was amazing. I met and collaborated with interesting people, had fun coding non-stop for 24 hours and bootstrapped a project. Did I also mention that our hack, got featured in wired’s 10 top amazing hacks from Music Hack Day ?

In detail our hack consists of the following three components.

Party host:

The party host creates a new party on Spotify, generating a unique party code (QR or text), and specifying the type of party (club / home party / bar / lounge). He can optionally choose to add one or several bots who continuously request tracks of a certain style. Optionally, the party host could award a “YouBot party DJ award” and / or a “YouBot party animal award” for the best scoring requester / dancer.

Guests:

Guests check into the YouBot phone app with the code, allowing them to:

[Consumer-mode] see the currently playing track on their phone, and give explicit like / dislike feedback. (Their dancing activity, measured as energy of the accelerometer signal around the bpm frequency, is recorded as implicit feedback.)
[DJ-mode] request tracks during the party.
[game-mode] see a leaderboard with their feedback score and dancing score, and the top-scorers.

YouBot:

YouBot randomly selects the next track from the set of requests, biased in several ways (implicit / explicit feedback score of the requester, number of requests made by the requester, but also how recent the request is and similarity to previous track). The type of party determines the default biases (e.g. implicit feedback from dancing is less important in a bar than in a club). In this way YouBot initially explores which requesters get good feedback and soon start exploiting the knowledge of who understand best what keeps the party going.

If you are interested, you can find more details in youbox.fm .

Evolution of Ovibos Moschatus

admin — Sun, 01 Jul 2012 10:32:07 +0000

A couple of years ago, I returned home, a bit drunk, and started randomly clicking on wikipedia articles. I ended up reading about muskox.
My first response was surprise and excitement by it’s severe ugliness. Several years later, during my MSc studies, I attended Computational Biology and part of this course was to do a project with an animal of our choice, ask a question about it and answer using data. So, I decided to choose “muskox” and the question to ask was, “where does all this ugliness comes from?”. The study is a bit technical but the answer very interesting.

Source code of this study can be found in my github. The code is in MATLAB and depends on bioinformatics toolbox.

Introduction and Data description

This report investigates the relation of ovibos moschatus (muskox) with several animals of Bovidae family. Due to morphological similarities with Budorcas taxicolor (takin) I tested the hypothesis that those two animals have diverged from a recent common ancestor.Also, I compared muskox with the sheep and takin to see where is closer to. Finally I answered the question of where was the primary location of muskox.

Ovibos moscatus is an Artctic mammal which belongs to Bovidae family, Capricae subfamily. Although it primalry lives in Canadian Arctic and Greenland it may be found in Sweden, Siberia and Norway. Several studies (including this one) shows that muskox is closer to sheeps and goats than Budorcas taxicolor (takin), a Bovidae mammal living in Eastern Himalyas, contradicting with their morphology (e.g. size).

Other animals which compared with muskox are common sheep, long-tailed goral, Japanese serow and Taiwan serow. Those animals are located in East Asia, and as we will see, they are the closest animals to muskox. Finally I compared with Gallus gallus (chicken) which I used as an outgroup. Figure 1 shows some basic statistics about mitochondrial DNA of the investigated species. The base count does not reveal any significant difference among the species, except for the chicken whch is the outgroup.

species	common name	accession no.	%A	%C	%G	%T	length
Ovibos moschatus	muskox	FJ207536	33.3698	26.5291	13.2554	26.7787	16431
Budorcas taxicolor	takin	NC_013069	33.8573	26.3095	12.8577	26.9755	16667
Ovis aries	sheep	NC_001941	33.6663	25.8125	13.1259	27.3953	16616
Capricornis crispus	Japanese serow	NC_012096	33.7446	26.7064	13.0067	26.5423	16453
Naemorhedus caudatus	Long-tailed goral	NC_013751	33.6340	25.9580	13.1243	27.2656	16519
Naemorhedus swinhoei	Taiwan serow	NC_010640	33.3999	26.6521	13.3442	26.6037	16524
Gallus gallus	chicken	NC_007236	30.2472	32.4873	13.5240	23.7414	16785

figure 1. Data Description

Data description

All genomes were extracted using NCBI/Genbank database and taxonomy browser. Proteins also extracted from NCBI website. I used positions given in NCBI website to extract genes coded in the mitochondrial genome. Also, I fetched amino acid sequences directly from NCBI database, instead of translating them. The reason was because there were nucleotide sequences with errors (missing or unknown values) where in amino acid sequences were corrected.

For reason I will describe later, I used proteins Cytochrome B (CTYB) and Cytochrome C oxidase subunit 1 (COX1) – commonly used in phylogenetic analyses.

Methods

I compared the nucleotide and amino acid sequences of CYTB and COX1 proteins. I used both sequences because I wanted to measure two different mutation rates. The non-synonymous (N) substitution rate and the synonymous (S) substitution rate (The N substitution is mutation in nucleotide level which result in mutation in amino acid level where S substitution is the mutation which doesn’t affect amino-acids).

One reason I chose proteins CYTB and COX1 was the fact that are coded in mitochondrial DNA. Mitochondrial DNA is very useful for studying evolution since it passes from mother to offspring without recombination. Thus, proteins coded in mitochondrial DNA have the same property. What is more, those two proteins mutate with different rates and different codons in the same protein mutate with different rates, properties very useful for phylogenetic analysis.

Cytochrome B
Cytochrome B is known for it’s fast mutation rate and it is very commonly used in literature for studying evolution even in the same subfamilies. It contains both slowly and rapidly evolving codon positions, as well as more conservative and more variable regions or domains overall. However, many problems have been encountered when using CYTB, including base compositional biases, rate variation between lineage, etc [FOS+01].

Cytochrome C oxidase subunit 1
Cytochrome C is an essential and ubiquitous protein found in all organisms, including eukaryotes and bacteria (Voet and Voet 1995, p. 24). It is responsible for tansporting electrons in the fundamental metabolic process of oxidative phosphorylation.

There is no standar answer of which of the two proteins is better for phylogenetic analysis. [TKL09] suggests that cytohrome b will offer richer information for mammals, however, I used both proteins so as to compare the trees produced.

Phylogenetic tree
Neighbor-Joining was used to generate phylogenetic tree. This choice was due to the assumption that not all lineages evolve at the same rate (molecular clock hypothesis). Although this method might not give us the correct tree, given data of sufficient length, neighbor-joining will reconstruct the true tree with high probability. Also, so as to simplify our experiment, distances were calculated using maximum likelihood estimation (Jukes-Cantor) which assigns equal probability to every possible change of state for a given nucleotide base or amino acid.

BLAST
The introduction of NCBI’s BLAST, or The Basic Local Alignment Search Tool, in 1990 made it easier to rapidly scan huge databases for overt homologies, or sequence similarity, and to statistically evaluate the resulting matches. BLAST works by comparing a sequence against the database of all known sequences to determine likely matches. The BLAST server compares the user’s sequence with up to a million known sequences and determines the closest matches ³.

Results

Analysis of BLAST results
I used BLAST to find similar animals to muskox and the protein used was Cytochrome B. The animals I found were Capra aegagrus (wild goat), Capricornis sumatraensis (Sumatran serow), Capricornis crispus (Japanese serow), Naemorhedus caudatus (Long-tailed goral) and Naemorhedus swinhoei (Taiwan serow). In figure 3 you can see the phylogenetic tree constructed using neighbor-joining and maximum likelihood estimation.

The phylogenetic analysis (fig.3) showed that Capra aegagrus (wild goat) was the closest animal to Ovibos Moschatus (muskox) which seems that diverged from the same ancestor. Also, as I expected from their names Naemorhedus caudatus and Naemorhedus swinhoei were forming a group which diverged from Capricornis sumatraensis. I noticed that the branch height of Naemorhedus swinhoei is longer than Naemorhedus swinhoei’s which reveals that Naemorhedus caudatus evolved more after its diversion from Naemorhedus swinhoei. Finally, I observed that Naemorhedus and Capricornis sumatraensis is closely related to Capricornis crispus.

	Ovibos moschatus	Budorcas taxicolor	Ovis aries	Capricornis crispus	Naemorhedus caudatus	Naemorhedus swinhoei	Gallus gallus
Ovibos moschatus	0	0.0597	0.0541	0.0240	0.0486	0.0240	0.3191
Budorcas taxicolor	0.0137	0	0.0739	0.0569	0.0710	0.0625	0.3154
Ovis aries	0.0058	0.0078	0	0.0514	0.0597	0.0486	0.3377
Capricornis crispus	0.0058	0.0137	0.0078	0	0.0431	0.0159	0.3118
Naemorhedus caudatus	0.0098	0.0137	0.0078	0.0078	0	0.0376	0.3377
Naemorhedus swinhoei	0.0039	0.0117	0.0058	0.0019	0.0058	0	0.3191
Gallus gallus	0.1243	0.1243	0.1199	0.1243	0.1265	0.1221	0

figure 2. seqpdist with jukes-cantor. Above diagonal is for CYTB and below diagonal is for COXI

figure 3. CYTB phylogenetic tree (AA) for the BLASTed animals [phytree]

	Ovibos moschatus	Budorcas taxicolor	Ovis aries	Capricornis crispus	Naemorhedus caudatus	Naemorhedus swinhoei	Gallus gallus
Ovibos moschatus	0.00	12.54	12.98	7.89	9.65	8.60	8.60
Budorcas taxicolor	11.00	0.00	12.89	12.46	13.16	12.98	12.98
Ovis aries	10.23	10.36	0.00	11.32	13.25	11.67	11.67
Capricornis crispus	8.09	10.10	10.49	0.00	8.68	5.96	5.96
Naemorhedus caudatus	8.67	9.45	9.64	6.80	0.00	9.47	9.47
Naemorhedus swinhoei	8.03	10.16	10.68	3.56	7.18	0.00	0.00
Gallus gallus	22.05	23.15	22.57	21.60	22.95	22.31	22.31

figure 4. local alignment % score for each pair (nucleotide sequences)- Above diagonal is for CYTB and below diagonal is for COXI

	Ovibos moschatus	Budorcas taxicolor	Ovis aries	Capricornis crispus	Naemorhedus caudatus	Naemorhedus swinhoei	Gallus gallus
Ovibos moschatus	0.00	6.15	5.56	2.43	4.97	2.43	2.43
Budorcas taxicolor	1.38	0.00	7.65	5.85	7.34	6.44	6.44
Ovis aries	0.59	0.78	0.00	5.26	6.15	4.97	4.97
Capricornis crispus	0.59	1.38	0.78	0.00	4.40	1.60	1.60
Naemorhedus caudatus	0.98	1.38	0.78	0.78	0.00	3.83	3.83
Naemorhedus swinhoei	0.39	1.18	0.59	0.19	0.59	0.00	0.00
Gallus gallus (chicken)	13.25	13.25	12.75	13.25	13.50	13.00	13.00

figure 5. local alignment % score for each pair (amino acid sequences)- Above diagonal is for CYTB and below diagonal is for COXI

	Ovibos moschatus	Budorcas taxicolor	Ovis aries	Capricornis crispus	Naemorhedus caudatus	Naemorhedus swinhoei	Gallus gallus
Ovibos moschatus	0	143	148	90	110	98	98
Budorcas taxicolor	170	0	147	142	150	148	148
Ovis aries	158	160	0	129	151	133	133
Capricornis crispus	125	156	162	0	99	68	68
Naemorhedus caudatus	134	146	149	105	0	108	108
Naemorhedus swinhoei	124	157	165	55	111	0	0
Gallus gallus	342	359	350	335	356	346	346

figure 6. number of polymorphic sites per pair – Above diagonal is for CYTB and below diagonal is for COXI

figure 7. phylogenetic tree based on cytochrome b

figure 8. phylogenetic tree based on Cytochrome C oxidase subunit 1

figure 9. AA and NT consensus phylogenetic trees

Phylogenetic analysis
In my analysis I used CYTB and COX1 proteins which I selected them for their fast evolution rate. I noticed that amino acid and nucleotide phologenetic trees had slightly different results.
This is justified by the fact that mutations happening in nucleotide level are faster that those in amino acid level (N substitutions and S substitutions ⁴). One way to explain this is that there are more than one nucleotide codons which are translated to the same amino acid. Thus, nucleotide mutations doesn’t always result in amino acid mutations.
What is more I found that there were differences between CYTB and COX1 phylogenetic trees. That was because those proteins evolve in different rates. As you can see in figure 8, number of polymorphic sites in COX1 are greater on average than in CYTB, which shows that COX1 evolves faster than CYTB in our animals.

Since there were several small differences between CYTB(AA) COX1(AA) CYTB(NT) COX1(NT) phylogenetic trees, I constructed consensus trees; one for amino acids and another for nucleotides (figure 9). The reason for that was because I wanted to summarise the results from different proteins. As you can see, the consensus trees doesn’t agree for the evolution of Ovibos moschatus and Naemorhedus caudatus. That might be explained by the different rate of nucleotide mutations. However, both trees clearly shows that muskox and Ovis aries (common sheep) diverged from a recent ancestor, both diverged from Budorcas taxicolor (takin), a result that contradicts with our hypothesis that takin is closer to muskox than the ship. Finally, we notice that takin is closer to sheep making them more related than sheep and muskox. Similar results can be found in literature [GS97].

Polymorphic sites
As you can see in figure 5 and figure 6 there are some interesting observations. I local aligned each pair of CYTB and COX1 nt sequences. In figure 6 you can see the number of polymorphic sites per pair. The number of polymorphic sites in COX1 is greater for all pairs compared to CYTB. However, after comparing the percentage of polymorphic sites I noticed that COX1 percentages are smaller than CYTB for each pair. Also, similar results can be observed in amino acid sequences alignment which means that the substitution happening are mainly non-synonymous. This shows that in the animals of our family of interest, CYTB protein mutates in higher rate than COX1 and thus CYTB is more appropriate for phylogenetic analysis.

Conclusions

As you can see, appearance and morphology of species doesn’t show the correct evolution history. Also, statistics of the sequence (percentage of the base) doesn’t contain much information. As I observed that comparing proteins in both amino acid and nucleotide level is more appropriate method for investigating species relations. In my report I showed that ovibos moschatus, although very similar to the takin, is closer to ships. Similar results could be found in in [GS96]. They showed that muskox and takin appearance similarity was due to convergence evolution ⁵, during which appearance similarities arise among species although they have diverged from different ancestors.

Also, we asked about the area of origin of muskox. BLAST results of cytochrome B protein showed that animals from central Asia were the most closely related which is an indication that the area of origin was there. In literature there are studies which shows that the earliest ancestors of the modern muskox evolved in southern central Asia during the late Miocene, more than ten million years ago. During the Pleistocene (1.8 million – 11 500 years ago) the muskox spread from Asia over the northern world [Len99].

http://en.wikipedia.org/wiki/Bovid ↩
http://en.wikipedia.org/wiki/Molecular_clock ↩
http://www.ncbi.nlm.nih.gov/About/primer/phylo.html↩
The N (non-synonymous) substitution is referred to mutations of nucleotides which result in mutation of amino acid sequence where S (synonymous) substitution is the mutation which doesn’t result in aminoacid change.↩
http://en.wikipedia.org/wiki/Convergent_evolution ↩
I P Farias, G Ortí, I Sampaio, H Schneider, and a Meyer. The cytochrome b gene as a phylogenetic marker: the limits of resolution for analyzing rela- tionships among cichlid fishes. Journal of molecular evolution, 53(2):89–103, August 2001.↩
Shanan S. Tobe, Andrew Kitchener, and Adrian Linacre. Cytochrome b or cytochrome c oxidase subunit I for mammalian species identification↩
P Groves and G F Shields. Phylogenetics of the Caprinae based on cytochrome b sequence. Molecular phylogenetics and evolution, 5(3):467–76, June 1996.↩
P Groves and G F Shields. Cytochrome B sequences suggest convergent evolution of the Asian takin and Arctic muskox. Molecular phylogenetics and evolution, 8(3):363–74, December 1997.↩
P.C. Lent. Muskoxen and their hunters: A history. Univ of Oklahoma Pr, 1999.↩

MSc thesis poster and complete text

admin — Thu, 12 Apr 2012 22:07:10 +0000

My initial thought was to start writing posts so as to explain and document my thesis. However as a first step I decided to post here the thesis poster and upload the complete text of my thesis on arxiv.

Voila!

http://arxiv.org/abs/1201.6251

and

Memento decorator in python and fibonacci sequence

admin — Tue, 06 Mar 2012 20:17:37 +0000

Computing fibonacci with the naive method takes exponential time. However, it’s known that by using dynamic programming you can compute fibonacci in linear time. The trick is to check if you have already computed the value of , and if yes return the already-computed value instead of calling the function again (which is expensive).

Here is the naive implementation in python :

def fib( n ) :
	if n <= 1 :
		return 1
	return fib(n-1) + fib(n-2)

Calculating fibonacci sequence from 1 to 35 takes about 8 seconds (8.056s). So, let's how we can improve this by using decorators.

The Memento decorator will allow us to save time by not calling a function which we've already called.

Here is an implementation of Memento decorator:

class Memento :
	__mem__ = {}
	def __init__(self, f) :
		self.f = f

	def __call__(self, arg) :
		if arg not in self.__mem__ :
			self.__mem__[arg] = self.f(arg)
		return self.__mem__[arg]

Let's see what it does. The __init__ function is pretty obvious. It assigns the function f to an object variable. The __call__ function is triggered every time the decorated function is called. By using a dictionary we check and return the result of a function call with the same argument (if we've already computed that). The cost of querying the dictionary is almost .

By just applying this decorator to our function

@Memento
def fib( n ) :
	if n <= 1 :
		return 1
	return fib(n-1) + fib(n-2)

we manage to compute the first 1000 fibonacci numbers in less than a second.

Here is a gist of this example https://gist.github.com/1988734 .

Loop detection in O(n) using Floyd’s algorithm

admin — Sat, 03 Mar 2012 21:32:29 +0000

Given a linked list find whether the list contains a loop or not.

There are several solutions to this but the following is considered one of the best. Also it’s dead simple. The algorithm is also known as “Tortoise and Hare Algorithm”

def toirtoise_and_hare(l) :
  tortoise = l.head
  hare = l.head
  steps = 0
  while True:
    if hare == None :
      return (steps, 'No loop found')
    hare = hare.next
    if hare == None :
      return (steps, 'No loop found')
    hare = hare.next
    tortoise = tortoise.next
    if hare == tortoise:
      #print hare, tortoise
      return (steps, 'Loop found')
    steps += 1

Below you can find a simple implementation of linked list in python

class Node :
  def __init__(self, data) :
    self.next = None
    self.data = data

  def __str__(self) :
    return str(self.data)

class LinkedList :
  def __init__(self) :
    self.head = None

  def add_node(self, data):
    if self.head == None :
      self.head = Node(data)
    else :
      tmp = Node(data)
      tmp.next = self.head
      self.head = tmp

  def add_loop(self, f, t) :
    node = self.head
    i = 0
    node_f = None
    node_t = None
    while node != None :
      i += 1
      if i == f :
        node_f = node
      if i == t :
        node_t = node
      node = node.next
    node_f.next = node_t

  def print_list(self):
    node = self.head
    while node != None :
      print node.data
      node = node.next

… and test …

l = LinkedList()
for i in xrange(1000) :
  l.add_node(i)

l.add_loop(800, 200)

print toirtoise_and_hare(l)

As mentioned the complexity of the algorithm is .

Here is the complete gist.