<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Mailund on the Internet</title>
	
	<link>http://www.mailund.dk</link>
	<description>Computer science, bioinformatics, genetics, and everything in between</description>
	<lastBuildDate>Tue, 23 Feb 2010 11:28:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/MailundOnTheInternet" /><feedburner:info uri="mailundontheinternet" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>First day of the workshop</title>
		<link>http://feedproxy.google.com/~r/MailundOnTheInternet/~3/_gC_VgZNF8U/</link>
		<comments>http://www.mailund.dk/index.php/2010/02/23/first-day-of-the-workshop/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 11:28:07 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2037</guid>
		<description><![CDATA[It&#8217;s seven in the morning and I&#8217;m sitting in my hotel room having coffee.  I got up an hour ago, and now I only have decaf left in my room, so having more coffee seems a bit pointless&#8230;
Anyway, yesterday was the first day of the workshop, and I had great fun.  There were only three [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s seven in the morning and I&#8217;m sitting in my hotel room having coffee.  I got up an hour ago, and now I only have decaf left in my room, so having more coffee seems a bit pointless&#8230;</p>
<p>Anyway, yesterday was the first day of the <a href="http://mbi.osu.edu/2009/ws4description.html">workshop</a>, and I had great fun.  There were only three talks, the first a very interesting one by Graham Coop on competing selective sweeps, the second one was my own talk on the mathematical framework behind our CoalHMMs, and the last one by Mikkel Schierup on inference in CoalHMMs.  More interesting than the talks, though, were the discussions in the coffee breaks (there&#8217;s half an hour of coffee break between each talk, exactly for these discussions).  There&#8217;s a lot of interesting people here to talk to.</p>
<p>After the last talk there was a welcome reception, sorely lacking in alcohol but good for the discussions still.  When we got back to the hotel we (over?) compensated for the non-alcoholic reception in the hotel bar, so it all turned out ok in the end.</p>
<p>I&#8217;m still jet-lagged, so I hit the bed before 22:00 and woke up at six.  I guess I&#8217;m slowly adjusting now, &#8217;cause that is later than the last two mornings.</p>
<p>In an hour we&#8217;re getting on the bus to campus where we will have breakfast and then have another exciting day at MBI. Until then, I&#8217;ll try to survive on decaf.  Why oh why do people drink that stuff?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2010/02/23/first-day-of-the-workshop/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.mailund.dk/index.php/2010/02/23/first-day-of-the-workshop/</feedburner:origLink></item>
		<item>
		<title>Sunday in Ohio</title>
		<link>http://feedproxy.google.com/~r/MailundOnTheInternet/~3/MnUxKfbVq_Y/</link>
		<comments>http://www.mailund.dk/index.php/2010/02/21/sunday-in-ohio/#comments</comments>
		<pubDate>Sun, 21 Feb 2010 14:20:45 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2035</guid>
		<description><![CDATA[I&#8217;m sitting in a hotel room in Columbus, Ohio, working on a talk.  I arrived here yesterday evening and I&#8217;m pretty jet lagged, but coffee and work is keeping me on my feet so far.
I&#8217;m attending a workshop the coming week that I&#8217;ve been looking forward to for a while. Tomorrow I have a talk [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m sitting in a hotel room in Columbus, Ohio, working on a talk.  I arrived here yesterday evening and I&#8217;m pretty jet lagged, but coffee and work is keeping me on my feet so far.</p>
<p>I&#8217;m attending a <a href="http://mbi.osu.edu/2009/ws4description.html">workshop</a> the coming week that I&#8217;ve been looking forward to for a while. Tomorrow I have a talk there, on the <a href="http://www.mailund.dk/index.php/2009/12/27/on-segment-lengths-going-back-in-time-in-the-coalescence-process-part-3-segment-lengths-of-mrcas-of-two-species/">mathematics behind</a> our latest CoalHMM work, and that talk is what I&#8217;m trying to put together now.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2010/02/21/sunday-in-ohio/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.mailund.dk/index.php/2010/02/21/sunday-in-ohio/</feedburner:origLink></item>
		<item>
		<title>Evolution in Health and Medicine</title>
		<link>http://feedproxy.google.com/~r/MailundOnTheInternet/~3/dfOqyV-AIdI/</link>
		<comments>http://www.mailund.dk/index.php/2010/02/09/evolution-in-health-and-medicine/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 17:03:05 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[Work]]></category>
		<category><![CDATA[genetics]]></category>
		<category><![CDATA[talks]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2032</guid>
		<description><![CDATA[Hi readers.  Sorry I&#8217;ve been very slow in posting the last two months.  My RSI kicked in badly early January and I chose to limit my computer usage to the absolute minimum for a while.  That, combined with a lot of work on various projects means that I haven&#8217;t been able to blog since around [...]]]></description>
			<content:encoded><![CDATA[<p>Hi readers.  Sorry I&#8217;ve been very slow in posting the last two months.  My RSI kicked in badly early January and I chose to limit my computer usage to the absolute minimum for a while.  That, combined with a lot of work on various projects means that I haven&#8217;t been able to blog since around Christmas&#8230;</p>
<p>It will probably be another few months before I&#8217;m up to speed again.  I still haven&#8217;t recovered fully, but at least it is getting better&#8230;</p>
<p>Anyway, enough excuses!  I&#8217;m posting now just to share this nice list of talks I got by email today: <a href="http://www.nasonline.org/site/PageServer?pagename=Sackler_evolution_health_medicine_program">Evolution in Health and Medicine</a>.</p>
<p>I like the talks there, at least, and I hope you will also.</p>
<p>Stay tuned.  While the posting is at a very low rate right now, I <em>do</em> plan to pick up the speed over the coming weeks&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2010/02/09/evolution-in-health-and-medicine/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.mailund.dk/index.php/2010/02/09/evolution-in-health-and-medicine/</feedburner:origLink></item>
		<item>
		<title>The world as we know it</title>
		<link>http://feedproxy.google.com/~r/MailundOnTheInternet/~3/M1FGK12mLSk/</link>
		<comments>http://www.mailund.dk/index.php/2010/01/03/the-world-as-we-know-it/#comments</comments>
		<pubDate>Sun, 03 Jan 2010 10:38:03 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2029</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/17jymDn0W6U&#038;fs=1" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed src="http://www.youtube.com/v/17jymDn0W6U&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2010/01/03/the-world-as-we-know-it/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.mailund.dk/index.php/2010/01/03/the-world-as-we-know-it/</feedburner:origLink></item>
		<item>
		<title>Happy new year</title>
		<link>http://feedproxy.google.com/~r/MailundOnTheInternet/~3/9HjoMfz0geU/</link>
		<comments>http://www.mailund.dk/index.php/2009/12/31/happy-new-year-2/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 14:56:40 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Fun]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2027</guid>
		<description><![CDATA[yup, that&#8217;s all I had to say right now :)
]]></description>
			<content:encoded><![CDATA[<p>yup, that&#8217;s all I had to say right now :)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/12/31/happy-new-year-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.mailund.dk/index.php/2009/12/31/happy-new-year-2/</feedburner:origLink></item>
		<item>
		<title>On segment lengths, going back in time, in the coalescence process. Part 3: Segment lengths of MRCAs of two species</title>
		<link>http://feedproxy.google.com/~r/MailundOnTheInternet/~3/2OqTvKfinoo/</link>
		<comments>http://www.mailund.dk/index.php/2009/12/27/on-segment-lengths-going-back-in-time-in-the-coalescence-process-part-3-segment-lengths-of-mrcas-of-two-species/#comments</comments>
		<pubDate>Sun, 27 Dec 2009 12:57:37 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[ancestral effective population size]]></category>
		<category><![CDATA[coalHMM]]></category>
		<category><![CDATA[ctmc]]></category>
		<category><![CDATA[population genetics]]></category>
		<category><![CDATA[population genomics]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2008</guid>
		<description><![CDATA[A while ago I wrote the first two posts in this &#8220;series&#8221; on the coalescent process and how genomic segments behave as a dynamic system back in time.  Read Part 1 (about a single sequence) here and Part 2 (about two sequences) here.  I wanted to write quite a bit more about it, but what [...]]]></description>
			<content:encoded><![CDATA[<p>A while ago I wrote the first two posts in this &#8220;series&#8221; on the coalescent process and how genomic segments behave as a dynamic system back in time.  Read Part 1 (about a single sequence) <a href="http://www.mailund.dk/index.php/2009/02/23/on-segment-lengths-going-back-in-time-in-the-coalescence-process-part-1-the-ancestry-of-a-single-species/">here</a> and Part 2 (about two sequences) <a href="http://www.mailund.dk/index.php/2009/02/27/on-segment-lengths-going-back-in-time-in-the-coalescence-process-part-2-the-ancestry-of-two-species/">here</a>.  I wanted to write quite a bit more about it, but what happened was that I managed to translate it into a method for population genomics and I was busy working on that and writing a paper about it so I left the blogging for a while.  The paper is now under review, and I have time to write about some of the ideas here again.</p>
<p>As you may recall, we can think of the ancestry of a genomic sequence as a dynamic system that, when run back in time, splits up into different lineages through recombination and then merge again through coalescence events.  This is just the classical coalescence process with recombination, and the result of running it is the so-called Ancestral Recombination Graph, or ARG.</p>
<p>We can use this model both at the population level, but also at the inter-species level (where the ancestries just go back a lot longer in time).</p>
<p>The mathematics is fairly easy to deal with when simulating &#8212; it is a relatively simple continuous time Markov chain (CMTC) &#8212; but because the state space is rather large (infinite if you consider a continuous segment as the genomic segment), inference is rather difficult as is working out some of the properties of the system.  For example the different properties I wrote about in <a href="http://www.mailund.dk/index.php/2009/02/27/on-segment-lengths-going-back-in-time-in-the-coalescence-process-part-2-the-ancestry-of-two-species/">Part 2</a>.</p>
<h3>Markov approximations</h3>
<p>In the work we are doing here at BiRC on &#8220;<a href="http://www.mailund.dk/index.php/2009/07/07/ancestral-population-genomics-the-coalescent-hidden-markov-model-approach/">ancestral population genomics</a>&#8221; we work around this problem by making a simplifying assumption.  We assume that the process is not only a Markov chain in time, but also along the genomic segment.  What this means is that instead of dealing with the enormous &#8212; or infinite &#8212; state space of the coalescent with recombination, we can consider only neighbouring pairs of nucleotides, work out their dynamics, and extrapolate from there using the Markov assumption.</p>
<p>In <a href="http://www.mailund.dk/index.php/2009/02/23/on-segment-lengths-going-back-in-time-in-the-coalescence-process-part-1-the-ancestry-of-a-single-species/">Part 1</a>, I described a simple example of this, when considering the dynamics of a single genome going back in time, and described how the Markov assumption is incorrect, but still reasonable in approximating the coalescent process.</p>
<p>In this post I am going to approximate the ancestry of two (haploid) genomes through a CTMC modeling two neighbouring nucleotides from each of the genomes.</p>
<p>The states of the system is shown below:</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/12/Screen-shot-2009-12-27-at-1.07.33-PM.png"><img class="aligncenter size-medium wp-image-2011" title="CTMC states" src="http://www.mailund.dk/wp-content/uploads/2009/12/Screen-shot-2009-12-27-at-1.07.33-PM-300x35.png" alt="CTMC states" width="300" height="35" /></a>Essentially the states captures all combinations of a left nucleotide being linked with a right nucleotide (a black line or no line) between the two sequences (top black dots and bottom black dots) and the nucleotides having found a most common recent ancestor (MRCA, the white dots) or not (the black dots).</p>
<p>You can move between the states through recombination events &#8212; that remove the links between right and left nucleotides &#8212; or coalescence events &#8212; that either adds a link between a left and right nucleotide or joins two left of two right nucleotides in their MRCA.</p>
<p>The rate matrix for the system is shown below, where the blank cells are zero entries and the &#8220;-&#8221; cells contain -1 times the row sums, as usual.</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/12/Screen-shot-2009-12-27-at-1.11.21-PM.png"><img class="aligncenter size-medium wp-image-2012" title="Rate matrix" src="http://www.mailund.dk/wp-content/uploads/2009/12/Screen-shot-2009-12-27-at-1.11.21-PM-300x161.png" alt="Rate matrix" width="300" height="161" /></a></p>
<p>Here <img src="http://www.mailund.dk/wp-content/cache/tex_0d61f8370cad1d412f80b84d143e1257.png" align="absmiddle" class="tex" alt="C" /> is the coalescent rate and <img src="http://www.mailund.dk/wp-content/cache/tex_e1e1d3d40573127e9ee0480caf1283d6.png" align="absmiddle" class="tex" alt="R" /> the recombination rate.  Typically, we use a time scale in coalescent theory of <img src="http://www.mailund.dk/wp-content/cache/tex_10ecf2ca4b4f4fefe9160ce5a396bce4.png" align="absmiddle" class="tex" alt="2 N_e" /> generations per time unit in which case <img src="http://www.mailund.dk/wp-content/cache/tex_0d61f8370cad1d412f80b84d143e1257.png" align="absmiddle" class="tex" alt="C" /> would be 1.  If <img src="http://www.mailund.dk/wp-content/cache/tex_ce1e8d68519bf70a08ff41fabdd2f9f2.png" align="absmiddle" class="tex" alt="N_e" /> is 10,000 as for humans, then a recombination rate of <img src="http://www.mailund.dk/wp-content/cache/tex_2fdc2e00689c1131c83c66493afbac88.png" align="absmiddle" class="tex" alt="1 cM/Mb" /> would be a rate of <img src="http://www.mailund.dk/wp-content/cache/tex_828a7107b4526a315fd57cca9e646d24.png" align="absmiddle" class="tex" alt="R=0.004" /> on that time scale.</p>
<p>When considering two genomes back in time, we start in one of the states in <img src="http://www.mailund.dk/wp-content/cache/tex_0cd591a83cddb2c0bb064d812a7c75a2.png" align="absmiddle" class="tex" alt="\Omega_B" /> &#8212; the &#8220;beginning states&#8221;.  If we have two haploid genomes and we start the system at time zero, we would probably start in state 4 where both the genomes are linked, but if we consider an inter-species system we would get some probability distribution over the states in <img src="http://www.mailund.dk/wp-content/cache/tex_0cd591a83cddb2c0bb064d812a7c75a2.png" align="absmiddle" class="tex" alt="\Omega_B" /> from running the single sequence system from Part 1.</p>
<p>The states in <img src="http://www.mailund.dk/wp-content/cache/tex_b1473bdbccdecad53cf9edfd42827f29.png" align="absmiddle" class="tex" alt="\Omega_L" /> (or <img src="http://www.mailund.dk/wp-content/cache/tex_d91679d0662fd61d247b8f4b803c29ab.png" align="absmiddle" class="tex" alt="\Omega_R" />) corresponds to states where only the left (or only the right) nucleotides have found their MRCA while the right (or left) have not.  The &#8220;end states&#8221;, <img src="http://www.mailund.dk/wp-content/cache/tex_7ae952d65a9091d63f52f4aa0841d2fb.png" align="absmiddle" class="tex" alt="\Omega_E" /> contains the two states where both left and right nucleotides have found their MRCA, and is essentially just the single sequence system from Part 1.</p>
<p>Running the system back in time, the probability of being in one of these four classes of states evolves as this:</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/12/Screen-shot-2009-12-27-at-1.24.22-PM.png"><img class="aligncenter size-medium wp-image-2014" title="Dynamics of the CTMC" src="http://www.mailund.dk/wp-content/uploads/2009/12/Screen-shot-2009-12-27-at-1.24.22-PM-288x300.png" alt="Dynamics of the CTMC" width="288" height="300" /></a>Being in either <img src="http://www.mailund.dk/wp-content/cache/tex_b1473bdbccdecad53cf9edfd42827f29.png" align="absmiddle" class="tex" alt="\Omega_L" /> or <img src="http://www.mailund.dk/wp-content/cache/tex_d91679d0662fd61d247b8f4b803c29ab.png" align="absmiddle" class="tex" alt="\Omega_R" /> is always somewhat unlikely, since it requires that a recombination has decoupled a left and right nucleotide before finding a MRCA, and it is more likely that the left and right nucleotides find their MRCA in the same coalescent event, moving directly from <img src="http://www.mailund.dk/wp-content/cache/tex_0cd591a83cddb2c0bb064d812a7c75a2.png" align="absmiddle" class="tex" alt="\Omega_B" /> to <img src="http://www.mailund.dk/wp-content/cache/tex_7ae952d65a9091d63f52f4aa0841d2fb.png" align="absmiddle" class="tex" alt="\Omega_E" />.</p>
<h3>Deep and shallow coalescent segments</h3>
<p>With a finite state CTMC as the one above, we can compute pretty much everything of interest, and there is an extensive theory for how to manipulate these systems.  If we want to say anything about the ancestry of genomes, though, we have to remember that the system only deals with two nucleotides, which in anyone&#8217;s book is quite a bit shorter than a genome.</p>
<p>To get from a pair of nucleotides to a full genome, we use the Markov assumption.  When modeling <img src="http://www.mailund.dk/wp-content/cache/tex_7b8b965ad4bca0e41ab51de7b31363a1.png" align="absmiddle" class="tex" alt="n" /> nucleotides, we assume that we can model the ancestry of number <img src="http://www.mailund.dk/wp-content/cache/tex_7b8b965ad4bca0e41ab51de7b31363a1.png" align="absmiddle" class="tex" alt="n" /> conditional on the previous <img src="http://www.mailund.dk/wp-content/cache/tex_a438673491daae8148eae77373b6a467.png" align="absmiddle" class="tex" alt="n-1" /> by only conditioning on number <img src="http://www.mailund.dk/wp-content/cache/tex_a438673491daae8148eae77373b6a467.png" align="absmiddle" class="tex" alt="n-1" />, and then we can compute the probabilities of those ancestries using our two-nucleotide CTMC.</p>
<p>For the rest of this post, I&#8217;ll give an example of this, and then give other examples in following posts.  When the paper I have under review now is out, I&#8217;ll also explain how this CTMC can be used to construct a CoalHMM for population genomics.</p>
<p>Anyway, the example: Let us consider the human-chimp-gorilla trio.  We know that these three species are very closely related, to the point where there is extensive <a href="http://www.mailund.dk/index.php/2009/02/12/on-gene-trees-and-species-trees/">incomplete lineage sorting</a>.  This happens when we have deep coalescences, compared to the speciation times.  If the human and chimp lineages have not found their MRCA before the gorilla speciation (when considering the process back in time), then it is possible that the human coalesces with the gorilla before the chimp, giving us a ((human,gorilla),chimp) genealogy, and similarly for chimp and gorilla coalescing before reaching the human.</p>
<p>Some things about this system are easy to work out from the coalescence process. Working out the probabilities for deep coalescences and incomplete lineage sorting is pretty simple.  The coalescence process has a simple exponential distribution, so figuring out the probability that the human and chimp have not coalesced before the gorilla speciation (as a function of effective population sizes and speciation times) is straightforward.</p>
<p>If the lineages have not coalesced before the gorilla speciation, then there are three possible topologies for the genealogy, ((human,chimp),gorilla), ((human,gorilla),chimp) and ((chimp,gorilla),human), each equally likely, and only one of which, ((human,chimp),gorilla), is congruent with the species tree.</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/12/Screen-shot-2009-12-27-at-1.49.36-PM.png"><img class="aligncenter size-medium wp-image-2016" title="Congruence graph" src="http://www.mailund.dk/wp-content/uploads/2009/12/Screen-shot-2009-12-27-at-1.49.36-PM-300x218.png" alt="Congruence graph" width="300" height="218" /></a>So, e.g. to work out the probability of seeing a gene tree equal to the species tree, you just need to work out the probability that human and chimp coalesce before or after the speciation time.  The gene tree is congruent if they do, or 1/3 of the times when they do not.</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/12/Screen-shot-2009-12-27-at-1.52.02-PM.png"><img class="aligncenter size-full wp-image-2017" title="Congruence equation" src="http://www.mailund.dk/wp-content/uploads/2009/12/Screen-shot-2009-12-27-at-1.52.02-PM.png" alt="Congruence equation" width="494" height="52" /></a>All in all pretty simple, but it is only telling us how many nucleotides we expect to see congruent with the species tree, not anything about how they are distributed along the genome.</p>
<p>Here, I am going to consider the slightly simpler question of segments above or below the speciation time.  It is straightforward to work out the probability of seeing a nucleotide above <img src="http://www.mailund.dk/wp-content/cache/tex_aa40b62074a188ab2501134f1c2ab128.png" align="absmiddle" class="tex" alt="\Pr(T&gt;\tau)" /> or below <img src="http://www.mailund.dk/wp-content/cache/tex_48bb085a14f572f1bcfad4765357ac8e.png" align="absmiddle" class="tex" alt="\Pr(T\leq\tau)" /> the speciation time, but what is the pattern of segments above or below along the genome?</p>
<p>Let&#8217;s call a contiguous segment of nucleotides that all coalesce further back than the speciation gorilla time a &#8220;deep coalescence segment&#8221; and a contiguous segment of nucleotides that all coalesce before the gorilla speciation a &#8220;shallow coalescence segment&#8221;.</p>
<p>Oh, by the way, before above means closer to the present &#8230; it is confusing, I know, but time runs backwards in coalescence theory.</p>
<p>We can use the Markov approximation to work out the mean length of deep and shallow coalescence segments as a function of the speciation time.</p>
<p>Under the Markov approximation, we can argue as follows: For a deep coalescence, consider a left nucleotide in a deep state.  The nucleotide to the right of it is either also deep, or it is shallow.  If it is deep, then the next one can be deep or shallow, and so on.  Under the Markov approximation, the probability of moving from deep to shallow is the same in each such step, and the length of a deep segment is given by the waiting time for moving from deep to shallow, a waiting time that is geometric distributed.</p>
<p>For the CTMC states, we have the left nucleotide in a deep state if we are in one of the states in the classes <img src="http://www.mailund.dk/wp-content/cache/tex_b1473bdbccdecad53cf9edfd42827f29.png" align="absmiddle" class="tex" alt="\Omega_L" /> or <img src="http://www.mailund.dk/wp-content/cache/tex_0cd591a83cddb2c0bb064d812a7c75a2.png" align="absmiddle" class="tex" alt="\Omega_B" /> at the speciation time, while we have the left nucleotide in a shallow state if we are in one of the states int he classes <img src="http://www.mailund.dk/wp-content/cache/tex_d91679d0662fd61d247b8f4b803c29ab.png" align="absmiddle" class="tex" alt="\Omega_R" /> or <img src="http://www.mailund.dk/wp-content/cache/tex_7ae952d65a9091d63f52f4aa0841d2fb.png" align="absmiddle" class="tex" alt="\Omega_E" />.</p>
<p>If the left nucleotide is in a deep state, then the right is also in a deep state with probability <img src="http://www.mailund.dk/wp-content/cache/tex_7da6c1a9866d6b80f40d4e06e046576a.png" align="absmiddle" class="tex" alt="\frac{\Pr(\Omega_E)}{\Pr(\Omega_R)+\Pr(\Omega_E)}" /> while the right is in a shallow state with probability <img src="http://www.mailund.dk/wp-content/cache/tex_de106eab3ef3b6105d33c1c33f961b95.png" align="absmiddle" class="tex" alt="\frac{\Pr(\Omega_R)}{\Pr(\Omega_R)+\Pr(\Omega_E)}" />.  Similarly for when the left nucleotide is in a shallow state.</p>
<p>The mean length of a deep coalescence segment is then <img src="http://www.mailund.dk/wp-content/cache/tex_21a92c4820f0dfca068f22d25d1b413b.png" align="absmiddle" class="tex" alt="1 / \frac{\Pr(\Omega_R)}{\Pr(\Omega_R)+\Pr(\Omega_E)}" /> while the mean length of a shallow coalescence segment is <img src="http://www.mailund.dk/wp-content/cache/tex_240a0a78618ae1fd8c578eb8a64825d6.png" align="absmiddle" class="tex" alt=" 1 / \frac{\Pr(\Omega_L)}{\Pr(\Omega_L)+\Pr(\Omega_B)}" />.</p>
<p>Very easy to compute with the CTMC, but pretty hard from the coalescence process.</p>
<p>Of course, it <em>is</em> only an approximation, so we should worry a bit about how accurate it is.  I&#8217;ve simulated some data sets using the coalescence process, using a human-chimp divergence of 4.5 mya, a present day effective population size of humans and chimps of 10,000 or 50,000 and an effective population size of the human-chimp ancestor of 50,000 or 100,000.  The plots below shows the distribution of deep and shallow segments as the boxplots, the mean length of these as the green bullet, and the expected mean lengths as computed above as the blue line.</p>
<p><a href="http://www.mailund.dk/wp-content/uploads/2009/12/10K-50K.png"><img class="aligncenter size-medium wp-image-2020" title="10K-50K" src="http://www.mailund.dk/wp-content/uploads/2009/12/10K-50K-300x209.png" alt="10K-50K" width="300" height="209" /></a><a href="http://www.mailund.dk/wp-content/uploads/2009/12/10K-100K.png"><img class="aligncenter size-medium wp-image-2021" title="10K-100K" src="http://www.mailund.dk/wp-content/uploads/2009/12/10K-100K-300x215.png" alt="10K-100K" width="300" height="215" /></a><a href="http://www.mailund.dk/wp-content/uploads/2009/12/50K-50K.png"><img class="aligncenter size-medium wp-image-2022" title="50K-50K" src="http://www.mailund.dk/wp-content/uploads/2009/12/50K-50K-300x216.png" alt="50K-50K" width="300" height="216" /></a><a href="http://www.mailund.dk/wp-content/uploads/2009/12/50K-100K.png"><img class="aligncenter size-medium wp-image-2023" title="50K-100K" src="http://www.mailund.dk/wp-content/uploads/2009/12/50K-100K-300x195.png" alt="50K-100K" width="300" height="195" /></a>All in all, it is a pretty good fit, so the approximation looks okay.</p>
<p>If we assume that the present day Ne of humans and chimps is 20,000 (a bit high for humans and a bit low for chimps, but on average okay) and that the human-chimp ancestral Ne is 50,000 (as often estimated) then with a gorilla branching off 1 milion years before the human-chimp speciation, we expect that segments that coalesce deeper than the speciation are ~116bp while segments that coalesce before are ~76bp.</p>
<p>Of course, this is only on average, and the true picture is more complex, but it does tell us that the genomic relationship between these three species is somewhat complex with a lot of short genomic fragments with MRCA in the human-chimp ancestral species or with MRCA in the shared African ape ancestor.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/12/27/on-segment-lengths-going-back-in-time-in-the-coalescence-process-part-3-segment-lengths-of-mrcas-of-two-species/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.mailund.dk/index.php/2009/12/27/on-segment-lengths-going-back-in-time-in-the-coalescence-process-part-3-segment-lengths-of-mrcas-of-two-species/</feedburner:origLink></item>
		<item>
		<title>Ok, I’m not gonna make it…</title>
		<link>http://feedproxy.google.com/~r/MailundOnTheInternet/~3/HwI3LQ5TUfk/</link>
		<comments>http://www.mailund.dk/index.php/2009/12/27/ok-im-not-gonna-make-it/#comments</comments>
		<pubDate>Sun, 27 Dec 2009 11:05:04 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Fun]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2006</guid>
		<description><![CDATA[I set out to post an average of one post per day this year, and I was on track until a month ago.  After that, I pretty much just stopped blogging.  The number of projects I was working on pretty much got out of hand and I was too stressed and it has taken me [...]]]></description>
			<content:encoded><![CDATA[<p>I set out to post an average of one post per day this year, and I was on track until a month ago.  After that, I pretty much just stopped blogging.  The number of projects I was working on pretty much got out of hand and I was too stressed and it has taken me the Christmas holiday to recover.  Still, I&#8217;m so far behind on the blogging, more than 20 behind with this post, that I doubt that I&#8217;ll make it for the new year resolution, but at least I am planning on getting back to blogging again now&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/12/27/ok-im-not-gonna-make-it/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://www.mailund.dk/index.php/2009/12/27/ok-im-not-gonna-make-it/</feedburner:origLink></item>
		<item>
		<title>Draft genomes and finished genomes</title>
		<link>http://feedproxy.google.com/~r/MailundOnTheInternet/~3/gcnVMQjpoao/</link>
		<comments>http://www.mailund.dk/index.php/2009/12/08/draft-genomes-and-finished-genomes/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 18:24:53 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2003</guid>
		<description><![CDATA[Finch Talk has a nice post about draft and finished genomes, slightly related to my post from a few days ago.
–
342-340=+2
]]></description>
			<content:encoded><![CDATA[<p>Finch Talk has a <a href="http://www.geospiza.com/finchtalk/2009/12/expeditiously-exponential-genome.html">nice post</a> about draft and finished genomes, slightly related to <a href="http://www.mailund.dk/index.php/2009/12/03/oh-my-has-it-really-been-that-long/">my post from a few days ago</a>.</p>
<p>–</p>
<p>342-340=+2</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/12/08/draft-genomes-and-finished-genomes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.mailund.dk/index.php/2009/12/08/draft-genomes-and-finished-genomes/</feedburner:origLink></item>
		<item>
		<title>Some feedback would be nice…</title>
		<link>http://feedproxy.google.com/~r/MailundOnTheInternet/~3/qYnq-8dobm0/</link>
		<comments>http://www.mailund.dk/index.php/2009/12/08/some-feedback-would-be-nice/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 17:21:23 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Rants]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Work]]></category>
		<category><![CDATA[research grant]]></category>
		<category><![CDATA[Research life]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=2001</guid>
		<description><![CDATA[I just got two research grant applications rejected this week.
Nothing wrong with that, really, the success rate where I applied is less than 10% so I couldn&#8217;t really expect to get the grants.  Still, it annoys me a little bit that the letter I got back is just boiler plate.  &#8220;We receive more applications than [...]]]></description>
			<content:encoded><![CDATA[<p>I just got two research grant applications rejected this week.</p>
<p>Nothing wrong with that, really, the success rate where I applied is less than 10% so I couldn&#8217;t really expect to get the grants.  Still, it annoys me a little bit that the letter I got back is just boiler plate.  &#8220;We receive more applications than we can fund, and unfortunately we could not fund yours.&#8221;</p>
<p>That doesn&#8217;t tell me if I was close to getting funded or lightyears from it.  It certainly doesn&#8217;t help me improve on the application if I want to try again.</p>
<p>Some feedback on the applications would really be nice.  If not a full reviewer report, then at least a score or something&#8230;</p>
<p>Oh well, I still have one grant application under review, this one for the EU Research Council, and here at least I&#8217;ve already gotten a review report back, with scores and everything, and it looks really nice, so I have my fingers crossed&#8230;</p>
<p>–</p>
<p>342-339=+3</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/12/08/some-feedback-would-be-nice/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.mailund.dk/index.php/2009/12/08/some-feedback-would-be-nice/</feedburner:origLink></item>
		<item>
		<title>One step closer to commercial space flight</title>
		<link>http://feedproxy.google.com/~r/MailundOnTheInternet/~3/d7SbPCCuJSk/</link>
		<comments>http://www.mailund.dk/index.php/2009/12/08/one-step-closer-to-commercial-space-flight/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 17:09:54 +0000</pubDate>
		<dc:creator>Thomas Mailund</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[Space]]></category>
		<category><![CDATA[space exploration]]></category>

		<guid isPermaLink="false">http://www.mailund.dk/?p=1999</guid>
		<description><![CDATA[Remember SpaceShipOne that won the X-prize in 2004?  Well, Virgin Galactic reviled SpaceShipTwo yesterday.
With SpaceShipTwo, 6 passengers can get a short trip into space, and the plan is to sell such trips.  I don&#8217;t know what the prize will be, but I guess it is safe to say that I won&#8217;t be able to get [...]]]></description>
			<content:encoded><![CDATA[<p>Remember SpaceShipOne that won the X-prize in 2004?  Well, <a href="http://spacefellowship.com/2009/12/07/virgin-galactic-unveils-spaceshiptwo/">Virgin Galactic reviled SpaceShipTwo yesterday</a>.</p>
<p>With SpaceShipTwo, 6 passengers can get a short trip into space, and the plan is to sell such trips.  I don&#8217;t know what the prize will be, but I guess it is safe to say that I won&#8217;t be able to get a trip any time soon.</p>
<p>Still, it is pretty cool, and if they get to go higher than sub-orbital flight with SpaceShipThree, we might be looking at the beginning of affordable space flight pretty soon.</p>
<p>&#8211;</p>
<p>342-338=+4</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mailund.dk/index.php/2009/12/08/one-step-closer-to-commercial-space-flight/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.mailund.dk/index.php/2009/12/08/one-step-closer-to-commercial-space-flight/</feedburner:origLink></item>
	</channel>
</rss>
